Introduction

Decision Trees is a powerful algorithms that can be applied for both classification and regression tasks. Decision Trees is among the most versatile and reliable machine learning algorithms for non-linear and complex data. Decision Trees is the fundamental component of Random Forest that will be discussed next lecture. Decision Trees is applied based on a flowchart structure. We start by generating a synthetic example with two classes; details about training, prediction and limitation of this technique are presented.

Generate Synthetic Data (Moons Dataset)¶

The following code shows how to generate a synthetic data set using make_moons that makes two interleaving half circles:

import pandas as pd
import numpy as np
import matplotlib
import pylab as plt
from numpy import random
from sklearn.datasets import make_moons


font = {'size'   : 12}
matplotlib.rc('font', **font)
fig = plt.subplots(figsize=(6.0, 4.5), dpi= 90, facecolor='w', edgecolor='k')

X, y = make_moons(n_samples=1000, noise=0.42, random_state=85)
X1=X[:,0] ; X2=X[:,1]
ik=0; ind=[]
for j in X2:
    if(y[ik]==0):
        if X1[ik]<-0.2:
            ind.append(ik)
        elif(j>-0.1):
            ind.append(ik)
    elif(y[ik]==1):
        if(j<-0.0):
            ind.append(ik)        
    ik+=1

X1=X1[ind] ; X2=X2[ind]; y=y[ind]

random.seed(42)
x_1=random.randint(-15,-5,100)/10
x_2=random.randint(-15,1,100)/10
y_1=random.randint(0,1,100)

X1=np.concatenate((X1,x_1),axis=0)
X2=np.concatenate((X2,x_2),axis=0)
y=np.concatenate((y,y_1),axis=0)
def moon_dataset(X1,X2, y, axes):
    plt.plot(X1[y==0], X2[y==0], "bs", label="Class 0")
    plt.plot(X1[y==1], X2[y==1], "r^", label="Class 1")
    plt.xlabel(r"$X_{1}$", fontsize=18)
    plt.ylabel(r"$X_{2}$", fontsize=18, rotation=0)
    plt.grid(True, which='both')
    plt.axis(axes)
    plt.legend(loc="lower right", fontsize=14)

moon_dataset(X1,X2, y, [-1.5, 3, -1.5, 2])

plt.title('Synthetic Dataset with Two Classes', fontsize=18)

a=np.array(X1)
b=np.array(X2)
a=a.reshape(len(X1),1)
b=b.reshape(len(X2),1)
X=np.concatenate((a,b),axis=1)

plt.show()

Decision Trees¶

A Decision Trees model is trained with the moon dataset using DecisionTreeClassifier

from sklearn.tree import DecisionTreeClassifier

np.random.seed(42) 
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)

DecisionTreeClassifier(max_depth=2)

# Predict the fist 5 instances
tree_clf.predict(X[:5])

array([1, 0, 1, 0, 1], dtype=int64)

# The actual 5 instances
y[:5]

array([1, 0, 1, 0, 1], dtype=int64)

from sklearn.model_selection import cross_val_score

Accuracies=cross_val_score(tree_clf,X,y, cv=4, scoring="accuracy")
Accuracies

array([0.98181818, 0.97260274, 0.98173516, 0.99086758])

The trained Decision Tree can be visualized by first using the export_graphviz() method to output a graph definition file DecisionTrees.dot and use the "dot -Tps DecisionTrees.dot -o DecisionTrees.eps" to convert dot file to a eps picture. See the following code.

# Plot Decision Trees for flow chart

from sklearn.tree import export_graphviz
import os

export_graphviz(tree_clf,out_file="DecisionTrees.dot",
                feature_names=['X1','X2'],
                class_names=['0','1'],rounded=True,filled=True)

command = "dot -Tps DecisionTrees.dot -o DecisionTrees.eps" 
os.system(command)
plt.show()

The Figure below shows visualization for trained Decision Tree with moon dataset.

Root Node (Start)¶

Let’s see how the tree represented in Figure above makes predictions. We start at the Root Node (Depth 0): this node asks whether the $X\leqslant-0.045$. If it is, then you move down to the root’s left Child Node (Depth 1). If not, it goes to the root’s right child node (Depth 1). A Leaf Node does not have any children nodes, so it does not ask any questions (see left and right Leaf Nodes in Figure above).

For each node, gini attribute measures the impurity: a node is “pure” (gini=0) if we have all the training instances belong to the same class.

The equation below shows how to compute the gini score:

$gini=1-\sum_{k=1}^{n} P_{k}^{2}$

$P_{k}$ is the ratio of class k instances among the training instances for each node. For example the gini score for the Root Node (Depth 0) is $1-(\frac{575}{877})^{2}-(\frac{302}{877})^{2}=0.452$

The gini reduces from Root Node to Leaf Node, so the predicted class is based on Leaf Node. The following codes shows how to calculate gini for each node:

X_p=X.reshape(len(X1),2)
y_p=y.reshape(len(y),1)
x=np.concatenate((X_p,y_p),axis=1)
df=pd.DataFrame(x,columns=['X1','X2','y'])
df

def gini(x):
    val=1-sum([(i/sum(x))**2 for i in x])      
    return np.round(val,3)

Result=df['y'].value_counts()
print('Total numbers=',sum(Result))
print('gini=',gini(Result)) 
print(Result)

Total numbers= 877
gini= 0.452
0.0    575
1.0    302
Name: y, dtype: int64

Depth 1: Left Child Node¶

Result=df['y'][df['X2']<= -0.045].value_counts()
print('Total numbers=',sum(Result))
print('gini=',gini(Result)) 
print(Result)

Total numbers= 415
gini= 0.409
1.0    296
0.0    119
Name: y, dtype: int64

Depth 1: Left Leaf Node¶

Result=df['y'][df['X2']<= -0.045][df['X1']<= -0.497].value_counts()
print('Total numbers=',sum(Result))
print('gini=',gini(Result)) 
print(Result)

Total numbers= 113
gini= 0.0
0.0    113
Name: y, dtype: int64

Depth 1: Right Leaf Node¶

Result=df['y'][df['X2']<= -0.045][df['X1']> -0.497].value_counts()
print('Total numbers=',sum(Result))
print('gini=',gini(Result)) 
print(Result)

Total numbers= 302
gini= 0.039
1.0    296
0.0      6
Name: y, dtype: int64

Depth 2: Right Child Node¶

Result=df['y'][df['X2']> -0.045].value_counts()
print('Total numbers=',sum(Result))
print('gini=',gini(Result)) 
print(Result)

Total numbers= 462
gini= 0.026
0.0    456
1.0      6
Name: y, dtype: int64

Depth 2: Left Leaf Node¶

Result=df['y'][df['X2']> -0.045][df['X1']<= 2.034].value_counts()
print('Total numbers=',sum(Result))
print('gini=',gini(Result)) 
print(Result)

Total numbers= 459
gini= 0.013
0.0    456
1.0      3
Name: y, dtype: int64

Depth 2: Right Leaf Node¶

Result=df['y'][df['X2']> -0.045][df['X1']> 2.034].value_counts()
print('Total numbers=',sum(Result))
print('gini=',gini(Result)) 
print(Result)

Total numbers= 3
gini= 0.0
1.0    3
Name: y, dtype: int64

Figure below shows Decision Tree’s decision boundaries. The decision boundaries for depth=0 (the root node) is X2<= -0.045, for depth=1 is X1<= -0.497 and for depth=2 is X1<= 2.034.

from matplotlib.colors import ListedColormap
import matplotlib as mpl
import matplotlib.pyplot as plt

font = {'size'   : 12}
matplotlib.rc('font', **font)
fig = plt.subplots(figsize=(6.0, 4.5), dpi= 90, facecolor='w', edgecolor='k')

x1s = np.linspace(-1.8,3, 100)
x2s = np.linspace(-1.5,2, 100)
x1, x2 = np.meshgrid(x1s, x2s)
X_new = np.c_[x1.ravel(), x2.ravel()]
y_pred = tree_clf.predict(X_new).reshape(x1.shape)
custom_cmap = ListedColormap(['b','r'])
plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs", label="Class 0")
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "r^", label="Class 1")
    
plt.axis([-1.5, 3, -1.5, 2])
plt.xlabel(r"$X_{1}$", fontsize=18)
plt.ylabel(r"$X_{2}$", fontsize=18, rotation=0)
plt.plot([-1.5, 3], [-0.045, -0.045], "k-", linewidth=2)
plt.text(0.6, 0.05, "Depth=0",style='oblique', ha='center',
         bbox=dict(facecolor='w', alpha=1,pad=0),
         va='top', wrap=True, fontsize=15)
#
plt.plot([-0.497, -0.497], [-1.5, -0.045], "k--", linewidth=2)
plt.text(-0.5, -0.75, "Depth=1",style='oblique', ha='center',
         bbox=dict(facecolor='w', alpha=1,pad=0),
         va='top', wrap=True, fontsize=15)
#
plt.plot([2.034, 2.034], [-0.045, 2.0], "k:", linewidth=2)
plt.text(2, 1, "Depth=2",style='oblique', ha='center',
         bbox=dict(facecolor='w', alpha=1,pad=0),
         va='top', wrap=True, fontsize=15)
plt.title('Decision Tree Decision Boundaries with max depth=2', fontsize=14)
plt.legend(loc=2, fontsize=14)
plt.show()

Regularization Hyperparameters¶

We saw in the Machine Learning main steps lecture that Decision Trees has greatly overfitted the training set. It was because all parameters have been freely selected. Decision Tree’s freedom can be restricted to avoid overfitting. As we mentioned before, this is called regularization. In Scikit-Learn, this is controlled by the max_depth hyperparameter (the default value for max_depth is unlimited). Reducing max_depth will reduce the risk of overfitting. Another hyperparameters are min_samples_leaf (the minimum number of samples a leaf node must have), max_leaf_nodes (maximum number of leaf nodes)... Reducing max or increasing min regularizes the model.

The Figure belows shows an overfitted model for max_depth=20 in left. In right Figure, we still have max_depth=20 but restrict min_samples_leaf to 5. This leads to avoid overfitting.

font = {'size'   : 12}
matplotlib.rc('font', **font)
fig = plt.subplots(figsize=(13.0, 4.5), dpi= 90, facecolor='w', edgecolor='k')

ax1=plt.subplot(1,2,1) 
X, y = make_moons(n_samples=100, noise=0.42, random_state=85)

x1s = np.linspace(-1.8,3, 100)
x2s = np.linspace(-1.5,2, 100)
x1, x2 = np.meshgrid(x1s, x2s)
X_new = np.c_[x1.ravel(), x2.ravel()]
np.random.seed(42) 
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)
y_pred = tree_clf.predict(X_new).reshape(x1.shape)
custom_cmap = ListedColormap(['b','r'])
plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs", label="Class 0")
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "r^", label="Class 1")
    
plt.axis([-1.5, 3, -1.5, 2])
plt.xlabel(r"$X_{1}$", fontsize=18)
plt.ylabel(r"$X_{2}$", fontsize=18, rotation=0)
plt.title("max_depth=20", fontsize=16)
plt.legend(loc=2, fontsize=14)

ax2=plt.subplot(1,2,2) 

# Decision Trees
np.random.seed(42) 
tree_clf = DecisionTreeClassifier(max_depth=20,min_samples_leaf=5)
tree_clf.fit(X, y)
y_pred = tree_clf.predict(X_new).reshape(x1.shape)
custom_cmap = ListedColormap(['b','r'])
plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs", label="Class 0")
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "r^", label="Class 1")
    
plt.axis([-1.5, 3, -1.5, 2])
plt.xlabel(r"$X_{1}$", fontsize=18)
plt.ylabel(r"$X_{2}$", fontsize=18, rotation=0)
plt.title("max_depth=20 and min_samples_leaf=5", fontsize=16)
plt.legend(loc=2, fontsize=14)

plt.show()

Regression¶

Decision Trees can also be applied for regression task as we saw in Lecture 3 (for hydrocarbone production). Let’s build a regression tree for a synthetic data using Scikit-Learn’s DecisionTreeRegressor.

# Quadratic training set + noise
np.random.seed(52)
n = 100
X = np.random.rand(n, 1)
y =  10-(X-0.5) **2
y = y + np.random.randn(n, 1) / 15

from sklearn.tree import DecisionTreeRegressor

tree_reg = DecisionTreeRegressor(max_depth=2, random_state=42)
tree_reg.fit(X, y)

DecisionTreeRegressor(max_depth=2, random_state=42)

from sklearn.tree import DecisionTreeRegressor

font = {'size'   : 12}
matplotlib.rc('font', **font)
fig = plt.subplots(figsize=(13.0, 4.5), dpi= 90, facecolor='w', edgecolor='k')

ax1=plt.subplot(1,2,1) 
reg_tree1 = DecisionTreeRegressor(random_state=20)
reg_tree1.fit(X, y)
x1 = np.linspace(0, 1, 500).reshape(-1, 1)
y_pred = reg_tree1.predict(x1)

plt.xlabel("$X$", fontsize=18)
plt.ylabel("$y$", fontsize=18, rotation=0)
plt.plot(X, y, "ro",markersize=6,label='Training Data')
plt.plot(x1, y_pred, "b-", linewidth=2, label='Fitted Model')
plt.title("max_depth=2", fontsize=16)
plt.legend(loc=4, fontsize=14)

ax2=plt.subplot(1,2,2) 
reg_tree2 = DecisionTreeRegressor(random_state=20, max_depth=20)
reg_tree2.fit(X, y)
x1 = np.linspace(0, 1, 500).reshape(-1, 1)
y_pred = reg_tree2.predict(x1)

plt.xlabel("$X$", fontsize=18)
plt.ylabel("$y$", fontsize=18, rotation=0)
plt.plot(X, y, "ro",markersize=6,label='Training Data')
plt.plot(x1, y_pred, "b-", linewidth=2, label='Fitted Model')
plt.legend(loc=4, fontsize=14)
plt.title("max_depth=20 (overfitting)", fontsize=16)

plt.show()

Limitation of Decision Trees¶

We saw Decision Trees algorithm is simple, easy to use, and powerful. However, like every technique it has limitations. Decision Trees has orthogonal decision boundaries, which makes it difficult to fit training sets which require rotation. For example, the following code shows training Decision Trees model for synthetic orthogonal data in left and for 45° rotated training set in right. It seems both Decision Trees fit the training set well; but the model on the right does not generalize well.

np.random.seed(20)
Xs = np.random.rand(100, 2) - 0.5
ys = (Xs[:, 0] > 0).astype(np.float32) * 2

def rotate(x0,y0,x,y,Ang):
    """Rotate coordinate"""
    import math
    xa=[]      
    ya=[] 
    Angle=Ang *(math.pi/180)
    for i in range(len(x)):
        tmp1=x[i]-x0
        tmp2=y[i]-y0
        X=(tmp2)*math.sin(Angle)+(tmp1)*math.cos(Angle)
        xa.append(X+x0)
        Y=(tmp2)*math.cos(Angle)-(tmp1)*math.sin(Angle)
        ya.append(Y+y0) 
    xa=np.array(xa).reshape(len(xa),1)
    ya=np.array(ya).reshape(len(ya),1)    
    X=np.concatenate((xa,ya),axis=1)   
    return X

font = {'size'   : 12}
matplotlib.rc('font', **font)
fig = plt.subplots(figsize=(13.0, 4.5), dpi= 90, facecolor='w', edgecolor='k')

np.random.seed(20)
X = np.random.rand(200, 2) - 0.5
y = np.array([1 if i > 0 else 0 for i in list(X[:, 0])])

ax1=plt.subplot(1,2,1) 
X=rotate(0,0,X[:, 0],X[:, 1],-8)
x1s = np.linspace(-0.75,0.75, 100)
x2s = np.linspace(-0.75,0.75, 100)
x1, x2 = np.meshgrid(x1s, x2s)
X_new = np.c_[x1.ravel(), x2.ravel()]
np.random.seed(62)
tree_clf = DecisionTreeClassifier()
tree_clf.fit(X, y)
y_pred = tree_clf.predict(X_new).reshape(x1.shape)
custom_cmap = ListedColormap(['b','r'])
plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs", label="Class 0")
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "r^", label="Class 1")
    
plt.xlabel(r"$X$", fontsize=18)
plt.ylabel(r"$y$", fontsize=18, rotation=0)
plt.title("Vertical Data", fontsize=16)
plt.legend(loc=2, fontsize=14)

ax2=plt.subplot(1,2,2) 
X=rotate(0,0,X[:, 0],X[:, 1],45)
np.random.seed(62) 
tree_clf = DecisionTreeClassifier(min_samples_leaf=3)
tree_clf.fit(X, y)
y_pred = tree_clf.predict(X_new).reshape(x1.shape)
custom_cmap = ListedColormap(['b','r'])
plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs", label="Class 0")
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "r^", label="Class 1")

plt.xlabel(r"$X$", fontsize=18)
plt.ylabel(r"$y$", fontsize=18, rotation=0)
plt.title("Rotate Data for 45 Degree", fontsize=16)
plt.legend(loc=2, fontsize=14)
plt.show()

Energy Efficiency Data Set¶

import pandas as pd

df = pd.read_csv('Building_Heating_Load.csv',na_values=['NA','?',' '])
df[0:5]

df_binary=df.copy()
df_binary.drop(['Heating Load','Multi-Classes'], axis=1, inplace=True)
df_binary[0:5]

df_binary['Binary Classes']=df_binary['Binary Classes'].replace('Low Level', 0)
df_binary['Binary Classes']=df_binary['Binary Classes'].replace('High Level', 1)
df_binary[0:5]

import numpy as np

np.random.seed(32) 
df_binary=df_binary.reindex(np.random.permutation(df_binary.index))
df_binary.reset_index(inplace=True, drop=True)
df_binary[0:10]

from sklearn.model_selection import StratifiedShuffleSplit

# Training and Test
spt = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
for train_idx, test_idx in spt.split(df_binary, df_binary['Binary Classes']):
    train_set_strat = df_binary.loc[train_idx]
    test_set_strat  = df_binary.loc[test_idx]

# Note that drop() creates a copy and does not affect train_set_strat
X_train = train_set_strat.drop("Binary Classes", axis=1)
y_train = train_set_strat["Binary Classes"].values

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train=X_train.copy()
X_train_Std=scaler.fit_transform(X_train)
# Traning Data
X_train_Std[0:5]

array([[ 2.00399847, -1.76197447, -0.55654271, -1.45169996,  1.        ,
        -0.45187215, -1.76353776, -1.81691156],
       [-0.05065135, -0.10607805,  2.23983391, -1.18233043,  1.        ,
        -1.34399272,  0.10919172,  0.7669534 ],
       [-0.05065135, -0.10607805,  2.23983391, -1.18233043,  1.        ,
         0.44024843, -1.01444597, -0.52497908],
       [ 0.88328039, -0.93402626, -0.55654271, -0.64359137,  1.        ,
        -1.34399272,  0.10919172,  0.7669534 ],
       [-0.98458308,  0.99785289,  0.00273262,  0.97262581, -1.        ,
         0.44024843, -1.01444597,  0.12098716]])

np.random.seed(42) 
tree_clf = DecisionTreeClassifier(max_depth=10,min_samples_split=5,min_samples_leaf=4)
tree_clf.fit(X_train_Std, y_train)

DecisionTreeClassifier(max_depth=10, min_samples_leaf=4, min_samples_split=5)

from sklearn.model_selection import cross_val_score

Accuracies=cross_val_score(tree_clf,X_train_Std, y_train, cv=4, scoring="accuracy")
Accuracies

array([0.93506494, 0.9025974 , 0.96732026, 0.96732026])

from sklearn.model_selection import cross_val_predict
y_train_pred = cross_val_predict(tree_clf,X_train_Std,y_train, cv=4)

from sklearn.metrics import precision_score, recall_score

precision=precision_score(y_train, y_train_pred)
print('Precision= ',precision)

recall=recall_score(y_train, y_train_pred)
print('Recall (sensitivity)= ',recall)

Precision=  0.8728813559322034
Recall (sensitivity)=  0.8373983739837398

Fine-Tune Your Model¶

Lets fine-tune hyperparameters of Descicion Trees max_depth,min_samples_leaf... with Scikit-Learn’ Grid Search and Randomized Search discussed in previous lecture.

Grid Search¶

from sklearn.model_selection import GridSearchCV

np.random.seed(42)
tree_clf = DecisionTreeClassifier()    
params = {'max_depth': [2,5,10,20,30,40],'max_features': [2,3,4,5],'min_samples_leaf': [2,3,5,7]}

tree_search_cv = GridSearchCV(tree_clf, params, cv=5,scoring='recall') 
tree_search_cv.fit(X_train,y_train)
tree_search_cv.best_params_

{'max_depth': 5, 'max_features': 3, 'min_samples_leaf': 5}

tree_search_cv.best_estimator_

DecisionTreeClassifier(max_depth=20, min_samples_leaf=2)

cvreslt=tree_search_cv.cv_results_
cvreslt_params=[str(i) for i in cvreslt["params"]]
for mean_score, params in sorted(zip(cvreslt["mean_test_score"], cvreslt_params)):
    print(np.sqrt(mean_score), params)

0.840238061504 {'max_depth': 2, 'min_samples_leaf': 2}
0.840238061504 {'max_depth': 2, 'min_samples_leaf': 3}
0.840238061504 {'max_depth': 2, 'min_samples_leaf': 5}
0.840238061504 {'max_depth': 2, 'min_samples_leaf': 7}
0.9146948489341495 {'max_depth': 10, 'min_samples_leaf': 7}
0.9146948489341495 {'max_depth': 30, 'min_samples_leaf': 7}
0.9190574882272962 {'max_depth': 20, 'min_samples_leaf': 7}
0.9190574882272962 {'max_depth': 40, 'min_samples_leaf': 7}
0.9192388155425117 {'max_depth': 10, 'min_samples_leaf': 3}
0.9192388155425117 {'max_depth': 20, 'min_samples_leaf': 3}
0.9194201070964966 {'max_depth': 20, 'min_samples_leaf': 5}
0.9194201070964966 {'max_depth': 40, 'min_samples_leaf': 5}
0.9196013629104008 {'max_depth': 30, 'min_samples_leaf': 5}
0.9235799911215054 {'max_depth': 5, 'min_samples_leaf': 7}
0.9239408350466315 {'max_depth': 10, 'min_samples_leaf': 5}
0.9279008567729636 {'max_depth': 5, 'min_samples_leaf': 2}
0.9279008567729636 {'max_depth': 5, 'min_samples_leaf': 3}
0.9279008567729636 {'max_depth': 5, 'min_samples_leaf': 5}
0.9280804562823922 {'max_depth': 10, 'min_samples_leaf': 2}
0.9280804562823922 {'max_depth': 30, 'min_samples_leaf': 3}
0.9280804562823922 {'max_depth': 40, 'min_samples_leaf': 2}
0.9280804562823922 {'max_depth': 40, 'min_samples_leaf': 3}
0.932559202767667 {'max_depth': 20, 'min_samples_leaf': 2}
0.932559202767667 {'max_depth': 30, 'min_samples_leaf': 2}

Randomized Search¶

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)
tree_clf = DecisionTreeClassifier()    
params = {'max_depth': randint(low=2, high=30),'min_samples_leaf': randint(low=2, high=10)}

tree_search_cv = RandomizedSearchCV(tree_clf, params,n_iter=50, cv=5,scoring='recall') 
tree_search_cv.fit(X_train,y_train)
tree_search_cv.best_params_

{'max_depth': 13, 'min_samples_leaf': 2}

cvreslt=tree_search_cv.cv_results_
cvreslt_params=[str(i) for i in cvreslt["params"]]
for mean_score, params in sorted(zip(cvreslt["mean_test_score"], cvreslt_params)):
    print(np.sqrt(mean_score), params)

0.840238061504 {'max_depth': 2, 'min_samples_leaf': 5}
0.840238061504 {'max_depth': 3, 'min_samples_leaf': 5}
0.910494371207203 {'max_depth': 11, 'min_samples_leaf': 9}
0.910494371207203 {'max_depth': 12, 'min_samples_leaf': 9}
0.910494371207203 {'max_depth': 19, 'min_samples_leaf': 9}
0.910494371207203 {'max_depth': 27, 'min_samples_leaf': 4}
0.910494371207203 {'max_depth': 9, 'min_samples_leaf': 9}
0.9146948489341495 {'max_depth': 11, 'min_samples_leaf': 7}
0.9146948489341495 {'max_depth': 22, 'min_samples_leaf': 8}
0.9146948489341495 {'max_depth': 27, 'min_samples_leaf': 7}
0.9146948489341495 {'max_depth': 28, 'min_samples_leaf': 8}
0.9146948489341495 {'max_depth': 29, 'min_samples_leaf': 6}
0.9148770409186143 {'max_depth': 16, 'min_samples_leaf': 4}
0.9148770409186143 {'max_depth': 18, 'min_samples_leaf': 4}
0.9148770409186143 {'max_depth': 24, 'min_samples_leaf': 4}
0.9148770409186143 {'max_depth': 6, 'min_samples_leaf': 4}
0.9148770409186143 {'max_depth': 7, 'min_samples_leaf': 3}
0.9150591966279193 {'max_depth': 24, 'min_samples_leaf': 5}
0.9150591966279193 {'max_depth': 25, 'min_samples_leaf': 5}
0.9190574882272962 {'max_depth': 10, 'min_samples_leaf': 8}
0.9190574882272962 {'max_depth': 13, 'min_samples_leaf': 6}
0.9190574882272962 {'max_depth': 16, 'min_samples_leaf': 7}
0.9190574882272962 {'max_depth': 17, 'min_samples_leaf': 8}
0.9190574882272962 {'max_depth': 23, 'min_samples_leaf': 6}
0.9190574882272962 {'max_depth': 27, 'min_samples_leaf': 6}
0.9190574882272962 {'max_depth': 29, 'min_samples_leaf': 8}
0.9190574882272962 {'max_depth': 29, 'min_samples_leaf': 8}
0.9190574882272962 {'max_depth': 7, 'min_samples_leaf': 7}
0.9190574882272962 {'max_depth': 8, 'min_samples_leaf': 6}
0.9190574882272962 {'max_depth': 9, 'min_samples_leaf': 6}
0.9190574882272962 {'max_depth': 9, 'min_samples_leaf': 8}
0.9194201070964966 {'max_depth': 15, 'min_samples_leaf': 3}
0.9194201070964966 {'max_depth': 22, 'min_samples_leaf': 3}
0.9194201070964966 {'max_depth': 26, 'min_samples_leaf': 4}
0.9194201070964966 {'max_depth': 26, 'min_samples_leaf': 5}
0.9194201070964966 {'max_depth': 8, 'min_samples_leaf': 5}
0.9196013629104008 {'max_depth': 11, 'min_samples_leaf': 5}
0.9196013629104008 {'max_depth': 18, 'min_samples_leaf': 5}
0.9196013629104008 {'max_depth': 19, 'min_samples_leaf': 5}
0.9196013629104008 {'max_depth': 20, 'min_samples_leaf': 5}
0.9196013629104008 {'max_depth': 29, 'min_samples_leaf': 5}
0.9235799911215054 {'max_depth': 4, 'min_samples_leaf': 7}
0.9235799911215054 {'max_depth': 4, 'min_samples_leaf': 7}
0.9235799911215054 {'max_depth': 5, 'min_samples_leaf': 7}
0.9237604307034012 {'max_depth': 19, 'min_samples_leaf': 3}
0.9239408350466315 {'max_depth': 22, 'min_samples_leaf': 5}
0.9279008567729636 {'max_depth': 5, 'min_samples_leaf': 3}
0.9280804562823922 {'max_depth': 13, 'min_samples_leaf': 2}
0.9280804562823922 {'max_depth': 13, 'min_samples_leaf': 3}
0.9280804562823922 {'max_depth': 27, 'min_samples_leaf': 2}

	X1	X2	y
0	2.200372	-0.415267	1.0
1	-0.992773	1.414216	0.0
2	0.960662	-0.496930	1.0
3	-0.440320	0.611729	0.0
4	0.873005	-0.905923	1.0
...	...	...	...
872	-0.600000	-0.800000	0.0
873	-0.700000	-0.500000	0.0
874	-0.900000	-0.700000	0.0
875	-0.700000	-0.200000	0.0
876	-0.800000	-1.100000	0.0

	Relative Compactness	Surface Area	Wall Area	Roof Area	Overall Height	Orientation	Heating Load	Binary Classes	Multi-Classes
0	0.98	514.5	294.0	110.25	7.0	2	15.55	Low Level	Level 2
1	0.98	514.5	294.0	110.25	7.0	3	15.55	Low Level	Level 2
2	0.98	514.5	294.0	110.25	7.0	4	15.55	Low Level	Level 2
3	0.98	514.5	294.0	110.25	7.0	5	15.55	Low Level	Level 2
4	0.90	563.5	318.5	122.50	7.0	2	20.84	Low Level	Level 2

	Relative Compactness	Surface Area	Wall Area	Roof Area	Overall Height	Orientation	Binary Classes
0	0.98	514.5	294.0	110.25	7.0	2	Low Level
1	0.98	514.5	294.0	110.25	7.0	3	Low Level
2	0.98	514.5	294.0	110.25	7.0	4	Low Level
3	0.98	514.5	294.0	110.25	7.0	5	Low Level
4	0.90	563.5	318.5	122.50	7.0	2	Low Level

	Relative Compactness	Surface Area	Wall Area	Roof Area	Overall Height	Orientation
0	0.98	514.5	294.0	110.25	7.0	2
1	0.98	514.5	294.0	110.25	7.0	3
2	0.98	514.5	294.0	110.25	7.0	4
3	0.98	514.5	294.0	110.25	7.0	5
4	0.90	563.5	318.5	122.50	7.0	2

	Relative Compactness	Surface Area	Wall Area	Roof Area	Overall Height	Orientation	Glazing Area	Glazing Area Distribution	Binary Classes
0	0.79	637.0	343.0	147.00	7.0	4	0.40	3	1
1	0.76	661.5	416.5	122.50	7.0	5	0.40	4	1
2	0.76	661.5	416.5	122.50	7.0	3	0.25	4	1
3	0.66	759.5	318.5	220.50	3.5	3	0.40	1	0
4	0.98	514.5	294.0	110.25	7.0	5	0.10	2	0
5	0.79	637.0	343.0	147.00	7.0	3	0.25	2	1
6	0.79	637.0	343.0	147.00	7.0	2	0.40	2	1
7	0.64	784.0	343.0	220.50	3.5	3	0.00	0	0
8	0.62	808.5	367.5	220.50	3.5	4	0.10	4	0
9	0.71	710.5	269.5	220.50	3.5	4	0.10	3	0