Train-Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=0)
# the first 4 parameters are the ones we will get (X_train, X_test, y_train, y_test)
# the arguments in the brackets are the ones we are using (X, y in this case)
# test_size=0.2 means that 20% of the data will be used for test and 80 for training
# different values of random_state will produce different splits, but the same value will always produce the same split.X_train, X_test, y_train, y_test = train_test_split(
df[features], df.label, test_size=0.2)
# here X is df[features], a column called "features" from the dataframe called df.
# and y is "label" column from df dataframe, just defined another way.from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)
y_predicted = clf.predict(X_test)
# import the classifier class and create an object.
# with .fit we acctualy train our model, and in brackets on the .fit we put our training data (X_train, y_train)
# at the end we use .predict method on our trained classifier (clf) using the testing dataset (X_test) to test out model and store the output in y_predicted.Full Train & Test example:
Last updated

