1.10. Decision Trees — Scikit-learn 1.1.1 Documentation

Có thể bạn quan tâm

1.10.1. Classification#

DecisionTreeClassifier is a class capable of performing multi-class classification on a dataset.

As with other classifiers, DecisionTreeClassifier takes as input two arrays: an array X, sparse or dense, of shape (n_samples, n_features) holding the training samples, and an array Y of integer values, shape (n_samples,), holding the class labels for the training samples:

>>> fromsklearnimport tree >>> X = [[0, 0], [1, 1]] >>> Y = [0, 1] >>> clf = tree.DecisionTreeClassifier() >>> clf = clf.fit(X, Y)

After being fitted, the model can then be used to predict the class of samples:

>>> clf.predict([[2., 2.]]) array([1])

In case that there are multiple classes with the same and highest probability, the classifier will predict the class with the lowest index amongst those classes.

As an alternative to outputting a specific class, the probability of each class can be predicted, which is the fraction of training samples of the class in a leaf:

>>> clf.predict_proba([[2., 2.]]) array([[0., 1.]])

DecisionTreeClassifier is capable of both binary (where the labels are [-1, 1]) classification and multiclass (where the labels are [0, …, K-1]) classification.

Using the Iris dataset, we can construct a tree as follows:

>>> fromsklearn.datasetsimport load_iris >>> fromsklearnimport tree >>> iris = load_iris() >>> X, y = iris.data, iris.target >>> clf = tree.DecisionTreeClassifier() >>> clf = clf.fit(X, y)

Once trained, you can plot the tree with the plot_tree function:

>>> tree.plot_tree(clf) [...]
../_images/sphx_glr_plot_iris_dtc_002.png
Alternative ways to export trees#

We can also export the tree in Graphviz format using the export_graphviz exporter. If you use the conda package manager, the graphviz binaries and the python package can be installed with conda install python-graphviz.

Alternatively binaries for graphviz can be downloaded from the graphviz project homepage, and the Python wrapper installed from pypi with pip install graphviz.

Below is an example graphviz export of the above tree trained on the entire iris dataset; the results are saved in an output file iris.pdf:

>>> importgraphviz >>> dot_data = tree.export_graphviz(clf, out_file=None) >>> graph = graphviz.Source(dot_data) >>> graph.render("iris")

The export_graphviz exporter also supports a variety of aesthetic options, including coloring nodes by their class (or value for regression) and using explicit variable and class names if desired. Jupyter notebooks also render these plots inline automatically:

>>> dot_data = tree.export_graphviz(clf, out_file=None, ... feature_names=iris.feature_names, ... class_names=iris.target_names, ... filled=True, rounded=True, ... special_characters=True) >>> graph = graphviz.Source(dot_data) >>> graph
../_images/iris.svg
../_images/sphx_glr_plot_iris_dtc_001.png

Alternatively, the tree can also be exported in textual format with the function export_text. This method doesn’t require the installation of external libraries and is more compact:

>>> fromsklearn.datasetsimport load_iris >>> fromsklearn.treeimport DecisionTreeClassifier >>> fromsklearn.treeimport export_text >>> iris = load_iris() >>> decision_tree = DecisionTreeClassifier(random_state=0, max_depth=2) >>> decision_tree = decision_tree.fit(iris.data, iris.target) >>> r = export_text(decision_tree, feature_names=iris['feature_names']) >>> print(r) |--- petal width (cm) <= 0.80 | |--- class: 0 |--- petal width (cm) > 0.80 | |--- petal width (cm) <= 1.75 | | |--- class: 1 | |--- petal width (cm) > 1.75 | | |--- class: 2

Examples

  • Plot the decision surface of decision trees trained on the iris dataset

  • Understanding the decision tree structure

Từ khóa » C5.0