1.10. Decision Trees — Scikit-learn 1.1.1 Documentation
1.10.1. Classification#
DecisionTreeClassifier is a class capable of performing multi-class classification on a dataset.
As with other classifiers, DecisionTreeClassifier takes as input two arrays: an array X, sparse or dense, of shape (n_samples, n_features) holding the training samples, and an array Y of integer values, shape (n_samples,), holding the class labels for the training samples:
>>> fromsklearnimport tree >>> X = [[0, 0], [1, 1]] >>> Y = [0, 1] >>> clf = tree.DecisionTreeClassifier() >>> clf = clf.fit(X, Y)After being fitted, the model can then be used to predict the class of samples:
>>> clf.predict([[2., 2.]]) array([1])In case that there are multiple classes with the same and highest probability, the classifier will predict the class with the lowest index amongst those classes.
As an alternative to outputting a specific class, the probability of each class can be predicted, which is the fraction of training samples of the class in a leaf:
>>> clf.predict_proba([[2., 2.]]) array([[0., 1.]])DecisionTreeClassifier is capable of both binary (where the labels are [-1, 1]) classification and multiclass (where the labels are [0, …, K-1]) classification.
Using the Iris dataset, we can construct a tree as follows:
>>> fromsklearn.datasetsimport load_iris >>> fromsklearnimport tree >>> iris = load_iris() >>> X, y = iris.data, iris.target >>> clf = tree.DecisionTreeClassifier() >>> clf = clf.fit(X, y)Once trained, you can plot the tree with the plot_tree function:
>>> tree.plot_tree(clf) [...]
We can also export the tree in Graphviz format using the export_graphviz exporter. If you use the conda package manager, the graphviz binaries and the python package can be installed with conda install python-graphviz.
Alternatively binaries for graphviz can be downloaded from the graphviz project homepage, and the Python wrapper installed from pypi with pip install graphviz.
Below is an example graphviz export of the above tree trained on the entire iris dataset; the results are saved in an output file iris.pdf:
>>> importgraphviz >>> dot_data = tree.export_graphviz(clf, out_file=None) >>> graph = graphviz.Source(dot_data) >>> graph.render("iris")The export_graphviz exporter also supports a variety of aesthetic options, including coloring nodes by their class (or value for regression) and using explicit variable and class names if desired. Jupyter notebooks also render these plots inline automatically:
>>> dot_data = tree.export_graphviz(clf, out_file=None, ... feature_names=iris.feature_names, ... class_names=iris.target_names, ... filled=True, rounded=True, ... special_characters=True) >>> graph = graphviz.Source(dot_data) >>> graph
Alternatively, the tree can also be exported in textual format with the function export_text. This method doesn’t require the installation of external libraries and is more compact:
>>> fromsklearn.datasetsimport load_iris >>> fromsklearn.treeimport DecisionTreeClassifier >>> fromsklearn.treeimport export_text >>> iris = load_iris() >>> decision_tree = DecisionTreeClassifier(random_state=0, max_depth=2) >>> decision_tree = decision_tree.fit(iris.data, iris.target) >>> r = export_text(decision_tree, feature_names=iris['feature_names']) >>> print(r) |--- petal width (cm) <= 0.80 | |--- class: 0 |--- petal width (cm) > 0.80 | |--- petal width (cm) <= 1.75 | | |--- class: 1 | |--- petal width (cm) > 1.75 | | |--- class: 2Examples
Plot the decision surface of decision trees trained on the iris dataset
Understanding the decision tree structure
Từ khóa » C5.0
-
C5.0 Classification Models
-
[PDF] C50: C5.0 Decision Trees And Rule-Based Models
-
C5.0 Node - IBM
-
C5.0: An Informal Tutorial - RuleQuest
-
C5.0 Decision Trees And Rule-Based Models - Github Sites
-
C5.0 Decision Trees And Rule-Based Models • C50 - Github Sites
-
C5.0 Decision Tree Algorithm - RPubs
-
C5.fault: C5.0 Decision Trees And Rule-Based Models
-
[PDF] Decision Tree Classification Of Products Using C5.0 And Prediction ...
-
C4.5 Algorithm - Wikipedia
-
C5.0 Classification Algorithm And Application On Individual Credit ...
-
An R Package For Fitting Quinlan's C5.0 Classification Model - GitHub
-
Classification Of Data Using Decision Tree And Regression Tree Methods