Predict Method For EXtreme Gradient Boosting Model In Xgboost

Có thể bạn quan tâm

rdrr.io Find an R package R language docs Run R in your browser Package index Search the xgboost package Vignettes

XGBoost for R introduction
XGBoost from JSON

Functions 214 Source code 48 Man pages 61

a-compatibility-note-for-saveRDS-save: Model Serialization and Compatibility
agaricus.test: Test part from Mushroom Data Set
agaricus.train: Training part from Mushroom Data Set
coef.xgb.Booster: Extract coefficients from linear booster
dimnames.xgb.DMatrix: Handling of column names of 'xgb.DMatrix'
dim.xgb.DMatrix: Dimensions of xgb.DMatrix
getinfo: Get or set information of xgb.DMatrix and xgb.Booster objects
predict.xgb.Booster: Predict method for XGBoost model
predict.xgboost: Compute predictions from XGBoost model on new data
print.xgb.Booster: Print xgb.Booster
print.xgb.cv: Print xgb.cv result
print.xgb.DMatrix: Print xgb.DMatrix
print.xgboost: Print info from XGBoost model
variable.names.xgb.Booster: Get Features Names from Booster
xgb.attr: Accessors for serializable attributes of a model
xgb.Callback: XGBoost Callback Constructor
xgb.cb.cv.predict: Callback for returning cross-validation based predictions
xgb.cb.early.stop: Callback to activate early stopping
xgb.cb.evaluation.log: Callback for logging the evaluation history
xgb.cb.gblinear.history: Callback for collecting coefficients history of a gblinear...
xgb.cb.print.evaluation: Callback for printing the result of evaluation
xgb.cb.reset.parameters: Callback for resetting booster parameters at each iteration
xgb.cb.save.model: Callback for saving a model file
xgb.config: Accessors for model parameters as JSON string
xgbConfig: Set and get global configuration
xgb.copy.Booster: Deep-copies a Booster Object
xgb.create.features: Create new features from a previously learned model
xgb.cv: Cross Validation
xgb.DataBatch: Structure for Data Batches
xgb.DataIter: XGBoost Data Iterator
xgb.DMatrix: Construct xgb.DMatrix object
xgb.DMatrix.hasinfo: Check whether DMatrix object has a field
xgb.DMatrix.save: Save xgb.DMatrix object to binary file
xgb.dump: Dump an XGBoost model in text format.
xgb.ExtMemDMatrix: DMatrix from External Data
xgb.gblinear.history: Extract gblinear coefficients history
xgb.get.DMatrix.data: Get DMatrix Data
xgb.get.DMatrix.num.non.missing: Get Number of Non-Missing Entries in DMatrix
xgb.get.DMatrix.qcut: Get Quantile Cuts from DMatrix
xgb.get.num.boosted.rounds: Get number of boosting in a fitted booster
xgb.importance: Feature importance
xgb.is.same.Booster: Check if two boosters share the same C object
xgb.load: Load XGBoost model from binary file
xgb.load.raw: Load serialised XGBoost model from R's raw vector
xgb.model.dt.tree: Parse model text dump
xgb.model.parameters: Accessors for model parameters
xgboost: Fit XGBoost Model
xgboost-options: XGBoost Options
xgb.params: XGBoost Parameters
xgb.plot.deepness: Plot model tree depth
Browse all...

Home / CRAN / xgboost / predict.xgb.Booster: Predict method for XGBoost model predict.xgb.Booster: Predict method for XGBoost model In xgboost: Extreme Gradient Boosting

View source: R/xgb.Booster.R

predict.xgb.Booster

R Documentation

Predict method for XGBoost model

Description

Predict values on data based on XGBoost model.

Usage

## S3 method for class 'xgb.Booster' predict( object, newdata, missing = NA, outputmargin = FALSE, predleaf = FALSE, predcontrib = FALSE, approxcontrib = FALSE, predinteraction = FALSE, training = FALSE, iterationrange = NULL, strict_shape = FALSE, avoid_transpose = FALSE, validate_features = FALSE, base_margin = NULL, ... )

Arguments

object	Object of class xgb.Booster.
newdata	Takes data.frame, matrix, dgCMatrix, dgRMatrix, dsparseVector, local data file, or xgb.DMatrix. For single-row predictions on sparse data, it is recommended to use CSR format. If passing a sparse vector, it will take it as a row vector. Note that, for repeated predictions on the same data, one might want to create a DMatrix to pass here instead of passing R types like matrices or data frames, as predictions will be faster on DMatrix. If newdata is a data.frame, be aware that: Columns will be converted to numeric if they aren't already, which could potentially make the operation slower than in an equivalent matrix object. The order of the columns must match with that of the data from which the model was fitted (i.e. columns will not be referenced by their names, just by their order in the data), unless passing validate_features = TRUE (which is not the default). If the model was fitted to data with categorical columns, these columns must be of factor type here, and must use the same encoding (i.e. have the same levels). If newdata contains any factor columns, they will be converted to base-0 encoding (same as during DMatrix creation) - hence, one should not pass a factor under a column which during training had a different type. Any columns with type other than factor will be interpreted as numeric.
missing	Float value that represents missing values in data (e.g., 0 or some other extreme value). This parameter is not used when newdata is an xgb.DMatrix - in such cases, should pass this as an argument to the DMatrix constructor instead.
outputmargin	Whether the prediction should be returned in the form of original untransformed sum of predictions from boosting iterations' results. E.g., setting outputmargin = TRUE for logistic regression would return log-odds instead of probabilities.
predleaf	Whether to predict per-tree leaf indices.
predcontrib	Whether to return feature contributions to individual predictions (see Details).
approxcontrib	Whether to use a fast approximation for feature contributions (see Details).
predinteraction	Whether to return contributions of feature interactions to individual predictions (see Details).
training	Whether the prediction result is used for training. For dart booster, training predicting will perform dropout.
iterationrange	Sequence of rounds/iterations from the model to use for prediction, specified by passing a two-dimensional vector with the start and end numbers in the sequence (same format as R's seq - i.e. base-1 indexing, and inclusive of both ends). For example, passing c(1,20) will predict using the first twenty iterations, while passing c(1,1) will predict using only the first one. If passing NULL, will either stop at the best iteration if the model used early stopping, or use all of the iterations (rounds) otherwise. If passing "all", will use all of the rounds regardless of whether the model had early stopping or not. Not applicable to gblinear booster.
strict_shape	Whether to always return an array with the same dimensions for the given prediction mode regardless of the model type - meaning that, for example, both a multi-class and a binary classification model would generate output arrays with the same number of dimensions, with the 'class' dimension having size equal to '1' for the binary model. If passing FALSE (the default), dimensions will be simplified according to the model type, so that a binary classification model for example would not have a redundant dimension for 'class'. See documentation for the return type for the exact shape of the output arrays for each prediction mode.
avoid_transpose	Whether to output the resulting predictions in the same memory layout in which they are generated by the core XGBoost library, without transposing them to match the expected output shape. Internally, XGBoost uses row-major order for the predictions it generates, while R arrays use column-major order, hence the result needs to be transposed in order to have the expected shape when represented as an R array or matrix, which might be a slow operation. If passing TRUE, then the result will have dimensions in reverse order - for example, rows will be the last dimensions instead of the first dimension.
validate_features	When TRUE, validate that the Booster's and newdata's feature_names match (only applicable when both object and newdata have feature names). If the column names differ and newdata is not an xgb.DMatrix, will try to reorder the columns in newdata to match with the booster's. If the booster has feature types and newdata is either an xgb.DMatrix or data.frame, will additionally verify that categorical columns are of the correct type in newdata, throwing an error if they do not match. If passing FALSE, it is assumed that the feature names and types are the same, and come in the same order as in the training data. Note that this check might add some sizable latency to the predictions, so it's recommended to disable it for performance-sensitive applications.
base_margin	Base margin used for boosting from existing model (raw score that gets added to all observations independently of the trees in the model). If supplied, should be either a vector with length equal to the number of rows in newdata (for objectives which produces a single score per observation), or a matrix with number of rows matching to the number rows in newdata and number of columns matching to the number of scores estimated by the model (e.g. number of classes for multi-class classification). Note that, if newdata is an xgb.DMatrix object, this argument will be ignored as it needs to be added to the DMatrix instead (e.g. by passing it as an argument in its constructor, or by calling setinfo.xgb.DMatrix().
...	Not used.

Details

Note that iterationrange would currently do nothing for predictions from "gblinear", since "gblinear" doesn't keep its boosting history.

One possible practical applications of the predleaf option is to use the model as a generator of new features which capture non-linearity and interactions, e.g., as implemented in xgb.create.features().

Setting predcontrib = TRUE allows to calculate contributions of each feature to individual predictions. For "gblinear" booster, feature contributions are simply linear terms (feature_beta * feature_value). For "gbtree" booster, feature contributions are SHAP values (Lundberg 2017) that sum to the difference between the expected output of the model and the current prediction (where the hessian weights are used to compute the expectations). Setting approxcontrib = TRUE approximates these values following the idea explained in http://blog.datadive.net/interpreting-random-forests/.

With predinteraction = TRUE, SHAP values of contributions of interaction of each pair of features are computed. Note that this operation might be rather expensive in terms of compute and memory. Since it quadratically depends on the number of features, it is recommended to perform selection of the most important features first. See below about the format of the returned results.

The predict() method uses as many threads as defined in xgb.Booster object (all by default). If you want to change their number, assign a new number to nthread using xgb.model.parameters<-(). Note that converting a matrix to xgb.DMatrix() uses multiple threads too.

Value

A numeric vector or array, with corresponding dimensions depending on the prediction mode and on parameter strict_shape as follows:

If passing strict_shape=FALSE:

For regression or binary classification: a vector of length nrows.
For multi-class and multi-target objectives: a matrix of dimensions ⁠[nrows, ngroups]⁠.

Note that objective variant multi:softmax defaults towards predicting most likely class (a vector nrows) instead of per-class probabilities.
For predleaf: a matrix with one column per tree.

For multi-class / multi-target, they will be arranged so that columns in the output will have the leafs from one group followed by leafs of the other group (e.g. order will be group1:feat1, group1:feat2, ..., group2:feat1, group2:feat2, ...).

If there is more than one parallel tree (e.g. random forests), the parallel trees will be the last grouping in the resulting order, which will still be 2D.
For predcontrib: when not multi-class / multi-target, a matrix with dimensions ⁠[nrows, nfeats+1]⁠. The last "+ 1" column corresponds to the baseline value.

For multi-class and multi-target objectives, will be an array with dimensions ⁠[nrows, ngroups, nfeats+1]⁠.

The contribution values are on the scale of untransformed margin (e.g., for binary classification, the values are log-odds deviations from the baseline).
For predinteraction: when not multi-class / multi-target, the output is a 3D array of dimensions ⁠[nrows, nfeats+1, nfeats+1]⁠. The off-diagonal (in the last two dimensions) elements represent different feature interaction contributions. The array is symmetric w.r.t. the last two dimensions. The "+ 1" columns corresponds to the baselines. Summing this array along the last dimension should produce practically the same result as predcontrib = TRUE.

For multi-class and multi-target, will be a 4D array with dimensions ⁠[nrows, ngroups, nfeats+1, nfeats+1]⁠

If passing strict_shape=TRUE, the result is always a matrix (if 2D) or array (if 3D or higher):

For normal predictions, the dimension is ⁠[nrows, ngroups]⁠.
For predcontrib=TRUE, the dimension is ⁠[nrows, ngroups, nfeats+1]⁠.
For predinteraction=TRUE, the dimension is ⁠[nrows, ngroups, nfeats+1, nfeats+1]⁠.
For predleaf=TRUE, the dimension is ⁠[nrows, niter, ngroups, num_parallel_tree]⁠.

If passing avoid_transpose=TRUE, then the dimensions in all cases will be in reverse order - for example, for predinteraction, they will be ⁠[nfeats+1, nfeats+1, ngroups, nrows]⁠ instead of ⁠[nrows, ngroups, nfeats+1, nfeats+1]⁠.

References

Scott M. Lundberg, Su-In Lee, "A Unified Approach to Interpreting Model Predictions", NIPS Proceedings 2017, https://arxiv.org/abs/1705.07874
Scott M. Lundberg, Su-In Lee, "Consistent feature attribution for tree ensembles", https://arxiv.org/abs/1706.06060

Examples

## binary classification: data(agaricus.train, package = "xgboost") data(agaricus.test, package = "xgboost") ## Keep the number of threads to 2 for examples nthread <- 2 data.table::setDTthreads(nthread) train <- agaricus.train test <- agaricus.test bst <- xgb.train( data = xgb.DMatrix(train$data, label = train$label, nthread = 1), nrounds = 5, params = xgb.params( max_depth = 2, nthread = nthread, objective = "binary:logistic" ) ) # use all trees by default pred <- predict(bst, test$data) # use only the 1st tree pred1 <- predict(bst, test$data, iterationrange = c(1, 1)) # Predicting tree leafs: # the result is an nsamples X ntrees matrix pred_leaf <- predict(bst, test$data, predleaf = TRUE) str(pred_leaf) # Predicting feature contributions to predictions: # the result is an nsamples X (nfeatures + 1) matrix pred_contr <- predict(bst, test$data, predcontrib = TRUE) str(pred_contr) # verify that contributions' sums are equal to log-odds of predictions (up to float precision): summary(rowSums(pred_contr) - qlogis(pred)) # for the 1st record, let's inspect its features that had non-zero contribution to prediction: contr1 <- pred_contr[1,] contr1 <- contr1[-length(contr1)] # drop intercept contr1 <- contr1[contr1 != 0] # drop non-contributing features contr1 <- contr1[order(abs(contr1))] # order by contribution magnitude old_mar <- par("mar") par(mar = old_mar + c(0,7,0,0)) barplot(contr1, horiz = TRUE, las = 2, xlab = "contribution to prediction in log-odds") par(mar = old_mar) ## multiclass classification in iris dataset: lb <- as.numeric(iris$Species) - 1 num_class <- 3 set.seed(11) bst <- xgb.train( data = xgb.DMatrix(as.matrix(iris[, -5], nthread = 1), label = lb), nrounds = 10, params = xgb.params( max_depth = 4, nthread = 2, subsample = 0.5, objective = "multi:softprob", num_class = num_class ) ) # predict for softmax returns num_class probability numbers per case: pred <- predict(bst, as.matrix(iris[, -5])) str(pred) # convert the probabilities to softmax labels pred_labels <- max.col(pred) - 1 # the following should result in the same error as seen in the last iteration sum(pred_labels != lb) / length(lb) # compare with predictions from softmax: set.seed(11) bst <- xgb.train( data = xgb.DMatrix(as.matrix(iris[, -5], nthread = 1), label = lb), nrounds = 10, params = xgb.params( max_depth = 4, nthread = 2, subsample = 0.5, objective = "multi:softmax", num_class = num_class ) ) pred <- predict(bst, as.matrix(iris[, -5])) str(pred) all.equal(pred, pred_labels) # prediction from using only 5 iterations should result # in the same error as seen in iteration 5: pred5 <- predict(bst, as.matrix(iris[, -5]), iterationrange = c(1, 5)) sum(pred5 != lb) / length(lb) xgboost documentation built on Feb. 10, 2026, 5:11 p.m. xgboost index XGBoost for R introduction XGBoost from JSON

R Package Documentation

rdrr.io home R language documentation Run R code online

Browse R Packages

CRAN packages Bioconductor packages R-Forge packages GitHub packages

We want your feedback!

Note that we can't provide technical support on individual packages. You should contact the package authors for that. Tweet to @rdrrHQ GitHub issue tracker [email protected]

Personal blog What can we improve? The page or its content looks wrong I can't find what I'm looking for I have a suggestion Other Extra info (optional) Submit Improve this page Embedding an R snippet on your website

Add the following code to your website.

REMOVE THIS Copy to clipboard

For more information on customizing the embed code, read Embedding Snippets.

Từ khóa » Xgb

Predict Method For EXtreme Gradient Boosting Model In Xgboost

xgboost Extreme Gradient Boosting

Predict method for XGBoost model

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to predict.xgb.Booster in xgboost...

R Package Documentation

Browse R Packages

We want your feedback!

XGBoost Documentation — Xgboost 1.6.1 Documentation

XGBoost - Wikipedia

A Beginner's Guide To XGBoost. This Article Will Have Trees…. Lots Of…

Dmlc/xgboost: Scalable, Portable And Distributed Gradient Boosting ...

What Is XGBoost? | Data Science | NVIDIA Glossary

A Gentle Introduction To XGBoost For Applied Machine Learning

[PDF] Xgboost: Extreme Gradient Boosting

XGB Series - LS Electric

ain: EXtreme Gradient Boosting Training In Xgboost

IShares Core Canadian Government Bond Index ETF | XGB

How To Use XgBoost Classifier And Regressor In Python? - ProjectPro

XGBoost Algorithm - Amazon SageMaker - AWS Documentation

XGB SERIES Base Unit Standard XBM-DR16S - GOTO Automation

Liên Hệ