**Gradient Boosted Machine**

A gradient boosted model

**can be either regression or classification**. Both are forward-learning ensemble methods that obtain predictive results through gradually improved estimations.

**Boosting is a flexible nonlinear regression procedure that helps improve the accuracy of trees**. By sequentially applying weak classification algorithms to the incrementally changed data, a series of decision trees are created that produce an ensemble of weak prediction models.

While boosting trees increases their accuracy, it also decreases speed and interpretability. The gradient boosting method generalizes tree boosting to minimize these issues.

After creating a GBM, H2O displays the confusion matrix that shows the classifications for each group, the associated error by group, and the overall average error.

**Summary of Features H2O's GBM functionalities include:**

- supervised learning for regression and classification tasks
- distributed and parallelized computation on either a single node or a multi-node cluster
- fast and memory-efficient Java implementations of the underlying algorithms
- an elegant web interface to mirror the model building and scoring process running in R
- grid search for hyperparameter optimization and model selection
- model export in plain java code for deployment in production environments
- additional parameters for model tuning

**Theory and Framework**

Gradient boosting is a machine learning technique that combines two powerful tools:

**gradient-based optimization and boosting**. Gradient-based optimization uses gradient computations in order to minimize a model's loss function with respect to the training data. Boosting additively collects an ensemble of weak models in order to ultimately create a strong learning system for predictive tasks. Here we consider gradient boosting in the example of K-class classification, although the model for regression follows similar logic. The following analysis follows from the discussion in Hastie et al (2010).

**GBM for classification**

In the above algorithm, the index m tracks of the number of weak learners added to the current ensemble. Within this outer loop, there is an inner loop across each of the $K$ classes. In this inner loop, the first step is to compute the residuals, rikm, which are actually the gradient values, for each of the N bins in the CART model, and then to fit a regression tree to these gradient computations. This fitting process is distributed and parallelized, and details on this framework can be found on the h2o.ai blog at http://h2o.ai/blog/2013/10/building-distributed-gbm-h2o/.

The final procedure in the inner loop is to add to the current model to the fitted regression tree, which improves the accuracy of the model during the inherent gradient descent step. After M iterations, the final "boosted" model can be tested out on new data.

The final procedure in the inner loop is to add to the current model to the fitted regression tree, which improves the accuracy of the model during the inherent gradient descent step. After M iterations, the final "boosted" model can be tested out on new data.

**Deep Learning**

**H2O's Deep Learning Architecture**

H2O follows the model of multi-layer, feedforward neural networks for predictive modeling. This section provides a more detailed description of H2O’s Deep Learning features, parameter configurations, and computational implementation.

**Summary of Features H2O’s Deep Learning functionalities include:**

- purely supervised training protocol for regression and classification tasks
- multi-threaded parallel computation to be run on either a single node or a multi-node cluster
- advanced training options including adaptive learning, momentum training, rate annealing, and dropout
- regularization options to prevent model overfitting
- fast and memory-efficient Java implementations of the underlying algorithms
- elegant web interface to mirror the model building and scoring process running in R
- grid search for hyperparameter optimization and model selection
- model checkpointing
- model export in plain java code for deployment in production environments
- additional parameters for model tuning
- deep autoencoders for unsupervised feature learning and anomaly detection capabilities