Probability of not getting picked: $1-\frac{1}{n}$

If you sample $n$ elements with replacement, the probability for each element of not getting picked in the sample is: $(1-\frac{1}{n})^n$

As ${n\to\infty}$, this probability approaches $\frac{1}{e}\approx.368$

Thus, $0.632$ of the data points in your original sample show up in the Bootstrap sample (the other $0.368$ won't be present in it)

Bagging

Train a tree on each bootstrap sample, and average their predictions (Bootstrap Aggregating)

Random Forests

Like bagging, but removes correlation among trees.

At each split, considers only a subset of predictors.

Notes:
Random forests typically consider $\sqrt{\textrm{number of features}}$ or $1/3$ at each split (if 10 features, then it considers ~3 features), and picks the best one.

Random Forests

Random Forest Lab

Other Ensemble Models

Gradient Boosted Decision Trees

Sequential method

Fit trees to residuals (on first iteration, residuals are the true predictions)

Idea: build tree more slowly

These trees are NOT independent of each other

GBDT

Y

Prediction

Residual

$40$

$35$

$5$

$60$

$67$

$-7$

$30$

$28$

$2$

$33$

$32$

$1$

$25$

$27$

$-2$

Y

Prediction

Residual

$5$

$3$

$2$

$-7$

$-4$

$-3$

$2$

$3$

$-1$

$1$

$0$

$1$

$-2$

$-2$

$0$

Y

Prediction

$40$

$38$

$60$

$63$

$30$

$31$

$33$

$32$

$25$

$25$

Kaggle

Data Science and Machine Learning Competition site