Vishal Rajput
Oct 27, 2023

--

I agree that it's impossible to visualize very big GBDT, but even calculating feature importance is quite a big deal.

Now coming to the second contention. GBDT do not overfit easily. but if the data is noisy it becomes highly overfit.

Sequential Learning: Gradient boosting works by building trees sequentially, where each tree tries to correct the errors of the previous ones. If the data contains a lot of noise, the model might end up fitting to that noise rather than the underlying pattern.

High Capacity: Boosting, by its nature, can achieve complex models. Given enough trees, it can fit very intricate patterns in the data. If those intricate patterns are just noise, the model will overfit.

So you see overfitting and underfitting is in comparison to something, in this case, GBDT compared to XGBoost 2, will overfit in complex scenarios.

--

--

No responses yet