[D] stupid question: why doesn't deep learning always outperform classical machine learning and statistical models?

2543 shaares
86 private links

2543 shaares · 86 private links

Filters

Links per page

20 50 100

[D] stupid question: why doesn't deep learning always outperform classical machine learning and statistical models?

A colleague was saying the other day, that deep learning models are like sports cars - they need a minimum distance to accelerate before they can reach their top speed. The same way, deep learning models don't perform well on smaller data sets where there is no room (i.e. not enough data) to rev their engine. Thats why a mountain bike (e.g. CART decision tree) can navigate a trail in a forest compared to a Ferrari (e.g. convoloution neural network).

I really liked my colleagues analogy, but is there any math theory to support what they are saying? Are complex models (e.g. neural networks, svm) naturally (through their mathematical architecture) more susceptible to overfitting than a logistic regression or decision tree when exposed to smaller data? I feel there is an unspoken rule: "in general, use complicated models on complicated data". But is there any mathematical justification to support this?

I understand that sometimes deep learning models perform poorly because the analyst might not know how to use them properly (e.g. hyperparameter tuning) - but this doesn't reflect the model itself.

I know there is a theorem called the "no free lunch theorem" that shows by default, "there is no single best algorithm for all problem" - but can this theorem be used to somehow justify that smaller datasets don't require conplex models? I.e. is there some way to show that more complex models (e.g. suppose we quantify model complexity through the VC dimensionality) dont necessarily produce lower generalization error on smaller datasets?

So, given a very powerful computer that can simultaneously consider millions of hyperparameter combinations: can it be statistically shown that more complex models are not necessarily better for smaller data sets (e.g. iris data)?

Thanks

reddit permalink

March 25, 2021 at 9:21:41 PM GMT+1 * · permalink

https://www.reddit.com/r/statistics/comments/mca93b/d_stupid_question_why_doesnt_deep_learning_always/

Filters

Links per page

20 50 100