Author: Arnaud Autef

<aside> 💡 In this reading group session, we present the 2021 Deep Learning (DL) paper "Regularization is all you Need: Simple Neural Nets can Excel on Tabular Data".

In this empirical paper, authors argue that simple neural networks architectures can reach state-of-the-art performance for supervised learning on tabular data.

It goes against the common wisdom that Gradient Boosting with Decision Trees (GBDT) are superior to DL approaches.

The main insight of the paper is that regularization techniques are key to unlock higher performance with neural networks. As long as a broad array of regularization approaches are considered during hyperparameters optimization, neural networks should prevail.

In this session we,

Review some DL basics relevant to the paper
Review some basics about regularization in supervised learning and regularization techniques for Deep Learning.
Discuss empirical results presented in the paper and derive conclusions for practicioners.

</aside>

Contents

1 - What is Deep Learning?

https://twitter.com/ylecun/status/1209497021398343680?lang=fr

Unfortunately, and as we can read above, DL is not quite precisely defined, and we are not going to try and define it!

Here, we restrict ourselves to Multi-Layer-Perceptrons (MLPs)

They are the "simplest" of DL models, almost always covered first in introductory DL lectures.
This is the neural network architecture used in the paper discussed today.

What is an MLP?

Setup

We consider a supervised regression setting with dataset $\mathcal{D} = (x_i, y_i)_{1 \le i \le n}$ where

$x_i \in \mathbb{R}^d$ input features of dimension $d$
$y_i \in \mathbb{R}$ responses to model