While training a deep learning network, we want the model to perform not only better on the training data but also on the testing data. If a model performs better on the training data however, performs worse on the testing or unseen data, then the model is said to be overfitted. Thus, in a deep learning network, the steps that help in avoiding overfitting are collectively referred to as regularization.
But, when it comes to unseen data, they are not able to properly do it. In such cases, we utilize regularization. Various techniques can be done to overcome model overfitting. The various regularization techniques in deep learning are:
- L1 and L2 Regularization
- Data Augmentation
L1 and L2 Regularization
In both L1 and L2 regularization, a penalty term λ is added to the loss function:
L= Le = ∑(y’−y)^2
L= Le + λ/2 Lw
If L is the loss function, then λ term is added to the loss function equation. Here, during the gradient descent process, the weights are optimized, and a lambda parameter is also used to reduce the values of weight coefficients.
For example, consider two equations:
Y = 5000X^2 + 2000X
Z = 5X^2 + 2X
Here, X is the input. The behavior of the network won’t change too much for small changes in the values of X in equation Z. However; it will make a huge difference in equation Y.
The L1 and L2 regularization both have their own significance, but their main idea is to introduce a penalty parameter that reduces the size of weight coefficients. This method is one of the most optimal methods to avoid overfitting in a deep neural network.
Dropout is a famous regularization technique that helps in reducing the chance of overfitting errors in deep neural networks. A deep neural network has a large number of neurons and is complex; complex networks tend to overfit, as the neurons on one layer are dependent on the other previous layer.
This makes the model simpler, since a single neuron can better extract features from the input instead of using multiple neurons. This can be even more effective on unseen data as well. On the other hand, only a few neurons can learn compared to the previous network, so training time is increased; however, the overall time taken by an epoch is reduced.
Deep learning models require a large amount of data to train, but, in real-world applications, it is hard to gather and annotate data. If the models are trained on a small set of data, then the model may seem to train quickly; however, such models will not correctly classify the results. The model overfits, so data augmentation techniques can be considered necessary for the model to be able to learn without overfitting.
Some angles rotate some images; some images are zoomed, some are translated, some are scaled, etc. But, the process may be a little different when it comes to augmenting a sound dataset. For sound datasets, augmentation can be achieved by noise injection, shifting, change of pitch and speed, and much more.
Now the data is increased by n times from a single file. Also, real-world images are disturbed due to blurriness, so such effects can also be used to introduce variation in the training set. When the training data size is large enough, the deep learning models can train easily. Thus overfitting can be overcome.
Other Causes of Overfitting
Not only regularization, but there are also other techniques like weight initialization, learning rate schedule, etc, which could cause overfitting. For those issues, we can find help through various deep learning platforms. Especially for learning rate scheduling, PyTorch learning rate schedulers can also be utilized.
Addressing Inaccurate Classification
To summarize, overfitting is a common issue for deep learning development which can be resolved using various regularization techniques. Among them, L1 and L2 are fairly popular regularization methods in the case of classical machine learning; while dropout and data augmentation are more suitable and recommended for overfitting issues in the case of deep neural networks.