2023. 4. 5. 06:21ㆍData science
*** This post is a summary version of the below website. The website has more details and references.
https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/
One possible reason for the difficulties of having changeable inputs to layers deep in the network("Internal Covariate Shift) is due to updating weights per mini-batch. Batch Normalisation can stabilise the learning process and reduce the number of training epochs required.
Batch Normalisation
: scales the layer output by standardising each input variable's activations per mini-batch. This is done by subtracting the mean of the batch from each value and dividing it by the standard deviation of the batch. This way, the rescaled data would have a mean of zero and a standard deviation of one. Therefore, it will allow all input variables to contribute to the model equally. "Whitening" is another term for Batch Norm.
Use cases:
- Dramatic speed improvement of an inception-based convolutional neural network for photo classification
- Used it after each convolution and before activation for a standard photo classification, significant outcome
- Used in the updated inception model (GoogleNet Inception-v3) for the ImageNet dataset, an incredible outcome
- Used in the recurrent neural networks in their end-to-end model for speech recognition, improve the final generalization error and accelerating training
How to Use
- Use before or after the activation function in the previous layer:
- after it, if s-shaped functions ( logistic, hyperbolic tangent )
- before it, if it results in non-Gaussian distributions like rectified linear activation function
- Use a Higher Learning Rate: Observed that the training speedup with higher learning rates.
- It can be used in data preprocessing purposes if the data has different scaled variables.
- DO NOT use Dropout: As it reduces the generalization error, dropout is not required.