Batch Normalization for Deep Learning

2023. 4. 5. 06:21Data science

반응형

*** This post is a summary version of the below website. The website has more details and references. 

https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/

 

One possible reason for the difficulties of having changeable inputs to layers deep in the network("Internal Covariate Shift) is due to updating weights per mini-batch. Batch Normalisation can stabilise the learning process and reduce the number of training epochs required. 

 

Batch Normalisation

: scales the layer output by standardising each input variable's activations per mini-batch. This is done by subtracting the mean of the batch from each value and dividing it by the standard deviation of the batch. This way, the rescaled data would have a mean of zero and a standard deviation of one. Therefore, it will allow all input variables to contribute to the model equally. "Whitening" is another term for Batch Norm. 

 

Use cases:

- Dramatic speed improvement of an inception-based convolutional neural network for photo classification 

- Used it after each convolution and before activation for a standard photo classification, significant outcome 

- Used in the updated inception model (GoogleNet Inception-v3) for the ImageNet dataset, an incredible outcome

- Used in the recurrent neural networks in their end-to-end model for speech recognition, improve the final generalization error and accelerating training

 

How to Use 

  1. Use before or after the activation function in the previous layer:
    1. after it, if s-shaped functions ( logistic, hyperbolic tangent ) 
    2. before it, if it results in non-Gaussian distributions like rectified linear activation function 
  2. Use a Higher Learning Rate: Observed that the training speedup with higher learning rates.
  3. It can be used in data preprocessing purposes if the data has different scaled variables.
  4. DO NOT use Dropout: As it reduces the generalization error, dropout is not required. 

https://gaussian37.github.io/dl-concept-batchnorm/

 

반응형