深度学习
  • 前言
  • 第一章:经典网络
    • ImageNet Classification with Deep Convolutional Neural Network
    • Very Deep Convolutional Networks for Large-Scale Image Recognition
    • Going Deeper with Convolutions
    • Deep Residual Learning for Image Recognition
    • PolyNet: A Pursuit of Structural Diversity in Very Deep Networks
    • Squeeze-and-Excitation Networks
    • Densely Connected Convolutional Networks
    • SQUEEZENET: ALEXNET-LEVEL ACCURACY WITH 50X FEWER PARAMETERS AND <0.5MB MODEL SIZE
    • MobileNet v1 and MobileNet v2
    • Xception: Deep Learning with Depthwise Separable Convolutions
    • Aggregated Residual Transformations for Deep Neural Networks
    • ShuffleNet v1 and ShuffleNet v2
    • CondenseNet: An Efficient DenseNet using Learned Group Convolution
    • Neural Architecture Search with Reinforecement Learning
    • Learning Transferable Architectures for Scalable Image Recognition
    • Progressive Neural Architecture Search
    • Regularized Evolution for Image Classifier Architecture Search
    • 实例解析:12306验证码破解
  • 第二章:自然语言处理
    • Recurrent Neural Network based Language Model
    • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
    • Neural Machine Translation by Jointly Learning to Align and Translate
    • Hierarchical Attention Networks for Document Classification
    • Connectionist Temporal Classification : Labelling Unsegmented Sequence Data with Recurrent Neural Ne
    • About Long Short Term Memory
    • Attention Is All you Need
    • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • 第三章:语音识别
    • Speech Recognition with Deep Recurrent Neural Network
  • 第四章:物体检测
    • Rich feature hierarchies for accurate object detection and semantic segmentation
    • Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
    • Fast R-CNN
    • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
    • R-FCN: Object Detection via Region-based Fully Convolutuional Networks
    • Mask R-CNN
    • You Only Look Once: Unified, Real-Time Object Detection
    • SSD: Single Shot MultiBox Detector
    • YOLO9000: Better, Faster, Stronger
    • Focal Loss for Dense Object Detection
    • YOLOv3: An Incremental Improvement
    • Learning to Segment Every Thing
    • SNIPER: Efficient Multi-Scale Training
  • 第五章:光学字符识别
    • 场景文字检测
      • DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images
      • Detecting Text in Natural Image with Connectionist Text Proposal Network
      • Scene Text Detection via Holistic, Multi-Channel Prediction
      • Arbitrary-Oriented Scene Text Detection via Rotation Proposals
      • PixelLink: Detecting Scene Text via Instance Segmentation
    • 文字识别
      • Spatial Transform Networks
      • Robust Scene Text Recognition with Automatic Rectification
      • Bidirectional Scene Text Recognition with a Single Decoder
      • multi-task learning for text recognition with joint CTC-attention
    • 端到端文字检测与识别
      • Reading Text in the Wild with Convolutional Neural Networks
      • Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework
    • 实例解析:字符验证码破解
    • 二维信息识别
      • 基于Seq2Seq的公式识别引擎
      • Show and Tell: A Neural Image Caption Generator
      • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
  • 第六章:语义分割
    • U-Net: Convolutional Networks for Biomedical Image Segmentation
  • 第七章:人脸识别
    • 人脸检测
      • DenseBox: Unifying Landmark Localization with End to End Object Detection
      • UnitBox: An Advanced Object Detection Network
  • 第八章:网络优化
    • Batch Normalization
    • Layer Normalization
    • Weight Normalization
    • Instance Normalization
    • Group Normalization
    • Switchable Normalization
  • 第九章:生成对抗网络
    • Generative Adversarial Nets
  • 其它应用
    • Holistically-Nested Edge Detection
    • Image Style Transfer Using Convolutional Nerual Networks
    • Background Matting: The World is Your Green Screen
  • Tags
  • References
由 GitBook 提供支持
在本页
  • 前言
  • 1. GN详解
  • 1.1 GN算法
  • 1.2 GN的伪代码
  • 1.2 GN的原理
  • 总结

这有帮助吗?

  1. 第八章:网络优化

Group Normalization

上一页Instance Normalization下一页Switchable Normalization

最后更新于4年前

这有帮助吗?

tags: Normalization

前言

Group Normalization(GN)是何恺明团队提出的一种归一化策略,它是介于(LN)和 (IN)之间的一种折中方案,图1最右。它通过将通道数据分成几组计算归一化统计量,因此GN也是和批量大小无关的算法,因此可以用在batchsize比较小的环境中。作者在论文中指出GN要比LN和IN的效果要好。

图1:从左到右依次是BN,LN,IN以及GN

1. GN详解

1.1 GN算法

和之前所有介绍过的归一化算法相同,GN也是根据该层的输入数据计算均值和方差,然后使用这两个值更新输入数据:

μi=1m∑k∈Sixkσi=1m∑k∈Si(xk−μi)2+ϵx^i=1σi(xi−μi)\mu_i = \frac{1}{m}\sum_{k \in \mathcal{S}_i} x_k \qquad \sigma_i = \sqrt{\frac{1}{m}\sum_{k \in \mathcal{S}_i}(x_k-\mu_i)^2 + \epsilon} \qquad \hat{x}_i = \frac{1}{\sigma_i} (x_i-\mu_i)μi​=m1​k∈Si​∑​xk​σi​=m1​k∈Si​∑​(xk​−μi​)2+ϵ​x^i​=σi​1​(xi​−μi​)

之前所介绍的所有归一化方法均可以使用上面式子进行概括,区别它们的是Si\mathcal{S}_iSi​是如何取得的:

对于BN来说,它是取不同batch的同一个channel上的所有的值:

Si={k∣kC=iC}\mathcal{S}_i = \{k | k_C = i_C\}Si​={k∣kC​=iC​}

而LN是从同一个batch的不同的channel上取所有的值:

Si={k∣kN=iN}\mathcal{S}_i = \{k | k_N = i_N\}Si​={k∣kN​=iN​}

IN即不跨batch,也不跨channel:

Si={k∣kN=iN,kC=iC}\mathcal{S}_i = \{k | k_N = i_N, k_C = i_C\}Si​={k∣kN​=iN​,kC​=iC​}

GN是将Channel分成若干组,只使用组内的数据计算均值和方差。通常组数GGG是一个超参数,TensorFlow中的默认值是32。

Si={k∣kN=iN,⌊kCC/G⌋=⌊iCC/G⌋}\mathcal{S}_i = \{k | k_N = i_N, \lfloor \frac{k_C}{C/G}\rfloor = \lfloor \frac{i_C}{C/G}\rfloor\}Si​={k∣kN​=iN​,⌊C/GkC​​⌋=⌊C/GiC​​⌋}

我们可以看出,当GN的组数为1时,此时GN和LN等价;当GN的组数为通道数时,GN和IN等价。

GN和其它算法一样也可以添加参数γ\gammaγ和β\betaβ来保证网络的容量。

1.2 GN的伪代码

论文中给出了基于TensorFlow的GN额源码:

1 def GroupNorm(x, gamma, beta, G, eps=1e−5):
2     # x: input features with shape [N,C,H,W]
3     # gamma, beta: scale and offset, with shape [1,C,1,1]
4     # G: number of groups for GN
5     N, C, H, W = x.shape
6     x = tf.reshape(x, [N, G, C // G, H, W])
7     mean, var = tf.nn.moments(x, [2, 3, 4], keep dims=True) 
8     x = (x − mean) / tf.sqrt(var + eps)
9     x = tf.reshape(x, [N, C, H, W]) 
10    return x * gamma + beta

第6行代码将Tensor中添加一个’组‘的维度,形成一个五维张量。第7行的axes的值为[2,3,4]表明计算归一化统计量时即不会跨batch,也不会跨组。

1.2 GN的原理

在深度学习之前,传统的SIFT,HOG等算法均由按组统计特征的特性,它们一般将同一个种类的特征归为一组,然后在进行组归一化。在深度学习中,每个通道的Feature Map也可以看做结构化的特征向量。如果一个Feature Map的卷积数足够多,那么必然有一些通道的特征是类似的,因此我们可以将这些类似的特征进行归一化处理。

作者认为,GN比LN效果好的原因是GN比LN的限制更少,因为LN假设了一个层的所有通道的数据共享一个均值和方差。而IN则丢失了探索通道之间依赖性的能力。

总结

作为一种介于IN和LN之间的归一化策略,GN的效果反而优于另外两个算法,这令我非常困惑。虽然作者也尝试给出解释,但总是感觉这个解释有些过于主观,有根据结果推导原因的嫌疑。另外我也做了一些归一化方法的对比实验,实验结果并不如作者所说的那么理想。所以我们在设计网络时,如果batchsize尺寸可以做的比较大,BN仍旧是最优的选择。但是如果batchsize比较小,也许通过对照实验选出最好的归一化策略是最优的选择。

Layer Normalization
Instance Normalization
图1:从左到右依次是BN,LN,IN以及GN