深度学习
  • 前言
  • 第一章:经典网络
    • ImageNet Classification with Deep Convolutional Neural Network
    • Very Deep Convolutional Networks for Large-Scale Image Recognition
    • Going Deeper with Convolutions
    • Deep Residual Learning for Image Recognition
    • PolyNet: A Pursuit of Structural Diversity in Very Deep Networks
    • Squeeze-and-Excitation Networks
    • Densely Connected Convolutional Networks
    • SQUEEZENET: ALEXNET-LEVEL ACCURACY WITH 50X FEWER PARAMETERS AND <0.5MB MODEL SIZE
    • MobileNet v1 and MobileNet v2
    • Xception: Deep Learning with Depthwise Separable Convolutions
    • Aggregated Residual Transformations for Deep Neural Networks
    • ShuffleNet v1 and ShuffleNet v2
    • CondenseNet: An Efficient DenseNet using Learned Group Convolution
    • Neural Architecture Search with Reinforecement Learning
    • Learning Transferable Architectures for Scalable Image Recognition
    • Progressive Neural Architecture Search
    • Regularized Evolution for Image Classifier Architecture Search
    • 实例解析:12306验证码破解
  • 第二章:自然语言处理
    • Recurrent Neural Network based Language Model
    • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
    • Neural Machine Translation by Jointly Learning to Align and Translate
    • Hierarchical Attention Networks for Document Classification
    • Connectionist Temporal Classification : Labelling Unsegmented Sequence Data with Recurrent Neural Ne
    • About Long Short Term Memory
    • Attention Is All you Need
    • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • 第三章:语音识别
    • Speech Recognition with Deep Recurrent Neural Network
  • 第四章:物体检测
    • Rich feature hierarchies for accurate object detection and semantic segmentation
    • Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
    • Fast R-CNN
    • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
    • R-FCN: Object Detection via Region-based Fully Convolutuional Networks
    • Mask R-CNN
    • You Only Look Once: Unified, Real-Time Object Detection
    • SSD: Single Shot MultiBox Detector
    • YOLO9000: Better, Faster, Stronger
    • Focal Loss for Dense Object Detection
    • YOLOv3: An Incremental Improvement
    • Learning to Segment Every Thing
    • SNIPER: Efficient Multi-Scale Training
  • 第五章:光学字符识别
    • 场景文字检测
      • DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images
      • Detecting Text in Natural Image with Connectionist Text Proposal Network
      • Scene Text Detection via Holistic, Multi-Channel Prediction
      • Arbitrary-Oriented Scene Text Detection via Rotation Proposals
      • PixelLink: Detecting Scene Text via Instance Segmentation
    • 文字识别
      • Spatial Transform Networks
      • Robust Scene Text Recognition with Automatic Rectification
      • Bidirectional Scene Text Recognition with a Single Decoder
      • multi-task learning for text recognition with joint CTC-attention
    • 端到端文字检测与识别
      • Reading Text in the Wild with Convolutional Neural Networks
      • Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework
    • 实例解析:字符验证码破解
    • 二维信息识别
      • 基于Seq2Seq的公式识别引擎
      • Show and Tell: A Neural Image Caption Generator
      • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
  • 第六章:语义分割
    • U-Net: Convolutional Networks for Biomedical Image Segmentation
  • 第七章:人脸识别
    • 人脸检测
      • DenseBox: Unifying Landmark Localization with End to End Object Detection
      • UnitBox: An Advanced Object Detection Network
  • 第八章:网络优化
    • Batch Normalization
    • Layer Normalization
    • Weight Normalization
    • Instance Normalization
    • Group Normalization
    • Switchable Normalization
  • 第九章:生成对抗网络
    • Generative Adversarial Nets
  • 其它应用
    • Holistically-Nested Edge Detection
    • Image Style Transfer Using Convolutional Nerual Networks
    • Background Matting: The World is Your Green Screen
  • Tags
  • References
由 GitBook 提供支持
在本页
  • Holistically-Nested Edge Detection
  • 前言
  • 总结

这有帮助吗?

  1. 其它应用

Holistically-Nested Edge Detection

上一页其它应用下一页Image Style Transfer Using Convolutional Nerual Networks

最后更新于4年前

这有帮助吗?

Holistically-Nested Edge Detection

tags: HED, Edge Detection

前言

本文提出了一个新的网络结构用于边缘检测,即本文的题目Holistically-Nested Network(HED)。其中Holistically表示该算法试图训练一个image-to-image的网络;Nested则强调在生成的输出过程中通过不断的集成和学习得到更精确的边缘预测图的过程。从图1中HED和传统Canny算法进行边缘检测的效果对比图我们可以看到HED的效果要明显优于Canny算子的。

图1:HED vs Canny

由于是HED是image-to-image的,所以该算法也很容易扩展到例如语义分割的其它领域。此外在OCR中的文字检测中,文字区域往往具有比较强的边缘特征,因此HED也可以扩展到场景文字检测中,著名的EAST [2]算法便得到了HED的启发。

1.1 HED的骨干网络

HED创作于2015年,使用了当时state-of-the-art的VGG-16作为骨干网络,并且使用迁移学习初始化了网络权重。

HED使用了多尺度的特征,类似多尺度特征的思想还有Inception,SSD,FPN等方法,对比如图2。

  • (a) Multi-stream learning: 使用不同结构,不同参数的网络训练同一副图片,类似的结构有Inception;

  • (b) Skip-layer network learning: 该结构有一个主干网络,在主干网络中添加若干条到输出层的skip-layer,类似的结构有FPN;

  • (c) Single model on multiple inputs: 该方法使用同一个网络,不同尺寸的输入图像得到不同尺度分Feature Map,YOLOv2采用了该方法;

  • (d) Training independent network: 使用完全独立的网络训练同一张图片,得到多个尺度的结果,该方法类似于集成模型;

  • (e) Holistically-Nested networks: HED采用的方法,下面详细介绍。

图2:几种提取多尺度特征的算法的网络结构

1.2 Holistically-Nested networks

Holistically-Nested networks的结构如图3以及下面代码:

图3:Holistically-Nested networks结构图

# Input
img_input = Input(shape=(480,480,3), name='input')
# Block 1
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
b1= side_branch(x, 1) # 480 480 1
x = MaxPooling2D((2, 2), strides=(2, 2), padding='same', name='block1_pool')(x) # 240 240 64
# Block 2
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
b2= side_branch(x, 2) # 480 480 1
x = MaxPooling2D((2, 2), strides=(2, 2), padding='same', name='block2_pool')(x) # 120 120 128
# Block 3
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
b3= side_branch(x, 4) # 480 480 1
x = MaxPooling2D((2, 2), strides=(2, 2), padding='same', name='block3_pool')(x) # 60 60 256
# Block 4
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
b4= side_branch(x, 8) # 480 480 1
x = MaxPooling2D((2, 2), strides=(2, 2), padding='same', name='block4_pool')(x) # 30 30 512
# Block 5
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x) # 30 30 512
b5= side_branch(x, 16) # 480 480 1
# fuse
fuse = Concatenate(axis=-1)([b1, b2, b3, b4, b5])
fuse = Conv2D(1, (1,1), padding='same', use_bias=False, activation=None)(fuse) # 480 480 1
# outputs
o1    = Activation('sigmoid', name='o1')(b1)
o2    = Activation('sigmoid', name='o2')(b2)
o3    = Activation('sigmoid', name='o3')(b3)
o4    = Activation('sigmoid', name='o4')(b4)
o5    = Activation('sigmoid', name='o5')(b5)
ofuse = Activation('sigmoid', name='ofuse')(fuse)
# model
model = Model(inputs=[img_input], outputs=[o1, o2, o3, o4, o5, ofuse])

无论从图3还是源码,VGG-16的骨干架构是非常明显的。在VGG-16的5个block的Max Pooling降采样之前,HED通过side_branch函数产生了5个分支,side_branch的源码如下

def side_branch(x, factor):
    x = Conv2D(1, (1, 1), activation=None, padding='same')(x)
    kernel_size = (2*factor, 2*factor)
    x = Conv2DTranspose(1, kernel_size, strides=factor, padding='same', use_bias=False, activation=None)(x)
    return x

其中Conv2DTranspose是反卷积操作,side_branch的输出特征向量的维度已反应在注释中。HED利用反卷积进行上采样的方法类似于DSSD。

HED的fuse branch层是由5个side_branch的输出通过Concatenate操作合并而成的。网络的5个side_branch和一个fuse branch通过sigmoid激活函数后共同作为网络的输出,每个输出的尺寸均和输入图像相同。

1.3 HED的损失函数

1.3.1 训练

设HED的训练集为S={(Xn,Yn),n=1,...,N}S=\{(X_n, Y_n), n=1,...,N\}S={(Xn​,Yn​),n=1,...,N},其中Xn={xj(n),j=1,...,∣Xn∣}X_n = \{x_j^{(n)}, j=1,...,|X_n|\}Xn​={xj(n)​,j=1,...,∣Xn​∣}表示原始输入图像,Yn={yj(n),j=1,...,∣Xn∣}Y_n = \{y_j^{(n)}, j=1,...,|X_n|\}Yn​={yj(n)​,j=1,...,∣Xn​∣}表示XnX_nXn​的二进制边缘标签map,故yj(n)∈{0,1}y_j^{(n)}\in\{0,1\}yj(n)​∈{0,1},∣Xn∣|X_n|∣Xn​∣是一张图像的像素点的个数。

假设VGG-16的网络的所有参数值为W\mathbf{W}W,如果网络有MMM个side branch的话,那么定义side branch的参数值为w=(w(1),...,w(M))\mathbf{w} = (\mathbf{w}^{(1)},...,\mathbf{w}^{(M)})w=(w(1),...,w(M)),则HED关于side branch的目标函数定义为:

Lside(W,w)=∑m=1Mαmℓside(m)(W,w(m))\mathcal{L}_{\text{side}}(\mathbf{W}, \mathbf{w}) = \sum^M_{m=1}\alpha_m \ell_{side}^{(m)}(\mathbf{W}, \mathbf{w}^{(m)})Lside​(W,w)=m=1∑M​αm​ℓside(m)​(W,w(m))

其中αm\alpha_mαm​表示每个side branch的损失函数的权值,可以根据训练日志进行调整或者均为1/5。

ℓside(m)(W,w(m))\ell_{side}^{(m)}(\mathbf{W},\mathbf{w}^{(m)})ℓside(m)​(W,w(m))是每个side branch的损失函数,该损失函数是一个类别平衡的交叉熵损失函数:

ℓside(m)(W,w(m))=−β∑j∈Y+logPr(yj=1∣X;W,w(m))−(1−β)∑j∈Y−logPr(yj=0∣X;W,w(m))\ell_{side}^{(m)}(\mathbf{W},\mathbf{w}^{(m)}) = -\beta\sum_{j\in Y_+}log \text{Pr}(y_j=1|X;\mathbf{W},\mathbf{w}^{(m)}) - (1-\beta) \sum_{j\in Y_-}log \text{Pr}(y_j=0|X;\mathbf{W},\mathbf{w}^{(m)})ℓside(m)​(W,w(m))=−βj∈Y+​∑​logPr(yj​=1∣X;W,w(m))−(1−β)j∈Y−​∑​logPr(yj​=0∣X;W,w(m))

其中β\betaβ适用于平衡边缘检测的正负样本不均衡的类别平衡权值,其中β=∣Y−∣∣Y∣\beta=\frac{|Y_-|}{|Y|}β=∣Y∣∣Y−​∣​, 1−β=∣Y+∣Y1-\beta = \frac{|Y_+|}{Y}1−β=Y∣Y+​∣​。∣Y+∣|Y_+|∣Y+​∣表示非边缘像素的个数,那么∣Y−∣|Y_-|∣Y−​∣则表示边缘像素的个数。

Y^side(m)=Pr(yj=1∣X;W,w(m))=σ(aj(m))\hat{Y}_{\text{side}}^{(m)} = \text{Pr}(y_j=1|X;\mathbf{W},\mathbf{w}^{(m)}) = \sigma(a_j^{(m)})Y^side(m)​=Pr(yj​=1∣X;W,w(m))=σ(aj(m)​)表示第mmm个side branch在第jjj个像素处预测的边缘值,σ()\sigma()σ()是sigmoid激活函数。

类别平衡损失函数实现如下

def cross_entropy_balanced(y_true, y_pred):
    _epsilon = _to_tensor(K.epsilon(), y_pred.dtype.base_dtype)
    y_pred   = tf.clip_by_value(y_pred, _epsilon, 1 - _epsilon)
    y_pred   = tf.log(y_pred/ (1 - y_pred))
    y_true = tf.cast(y_true, tf.float32)
    count_neg = tf.reduce_sum(1. - y_true)
    count_pos = tf.reduce_sum(y_true)
    beta = count_neg / (count_neg + count_pos)
    pos_weight = beta / (1 - beta)
    cost = tf.nn.weighted_cross_entropy_with_logits(logits=y_pred, targets=y_true, pos_weight=pos_weight)
    cost = tf.reduce_mean(cost * (1 - beta))
    return tf.where(tf.equal(count_pos, 0.0), 0.0, cost)

如图3所示,fuse层表示为m个side branch的加权和(代码中的1×11\times11×1卷积起到的作用),即Y^fuse≡σ(∑m=1MhmA^side(m))\hat{Y}_{\text{fuse}} \equiv \sigma(\sum_{m=1}^M h_m \hat{A}_{\text{side}}^{(m)})Y^fuse​≡σ(∑m=1M​hm​A^side(m)​),fuse层的损失函数1定义为:

Lfuse(W,w,h)=Dist(Y,Y^fuse)\mathcal{L}_{\text{fuse}}(\mathbf{W},\mathbf{w},\mathbf{h}) = \text{Dist}(Y, \hat{Y}_{\text{fuse}})Lfuse​(W,w,h)=Dist(Y,Y^fuse​)

其中Dist(⋅,⋅)\text{Dist}(\cdot,\cdot)Dist(⋅,⋅)表示交叉熵损失函数。源码中使用的是类别平衡的交叉熵损失函数,个人认为源码中的方案更科学。

最后,训练模型时的目标函数便是最小化side branch损失Lside(W,w)\mathcal{L}_{\text{side}}(\mathbf{W}, \mathbf{w})Lside​(W,w)以及fuse损失Lfuse(W,w,h)\mathcal{L}_{\text{fuse}}(\mathbf{W},\mathbf{w},\mathbf{h})Lfuse​(W,w,h)的和:

(W,w,h)⋆=argmin(Lside(W+Lfuse(W,w,h))(\mathbf{W},\mathbf{w},\mathbf{h})^{\star}= \text{argmin}(\mathcal{L}{\text{side}}(\mathbf{W}+\mathcal{L}{\text{fuse}}(\mathbf{W},\mathbf{w},\mathbf{h}))(W,w,h)⋆=argmin(Lside(W+Lfuse(W,w,h))

1.3.2 测试

给定一张图片XXX,HED预测MMM个side branch和一个fuse layer:

(Y^fuse,Y^side(1),...,Y^_side(1))=CNN(X,(W,w,h)⋆)(\hat{Y}_{\text{fuse}}, \hat{Y}_{\text{side}}^{(1)}, ..., \hat{Y}\_{\text{side}}^{(1)}) = CNN(X, (\mathbf{W},\mathbf{w},\mathbf{h})^\star)(Y^fuse​,Y^side(1)​,...,Y^_side(1))=CNN(X,(W,w,h)⋆)

HED的输出是所以side branch和fuse layer的均值:

Y^HED=Average(Y^fuse,Y^side(1),...,Y^side(1))\hat{Y}_{\text{HED}} = \text{Average}(\hat{Y}_{\text{fuse}}, \hat{Y}_{\text{side}}^{(1)}, ..., \hat{Y}_{\text{side}}^{(1)})Y^HED​=Average(Y^fuse​,Y^side(1)​,...,Y^side(1)​)

总结

我是在研究EAST的时候读到的这篇论文,EAST算法的核心之一是使用语义分割构建损失函数,而其语义分割的标签便是由类似HED的结构得到的。

从HED的实验结果可以看出,其边缘检测的效果着实经验,且测试非常快,具有非常光明的应用前景。

HED的缺点是模型过于庞大,Keras训练的模型超过了100MB,原因是fuse layer合并了VGG-16每个block的Feature Map,且每个side branch的尺寸均为输入图像的大小。由此引发了HED训练过程中显存占用问题,不过在目前GPU环境下训练HED算法还是没有问题的。

下面我们结合HED的对HED展开详细分析。

Keras源码