FCN做图像分割,训练时loss一直稳定在一个值附近,请大牛指点

1. Train data:   10000 X 3 X 256 X 256
   Train label: 10000 X 1 X 256 X 256 (二值mask, 0为背景, 1为前景)
   生成LMDB时train_data.txt 和 train_label.txt中的文件名是一一对应的。

2.  model prototxt 如下:
   
name: "FCN"
force_backward: true
layer {
  name: "data"
  type: "Data"
  top: "data"
  transform_param {
    mirror: false
    crop_size: 0
    mean_value: 77
  }
  data_param {
    source: "F:\\fcn\\train_data_lmdb"
    batch_size: 1
    backend: LMDB
  }
}
layer {
  name: "label"
  type: "Data"
  top: "label"
  data_param {
    source: "F:\\fcn\\train_label_lmdb"
    batch_size: 1
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    pad: 100
    kernel_size: 11
    group: 1
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    stride: 1
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 1
    stride: 1
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    stride: 1
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    stride: 1
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "Convolution"
  bottom: "pool5"
  top: "fc6"
  convolution_param {
    num_output: 4096
    pad: 0
    kernel_size: 6
    group: 1
    stride: 1
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "Convolution"
  bottom: "fc6"
  top: "fc7"
  convolution_param {
    num_output: 4096
    pad: 0
    kernel_size: 1
    group: 1
    stride: 1
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "score_fr"
  type: "Convolution"
  bottom: "fc7"
  top: "score_fr"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 2
    pad: 0
    kernel_size: 1
  }
}
layer {
  name: "upscore"
  type: "Deconvolution"
  bottom: "score_fr"
  top: "upscore"
  param {
    lr_mult: 0
  }
  convolution_param {
    num_output: 2
    bias_term: false
    kernel_size: 63
    stride: 32
  }
}
layer {
  name: "score"
  type: "Crop"
  bottom: "upscore"
  bottom: "data"
  top: "score"
  crop_param {
    axis: 2
    offset: 18
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "score"
  bottom: "label"
  top: "loss"
  loss_param {
    ignore_label: 255
    normalize: true
  }
  exclude{
      stage: "deploy"}
}

3. 训练过程见附件,loss值从网络训练一开始就一直稳定在0.69附近

请大牛指点。
多谢!
Capture.JPG
已邀请:

jiongnima

赞同来自: caffe

本问题已经解决了,是由于反卷积层初始化的问题,但是解决方法不是fine-tune,而是采用作者提供的数据层(python_layer),然后在反卷积层名中保留"up"字样。
详见http://blog.csdn.net/jiongnima ... 49326

爱在路上 - 想成为菜鸟中的战斗机

赞同来自:

初始状态为均匀分布,每个类别的分类概率均为0.001,损失函数-ln(0.001)=6.9,loss一直为6.9,俗称高原反应,说明训练还没有收敛的迹象,建议调大学习速率,或者修改权值初始化方式

Eric_X

赞同来自:

Hello, 你好

非常感谢你的指点,但是我再修改了权值的初始化方式和调大学习率以后,loss值依然稳定在0.692附近,请问是不是网络的层设计的有问题呢?多谢!

Artisgrammer - 刚毕业的渣

赞同来自:

和我问题差不多,我的更加严重一些,是保持一直不变,目前正在想办法解决,你有新进展的话,希望告知我一下

Artisgrammer - 刚毕业的渣

赞同来自:

目测我的跟你一样,只不过loss只给看6位有效数字所以我看不到后面的微小变化

Artisgrammer - 刚毕业的渣

赞同来自:

我的这个我感觉是训练集的图像相似程度太高了,准备切到fcn8s试试,毕竟fcn32s精度太低

莫言chank

赞同来自:

数据的准备可能错了

风兮兮

赞同来自:

请问下如何构建FCN,deconv的参数和crop的参数怎么确定

feynman

赞同来自:

请问你的问题解决了吗,我也遇到了这个问题,求指教!

Jungle_KingKing - 90后IT

赞同来自:

请问解决了么,我也遇到了

阮晋dolphin

赞同来自:

我也遇到了,怎么解决啊

hongzhiyang

赞同来自:

我也遇到了,但是loss很大

NAVY_navy

赞同来自:

 我也遇到这个问题了,一开始loss就是五六十万,请问怎么解决啊、

要回复问题请先登录注册