Caffe 训练时 accuracy 始终为1是什么原因?

Test net output #0: accuracy = 1
Test net output #1: loss = 0 (* 1 = 0 loss)
已邀请:

gybheroin - 90后研究僧

赞同来自:

应该是过拟合了,你把基础学习率调小十倍试一下效果。

辛淼 - CaffeCN社区创始人。北航PhD,目前在哈佛大学工作。

赞同来自:

所以你的test集和train集是有重叠吗?

ling_yu - 92年

赞同来自:

我也是被这个问题困扰了很久啊,我做图像分割的。name: "segnet"
layer {
  name: "features"
  type: "Data"
  top: "features"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "project_shuo/lung_seg/train_db/features"
    batch_size: 10
    backend: LMDB
  }
}

layer {
  name: "labels"
  type: "Data"
  top: "labels"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "project_shuo/lung_seg/train_db/labels"
    batch_size: 10
    backend: LMDB
  }
}

layer {
  name: "features"
  type: "Data"
  top: "features"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "project_shuo/lung_seg/train_db/features"
    batch_size: 10
    backend: LMDB
  }
}

layer {
  name: "labels"
  type: "Data"
  top: "labels"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "project_shuo/lung_seg/train_db/labels"
    batch_size: 10
    backend: LMDB
  }
}

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "features"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 10
    pad: 3
    kernel_size: 7
    stride: 1
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "conv1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 1
    kernel_size: 5
    pad: 2
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}


layer {
  name: "Sigmoid"
  type: "Sigmoid"
  bottom: "conv2"
  top: "conv2"
}


layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "conv2"
  bottom: "labels"
  top: "accuracy"
  include {
    phase: TEST
  }
}

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "conv2"
  bottom: "labels"
  top: "loss"
}
不知道怎么搞,数据集是从digits做的。运行的时候一直这样:
 Waiting for data
I0413 08:23:03.386610  8579 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:23:03.517233  8578 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:23:11.045024  8574 solver.cpp:398]     Test net output #0: accuracy = 1
I0413 08:23:11.045071  8574 solver.cpp:398]     Test net output #1: loss = 0 (* 1 = 0 loss)
I0413 08:23:15.755558  8574 solver.cpp:219] Iteration 0 (-1.60881e-39 iter/s, 43.229s/50 iters), loss = 0
I0413 08:23:15.755609  8574 solver.cpp:238]     Train net output #0: loss = 0 (* 1 = 0 loss)
I0413 08:23:15.755637  8574 sgd_solver.cpp:105] Iteration 0, lr = 0.01
I0413 08:24:24.764686  8577 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:24:24.787492  8576 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:24:43.214902  8574 solver.cpp:331] Iteration 20, Testing net (#0)
I0413 08:25:13.256340  8579 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:25:13.279400  8578 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:25:20.754653  8574 solver.cpp:398]     Test net output #0: accuracy = 1
I0413 08:25:20.754693  8574 solver.cpp:398]     Test net output #1: loss = 0 (* 1 = 0 loss)
I0413 08:26:34.470402  8577 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:26:34.493942  8576 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:26:52.914747  8574 solver.cpp:331] Iteration 40, Testing net (#0)
I0413 08:27:23.093843  8579 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:27:23.106179  8578 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:27:30.570837  8574 solver.cpp:398]     Test net output #0: accuracy = 1
I0413 08:27:30.570880  8574 solver.cpp:398]     Test net output #1: loss = 0 (* 1 = 0 loss)
I0413 08:28:21.290916  8574 solver.cpp:219] Iteration 50 (0.163647 iter/s, 305.535s/50 iters), loss = 0
I0413 08:28:21.291126  8574 solver.cpp:238]     Train net output #0: loss = 0 (* 1 = 0 loss)
I0413 08:28:21.291141  8574 sgd_solver.cpp:105] Iteration 50, lr = 0.00996266
I0413 08:28:44.416146  8577 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:28:44.438606  8576 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:29:02.889833  8574 solver.cpp:331] Iteration 60, Testing net (#0)
I0413 08:29:34.297574  8579 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:29:34.316784  8578 data_layer.cpp:73] Restarting data prefetching from start.
I0413 08:29:42.330623  8574 solver.cpp:398]     Test net output #0: accuracy = 1
I0413 08:29:42.330667  8574 solver.cpp:398]     Test net output #1: loss = 0 (* 1 = 0 loss)
请问楼主解决了吗
了吗,那个准确率一直是1的问题

phoenixbai

赞同来自:

一种可能是,你训练数据没有打散,即,负样本超级多后,大部分batch里全是负样本。
最好每个batch里即有负样本,又有一定比例的正样本.
 
顺便,还可以把训练了来的模型的参数值打出来看看,可能绝对直超大或0的值吧?

huxuanlai - https://linkedin.com/in/huxuanlai

赞同来自:

已经试过:
1. base_lr从0.01调小到0.001
2. train/val目录数据无重叠
3. train.txt和val.txt中数据都shuffle了
都没解决。

下面方法可行:
图片的分类标签label需要是int类型,caffe的convert_imageset工具/create_lmdb_train.sh脚本对应的train.txt/val.txt中每行的格式需要是"图片路径 分类标签的整数表示"的形式。

参见:https://github.com/BVLC/caffe/ ... t.cpp 
其中label = atoi(line.substr(pos + 1).c_str());

要回复问题请先登录注册