如何设置batch_size?

看到群里有讨论,那么能不能具体确切的说一下batch_size的大小有什么影响?如果是多GPU的话又该怎么设置才能提高GPU利用率同时又会对结果造成怎样的影响?
已邀请:

YJango - 在日研究生,人工智能专业

赞同来自: jckzzu 搬砖工彭于晏哼

我直接引用大神Hinton的话
It is possible to update the weights after estimating the gradient on a single training case, but it is often more efficient to divide the training set into small “mini-batches” of 10 to 100 cases. This allows matrix-matrix multiplies to be used which is very advantageous on GPU boards or in Matlab
To avoid having to change the learning rate when the size of a mini-batch is changed, it is helpful to divide the total gradient computed on a mini-batch by the size of the mini-batch, so when talking about learning rates we will assume that they multiply the average, per-case gradient computed on a mini-batch, not the total gradient for the mini-batch.
It is a serious mistake to make the mini-batches too large when using stochastic gradient descent. Increasing the mini-batch size by a factor of N leads to a more reliable gradient estimate but it does not increase the maximum stable learning rate by a factor of N, so the net effect is that the weight updates are smaller per gradient evaluation.
A recipe for dividing the training set into mini-batches
For datasets that contain a small number of equiprobable classes, the ideal mini-batch size is often equal to the number of classes and each mini-batch should contain one example of each class to reduce the sampling error when estimating the gradient for the whole training set from a single mini-batch. For other datasets, first randomize the order of the training examples then use minibatches of size about 10.

YJango - 在日研究生,人工智能专业

赞同来自: jckzzu qingsong99

再有值得提的就是。很多设置都没有非常明确的指导。非常依赖你所做的任务,需要不断尝试。吴恩达做过一个关于深层学习的介绍。他描述了一个环。 idea--->experiments--->error--->改善idea 的循环。加油吧少年(这个环的更新基本取决于实验速度。。)

要回复问题请先登录注册