kaldi conv-relu-batchnorm-layer

conv-relu-batchnorm-layer

xconfig/convolution.py

src/nnet3/nnet-convolutional-component.h

nnet3-TimeHeightConvolution代码解读

1
2
conv-relu-batchnorm-layer name=cnn1 $cnn_opts height-in=50 height-out=50 time-offsets=-1,0,1 height-offsets=-1,0,1 num-filters-out=64
conv-relu-batchnorm-layer name=cnn3 $cnn_opts height-in=50 height-out=25 height-subsample-out=2 time-offsets=-1,0,1 height-offsets=-1,0,1 num-filters-out=128

参数解释

  • height-offsets:卷积核大小,dim方向的卷积(比如dim=50)(if height-offsets=-1,0,1 then height 10 at the output would take input from heights 9,10,11 at the input.)
  • num-filters-out:The output dimension of this layer is num-filters-out * height-out,卷积核的数量
  • num-filters-in 和 height-in:假设上一层的输出是2048,那么这里就会要求num-filters-in × \times× height-in == 2048。假设我们设height-in是1024,那么num-filters-in 就是2
  • height-out 和 height-subsample-out:期望的输出维度,是对于一个height-in来说的。height-subsample-out就是在维度上的一个降采样率。这里会有个限制 height-out × height-subsample-out <= height-in,这一层最终输出的维度是height-out × num-filters-out。
  • height-offsets, time-offsets: 这就是我们需要的卷积核的样子了。或者如果不想这么定义的话,可以直接定义offsets即可。
  • required-time-offsets:这个实际上是可以忽略的,一般来说,required-time-offsets和time-offsets是等价的。