tdnn、tdnnf计算量分析

计算量，一般是指的1s内乘法的次数

一帧的计算量，一般就是参数量？

TDNN

tdnn结构：

fixed-affine-layer name=lda input=Append(-6,-3,0,3,6) affine-transform-file=$dir/configs/lda.mat
relu-renorm-layer name=tdnn1 dim=256 input=lda
relu-renorm-layer name=tdnn2 dim=256 input=Append(-3,6)
relu-renorm-layer name=tdnn3 dim=256 input=Append(-6,3)
relu-renorm-layer name=tdnn4 dim=256 input=Append(-9,9)
relu-renorm-layer name=tdnn5 dim=256 input=Append(-15,3)

relu-renorm-layer name=prefinal-chain input=tdnn5 dim=256 target-rms=0.5
output-layer name=output include-log-softmax=false dim=$num_targets max-change=1.5
relu-renorm-layer name=prefinal-xent input=tdnn5 dim=256 target-rms=0.5
output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor max-change=1.5

tdnn一帧计算量：

tdnn网络图：

可以看出，chain的跳帧，作用到最底层的输入，输入也是跳3帧的，相当于虽然左右跨帧27，27，但实际输入，参与计算的只有1/3的帧数，相当于速度3倍。

TDNNF

tdnnf结构

dim=320
bn_dim=32
outbn_dim=48
#dim=180
#bn_dim=20
#outbn_dim=32
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree | grep num-pdfs | awk '{print $2}')
learning_rate_factor=$(python3 -c "print(0.5/$xent_regularize)")
affine_opts="l2-regularize=0.01 dropout-proportion=0.0 dropout-per-dim=true dropout-per-dim-continuous=true"
tdnnf_opts="l2-regularize=0.01 dropout-proportion=0.0 bypass-scale=0.66"
linear_opts="l2-regularize=0.01 orthonormal-constraint=-1.0"
prefinal_opts="l2-regularize=0.01"
output_opts="l2-regularize=0.002"

mkdir -p $dir/configs
cat <<EOF > $dir/configs/network.xconfig
input dim=50 name=input

# please note that it is important to have input layer with the name=input
# as the layer immediately preceding the fixed-affine-layer to enable
# the use of short notation for the descriptor

fixed-affine-layer name=lda input=Append(-2,-1,0,1,2) affine-transform-file=$dir/configs/lda.mat
relu-batchnorm-dropout-layer name=tdnn1 $affine_opts dim=$dim
tdnnf-layer name=tdnnf2 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=1
tdnnf-layer name=tdnnf3 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=1
tdnnf-layer name=tdnnf4 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=1
tdnnf-layer name=tdnnf5 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=1
tdnnf-layer name=tdnnf6 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=0
tdnnf-layer name=tdnnf7 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf8 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf9 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf10 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf11 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf12 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf13 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
linear-component name=prefinal-l dim=$outbn_dim $linear_opts

prefinal-layer name=prefinal-chain input=prefinal-l $prefinal_opts big-dim=$dim small-dim=$outbn_dim
output-layer name=output include-log-softmax=false dim=$num_targets $output_opts

prefinal-layer name=prefinal-xent input=prefinal-l $prefinal_opts big-dim=$dim small-dim=$outbn_dim
output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor $output_opts

tdnnf一帧计算量：

tdnnf网络图

可以看出，前几层很多1的拼帧，所以虽然最上层输入给decoder的进行了下采样，到了底层，也还是要利用那么多帧，没有起到加速的效果。

改进：不用1的拼帧，只用3的拼帧