tdnn、tdnnf计算量分析

tdnn、tdnnf计算量分析

计算量,一般是指的1s内乘法的次数

一帧的计算量,一般就是参数量?

TDNN

  • tdnn结构:
1
2
3
4
5
6
7
8
9
10
11
fixed-affine-layer name=lda input=Append(-6,-3,0,3,6) affine-transform-file=$dir/configs/lda.mat
relu-renorm-layer name=tdnn1 dim=256 input=lda
relu-renorm-layer name=tdnn2 dim=256 input=Append(-3,6)
relu-renorm-layer name=tdnn3 dim=256 input=Append(-6,3)
relu-renorm-layer name=tdnn4 dim=256 input=Append(-9,9)
relu-renorm-layer name=tdnn5 dim=256 input=Append(-15,3)

relu-renorm-layer name=prefinal-chain input=tdnn5 dim=256 target-rms=0.5
output-layer name=output include-log-softmax=false dim=$num_targets max-change=1.5
relu-renorm-layer name=prefinal-xent input=tdnn5 dim=256 target-rms=0.5
output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor max-change=1.5
  • tdnn一帧计算量:
image-20211123163249337
  • tdnn网络图:

image-20211123163433352

可以看出,chain的跳帧,作用到最底层的输入,输入也是跳3帧的,相当于虽然左右跨帧27,27,但实际输入,参与计算的只有1/3的帧数,相当于速度3倍。

TDNNF

  • tdnnf结构
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
dim=320
bn_dim=32
outbn_dim=48
#dim=180
#bn_dim=20
#outbn_dim=32
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree | grep num-pdfs | awk '{print $2}')
learning_rate_factor=$(python3 -c "print(0.5/$xent_regularize)")
affine_opts="l2-regularize=0.01 dropout-proportion=0.0 dropout-per-dim=true dropout-per-dim-continuous=true"
tdnnf_opts="l2-regularize=0.01 dropout-proportion=0.0 bypass-scale=0.66"
linear_opts="l2-regularize=0.01 orthonormal-constraint=-1.0"
prefinal_opts="l2-regularize=0.01"
output_opts="l2-regularize=0.002"

mkdir -p $dir/configs
cat <<EOF > $dir/configs/network.xconfig
input dim=50 name=input

# please note that it is important to have input layer with the name=input
# as the layer immediately preceding the fixed-affine-layer to enable
# the use of short notation for the descriptor

fixed-affine-layer name=lda input=Append(-2,-1,0,1,2) affine-transform-file=$dir/configs/lda.mat
relu-batchnorm-dropout-layer name=tdnn1 $affine_opts dim=$dim
tdnnf-layer name=tdnnf2 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=1
tdnnf-layer name=tdnnf3 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=1
tdnnf-layer name=tdnnf4 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=1
tdnnf-layer name=tdnnf5 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=1
tdnnf-layer name=tdnnf6 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=0
tdnnf-layer name=tdnnf7 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf8 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf9 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf10 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf11 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf12 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
tdnnf-layer name=tdnnf13 $tdnnf_opts dim=$dim bottleneck-dim=$bn_dim time-stride=3
linear-component name=prefinal-l dim=$outbn_dim $linear_opts

prefinal-layer name=prefinal-chain input=prefinal-l $prefinal_opts big-dim=$dim small-dim=$outbn_dim
output-layer name=output include-log-softmax=false dim=$num_targets $output_opts

prefinal-layer name=prefinal-xent input=prefinal-l $prefinal_opts big-dim=$dim small-dim=$outbn_dim
output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor $output_opts
  • tdnnf一帧计算量:
image-20211123163259275
  • tdnnf网络图

image-20211123163600244

可以看出,前几层很多1的拼帧,所以虽然最上层输入给decoder的进行了下采样,到了底层,也还是要利用那么多帧,没有起到加速的效果。

  • 改进:不用1的拼帧,只用3的拼帧