调试decoder_main

调试decoder_main

用的cgdb调试

首先之前编译cmake时,没有加可选debug的选项,因此不可调试,在CMakeLists.txt加入

1
2
3
4
SET(CMAKE_BUILD_TYPE "Debug") 

SET(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -ggdb")
SET(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall")

重新编译:在build里:

1
2
3
4
# cmake clean ..
cmake -DCMAKE_BUILD_TYPE=Debug ..
cmake -DCMAKE_BUILD_TYPE=Debug --build .
make

然后在Libtorch下调试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cgdb build/bin/decoder_main
# 然后在界面中打断点:比如
b asr_decoder.cc:194
# 无tlg
run --chunk_size -1 --wav_scp /home/yelong/data/wenet/examples/aishell/s0/data/xueyuan/wav.scp.1 --model_path /home/yelong/data/wenet/examples/aishell/s0/exp/aban-c009/final.zip --unit_path /home/yelong/data/wenet/examples/aishell/s0/exp/aban-c009/lang.char.txt

# 不调试运行就是:
./build/bin/decoder_main --chunk_size -1 --wav_scp /home/yelong/data/wenet/examples/aishell/s0/data/xueyuan/wav.scp.1 --model_path /home/yelong/data/wenet/examples/aishell/s0/exp/aban-c009/final.zip --unit_path /home/yelong/data/wenet/examples/aishell/s0/exp/aban-c009/lang.char.txt

# 要打印n_best:加 --output_nbest true

# tlg
run --acoustic_scale 5 --beam 15.0 --lattice_beam 7.5 --max_active 7000 --ctc_weight 1 --rescoring_weight 0 --chunk_size -1 --fst_path /data_local/yelong/wenet/examples/aishell/s0/exp/aban-c009/lang_test_aban-c009_lin/TLG.fst --wav_scp /home/yelong/data/wenet/examples/aishell/s0/data/xueyuan/wav.scp.1 --model_path /home/yelong/data/wenet/examples/aishell/s0/exp/aban-c009/final.zip --dict_path /home/yelong/data/wenet/examples/aishell/s0/exp/aban-c009/lang_test_aban-c009_lin/words.txt --unit_path /home/yelong/data/wenet/examples/aishell/s0/exp/aban-c009/lang_test_aban-c009_lin/units.txt

run --acoustic_scale 5 --beam 15.0 --lattice_beam 7.5 --max_active 7000 --ctc_weight 1 --rescoring_weight 0 --chunk_size -1 --fst_path /data_local/yelong/wenet/examples/aishell/s0/exp/aban-c009/lang_test_aban-c009_ngram_7g_train_hua_lexicon_eng/TLG.fst --wav_scp /home/yelong/data/wenet/examples/aishell/s0/data/xueyuan/wav.scp.1 --model_path /home/yelong/data/wenet/examples/aishell/s0/exp/aban-c009/final.zip --dict_path /home/yelong/data/wenet/examples/aishell/s0/exp/aban-c009/lang_test_aban-c009_ngram_7g_train_hua_lexicon_eng/words.txt --unit_path /home/yelong/data/wenet/examples/aishell/s0/exp/aban-c009/lang_test_aban-c009_ngram_7g_train_hua_lexicon_eng/units.txt




调试

172.17.84.128:root@067224ac4999:/home/newest_wenet/wenet/

1
2
3
4
5
6
7
8
9
10
11
12
13
cd /home/newest_wenet/wenet/runtime/LibTorch/
gdb build/bin/decoder_main
run --acoustic_scale 5 --beam 20.0 --lattice_beam 7.5 --max_active 7000 --ctc_weight 1 --rescoring_weight 0 --chunk_size -1 --blank_skip_thresh 0.98 --fst_path /home/aban-c009/lang_test_aban-c009_ngram_7g_train_hua_lexicon_eng/TLG.fst --wav_scp /home/data/yelong/docker_seewo/corpus/seewo/wav.scp --model_path /home/aban-c009/final.zip --dict_path /home/aban-c009/lang_test_aban-c009_ngram_7g_train_hua_lexicon_eng/words.txt --unit_path /home/aban-c009/lang_test_aban-c009_ngram_7g_train_hua_lexicon_eng/units.txt


run --acoustic_scale 5 --beam 20.0 --lattice_beam 7.5 --max_active 7000 --ctc_weight 1 --rescoring_weight 0 --chunk_size -1 --blank_skip_thresh 0.98 --fst_path /home/aban-c009/lang_test_aban-c009_ngram_7g_train_hua_lexicon_eng/TLG.fst --wav_scp /home/data/yelong/docker_seewo/corpus/200/wav.scp.1 --model_path /home/aban-c009/final.zip --dict_path /home/aban-c009/lang_test_aban-c009_ngram_7g_train_hua_lexicon_eng/words.txt --unit_path /home/aban-c009/lang_test_aban-c009_ngram_7g_train_hua_lexicon_eng/units.txt

./build/bin/decoder_main --acoustic_scale 5 --beam 20.0 --lattice_beam 7.5 --max_active 7000 --ctc_weight 1 --rescoring_weight 0 --chunk_size -1 --blank_skip_thresh 0.98 --fst_path /home/aban-c009/lang_test_aban-c009_ngram_7g_train_30w_2017_seewo_eng_word_new/TLG.fst --wav_scp /home/data/yelong/docker_seewo/corpus/200/wav.scp.1 --model_path /home/aban-c009/final.zip --dict_path /home/aban-c009/lang_test_aban-c009_ngram_7g_train_30w_2017_seewo_eng_word_new/words.txt --unit_path /home/aban-c009/lang_test_aban-c009_ngram_7g_train_30w_2017_seewo_eng_word_new/units.txt


# no tlg
./build/bin/decoder_main --chunk_size -1 --wav_scp /home/data/yelong/docker_seewo/corpus/200/wav.scp.1 --model_path /home/aban-c009/final.zip --unit_path /home/aban-c009/lang.char.txt

core/decoder/params.h

参数解释:

1
2
3
4
5
6
DEFINE_double(beam, 16.0, "beam in ctc wfst search");
DEFINE_double(lattice_beam, 10.0, "lattice beam in ctc wfst search");
DEFINE_double(acoustic_scale, 1.0, "acoustic scale for ctc wfst search"); # 这个是乘在am分数上的, graph_cost是语言模型分数,acoustic_cost是am*acoustic_scale得到的缩放声学分数,因此如果这个很大,总体分数会很高,但是声学模型作用会比语言模型大,这个是合理的,和kaldi的希望缩放因子是作用在lm上的不同,这里是作用在am上的
# 比如acoustic_scale = 1:graph_cost = 15.0703125, acoustic_cost = 32.8631744
# 比如acoustic_scale = 20:graph_cost = 15.0703125, acoustic_cost = 657.263489
DEFINE_int32(nbest, 10, "nbest for ctc wfst or prefix search");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
读模型 model->Read(FLAGS_model_path); 进入core/decoder/torch_asr_model.cc的void TorchAsrModel::Read(const std::string& model_path) {

decoder/asr_decoder.cc的void AsrDecoder::Rescoring() {

模型前向过程(输入声学特征,出来ctc的分类概率)这里用到的是libtorch,就是把特征转成torch格式,送给模型,模型得到输出

core/bin/decoder_main.cc : wenet::DecodeState state = decoder.Decode();跳到core/decoder/asr_decoder.cc的DecodeState AsrDecoder::Decode(bool block) {跳到DecodeState AsrDecoder::AdvanceDecoding(bool block) {

跳到core/decoder/asr_model.cc的 void AsrModel::ForwardEncoder( 跳到 core/decoder/torch_asr_model.cc的void TorchAsrModel::ForwardEncoderFunc(

得到encoder输出:

auto outputs =

model_->get_method("forward_encoder_chunk")(inputs).toTuple()->elements();

得到ctc_log_probs:

torch::Tensor ctc_log_probs =

model_->run_method("ctc_activation", chunk_out).toTensor()[0];



返回core/decoder/asr_model.cc的model_->ForwardEncoder(chunk_feats, &ctc_log_probs);得到ctc_log_probs的维度是362(T/4),11755(V)

进入解码模块:searcher_->Search(ctc_log_probs); 跳到core/decoder/ctc_wfst_beam_search.cc的void CtcWfstBeamSearch::Search(const std::vector<std::vector<float>>& logp) {

// Get the best symbol
int cur_best =
std::max_element(logp[i].begin(), logp[i].end()) - logp[i].begin();?

跳到decoder/lattice-faster-decoder.cc的void LatticeFasterDecoderTpl<FST, Token>::AdvanceDecoding(
runeActiveTokens(config_.lattice_beam * config_.prune_scale);跳到core/kaldi/decoder/lattice-faster-decoder.cc的void LatticeFasterDecoderTpl<FST, Token>::PruneActiveTokens(BaseFloat delta) {

# best path:
core/decoder/ctc_wfst_beam_search.cc 的 decoder_.GetBestPath(&lat, false);


core/decoder/asr_decoder.ccd searcher_->Likelihood()
/core/bin/decoder_main.cc:decoder.Rescoring();//做attention rescore


kaldi/decoder/lattice-faster-decoder.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
```



$\large c\in \mathbb R^{F\times n}$

$\large x_i\in \mathbb R^{F\times 1}$

设置输出log多还是少

```shell
export GLOG_logtostderr=1
export GLOG_v=2 # 这里数值越大,当代码里的值小于该值,就输出log