2
0
Jeremy Zheng 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
..
css 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
js 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
language 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
README.md 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
comp_csv.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
dict.js 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
dict_lookup.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
dict_lookup_pre.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
dict_redis.bat 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
function.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
get_first_mean.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
get_split_data.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
grm_abbr.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
icon.svg 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
index.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
mobile.css 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
multi-core.bat 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
p_ending.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
redis_comp_part.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
redis_import_dict.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
redis_import_term.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
redis_import_user.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
redis_pali_word_list.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
redis_pali_word_statistic.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
redis_pm_part.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
redis_ref_with_mean.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
redis_refresh_first_mean.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
redis_split_part.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
redis_sys_rgl_part.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
sandhi.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
split.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
turbo_split.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө
word_statistics.php 7770367ff5 :truck: move background source codes into api-v8 1 жил өмнө

README.md

Turbo Split 拆词算法

confidence value fomular
信心值公式

  • let CV=confidence.value
  • let N=dictionary_num
  • let L=word_spell.lenght

$CV=\frac{1}{1+640\times{(\frac{1}{1.1+N^{1.18}})}^L}$

这里:CV值的范围控制在0~1之间

final confidence value
最终信心值

let word=word_1+word_2+word_3+……+word_N+word_remain

No. confidence value
1 CV_1=word_1.Cof_val
2 CV_2=word_2.Cof_val
3 CV_3=word_3.Cof_val
…… ……
N CV_N==word_N.Cof_val

$Len{remain}=Len(word{remain})$

$CV_{final}=CV_1\times CV_2\times CV_3\times \cdots×CVN+\frac{150}{(Len{remain})^{3}+150}-1$

graph TD
1Begin([start开始])
1A[/word_org<br>原单词/]
1B{diphthong ?<br>双元音 ?}
1C(non-diphthong word<br>非双元音词)
1D(diphthong word<br>双元音词)
1Begin-->1A
1A--input<br>输入-->1B

subgraph step 1:diphthong split<br>第一步:双元音切分
1B--No-->1C
1B--Yes-->1D
1D--according to根据<br>diphthong table<br>to split切分-->1C



end

2A([split loop<br>拆分循环])
2B{与阈值比较<br>compare with<br>threshold<br>value=0.8}
2C[(array<br>数组)]
2D(slice the last letter<br>by sandhi rule<br>根据连音规则<br>切除最后一个字母)
2E[/pre-word<br>前半段/]
2F[/post-word<br>后半段/]
2G(pre-word前半段<br>post-word后半段<br>confidence value信心值)
2J{length 4?<br>长度判定}
2K[(规则库<br>array $sandhi)]
2L{sandhi rules remained?<br>是否有规则剩余?}

subgraph function of word split<br>单词切分函数<br>recursion depth=18
1C--input输入-->2A
2A-->2D
2D-->2L
2K-->2D
2L--YES-->2E
2L--NO-->2M(slice the last letter<br>切除最后一个字母)
2M-->2D
2E-->2J
2J--length>4<br>input输入-->2D
2J--length<4-->2N([stop停止])
2H(confidence value<br>信心值)-->2B
2B--less than 0.8<br>小于0.8-->2D

end

subgraph confidence value calculator<br>信心值计算器
2E--accuracy estimate<br>准确性评估-->3A(found in how many dictionaries<br>在多少本字典中出现)
3A-->3B(fomular计算公式<br>)
3C[(vocabulary & frequency<br>in dictionaries)]-->3A
3B-->2H
end
2C-->A1

subgraph final confidence value<br>计算总信心值
2B--greater than 0.8<br>大于0.8-->2G
2G-->2C
2G-->2F
2F--input输入<br>recursion depth=18<br>递归深度=18-->2D
A1[/word1<br>/]
A11[/word1_1.spell<br>word1_1.Cof_val<br>word1_1.length/]
A12[/word1_2<br>/]
A121[/word1_21.spell<br>word1_21.Cof_val<br>word1_21.length/]
A122[/word1_22<br>/]
A1221[/word1_221.spell<br>word1_221.Cof_val<br>word1_221.length/]
A1222[/word1_222.spell<br>unsplitable/]
A1--split && C_val>0_8-->A11
A1--remained-->A12
A12--split && C_val>0_8-->A121
A12--remained-->A122
A122--split && C_val>0_8-->A1221
A122--remained-->A1222
A11-->B(total Cof.val calculator)
A121-->B
A1221-->B
A1222-->B
end

subgraph result process<br>结果处理
B-->C(word1_1.spell<br>word1_21.spell<br>word1_221.spell<br>word1_222.spell<br>final Cof.val)
C--push-->Array(Result Array<br>结果数组)
Array--依最终信心值倒序排列<br>orderby CV_final DESC-->result([Final Result])
end



Redis