Preview

Linguistics & Polyglot Studies

Advanced search

Frequency of co-occurrence of chinese characters as an indicator of lexicality (when selecting the vocabulary of chinese military discourse)

https://doi.org/10.24833/2410-2423-2020-4-24-14-24

Abstract

Teaching a foreign language in a non-linguistic college or university should be professionally oriented, which brings up the question of selecting the relevant vocabulary of a professional discourse under study. Modern text corpora are too general in subject matter and the time span. Therefore, a specially compiled collection of texts can serve the purpose of selecting the vocabulary. In the case of the Chinese language, the task is complicated by the lack of word segmentation in such texts. Taking into account the fact that most words in Chinese are written in two characters, it is assumed that one of the methods applicable in this situation is a comprehensive frequency analysis of text sequences of two characters – character bigrams. The analysis of frequent bigrams has showed that 70% of the most frequent lexical units are representative of the discourse, including 11% of out-of-vocabulary ones. The remaining part of bigrams pertain to syntactic constructions, including structurally incomplete ones, and fragments of longer lexical units. Thus, the high frequency of character co-occurrence can with a rather high probability (p > 0.7) be considered as an indicator of lexicality in identifying representative vocabulary in an unsegmented the matic collection of texts in Chinese.

About the Author

D. S. Korshunov
Military University of Radio Electronics
Russian Federation

Dmitry S. Korshunov – PhD (philology), scientific and pedagogical employee

126, Sovetsky Prospect, Cherepovets, Vologda region, 162600



References

1. Aleksakhin, A.N. Alfavit kitayskogo iazyka putunkhua. Bukva – fonema – zvuk rechi – slog – slovo [The Chinese language alphabet of Putonghua. Letter – phoneme – speech sound – syllable – word]. 4th ed., rev. and add. M.: Vostochnaia kniga, 2018. 212 p. (In Rus.)

2. Aleksakhin, A.N. Sovremennaia politika KNR v otnoshenii ieroglificheskoi i bukvennoi pis’mennosti [Modern policy of the PRC in relation to hieroglyphic and alphabetic writing] // MGIMO Bulletin, 2011. No. 3. Pp. 243–252. (In Rus.)

3. Gorina, O.G. Ispol’zovanie tekhnologii korpusnoi lingvistiki dlia razvitiia leksicheskikh navykov studentov-regionovedov v professional’no-orientirovannom obshchenii na angliiskom iazyke [The use of corpus linguistics technologies for the development of the lexical skills of regional students in professionally oriented communication in English]. PhD thesis. Moscow: Moscow State University, 2014. 332 p. (In Rus.)

4. Klenin, I.D. Leksikologiia kitaiskogo iazyka [Lexicology of the Chinese language] / I.D. Klenin, V.F. Shchichko. M.: Vostochnaia kniga, 2013. 272 p. (In Rus.)

5. Kurdyumov, V.A. Dinamicheskii podkhod k nauchnomu izucheniiu kitaiskogo iazyka [Dynamic approach to exploring Chinese]. III Gotlibovskie chteniia: Vostokovedenie i regionovedenie Aziatsko-Tikhookeanskogo regiona v fokuse sovremennosti: materialy Mezhdunar. nauch. konf. Irkutsk, 10–16 Sent. 2019. FGBOU VO «IGU»; [otv. red. Ie. F. Serebrennikova]. Irkutsk: Izd-vo IGU, 2019. Pp. 285–291. (In Rus.)

6. Murav’ëv, N.A. Podkhody k sostavleniiu leksicheskikh minimumov v Rossii i za rubezhom: problemy i perspektivy [Approaches to the composition of lexical minima in Russia and abroad: problems and prospects] / N.A. Murav’ev, M. Iu. Olshevskaia. Vestnik NSU. Series: Linguistics and Intercultural Communication, 2019, 17 (1). Pp. 78–89. (In Rus.)

7. Riehakainen, E.I. Vospriiatie russkoi ustnoi rechi: kontekst + chastotnost’ [Perception of Russian spoken language: context + frequency]. Monograph. St. Petersburg: St. Petersburg. State Univ., 2016. 270 p. (In Rus.)

8. Solntsev, V.M. Iazyk kak sistemno-strukturnoe obrazovanie [Language as a systemic-structural formation]. Ed. 2nd, add. M .: Nauka, 1977. (In Rus.)

9. Solntsev, V.M. Teoreticheskaia grammatika sovremennogo kitaiskogo iazyka (problemy morfologii) [Theoretical grammar of modern Chinese (problems of morphology)]. Course of lectures, Moscow: Military Institute, 1978. (In Rus.)

10. Sheigal, E.I. Semiotika politicheskogo diskursa [Semiotics of Political Discourse]. Doctoral Thesis. Volgograd, 2000. 431 p. (In Rus.)

11. Shemet, G.I. Sovershenstvovanie obucheniia inostrannomu iazyku kursantov voennykh vuzov na osnove optimizatsii leksicheskoi komponenty [Improving the teaching of a foreign language to cadets of military colleges based on the optimization of the lexical component]. PhD thesis. Moscow: Military University, 2011. 249 p. (In Rus.)

12. Yu, Chuqiao. Avtomaticheskii sintaksicheskii analiz kitaiskikh predlozhenii pri ogranichennom slovare [Automatic syntactic analysis of Chinese sentences with a restricted dictionary] / Yu Chuqiao, I.A. Bessmertnyi // Programmnye produkty i sistemy. 2017. 30 (1). Pp. 138–142. (In Rus.)

13. Da, Jun. 2004. Chinese text computing, https://lingua.mtsu.edu/chinese-computing (accessed: 23.03.2020)

14. Deng, K. On the unsupervised analysis of domain-specific Chinese texts / K. Deng, P.K. Bol, K.J. Li et al. // Proceedings of the National Academy of Sciences of the United States of America, 2016. Vol. 113, No. 22. Pp. 6154–6159.

15. Huang, W. Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning / W. Huang, X. Cheng, K. Chen et al. [Electronic resource] Cornell University > Computer Science > Computation and Language, https://arxiv.org/abs/1903.04190 (accessed: 22.03.2020)

16. Li, Sh. Collocation Analysis Tools for Chinese Collocation Studies / Sh. Li, Sh. Guo // Journal of Technology and Chinese Language Teaching. Vol. 7, No. 1, 2016. Pp. 56–77.

17. Li, J. A Comparison and Semi-Quantitative Analysis of Words and Character-Bigrams as Features in Chinese Text Categorization / J. Li, M. Sun, X. Zhang // Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, July 2006. Pp. 545–552.

18. Ma, J. State-of-the-art Chinese Word Segmentation with Bi-LSTMs / J. Ma, K. Ganchev, D. Weiss // In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium, October 31 – November 4, 2018. Pp. 4902–4908.

19.

20.


Review

For citations:


Korshunov D.S. Frequency of co-occurrence of chinese characters as an indicator of lexicality (when selecting the vocabulary of chinese military discourse). Linguistics & Polyglot Studies. 2020;6(4):14-24. (In Russ.) https://doi.org/10.24833/2410-2423-2020-4-24-14-24

Views: 680


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2410-2423 (Print)
ISSN 2782-3717 (Online)