Инд. авторы: | Mansurova M., Barakhnin V., Khibatkhanuly Ye., Pastushkov I.S. |
Заглавие: | Named entity extraction from semistructured data using machine learning algorithms |
Библ. ссылка: | Mansurova M., Barakhnin V., Khibatkhanuly Ye., Pastushkov I.S. Named entity extraction from semistructured data using machine learning algorithms // Lecture Notes in Computer Science. - 2019. - Vol.11684 LNAI. - P.58-69. - ISSN 0302-9743. - EISSN 1611-3349. |
Внешние системы: | DOI: 10.1007/978-3-030-28374-2_6; РИНЦ: 41684667; SCOPUS: 2-s2.0-85072857642; WoS: 000611590600006; |
Реферат: | eng: The modern society have been witnessed that intensive development of Internet technologies had followed to information explosion during last decades. This explosion had been expressing by an exponential growth of data volume among the low-quality information. This paper is designed to provide detailed information about some intellectual tools which are support decision taking by automatic knowledge extraction. In the first part of paper, we considered a preprocessing contains morphological analysis of texts. Then we had considered the model of text documents in the form of a hypergraph and implementation of the random walk method to extract semantically close word’s pairs, in other words, pairs that often appears together. Result of calculations is matrix with word affinity coefficients corresponding to each other component of vocabulary vector. In the second part we describe training of neural network for linguistic constructions extraction. These ones include possible values of text named entities descriptors. The neural network enables to retrieve information on one preselected descriptor, for example, location, in the form of the final result of the name of geographical objects. In a general case, the neural network can retrieve information on several descriptors simultaneously. |
Ключевые слова: | machine learning algorithms; neural networks; random walk method; Semi-structured data; Entity extraction; |
Издано: | 2019 |
Физ. характеристика: | с.58-69 |
Конференция: | Название: 11th International Conference "Computational Collective Intelligence" Аббревиатура: ICCCI 2019 Город: Hendaye Страна: France Даты проведения: 2019-09-04 - 2019-09-06 |
Цитирование: | 1. Shokin, Yu.I., Fedotov, A.M., Barakhnin, V.B.: Problems of information retrieval. Novosibirsk: Sci. 196 p. (2010). (In Russian) 2. Barakhnin, V.B., Fedotov, A.M.: Building a factual search model. Vestnik NSU. Series: Information Technology, vol. 11, no. 4. pp. 16–27 (2013) 3. Pedersen, T.: A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation. ACM (2000) 4. Borkar, V., Sarawahi, S.: Automatic segmentation of text into structured records. ACM (2001) 5. Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, USA (2004) 6. Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Papers from the AAAI-99 Workshop on Machine Learning for Information Extraction, pp. 37–42 (1999) 7. Zelenko, D., Aone, C.: Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083– 1106 (2003) 8. Califf, M., Moony, R.J.: Bottom-up relational learning of matching rules for information extraction. J. Mach. Learn. Res. 4, 177–210 (2003) 9. Dejean, H.: Learning rules and their exceptions. J. Mach. Learn. Res. 2, 669–693 (2002) 10. Liu, X., Wang, M., Huet, B.: Event analysis in social multimedia: a survey. Front. Comput. Sci. 10(3), 433–446 (2016) 11. Anantharangachar, R., Ramani, S., Rajagopalan, S.: Ontology guided information extraction from unstructured text. Int. J. Web Semant. Technol. (IJWesT) 4(1), 19–36 (2013) 12. Stanford CoreNLP: A Suite of Core NLP Tools. (2015). http://nlp.stanford.edu/software/corenlp.shtml 13. Atzmueller, M., Kluegl, P.: Rule-based information extraction for structured data acquisition using TextMarker. In: LWA (2008) 14. Chiticariu, L., Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F., Vaithyanathan, S.: SystemT: an algebraic approach to declarative information extraction. In: ACL (2010) 15. Kluegl, P., Atzmueller, M., Puppe, F.: TextMarker: a tool for rule-based information extraction. In: UIMA@GSCL Workshop, pp. 233–240 (2009) 16. Chopra, D., Joshi, N., Mathur, I.: Named entity recognition in Hindi using hidden Markov model. In: 2016 Second International Conference on Computational Intelligence and Communication Technology (CICT), pp. 581–586. IEEE (2016) 17. Malik, M.K., Sarwar, S.M.: Urdu named entity recognition system using hidden Markov model. Pak. J. Eng. & Appl. Sci. 21,15–22 (2017) 18. McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Icml 2000, vol. 17, pp. 591–598 (2000) 19. Ahmed, I., Sathyaraj, R.: Named entity recognition by using maximum entropy. Int. J. Database Theory Appl. 8(2), 43–50 (2015) 20. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001) 21. Lee, C., et al.: Fine-grained named entity recognition using conditional random fields for question answering. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 581–587. Springer, Heidelberg (2006). https://doi.org/10.1007/11880592_49 22. Chen, W., Zhang, Y., Isahara, H.: Chinese named entity recognition with conditional random fields. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121 (2006) 23. Bellaachia, A., Al Dhelaan, M.: HGRANK: a hypergraph based keyphrase extraction for short documents in dynamic genre (2014) 24. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Pearson, London (2009). 936 p 25. Korobov, M.: Morphological analyzer and generator for russian and ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31 |