Инд. авторы: | Ryabko B., Guskov A., Selivanova I. |
Заглавие: | Using data-compressors for statistical analysis of problems on homogeneity testing and classification |
Библ. ссылка: | Ryabko B., Guskov A., Selivanova I. Using data-compressors for statistical analysis of problems on homogeneity testing and classification // 2017 IEEE International Symposium on Information Theory (ISIT). - 2017: IEEE. - P.121-125. |
Внешние системы: | DOI: 10.1109/ISIT.2017.8006502; РИНЦ: 31055656; SCOPUS: 2-s2.0-85034015843; WoS: 000430345200025; |
Реферат: | eng: Nowadays data compressors are applied to many problems of text analysis, but many such applications are developed outside of the framework of mathematical statistics. In this paper we overcome this obstacle and show how several methods of classical mathematical statistics can be developed based on applications of the data compressors. |
Ключевые слова: | Information theory; Universal codes; Text analysis; Hypothesis testing; Homogeneity testing; Data compressor; Statistics; Compressors; Classification (of information); Universal code; Hypothesis testing; Homogeneity test; Data compression; Classification; Data compression; |
Издано: | 2017 |
Физ. характеристика: | с.121-125 |
Конференция: | Название: 2017 IEEE International Symposium on Information Theory Аббревиатура: ISIT Город: Aachen Страна: Germany Даты проведения: 2017-06-25 - 2017-06-30 Ссылка: http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7999336 |
Цитирование: | 1. Cilibrasi, R., Vitanyi, P., 2005. Clustering by compression. IEEE Transactions on Information Theory 51, 1523-1545. 2. Cilibrasi, R., Vitányi, p., De Wolf, R., 2004. Algorithmic clustering of music based on string compression. Computer Music Journal 28(4), 49-67. 3. Cover, T. M., Thomas, J. A., 2006. Elements of information theory. Wiley-Interscience, New York, NY, USA. 4. Ferragina, P., Giancarlo, R., Greco, V., Manzini, G., Valiente, G., 2007. Compression-based classification of biological sequences and structures via the universal similarity metric: experimental assessment. BMC bioinformatics 8(1), 1. 5. Kendall, M., Stuart, A., 1961. The advanced theory of statistics; Vol. 2: Inference and relationship. London. 6. Khmelev, D. V., Teahan, W. J., 2003. A repetition based measure for verification of text collections and for text categorization. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, pp. 104-110. 7. Kukushkina, O. V., Polikarpov, A., Khmelev, D. V., 2001. Using literal and grammatical statistics for authorship attribution. Problems of Information Transmission 37(2), 172-184. 8. Li, M., Chen, X., Li, X., Ma, B., Vitányi, P. M., 2004. The similarity metric. IEEE transactions on Information Theory 50(12), 3250-3264. 9. Ryabko, B., Astola, J., Malyutov, M., 2016. Compression-Based Methods of Statistical Analysis and Prediction of Time Series. Springer. 10. Teahan, W. J., Harper, D. J., 2003. Using compression-based language models for text categorization. In: Language modeling for information retrieval. Springer, pp. 141-165. 11. Vitányi, P. M., 2011. Information distance in multiples. IEEE Transactions on Information Theory 57(4), 2451-2456. |