Инд. авторы: Selivanova I.V., Ryabko B.Ya., Guskov A.E.
Заглавие: Classification by Compression: Application of Information-Theory Methods for the Identification of Themes of Scientific Texts
Библ. ссылка: Selivanova I.V., Ryabko B.Ya., Guskov A.E. Classification by Compression: Application of Information-Theory Methods for the Identification of Themes of Scientific Texts // Automatic Documentation and Mathematical Linguistics. - 2017. - Vol.51. - Iss. 3. - P.120-126. - ISSN 0005-1055. - EISSN 1934-8371.
Внешние системы: DOI: 10.3103/S0005105517030116; РИНЦ: 32812733; WoS: 000409073600005;
Реферат: eng: A method for automatic classification of scientific texts based on data compression is proposed. The method is implemented and investigated based on the data from an archive of scientific texts (arXiv.org) and in the CyberLeninka scientific electronic library (CyberLeninka.ru). Experiments showed that the method correctly identified the themes of scientific texts with a probability of 75-95%; its accuracy depends on the quality of the original data
Ключевые слова: CyberLeninka; arXiv.org; text compression; information theory; thematic classification of texts; classification;
Издано: 2017
Физ. характеристика: с.120-126
Цитирование: 1. A Frequent Concepts Based Document Clustering Algorithm By: Baghel, R.; Dhir, D. R. Int. J. Comput. Appl. Volume: 4 Issue: 5 Pages: 6-12 Published: 2010 2. Frequent Term-based Text Clustering By: Beil, F.; Ester, M.; Xu, X. P 8 ACM SIGKDD INT C Pages: 436-442 Published: 2002 3. Algorithmic clustering of music based on string compression By: Cilibrasi, R; Vitanyi, P; de Wolf, R COMPUTER MUSIC JOURNAL Volume: 28 Issue: 4 Pages: 49-67 Published: DEC 2004 4. Clustering by compression By: Cilibrasi, R; Vitanyi, PMB IEEE TRANSACTIONS ON INFORMATION THEORY Volume: 51 Issue: 4 Pages: 1523-1545 Published: APR 2005 5. A complex approach to the problem of determining the authorship of the text By: Khmelev, D. V. MEZHD K RUSSK YAZ IS Pages: 426-427 Published: 2001 6. Some effective techniques for naive Bayes text classification By: Kim, Sang-Bum; Han, Kyoung-Soo; Rim, Hae-Chang; et al. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING Volume: 18 Issue: 11 Pages: 1457-1466 Published: NOV 2006 7. Title: [not available] By: KUKUSHKINA OV PROBL PEREDACHI INF Volume: 37 Pages: 96 Published: 2001 8. The similarity metric By: Li, M; Chen, X; Li, X; et al. IEEE TRANSACTIONS ON INFORMATION THEORY Volume: 50 Issue: 12 Pages: 3250-3264 Published: DEC 2004 9. Title: [not available] By: Li, M.; Vitanyi, P. M. B. An Introduction to Kolmog-orov Complexity and Its Applications Pages: 637 Published: 1997 Publisher: Springer-Verlag, New York 10. Conditional Complexity of Compression for Authorship Attribution By: Malyutov, M. B.; Wickramasinghe, C. I.; Li, S. SFB 649 Discussion Paper No. 57 Pages: 38 Published: 2007 Publisher: Humboldt University, Berlin 11. Title: [not available] By: Malyutov, M. B. SPRINGER LECT NOTES Volume: 4123 Pages: 362-380 Published: 2007 12. Classification of documents in vector space By: Matyasko, A. A.; Khaustov, V. A. INF TEKHN SIST 2012 Pages: 140-141 Published: 2012 13. Document clustering using character n-grams: A comparative evaluation with term-based and word-based clustering By: Miao, Y.; Keselj, V.; Milios, E. CIKM 05 Pages: 357-358 Published: 2005 14. Title: [not available] By: Ryabko, B.; Astola, J.; Malyutov, M. COMPRESSION BASED ME Published: 2016 Publisher: Springer, New York 15. Graph clustering By: Schaeffer, Satu Elisa COMPUTER SCIENCE REVIEW Volume: 1 Issue: 1 Pages: 27-64 Published: AUG 2007 16. Classification of texts with decision trees and neural networks of direct propagation, Vestn By: Shevelev, O. G.; Petrakov, A. V. Tomsk. Gos. Univ. Volume: 290 Pages: 300-307 Published: 2006 17. A comparison among three neural networks for text classification By: Wang, Zhan; He, Yifan; Jiang, Minghu 2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4 Book Series: International Conference on Signal Processing Pages: 1883-+ Published: 2006