Инд. авторы: Mansurova M.E., Barakhnin V.B., Aubakirov S.S., Khibatkhanuly Ye., Mussina A.B.
Заглавие: Parallel text document clustering based on genetic algorithm
Библ. ссылка: Mansurova M.E., Barakhnin V.B., Aubakirov S.S., Khibatkhanuly Ye., Mussina A.B. Parallel text document clustering based on genetic algorithm // CEUR Workshop Proceedings. - 2017. - Vol.1839. - P.218-232. - ISSN 1613-0073. - http://ceur-ws.org/Vol-1839/MIT2016-p20.pdf
Внешние системы: РИНЦ: 31026606; SCOPUS: 2-s2.0-85020491808;
Реферат: eng: This work describes parallel implementation of the text document clustering algorithm. The algorithm is based on evaluation of the similarity between objects in a competitive situation, which leads to the notion of the function of rival similarity. Attributes of bibliographic description of scientific articles were chosen as the scales for determining similarity measure. To find the weighting coefficients which are used in the formula of similarity measure a genetic algorithm is developed. To speed up the performance of the algorithm, parallel computing technologies are used. Parallelization is executed in two stages: in the stage of the genetic algorithm, as well as directly in clustering. The parallel genetic algorithm is implemented with the help of MPJ Express library and the parallel clustering algorithm using the Java 8 Streams library. The results of computational experiments showing benefits of the parallel implementation of the algorithm are presented.
Ключевые слова: Parallel processing systems; Information retrieval; Genetic algorithms; Cluster analysis; Parallel computing; Text processing; Clustering algorithms; Weighting coefficient; Text Document Clustering; Similarity between objects; Scientific articles; Parallel implementations; Parallel genetic algorithms; Parallel com- puting; Computational experiment; Clustering algorithm; Genetic algorithm;
Издано: 2017
Физ. характеристика: с.218-232
Ссылка: http://ceur-ws.org/Vol-1839/MIT2016-p20.pdf
Конференция: Название: Международная конференция «Математические и информационные технологии, MIT-2016»
Аббревиатура: MIT-2016
Город: Врнячка Баня, Будва
Страна: Сербия, Черногория
Даты проведения: 2016-08-28 - 2016-09-05
Ссылка: http://conf.nsc.ru/MIT-2016
Цитирование: 1. Borisova I.A., Zagoruiko N.G.: Functions rival similarity in the problem of taxonomy. In: Proceedings of Conference with international participation Knowledge - Ontology - Theory. Novosibirsk, Vol. 2. pp. 67-76 (2007). 2. Borisova I.A., Zagoruyko N.G.: Using FRiS-functions to solve the problem SDX//Proceedings of the International Conference Classification, Forecasting, Data Mining CFDM 2009. Varna. pp. 110-116 (2009). 3. Barakhnin V.B., Nekhaeva V.A., Fedotov A.M.: On the statement of the similarity measure for the clustering of text documents//Bulletin of Novosibirsk state University. Series: Information technology. Vol. 6, No. 1. pp. 3-9 (2008). 4. Zagoruiko N.G., Borisova I.A., Dyubanov V.V., Kutnenko O.A.: Functions of rival similarity in algorithms of recognition of combined type//Bulletin of Siberian State Aerospace University named after M.F. Reshetnev. Vol. 5. pp. 19-21 (2010). 5. Gladkov L.A., Kureichik V.V., V.M. Kureichik: Genetic algorithms.Ed. V.M. Kureichik. 2nd ed. Moscow, FIZMATLIT (2006). 6. Navarro, Gonzalo. A guided tour to approximate string matching//ACM Computing Surveys. Vol. 33 (1). pp. 31-88. (2001). 7. Andrei Z. Broder: Identifying and Filtering Near-Duplicate Documents. In: Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching (COM '00). Springer-Verlag London. pp. 1-10 (2000). 8. Back Thomas: Evolutionary Algorithms in Theory and Practice. Oxford Univ. Press. P. 120 (1996). 9. Chetan Chudasama, S.M. Shah, Mahesh Panchal: Comparison of Parents Selection Methods of Genetic Algorithm for TSP//International Conference on Computer Communication and Networks CSI-COMNET-2011. 10. Evaluation of clustering. http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html. 11. Piyatida Rujasiri, Boonorm Chomtee: Comparison of Clustering Techniques for Cluster Analysis Kasetsart J. (Nat. Sci.). Vol. 43. pp. 378-388 (2009). 12. MPJ-Express. http://mpj-express.org/. 13. Processing Data with Java SE 8 Streams. http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html.