Инд. авторы: | Knyazeva A., Kolobov O., Turchanovsky I. |
Заглавие: | An example of empirical approach for bibliographic record linkage |
Библ. ссылка: | Knyazeva A., Kolobov O., Turchanovsky I. An example of empirical approach for bibliographic record linkage // IEEE Xplore. 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS). - 2016. - Vol.2016-August. - Art.7549290. - ISBN 9781479987092. - ISSN 2151-1349. |
Внешние системы: | DOI: 10.1109/RCIS.2016.7549290; РИНЦ: 27573084; SCOPUS: 2-s2.0-84987624239; |
Реферат: | eng: The record linkage problem in application to a bibliographic and authority data is considered. The problem is common in the situation of merging data from several libraries. The two approaches based on empirical analysis of data are tested. Both of them involve an indirect information about a person. The proposed variant of the decision tree method allows us to deal with inconsistent bibliographic data and to use particular rules one by one for improving of record linkage quality. The study was performed on data of several Russian libraries. The data we deal with are in RUSMARC format which is a variant of UNIMARC popular in Russia. © 2016 IEEE. |
Ключевые слова: | Bibliographic retrieval systems; Record linkage; Empirical approach; Empirical analysis; Decision tree method; Bibliographic records; Trees (mathematics); Libraries; Information science; Decision trees; Data handling; Bibliographies; Bibliographic data; |
Издано: | 2016 |
Физ. характеристика: | 7549290 |
Конференция: | Название: 10th IEEE International Conference on Research Challenges in Information Science Аббревиатура: IEEE RCIS-2016 Город: Grenoble Страна: France Даты проведения: 2016-06-01 - 2016-06-03 Ссылка: http://wikicfp.com/cfp/servlet/event.showcfp?eventid=48883©ownerid=82056 |
Цитирование: | 1. Library of Congress, "What is a marc record, and why is it important" https: //www. loc. gov/marc/uma/pt1-7. html, accessed: 2016-01-11. 2. IFLA, Functional Requirements for Bibliographic Records, jan 1998. [Online]. Available: http: //dx. doi. org/10. 1515/9783110962451 3. M. Finn, "Batch-load authority control cleanup using MarcEdit and LTI, " Technical Services Quarterly, vol. 26, no. 1, pp. 44-50, dec 2008. [Online]. Available: http: //dx. doi. org/10. 1080/07317130802225605 4. M. Elfeky, V. Verykios, and A. Elmagarmid, "TAILOR: A record linkage toolbox, " in Proceedings 18th International Conference on Data Engineering. IEEE Comput. Soc, 2002. [Online]. Available: http: //dx. doi. org/10. 1109/icde. 2002. 994694 5. A. Knyazeva, O. Kolobov, F. Tatarsky, and I. Turchanovsky, "An instrument for merging of bibliographic databases, " in IEEE/ACM Joint Conference on Digital Libraries, jun 2015, pp. 277-278. 6. M. Porter, "An algorithm for suffix stripping, " Program, vol. 14, no. 3, pp. 130-137, 1980. [Online]. Available: http: //www. emeraldinsight. com/doi/abs/10. 1108/eb046814 7. A. Hopkinson, Ed., UNIMARC Manual. Walter de Gruyter-K. G. Saur, jan 2008. [Online]. Available: http: //dx. doi. org/10. 1515/9783598441196 8. A. Monge and C. Elkan, "An efficient domain-independent algorithm for detecting approximately duplicate database records, " in Proceedings DMKD'97 the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997. 9. P. Jurczyk, J. J. Lu, L. Xiong, and J. D. Cragan, "Fril: A tool for comparative record linkage, " in In AMIA Annual Symposium, 2008. 10. B.-W. On, D. Lee, J. Kang, and P. Mitra, "Comparative study of name disambiguation problem using a scalable blocking-based framework, " in Iternational Conference on Digital Libraries. ACM, 2005, pp. 344-353. 11. A. F. Santana, M. A. Goncalves, A. H. F. Laender, and A. Ferreira, "Combining domain-specific heuristics for author name disambiguation, " in IEEE/ACM Joint Conference on Digital Libraries. IEEE, sep 2014. [Online]. Available: http: //dx. doi. org/10. 1109/jcdl. 2014. 6970165 12. A. A. Ferreira, M. A. Gonçalves, and A. H. F. Laender, "A brief survey of automatic methods for author name disambiguation, " SIGMOD Rec., vol. 41, no. 2, pp. 15-26, Aug. 2012. [Online]. Available: http: //doi. Acm. org/10. 1145/2350036. 2350040 13. D. A. Pereira, E. E. B. da Silva, and A. A. A. Esmin, "Disambiguating publication venue titles using association rules, " in IEEE/ACM Joint Conference on Digital Libraries. IEEE, sep 2014. [Online]. Available: http: //dx. doi. org/10. 1109/jcdl. 2014. 6970153 14. H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser, "Identity uncertainty and citation matching, " in In NIPS. MIT Press, 2003. 15. A. A. Ferreira, A. Veloso, M. A. Gonçalves, and A. H. Laender, "Effective self-training author name disambiguation in scholarly digital libraries, " in Proceedings of the 10th Annual Joint Conference on Digital Libraries, ser. JCDL '10. New York, NY, USA: ACM, 2010, pp. 39-48. [Online]. Available: http: //doi. Acm. org/10. 1145/1816123. 1816130 16. M. G. de Carvalho, M. A. Gonçalves, A. H. F. Laender, and A. S. da Silva, "Learning to deduplicate, " in Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, ser. JCDL '06. New York, NY, USA: ACM, 2006, pp. 41-50. [Online]. Available: http: //doi. Acm. org/10. 1145/1141753. 1141760 17. O. Obuhova, V. Dolgopolov, M. Zaikin, and I. Soloviev, "Optimization of methods to detecting duplicate bibliographic descriptions in the scientific database BIAS IPI RAS, " in Proceeding of the XVII International Conference DAMDID/RCDL'2015, oct 2015, pp. 423-427. 18. Z.V. Apanovich, A.G. Marchuk. A combined approach to crosslanguage identity resolution//Proceeding of the XVII International Conference DAMDID/RCDL'2015, oct 2015, pp. 91-95. 19. C. A. D'Angelo, C. Giuffrida, and G. Abramo, "A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments, " Journal of the American Society for Information Science and Technology, vol. 62, no. 2, pp. 257-269, 2011. [Online]. Available: http: //dx. doi. org/10. 1002/asi. 21460 20. Backstage Library Works, "Authority control, " http: //www. bslw. com/ authority control/, accessed: 2016-02-07. 21. M. F. Loesch, "VIAF (the virtual international authority file)-http: //viaf. org, " Technical Services Quarterly, vol. 28, no. 2, pp. 255-256, feb 2011. [Online]. Available: http: //dx. doi. org/10. 1080/07317131. 2011. 546304 22. R. Bennett, C. Hengel-Dittrich, E. T. O'Neill, and B. B. Tillett, "Viaf (virtual international authority file): Linking die deutsche bibliothek and library of congress name authority files, " 2006. [Online]. Available: http: //archive. ifla. org/IV/ifla72/papers/123-Bennett-en. pdf 23. R. Baxter, P. Christen, and T. Churches, "A comparison of fast blocking methods for record linkage, " in Proceedings ACM SIGKDD-2003 workshop on data cleaning, record linkage, and object consolidation, aug 2003. 24. M. A. Jaro, "Probabilistic linkage of large public health data files, " Statistics in medicine, vol. 14, pp. 491-498, 1995. 25. M. A. Hernández and S. J. Stolfo, "Real-world data is dirty: data cleansing and the merge/purge problem, " J. data mining and knowledge discovery, vol. 2, no. 1, pp. 9-37, 1998. 26. National Library of Medicine, "Medical subject headings (MeSH), " https: //www. nlm. nih. gov/pubs/factsheets/mesh. html, accessed: 2016-02-10. 27. R. A. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999. |