Инд. авторы: Kondrakhin Y., Valeev T., Sharipov R., Yevshin I., Kolpakov F., Kel A.
Заглавие: Prediction of protein-dna interactions of transcription factors linking proteomics and transcriptomics data
Библ. ссылка: Kondrakhin Y., Valeev T., Sharipov R., Yevshin I., Kolpakov F., Kel A. Prediction of protein-dna interactions of transcription factors linking proteomics and transcriptomics data // EuPA Open Proteomics. - 2016. - Vol.13. - P.14-23. - ISSN 2212-9685.
Внешние системы: DOI: 10.1016/j.euprot.2016.09.001; РИНЦ: 27570002; SCOPUS: 2-s2.0-84989183286;
Реферат: eng: We compared positional weight matrix-based prediction methods for transcription factor (TF) binding sites using selected fraction of ChIP-seq data with the help of partial AUC measure (limited to false positive rate 0.1, that is the most relevant for the application of the TF search in the genome scale). Comparison of three prediction methods—additive, multiplicative and information-vector based (MATCH) showed an advantage of the MATCH method for majority of transcription factors tested. We demonstrated that application of TF site identifying methods can help to connect the proteomics and phosphoproteomics world of signaling networks to gene regulation and transcriptomics world.
Ключевые слова: transcription factor binding site; The ROC curve; Proteomics versus transcriptomics; Protein-DNA interactions; Position weight matrix approach; ChIP-Seq; Area under curve;
Издано: 2016
Физ. характеристика: с.14-23
Цитирование: 1. [1] Wingender, E., Schoeps, T., Haubrock, M., Dönitz, J., TFClass: a classification of human transcription factors and their rodent orthologs. Nucleic Acids Res. 43 (2015), D97–102, 10.1093/nar/gku1064. 2. [2] Chen, G., Gharib, T.G., Huang, C.C., Taylor, J.M., Misek, D.E., Kardia, S.L., Giordano, T.J., Iannettoni, M.D., Orringer, M.B., Hanash, S.M., Beer, D.G., Discordant protein and mrna expression in lung adenocarcinomas. Mol. Cell. Proteomics 1:4 (2002), 304–313. 3. [3] Johnson, D.S., Mortazavi, A., Myers, R.M., Wold, B., Genome-wide mapping of in vivo protein-DNA interactions. Science 316 (2007), 1497–1502. 4. [4] Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nussbaum, C., Myers, R.M., Brown, M., Li, W., Liu, X.S., Model-based analysis of ChIP-seq (MACS). Genome Biol. 9:1 (2008), R137.1–R137.9. 5. [5] Jothi, R., Cuddapah, S., Barski, A., Cui, K., Zhao, K., Genome-wide identification of in vivo protein–DNA binding sites from ChIP-seq data. Nucleic Acids Res. 36 (2008), 5221–5231. 6. [6] Li, Q., Brown, J.B., Huang, H., Bickel, P.J., Measuring reproducibility of high-throughput experiments. Ann. Appl. Statist. 5 (2011), 1752–1779. 7. [7] Laajala, T.D., Raghav, S., Tuomela, S., Lahesmaa, R., Aittokallio, T., Elo, L.L., A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments. BMC Genomics, 18(December), 2009, 10.1186/1471-2164-10-618 (10:618). 8. [8] Wilbanks, E.G., Facciotti, M.T., Evaluation of algorithm performance in ChIPseq peak detection. PLoS One, 5(7), 2010, e11471. 9. [9] Wang, J., Zhuang, J., Iyer, S., Lin, X., Whitfield, T., Greven, W., Pierce, M.C., Dong, B.G., Kundaje, X., Cheng, A., Rando, Y., Birney, O.J., Myers, E., Noble, R.M., Snyder, W.S., Weng, M., Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 22 (2012), 1798–1812. 10. [10] Stormo, G.D., Schneider, T.D., Gold, L., Ehrenfeucht, A., Use of the ‘perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 10 (1982), 2997–3011. 11. [11] Stormo, G.D., Modeling the specificity of protein-dna interactions. Quant. Biol. 1 (2013), 115–130. 12. [12] Kel, A.E., Gossling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O.V., Wingender, E., MATCH™: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 31 (2003), 3576–3579. 13. [13] Quandt, K., Frech, K., Karas, H., Wingender, E., Werner, T., MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 11–12:23 (1995), 4878–4884 (PMID:8532532). 14. [14] Chen Q, K., Hertz G, Z., Stormo G, D., author, s., MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Comput. Appl. Biosci. 5:11 (1995), 563–566 (PMID:8590181). 15. [15] Workman C, T., Stormo G, D., ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. Pac Symp. Biocomput., 2000, 467–478 (PMID:10902194). 16. [16] Bailey, T.L., Williams, N., Misleh, C., Li, W.W., MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34:Web Server issue (2006), 1–7, 10.1093/nar/gkl198 (PMID:16845028). 17. [17] Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., Lewicki-Potapov, B., Saxel, H., Kel, A.E., Wingender, E., TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res., 1–1(Database issue), 2006, 34, 10.1093/nar/gkj143 (PMID:16381825). 18. [18] Portales-Casamar, Elodie, Thongjuea, Supat, Kwon, Andrew T., Arenillas, David, Zhao, Xiaobie, Valen, Eivind, Yusuf, Dimas, Lenhard, Boris, Wasserman, Wyeth W., Sandelin, Albin, JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res., 11–11(Database issue), 2009, 38, 10.1093/nar/gkp950 (PMID:19906716). 19. [19] Wang, Jie, Zhuang, Jiali, Iyer, Sowmya, Lin, Xin Ying, Whitfield, Troy W., Greven, Melissa C., Pierce, Brian G., Dong, Xianjun, Kundaje, Anshul, Cheng, Yong, Rando, Oliver J., Birney, Ewan, Myers, Richard M., Noble, Williams S., Snyder, Michael, Weng, Zhiping, Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2:9 (2012), 1798–1812, 10.1101/gr.139105.112 (PMID:22955990). 20. [20] Robasky, Kimberly, Bulyk, Martha L., UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res., 30–10(Database issue), 2010, 39, 10.1093/nar/gkq992 (PMID:21037262). 21. [21] Kulakovskiy, Ivan V., Medvedeva, Yulia A., Schaefer, Ulf, Kasianov, Artem S., Vorontsov, Ilya E., Bajic, Vladimir B., Makeev, Vsevolod J., HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res., 21–11, 2012, 41, 10.1093/nar/gks1089 (PMID:23175603). 22. [22] Fukunaga, K., Introduction to Statistical Pattern Recognition. 2nd edition, 1990, Academic Press, San Diego. 23. [23] Therrien, C.W., Decision Estimation and Classification: An Introduction to Pattern Recognition and Related Topics. 1989, John Wiley and Sons. 24. [24] Mathelier, A., Wasserman, W.W., The next generation of transcription factor binding site prediction. PLoS Comput. Biol., 5–9(9), 2013, 9, 10.1371/journal.pcbi.1003214 (PMID:24039567). 25. [25] Smeenk, L., van Heeringen, S.J., Koeppel, M., Driel, M.A., van Bartels, S.J.J., Akkers, R.C., Denissov, S., Stunnenberg, H.G., Lohrum, M., Characterization of genome-wide p53-binding sites upon stress response. Nucleic Acids Res. 28–5:11 (2008), 3639–3654, 10.1093/nar/gkn232 (ISSN: 0305-1048). 26. [26] Alamanova, D., Stegmaier, P., Kel, A., Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies. BMC Bioinf., 11(1), 2010, 225, 10.1186/1471-2105-11-225 (ISSN: 1471-2105). 27. [27] Carrier, M., Joint, M., Lutzing, R., Page, A., Rochette-Egly, C., Phosphoproteome and transcriptome of RA-responsive and RA-resistant breast cancer cell lines. PLoS One, 11(6), 2016, e0157290, 10.1371/journal.pone.0157290. 28. [28] Kel, A., Voss, N., Jauregui, R., Kel-Margoulis, O., Wingender, E., Beyond microarrays: find key transcription factors controlling signal transduction pathways. BMC Bioinf., 6–7(September (Suppl. 2)), 2006, S13. 29. [29] Barrett, T., Wilhite, S.E., Ledoux, P., Evangelista, C., Kim, I.F., Tomashevsky, M., Marshall, K.A., Phillippy, K.H., Sherman, P.M., Holko, M., Yefanov, A., Lee, H., Zhang, N., Robertson, C.L., Serova, N., Davis, S., Soboleva, A., NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res., 27–11(Database issue), 2012, 41, 10.1093/nar/gks1193 (PMID:23193258). 30. [30] Wheeler D, L., Barrett, T., Benson D, A., Bryant S, H., Canese, K., Chetvernin, V., Church D, M., Dicuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer L, Y., Helmberg, W., Kapustin, Y., Khovayko, O., Landsman, D., Lipman D, J., Madden T, L., Maglott D, R., Miller, V., Ostell, J., Pruitt K, D., Schuler G, D., Shumway, M., Sequeira, E., Sherry S, T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov R, L., Tatusova T, A., Wagner, L., Yaschenko, E., author, s., Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 27–11(Database issue), 2012, 41, 10.1093/nar/gks1189 (PMID:23193264). 31. [31] Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol., 4–3(3), 2009, 10, 10.1186/gb-2009-10-3-r25 (PMID:19261174). 32. [32] Hollander, M., Wolfe, D.A., Nonparametric statistical methods. Nonparametric Statistics, 8)17, 1973, John Wiley & Sons, 526, 10.1002/bimj.19750170808 (ISSN: 00063452). 33. [33] Wasserman, L., All of Statistics: A Concise Course in Statistical Inference. 2004, Springer, New York, 10.1007/978-0-387-21736-9 (ISBN: 0-387-40272-1). 34. [34] McClish, Donna Katzman, Analyzing a portion of the ROC curve. Med. Decision Making 9:3 (1989), 190–195, 10.1177/0272989x8900900307 (PMID 2668680). 35. [35] Dodd Lori, E., Pepe Margaret, S., Partial AUC estimation and regression. Biometrics 59:3 (2003), 614–623, 10.1111/1541-0420.00071 (PMID 14601762. Retrieved 2007-12-18). 36. [36] Cheremushkin, E., Kel, A., Whole genome human/mouse phylogenetic footprinting of potential transcription regulatory signals. Pac. Symp. Biocomput. 29 (2003), 1–302. 37. [37] Waleev, T., Shtokalo, D., Konovalova, T., Voss, N., Cheremushkin, E., Stegmaier, P., Kel-Margoulis, O., Wingender, E., Kel, A., Composite module analyst: identification of transcription factor binding site combinations using genetic algorithm. Nucleic Acids Res. 34:July (1) (2006), W541–W545 (Web Server issue):W541-5. PMID: 16845066. 38. [38] Choi, C., Krull, M., Kel, A., Kel-Margoulis, O., Pistor, S., Potapov, A., Voss, N., Wingender, E., TRANSPATH–a high quality database focused on signal transduction. Comp. Funct. Genomics 5:2 (2004), 163–168, 10.1002/cfg.386. 39. [39] Hanley, J.A., McNeil, B.J., A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:3 (1983), 839–843, 10.1148/radiology.148.3.6878708 (PMID 6878708). 40. [40] Kulakovskiy, I.V., Vorontsov, I.E., Yevshin, I.S., Soboleva, A.V., Kasianov, A.S., Ashoor, H., Ba-Alawi, W., Bajic, V.B., Medvedeva, Y.A., Kolpakov, F.A., Makeev, V.J., HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res., 19(November), 2015 (pii: gkv1249. [Epub ahead of print] PMID: 26586801). 41. [41] Lobo, J.M., Jiménez-Valverde, A., Real, R., AUC: a misleading measure of the performance of predictive distribution models. Global Ecol. Biogeogr. 17 (2008), 145–151. 42. [42] Berrar, D., Flach, P., Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them). Brief. Bioinform. 13:1 (2012), 83–97, 10.1093/bib/bbr008.