Ranking Variable Combinations to Characterize Breast Cancer Subtypes using the IBIF-RF Metric

10 pages•Published: March 11, 2020

Isis Narvaez-Bandera and Wandaliz Torres-Garcia

Abstract

Gene interactions play a fundamental role in the proneness to cancer. However, detect- ing and ranking these interactions is a complex problem due to the high dimensionality of genomic data. Hence, we aim to find patterns composed of multiple features to molecularly characterize breast cancer subtypes from the integration of different omics datasets using a data mining approach. To retrieve biological understanding from these computational results, we developed IBIF-RF (Importance Between Interactive Features using Random Forest), a new metric capable of assessing and holistically ranking the importance of genomic interactions without any prior knowledge of key feature combinations. A set of 247 top-performing features from transcriptomic, proteomic, methylation, and clinical data were used to investigate interactive patterns to classify breast cancer subtypes us- ing over 1150 samples. IBIF-RF metric allowed the extraction of 154312, 190481, and 463917 combinations of variables for TCGA, GSE20685, and GSE21653 datasets. Single genes, MLPH and FOXA1, were the most frequently identified variables across all datasets followed by some two-gene interactions such as CEP55-FOXA1 and FOXC1-THSD4. More- over, IBIF-RF metric allowed the definition of two sets of genes frequently found together (1: FOXA1, MLPH, and SIDT1, and 2: CEP55, ASPM, CENPL, AURKA, ESPL1, TTK, UBE2T, NCAPG, GMPS, NDC80, MYBL2, KIF18B, and EXO1).

Keyphrases: analysis of high throughput biological data, breast cancer, cancer genomics, data mining, systems biology

In: Qin Ding, Oliver Eulenstein and Hisham Al-Mubaid (editors). Proceedings of the 12th International Conference on Bioinformatics and Computational Biology, vol 70, pages 11-20.

Links:	https://easychair.org/publications/paper/nHhd
	https://doi.org/10.29007/8xwn

BibTeX entry

@inproceedings{BICOB2020:Ranking_Variable_Combinations_Characterize,
  author    = {Isis Narvaez-Bandera and Wandaliz Torres-Garcia},
  title     = {Ranking Variable Combinations to Characterize Breast Cancer Subtypes using the IBIF-RF Metric},
  booktitle = {Proceedings of the 12th International Conference on Bioinformatics and Computational Biology},
  editor    = {Qin Ding and Oliver Eulenstein and Hisham Al-Mubaid},
  series    = {EPiC Series in Computing},
  volume    = {70},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-7340},
  url       = {/publications/paper/nHhd},
  doi       = {10.29007/8xwn},
  pages     = {11-20},
  year      = {2020}}

Download PDF Open PDF in browser