Download PDFOpen PDF in browser

SuSiE PCA: a Scalable Bayesian Variable Selection Technique for Principal Component Analysis

EasyChair Preprint 9877

34 pagesDate: March 20, 2023

Abstract

Traditional latent factor models such as principal component analysis (PCA) provide a statistical framework to infer low-rank latent components across a multitude of biologically relevant settings. However, when this low-rank structure manifests from a sparse subspace, approaches that seek to infer the relevant features either lack the ability to perform feature selection, or fail to quantify the uncertainty in their selected features. In this paper, we present SuSiE PCA, a highly scalable sparse latent factor approach that explicitly models uncertainty in contributing variables through posterior inclusion probabilities (PIPs). We validate our model in extensive simulations and demonstrate that SuSiE PCA outperforms other approaches for detecting relevant signals in observed data, while being robust to model mis-specification. To illustrate its performance in real-data scenarios, we apply SuSiE PCA to multi-tissue eQTL data from GTEx v8 and identify tissue-specific regulatory factors and their contributing eGenes. Next, we investigate its performance to identify gene regulatory modules using large-scale perturbation screen data. We find that SuSiE PCA discovers modules enriched for genes relevant for ribosome function to a greater extent than competing methods (ribosome pathway: FDR $= 9.2 \times 10^{-82}$, 63 genes involved vs. $1.4 \times 10^{-33}$, 35 genes involved), while being $\sim$18x faster. Overall, SuSiE PCA provides an efficient and flexible tool to identify relevant features in high-dimensional structured biological data.

Keyphrases: Principal Component Analysis, Sparse modeling, gene regulatory modules, variational inference

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:9877,
  author    = {Dong Yuan and Nicholas Mancuso},
  title     = {SuSiE PCA: a Scalable Bayesian Variable Selection Technique for Principal Component Analysis},
  howpublished = {EasyChair Preprint 9877},
  year      = {EasyChair, 2023}}
Download PDFOpen PDF in browser