Adapting random forests to predict obesity-associated gene expression

Abstract

Random forests (RFs) are effective at predicting gene expression from genotype data. However, a comparison of RF regressors and classifiers, including feature selection and encoding, has been under-explored in the context of gene expression prediction. Specifically, we examine the role of ordinal or one-hot encoding and of data balancing via oversampling in the prediction of obesity-associated gene expression. Our work shows that RFs compete with PrediXcan in the prediction of obesity-associated gene expression in subcutaneous adipose tissue, a highly relevant tissue to obesity. Additionally, RFs generate predictions for obesity-associated genes where PrediXcan fails to do so.

Publication
In 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Theodore Papamarkou
Theodore Papamarkou
Professor in maths of data science

Knowing is not enough, one must compute.