Adapting random forests to predict obesity-associated gene expression


Random forests (RFs) are effective at predicting gene expression from genotype data. However, a comparison of RF regressors and classifiers, including feature selection and encoding, has been under-explored in the context of gene expression prediction. Specifically, we examine the role of ordinal or one-hot encoding and of data balancing via oversampling in the prediction of obesity-associated gene expression. Our work shows that RFs compete with PrediXcan in the prediction of obesity-associated gene expression in subcutaneous adipose tissue, a highly relevant tissue to obesity. Additionally, RFs generate predictions for obesity-associated genes where PrediXcan fails to do so.

In 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Theodore Papamarkou
Theodore Papamarkou
Reader in maths of data science

My research interests span probabilistic machine learning, with a main focus on Bayesian deep learning, and topological deep learning.