Adapting random forests to predict obesity-associated gene expression

Jeremy Watts, Elexis Allen, Ahmad Mitoubsi, Anahita Khojandi, James Eales, Farideh Jalali-Najafabadi, Theodore Papamarkou

July, 2022

Abstract

Random forests (RFs) are effective at predicting gene expression from genotype data. However, a comparison of RF regressors and classifiers, including feature selection and encoding, has been under-explored in the context of gene expression prediction. Specifically, we examine the role of ordinal or one-hot encoding and of data balancing via oversampling in the prediction of obesity-associated gene expression. Our work shows that RFs compete with PrediXcan in the prediction of obesity-associated gene expression in subcutaneous adipose tissue, a highly relevant tissue to obesity. Additionally, RFs generate predictions for obesity-associated genes where PrediXcan fails to do so.

Type

Conference paper

Publication

In 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Gene Expression Random Forests

Adapting random forests to predict obesity-associated gene expression

Abstract

Theodore Papamarkou

Distinguished professor