Can random forest provide insights into how yeast grows?

I’m not saying this is a good idea, but bear with me. A recent question on Stack Overflow [r] asked why a random forest model was not working as expected. The questioner was working with data from an experiment in which yeast was grown under conditions where (a) the growth rate could be controlled and (b) one of 6 nutrients was limited. Their dataset consisted of 6 rows – one per nutrient – and several thousand columns, with values representing the activity (expression) of yeast genes. Could the expression values be used to predict the limiting nutrient? The random forest was not working as expected: not one of the nutrients was correctly classified. I pointed out that with only one case for each outcome, this was to be expected – as the random forest algorithm samples a proportion of the rows, no correct predictions are likely in this case. As sometimes happens the question was promptly deleted, which was unfortunate as we could have further explored the problem. A little web searching revealed that the dataset in question is quite well-known. It’s published in an article titled Coordination of Growth Rate, Cell Cycle, Stress Response, and Metabolic Activity in Yeast and has been transformed into a “tidy” format for use in tutorials, here and here. As it turns out, there are 6 cases (rows) for each outcome (limiting nutrient), as experiments were performed at 6 different growth rates. Whilst random forests are good for “large p small n” pr...
Source: What You're Doing Is Rather Desperate - Category: Bioinformatics Authors: Tags: bioinformatics genomics statistics expression random forest rstats yeast Source Type: blogs