Informative metabolites identification by variable importance analysis based on random variable combination

Abstract Main target of metabolomics research is to reveal informative metabolites or biomarkers, which can be considered as a process of variable selection. So far, several methods, such as regression coefficient (RC), weights or variable importance in projection (VIP), have been widely used to assess the variable importance when building the partial least squares linear discriminant analysis PLS-LDA classification model. Then a set of metabolites can be selected by fixing a threshold value considering the rank of metabolites. However, they do not take into account the combination effect among a subset of variables, which will lead to bias within the results. In this work, a strategy named as variable importance analysis based on random variable combination (VIAVC), is developed for statistical assessment of variable importance. The framework of VIAVC includes mainly three parts: (1) employ a novel variables sampling method, called binary matrix resampling, which can guarantee that each variable has been selected with the same probability and generate a population of different variable combinations; (2) the importance of each variable is assessed by percent decrease or increase of the area under the receiver operating characteristic curve when the variable is excluded for the modeling by PLS-LDA; (3) iteratively retain and output the rank of the final remaining informative variables. The results of the applications to three metabolic datasets illustrate t...
Source: Metabolomics - Category: Biology Source Type: research
More News: Biology | Statistics