학술논문

Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations.
Document Type
Article
Source
Communications Biology. 4/3/2024, Vol. 7 Issue 1, p1-11. 11p.
Subject
*SUBSET selection
*MIXED integer linear programming
*PHENOTYPES
*CONDITIONED response
Language
ISSN
2399-3642
Abstract
The interpretation of complex biological datasets requires the identification of representative variables that describe the data without critical information loss. This is particularly important in the analysis of large phenotypic datasets (phenomics). Here we introduce Multi-Attribute Subset Selection (MASS), an algorithm which separates a matrix of phenotypes (e.g., yield across microbial species and environmental conditions) into predictor and response sets of conditions. Using mixed integer linear programming, MASS expresses the response conditions as a linear combination of the predictor conditions, while simultaneously searching for the optimally descriptive set of predictors. We apply the algorithm to three microbial datasets and identify environmental conditions that predict phenotypes under other conditions, providing biologically interpretable axes for strain discrimination. MASS could be used to reduce the number of experiments needed to identify species or to map their metabolic capabilities. The generality of the algorithm allows addressing subset selection problems in areas beyond biology. Multi-Attribute Subset Selection is an algorithm for identifying the most descriptive growth conditions in microbial phenotyping experiments. This streamlines high-throughput phenotyping efforts and helps to reveal biological insights. [ABSTRACT FROM AUTHOR]