A Data-Distribution and Successive Spline Points based discretization approach for evolving gene regulatory networks from scRNA-Seq time-series data using Cartesian Genetic Programming

Biosystems. 2024 Feb;236:105126. doi: 10.1016/j.biosystems.2024.105126. Epub 2024 Jan 24.ABSTRACTThe inference of gene regulatory networks (GRNs) is a widely addressed problem in Systems Biology. GRNs can be modeled as Boolean networks, which is the simplest approach for this task. However, Boolean models need binarized data. Several approaches have been developed for the discretization of gene expression data (GED). Also, the advance of data extraction technologies, such as single-cell RNA-Sequencing (scRNA-Seq), provides a new vision of gene expression and brings new challenges for dealing with its specificities, such as a large occurrence of zero data. This work proposes a new discretization approach for dealing with scRNA-Seq time-series data, named Distribution and Successive Spline Points Discretization (DSSPD), which considers the data distribution and a proper preprocessing step. Here, Cartesian Genetic Programming (CGP) is used to infer GRNs using the results of DSSPD. The proposal is compared with CGP with the standard data handling and five state-of-the-art algorithms on curated models and experimental data. The results show that the proposal improves the results of CGP in all tested cases and outperforms the state-of-the-art algorithms in most cases.PMID:38278505 | DOI:10.1016/j.biosystems.2024.105126
Source: Biosystems - Category: Biotechnology Authors: Source Type: research