Predictive Recognition of DNA-binding Proteins Based on Pre-trained Language Model BERT
J Bioinform Comput Biol. 2023 Dec;21(6):2350028. doi: 10.1142/S0219720023500282. Epub 2024 Jan 23.ABSTRACTIdentifying proteins is crucial for disease diagnosis and treatment. With the increase of known proteins, large-scale batch predictions are essential. However, traditional biological experiments being time-consuming and expensive are difficult to accomplish this task efficiently. Nevertheless, deep learning algorithms based on big data analysis have manifested potential in this aspect. In recent years, language representation models, especially BERT, have made significant advancements in natural language processing. In...
Source: Journal of Bioinformatics and Computational Biology - January 22, 2024 Category: Bioinformatics Authors: Yue Ma Yongzhen Pei Changguo Li Source Type: research

Imputation for Single-cell RNA-seq Data with Non-negative Matrix Factorization and Transfer Learning
J Bioinform Comput Biol. 2023 Dec;21(6):2350029. doi: 10.1142/S0219720023500294. Epub 2024 Jan 23.ABSTRACTSingle-cell RNA sequencing (scRNA-seq) has been proven to be an effective technology for investigating the heterogeneity and transcriptome dynamics due to the single-cell resolution. However, one of the major problems for data obtained by scRNA-seq is excessive zeros in the count matrix, which hinders the downstream analysis enormously. Here, we present a method that integrates non-negative matrix factorization and transfer learning (NMFTL) to impute the scRNA-seq data. It borrows gene expression information from the a...
Source: Journal of Bioinformatics and Computational Biology - January 22, 2024 Category: Bioinformatics Authors: Jiadi Zhu Youlong Yang Source Type: research

Predictive Recognition of DNA-binding Proteins Based on Pre-trained Language Model BERT
J Bioinform Comput Biol. 2023 Dec;21(6):2350028. doi: 10.1142/S0219720023500282. Epub 2024 Jan 23.ABSTRACTIdentifying proteins is crucial for disease diagnosis and treatment. With the increase of known proteins, large-scale batch predictions are essential. However, traditional biological experiments being time-consuming and expensive are difficult to accomplish this task efficiently. Nevertheless, deep learning algorithms based on big data analysis have manifested potential in this aspect. In recent years, language representation models, especially BERT, have made significant advancements in natural language processing. In...
Source: Journal of Bioinformatics and Computational Biology - January 22, 2024 Category: Bioinformatics Authors: Yue Ma Yongzhen Pei Changguo Li Source Type: research

Imputation for Single-cell RNA-seq Data with Non-negative Matrix Factorization and Transfer Learning
J Bioinform Comput Biol. 2023 Dec;21(6):2350029. doi: 10.1142/S0219720023500294. Epub 2024 Jan 23.ABSTRACTSingle-cell RNA sequencing (scRNA-seq) has been proven to be an effective technology for investigating the heterogeneity and transcriptome dynamics due to the single-cell resolution. However, one of the major problems for data obtained by scRNA-seq is excessive zeros in the count matrix, which hinders the downstream analysis enormously. Here, we present a method that integrates non-negative matrix factorization and transfer learning (NMFTL) to impute the scRNA-seq data. It borrows gene expression information from the a...
Source: Journal of Bioinformatics and Computational Biology - January 22, 2024 Category: Bioinformatics Authors: Jiadi Zhu Youlong Yang Source Type: research

Predictive Recognition of DNA-binding Proteins Based on Pre-trained Language Model BERT
J Bioinform Comput Biol. 2023 Dec;21(6):2350028. doi: 10.1142/S0219720023500282. Epub 2024 Jan 23.ABSTRACTIdentifying proteins is crucial for disease diagnosis and treatment. With the increase of known proteins, large-scale batch predictions are essential. However, traditional biological experiments being time-consuming and expensive are difficult to accomplish this task efficiently. Nevertheless, deep learning algorithms based on big data analysis have manifested potential in this aspect. In recent years, language representation models, especially BERT, have made significant advancements in natural language processing. In...
Source: Journal of Bioinformatics and Computational Biology - January 22, 2024 Category: Bioinformatics Authors: Yue Ma Yongzhen Pei Changguo Li Source Type: research

Imputation for Single-cell RNA-seq Data with Non-negative Matrix Factorization and Transfer Learning
J Bioinform Comput Biol. 2023 Dec;21(6):2350029. doi: 10.1142/S0219720023500294. Epub 2024 Jan 23.ABSTRACTSingle-cell RNA sequencing (scRNA-seq) has been proven to be an effective technology for investigating the heterogeneity and transcriptome dynamics due to the single-cell resolution. However, one of the major problems for data obtained by scRNA-seq is excessive zeros in the count matrix, which hinders the downstream analysis enormously. Here, we present a method that integrates non-negative matrix factorization and transfer learning (NMFTL) to impute the scRNA-seq data. It borrows gene expression information from the a...
Source: Journal of Bioinformatics and Computational Biology - January 22, 2024 Category: Bioinformatics Authors: Jiadi Zhu Youlong Yang Source Type: research

Predictive Recognition of DNA-binding Proteins Based on Pre-trained Language Model BERT
J Bioinform Comput Biol. 2023 Dec;21(6):2350028. doi: 10.1142/S0219720023500282. Epub 2024 Jan 23.ABSTRACTIdentifying proteins is crucial for disease diagnosis and treatment. With the increase of known proteins, large-scale batch predictions are essential. However, traditional biological experiments being time-consuming and expensive are difficult to accomplish this task efficiently. Nevertheless, deep learning algorithms based on big data analysis have manifested potential in this aspect. In recent years, language representation models, especially BERT, have made significant advancements in natural language processing. In...
Source: Journal of Bioinformatics and Computational Biology - January 22, 2024 Category: Bioinformatics Authors: Yue Ma Yongzhen Pei Changguo Li Source Type: research

Algorithms for the Uniqueness of the Longest Common Subsequence
J Bioinform Comput Biol. 2023 Dec;21(6):2350027. doi: 10.1142/S0219720023500270. Epub 2024 Jan 10.ABSTRACTGiven several number sequences, determining the longest common subsequence is a classical problem in computer science. This problem has applications in bioinformatics, especially determining transposable genes. Nevertheless, related works only consider how to find one longest common subsequence. In this paper, we consider how to determine the uniqueness of the longest common subsequence. If there are multiple longest common subsequences, we also determine which number appears in all/some/none of the longest common subs...
Source: Journal of Bioinformatics and Computational Biology - January 11, 2024 Category: Bioinformatics Authors: Yue Wang Source Type: research

CNV-FB: A Feature bagging strategy-based approach to detect copy number variants from NGS data
J Bioinform Comput Biol. 2023 Dec;21(6):2350026. doi: 10.1142/S0219720023500269. Epub 2024 Jan 10.ABSTRACTCopy number variation (CNV), as a type of genomic structural variation, accounts for a large proportion of structural variation and is related to the pathogenesis and susceptibility to some human diseases, playing an important role in the development and change of human diseases. The development of next-generation sequencing technology (NGS) provides strong support for the design of CNV detection algorithms. Although a large number of methods have been developed to detect CNVs using NGS data, it is still considered a d...
Source: Journal of Bioinformatics and Computational Biology - January 11, 2024 Category: Bioinformatics Authors: Chengyou Li Shiqiang Fan Haiyong Zhao Xiaotong Liu Source Type: research

Small groups in multidimensional feature space: Two examples of supervised two-group classification from biomedicine
We describe a method that considers the size and shape of feature distributions, as well as the pairwise relations between measured features as separate derived features and prognostic factors. Additionally, we explain how to perform similarity calculations that account for the variation in feature values within groups and inaccuracies in individual value measurements. By following these steps, a more accurate and reliable analysis can be achieved when working with biomedical datasets that have a small sample size and multiple features.PMID:38212875 | DOI:10.1142/S0219720023500257 (Source: Journal of Bioinformatics and Computational Biology)
Source: Journal of Bioinformatics and Computational Biology - January 11, 2024 Category: Bioinformatics Authors: Dmitriy Karpenko Aleksei Bigildeev Source Type: research

CBDT-Oglyc: Prediction of O-glycosylation sites using ChiMIC-based balanced decision table and feature selection
J Bioinform Comput Biol. 2023 Oct;21(5):2350024. doi: 10.1142/S0219720023500245. Epub 2023 Oct 28.ABSTRACTO-glycosylation (Oglyc) plays an important role in various biological processes. The key to understanding the mechanisms of Oglyc is identifying the corresponding glycosylation sites. Two critical steps, feature selection and classifier design, greatly affect the accuracy of computational methods for predicting Oglyc sites. Based on an efficient feature selection algorithm and a classifier capable of handling imbalanced datasets, a new computational method, ChiMIC-based balanced decision table O-glycosylation (CBDT-Ogl...
Source: Journal of Bioinformatics and Computational Biology - October 29, 2023 Category: Bioinformatics Authors: Ying Zeng Zheming Yuan Yuan Chen Ying Hu Source Type: research

iAMY-RECMFF: Identifying amyloidgenic peptides by using residue pairwise energy content matrix and features fusion algorithm
In this study, we have developed a machine learning framework called iAMY-RECMFF to discriminate amyloidgenic from non-amyloidgenic peptides. In our model, we first encoded the peptide sequences using the residue pairwise energy content matrix. We then utilized Pearson's correlation coefficient and distance correlation to extract useful information from this matrix. Additionally, we employed an improved similarity network fusion algorithm to integrate features from different perspectives. The Fisher approach was adopted to select the optimal feature subset. Finally, the selected features were inputted into a support vector...
Source: Journal of Bioinformatics and Computational Biology - October 29, 2023 Category: Bioinformatics Authors: Zizheng Yu Zhijian Yin Hongliang Zou Source Type: research

AAindex-PPII: Predicting polyproline type II helix structure based on amino acid indexes with an improved BiGRU-TextCNN model
This study demonstrates that our proposed method is simple and efficient for PPII prediction without using pre-trained large models or complex features such as position-specific scoring matrices.PMID:37899354 | DOI:10.1142/S0219720023500221 (Source: Journal of Bioinformatics and Computational Biology)
Source: Journal of Bioinformatics and Computational Biology - October 29, 2023 Category: Bioinformatics Authors: Jiasheng He Shun Zhang Chun Fang Source Type: research

CBDT-Oglyc: Prediction of O-glycosylation sites using ChiMIC-based balanced decision table and feature selection
J Bioinform Comput Biol. 2023 Oct;21(5):2350024. doi: 10.1142/S0219720023500245. Epub 2023 Oct 28.ABSTRACTO-glycosylation (Oglyc) plays an important role in various biological processes. The key to understanding the mechanisms of Oglyc is identifying the corresponding glycosylation sites. Two critical steps, feature selection and classifier design, greatly affect the accuracy of computational methods for predicting Oglyc sites. Based on an efficient feature selection algorithm and a classifier capable of handling imbalanced datasets, a new computational method, ChiMIC-based balanced decision table O-glycosylation (CBDT-Ogl...
Source: Journal of Bioinformatics and Computational Biology - October 29, 2023 Category: Bioinformatics Authors: Ying Zeng Zheming Yuan Yuan Chen Ying Hu Source Type: research

iAMY-RECMFF: Identifying amyloidgenic peptides by using residue pairwise energy content matrix and features fusion algorithm
In this study, we have developed a machine learning framework called iAMY-RECMFF to discriminate amyloidgenic from non-amyloidgenic peptides. In our model, we first encoded the peptide sequences using the residue pairwise energy content matrix. We then utilized Pearson's correlation coefficient and distance correlation to extract useful information from this matrix. Additionally, we employed an improved similarity network fusion algorithm to integrate features from different perspectives. The Fisher approach was adopted to select the optimal feature subset. Finally, the selected features were inputted into a support vector...
Source: Journal of Bioinformatics and Computational Biology - October 29, 2023 Category: Bioinformatics Authors: Zizheng Yu Zhijian Yin Hongliang Zou Source Type: research