A computational platform to identify origins of replication sites in eukaryotes.
Abstract The locations of the initiation of genomic DNA replication are defined as origins of replication sites (ORIs), which regulate the onset of DNA replication and play significant roles in the DNA replication process. The study of ORIs is essential for understanding the cell-division cycle and gene expression regulation. Accurate identification of ORIs will provide important clues for DNA replication research and drug development by developing computational methods. In this paper, the first integrated predictor named iORI-Euk was built to identify ORIs in multiple eukaryotes and multiple cell types. In the pr...
Source: Briefings in Bioinformatics - February 17, 2020 Category: Bioinformatics Authors: Dao FY, Lv H, Zulfiqar H, Yang H, Su W, Gao H, Ding H, Lin H Tags: Brief Bioinform Source Type: research

idenPC-MIIP: identify protein complexes from weighted PPI networks using mutual important interacting partner relation.
In this study, we propose the mutual important interacting partner relation to reflect the co-complex relationship of two proteins based on their interaction neighborhoods. In addition, a new algorithm called idenPC-MIIP is developed to identify protein complexes from weighted PPI networks. The experimental results on two widely used datasets show that idenPC-MIIP outperforms 17 state-of-the-art methods, especially for identification of small protein complexes with only two or three proteins. PMID: 32065215 [PubMed - as supplied by publisher] (Source: Briefings in Bioinformatics)
Source: Briefings in Bioinformatics - February 17, 2020 Category: Bioinformatics Authors: Wu Z, Liao Q, Liu B Tags: Brief Bioinform Source Type: research

Toward a high-quality pan-genome landscape of Bacillus subtilis by removal of confounding strains.
In this study, we carried out a series of pan-genome analyses of different strain sets of Bacillus subtilis to understand the impact of various strains on the performance and output quality of pan-genome analyses. Consequently, we found that the results obtained by pan-genome analyses of B. subtilis can be influenced by the inclusion of incorrectly classified Bacillus subspecies strains, phylogenetically distinct strains, engineered genome-reduced strains, chimeric strains, strains with a large number of unique genes or a large proportion of pseudogenes, and multiple clonal strains. Since the presence of these confounding ...
Source: Briefings in Bioinformatics - February 17, 2020 Category: Bioinformatics Authors: Wu H, Wang D, Gao F Tags: Brief Bioinform Source Type: research

Biological applications of knowledge graph embedding models.
Ťek V Abstract Complex biological systems are traditionally modelled as graphs of interconnected biological entities. These graphs, i.e. biological knowledge graphs, are then processed using graph exploratory approaches to perform different types of analytical and predictive tasks. Despite the high predictive accuracy of these approaches, they have limited scalability due to their dependency on time-consuming path exploratory procedures. In recent years, owing to the rapid advances of computational technologies, new approaches for modelling graphs and mining them with high accuracy and scalability have emerged. Th...
Source: Briefings in Bioinformatics - February 17, 2020 Category: Bioinformatics Authors: Mohamed SK, Nounu A, Nováček V Tags: Brief Bioinform Source Type: research

Exploration of databases and methods supporting drug repurposing: a comprehensive survey.
skela M Abstract Drug development involves a deep understanding of the mechanisms of action and possible side effects of each drug, and sometimes results in the identification of new and unexpected uses for drugs, termed as drug repurposing. Both in case of serendipitous observations and systematic mechanistic explorations, confirmation of new indications for a drug requires hypothesis building around relevant drug-related data, such as molecular targets involved, and patient and cellular responses. These datasets are available in public repositories, but apart from sifting through the sheer amount of data imposin...
Source: Briefings in Bioinformatics - February 14, 2020 Category: Bioinformatics Authors: Tanoli Z, Seemab U, Scherer A, Wennerberg K, Tang J, Vähä-Koskela M Tags: Brief Bioinform Source Type: research

Mass spectrometry-based protein identification in proteomics-a review.
Abstract Statistically, accurate protein identification is a fundamental cornerstone of proteomics and underpins the understanding and application of this technology across all elements of medicine and biology. Proteomics, as a branch of biochemistry, has in recent years played a pivotal role in extending and developing the science of accurately identifying the biology and interactions of groups of proteins or proteomes. Proteomics has primarily used mass spectrometry (MS)-based techniques for identifying proteins, although other techniques including affinity-based identifications still play significant roles. Her...
Source: Briefings in Bioinformatics - February 11, 2020 Category: Bioinformatics Authors: Noor Z, Ahn SB, Baker MS, Ranganathan S, Mohamedali A Tags: Brief Bioinform Source Type: research

Network analyses in microbiome based on high-throughput multi-omics data.
Liu B Abstract Together with various hosts and environments, ubiquitous microbes interact closely with each other forming an intertwined system or community. Of interest, shifts of the relationships between microbes and their hosts or environments are associated with critical diseases and ecological changes. While advances in high-throughput Omics technologies offer a great opportunity for understanding the structures and functions of microbiome, it is still challenging to analyse and interpret the omics data. Specifically, the heterogeneity and diversity of microbial communities, compounded with the large size of...
Source: Briefings in Bioinformatics - February 11, 2020 Category: Bioinformatics Authors: Liu Z, Ma A, Mathé E, Merling M, Ma Q, Liu B Tags: Brief Bioinform Source Type: research

Erratum to: Machine learning approaches and databases for prediction of drug-target interaction: a survey paper.
PMID: 32047893 [PubMed - as supplied by publisher] (Source: Briefings in Bioinformatics)
Source: Briefings in Bioinformatics - February 11, 2020 Category: Bioinformatics Authors: Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K Tags: Brief Bioinform Source Type: research

TRlnc: a comprehensive database for human transcriptional regulatory information of lncRNAs.
Abstract Long noncoding RNAs (lncRNAs) have been proven to play important roles in transcriptional processes and biological functions. With the increasing study of human diseases and biological processes, information in human H3K27ac ChIP-seq, ATAC-seq and DNase-seq datasets is accumulating rapidly, resulting in an urgent need to collect and process data to identify transcriptional regulatory regions of lncRNAs. We therefore developed a comprehensive database for human regulatory information of lncRNAs (TRlnc, http://bio.licpathway.net/TRlnc), which aimed to collect available resources of transcriptional regulator...
Source: Briefings in Bioinformatics - February 11, 2020 Category: Bioinformatics Authors: Li Y, Li X, Yang Y, Li M, Qian F, Tang Z, Zhao J, Zhang J, Bai X, Jiang Y, Zhou J, Zhang Y, Zhou L, Xie J, Li E, Wang Q, Li C Tags: Brief Bioinform Source Type: research

EP3: an ensemble predictor that accurately identifies type III secreted effectors.
Abstract Type III secretion systems (T3SS) can be found in many pathogenic bacteria, such as Dysentery bacillus, Salmonella typhimurium, Vibrio cholera and pathogenic Escherichia coli. The routes of infection of these bacteria include the T3SS transferring a large number of type III secreted effectors (T3SE) into host cells, thereby blocking or adjusting the communication channels of the host cells. Therefore, the accurate identification of T3SEs is the precondition for the further study of pathogenic bacteria. In this article, a new T3SEs ensemble predictor was developed, which can accurately distinguish T3SEs fr...
Source: Briefings in Bioinformatics - February 11, 2020 Category: Bioinformatics Authors: Li J, Wei L, Guo F, Zou Q Tags: Brief Bioinform Source Type: research

Biomedical data and computational models for drug repositioning: a comprehensive review.
Abstract Drug repositioning can drastically decrease the cost and duration taken by traditional drug research and development while avoiding the occurrence of unforeseen adverse events. With the rapid advancement of high-throughput technologies and the explosion of various biological data and medical data, computational drug repositioning methods have been appealing and powerful techniques to systematically identify potential drug-target interactions and drug-disease interactions. In this review, we first summarize the available biomedical data and public databases related to drugs, diseases and targets. Then, we ...
Source: Briefings in Bioinformatics - February 10, 2020 Category: Bioinformatics Authors: Luo H, Li M, Yang M, Wu FX, Li Y, Wang J Tags: Brief Bioinform Source Type: research

Meta-Prism: Ultra-fast and highly accurate microbial community structure search utilizing dual indexing and parallel computation.
Abstract Microbiome samples are accumulating at an unprecedented speed. As a result, a massive amount of samples have become available for the mining of the intrinsic patterns among them. However, due to the lack of advanced computational tools, fast yet accurate comparisons and searches among thousands to millions of samples are still in urgent need. In this work, we proposed the Meta-Prism method for comparing and searching the microbial community structures amongst tens of thousands of samples. Meta-Prism is at least 10 times faster than contemporary methods serving the same purpose and can provide very accurat...
Source: Briefings in Bioinformatics - February 7, 2020 Category: Bioinformatics Authors: Zhu M, Kang K, Ning K Tags: Brief Bioinform Source Type: research

High-dimensional variable selection for ordinal outcomes with error control.
This study reviews two existing variable selection frameworks, model-X knockoffs and a modified version of reference distribution variable selection (RDVS), both of which utilize artificial variables as benchmarks for decision making. Model-X knockoffs constructs a 'knockoff' variable for each covariate to mimic the covariance structure, while RDVS generates only one null variable and forms a reference distribution by performing multiple runs of model fitting. Herein, we describe how different importance measures for ordinal responses can be constructed that fit into these two selection frameworks, using either penalized r...
Source: Briefings in Bioinformatics - February 7, 2020 Category: Bioinformatics Authors: Fu H, Archer KJ Tags: Brief Bioinform Source Type: research

Toward a gold standard for benchmarking gene set enrichment analysis.
Abstract MOTIVATION: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. RESULTS: We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a cur...
Source: Briefings in Bioinformatics - February 6, 2020 Category: Bioinformatics Authors: Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Turaga N, Law C, Davis S, Carey V, Morgan M, Zimmer R, Waldron L Tags: Brief Bioinform Source Type: research

Human body-fluid proteome: quantitative profiling and computational prediction.
Abstract Empowered by the advancement of high-throughput bio technologies, recent research on body-fluid proteomes has led to the discoveries of numerous novel disease biomarkers and therapeutic drugs. In the meantime, a tremendous progress in disclosing the body-fluid proteomes was made, resulting in a collection of over 15 000 different proteins detected in major human body fluids. However, common challenges remain with current proteomics technologies about how to effectively handle the large variety of protein modifications in those fluids. To this end, computational effort utilizing statistical and machine-lea...
Source: Briefings in Bioinformatics - February 5, 2020 Category: Bioinformatics Authors: Huang L, Shao D, Wang Y, Cui X, Li Y, Chen Q, Cui J Tags: Brief Bioinform Source Type: research

Systematical identification of cell-specificity of CTCF-gene binding based on epigenetic modifications.
Abstract The CCCTC-binding factor (CTCF) mediates transcriptional regulation and implicates epigenetic modifications in cancers. However, the systematically unveiling inverse regulatory relationship between CTCF and epigenetic modifications still remains unclear, especially the mechanism by which histone modification mediates CTCF binding. Here, we developed a systematic approach to investigate how epigenetic changes affect CTCF binding. Through integration analysis of CTCF binding in 30 cell lines, we concluded that CTCF generally binds with higher intensity in normal cell lines than that in cancers, and higher i...
Source: Briefings in Bioinformatics - February 5, 2020 Category: Bioinformatics Authors: Wu J, Zhang L, Song Q, Yu L, Wang S, Zhang B, Wang W, Xia P, Chen X, Xiao Y, Xu C Tags: Brief Bioinform Source Type: research

GCdiscrimination: identification of gastric cancer based on a milliliter of blood.
Abstract Gastric cancer (GC) continues to be one of the major causes of cancer deaths worldwide. Meanwhile, liquid biopsies have received extensive attention in the screening and detection of cancer along with better understanding and clinical practice of biomarkers. In this work, 58 routine blood biochemical indices were tentatively used as integrated markers, which further expanded the scope of liquid biopsies and a discrimination system for GC consisting of 17 top-ranked indices, elaborated by random forest method was constructed to assist in preliminary assessment prior to histological and gastroscopic diagnos...
Source: Briefings in Bioinformatics - February 3, 2020 Category: Bioinformatics Authors: Wu J, Yang Y, Cheng L, Wu J, Xi L, Ma Y, Zhang P, Xu X, Zhang D, Li S Tags: Brief Bioinform Source Type: research

EPSD: a well-annotated data resource of protein phosphorylation sites in eukaryotes.
Abstract As an important post-translational modification (PTM), protein phosphorylation is involved in the regulation of almost all of biological processes in eukaryotes. Due to the rapid progress in mass spectrometry-based phosphoproteomics, a large number of phosphorylation sites (p-sites) have been characterized but remain to be curated. Here, we briefly summarized the current progresses in the development of data resources for the collection, curation, integration and annotation of p-sites in eukaryotic proteins. Also, we designed the eukaryotic phosphorylation site database (EPSD), which contained 1 616&...
Source: Briefings in Bioinformatics - February 2, 2020 Category: Bioinformatics Authors: Lin S, Wang C, Zhou J, Shi Y, Ruan C, Tu Y, Yao L, Peng D, Xue Y Tags: Brief Bioinform Source Type: research

Domain-specific introduction to machine learning terminology, pitfalls and opportunities in CRISPR-based gene editing.
Abstract The use of machine learning (ML) has become prevalent in the genome engineering space, with applications ranging from predicting target site efficiency to forecasting the outcome of repair events. However, jargon and ML-specific accuracy measures have made it hard to assess the validity of individual approaches, potentially leading to misinterpretation of ML results. This review aims to close the gap by discussing ML approaches and pitfalls in the context of CRISPR gene-editing applications. Specifically, we address common considerations, such as algorithm choice, as well as problems, such as overestimati...
Source: Briefings in Bioinformatics - February 2, 2020 Category: Bioinformatics Authors: O'Brien AR, Burgio G, Bauer DC Tags: Brief Bioinform Source Type: research

Deep learning-based clustering approaches for bioinformatics.
Abstract Clustering is central to many data-driven bioinformatics research and serves a powerful computational method. In particular, clustering helps at analyzing unstructured and high-dimensional data in the form of sequences, expressions, texts and images. Further, clustering is used to gain insights into biological processes in the genomics level, e.g. clustering of gene expressions provides insights on the natural structure inherent in the data, understanding gene functions, cellular processes, subtypes of cells and understanding gene regulations. Subsequently, clustering approaches, including hierarchical, c...
Source: Briefings in Bioinformatics - February 2, 2020 Category: Bioinformatics Authors: Karim MR, Beyan O, Zappa A, Costa IG, Rebholz-Schuhmann D, Cochez M, Decker S Tags: Brief Bioinform Source Type: research

Computational annotation of miRNA transcription start sites.
In this study, we summarized recent computational methods and their results on miRNA TSS annotation. We collected and performed a comparative analysis of miRNA TSS annotations from 14 representative studies. We further compiled a robust set of miRNA TSSs (RSmirT) that are supported by multiple studies. Integrative genomic and epigenomic data analysis on RSmirT revealed the genomic and epigenomic features of miRNA TSSs as well as their relations to protein-coding and long non-coding genes. CONTACT: xiaoman@mail.ucf.edu, haihu@cs.ucf.edu. PMID: 32003428 [PubMed - as supplied by publisher] (Source: Briefings in Bioinformatics)
Source: Briefings in Bioinformatics - January 31, 2020 Category: Bioinformatics Authors: Wang S, Talukder A, Cha M, Li X, Hu H Tags: Brief Bioinform Source Type: research

A network-based algorithm for the identification of moonlighting noncoding RNAs and its application in sepsis.
Abstract Moonlighting proteins provide more options for cells to execute multiple functions without increasing the genome and transcriptome complexity. Although there have long been calls for computational methods for the prediction of moonlighting proteins, no method has been designed for determining moonlighting long noncoding ribonucleicacidz (RNAs) (mlncRNAs). Previously, we developed an algorithm MoonFinder for the identification of mlncRNAs at the genome level based on the functional annotation and interactome data of lncRNAs and proteins. Here, we update MoonFinder to MoonFinder v2.0 by providing an extensi...
Source: Briefings in Bioinformatics - January 31, 2020 Category: Bioinformatics Authors: Liu X, Xu Y, Wang R, Liu S, Wang J, Luo Y, Leung KS, Cheng L Tags: Brief Bioinform Source Type: research

Erratum to: LARMD: integration of bioinformatic resources to profile ligand-driven protein dynamics with a case on the activation of estrogen receptor.
PMID: 31996905 [PubMed - as supplied by publisher] (Source: Briefings in Bioinformatics)
Source: Briefings in Bioinformatics - January 30, 2020 Category: Bioinformatics Authors: Yang JF, Wang F, Chen YZ, Hao GF, Yang GF Tags: Brief Bioinform Source Type: research

Closing the circle: current state and perspectives of circular RNA databases.
Abstract Circular RNAs (circRNAs) are covalently closed RNA molecules that have been linked to various diseases, including cancer. However, a precise function and working mechanism are lacking for the larger majority. Following many different experimental and computational approaches to identify circRNAs, multiple circRNA databases were developed as well. Unfortunately, there are several major issues with the current circRNA databases, which substantially hamper progression in the field. First, as the overlap in content is limited, a true reference set of circRNAs is lacking. This results from the low abundance an...
Source: Briefings in Bioinformatics - January 30, 2020 Category: Bioinformatics Authors: Vromman M, Vandesompele J, Volders PJ Tags: Brief Bioinform Source Type: research

Design powerful predictor for mRNA subcellular location prediction in Homo sapiens.
Abstract Messenger RNAs (mRNAs) shoulder special responsibilities that transmit genetic code from DNA to discrete locations in the cytoplasm. The locating process of mRNA might provide spatial and temporal regulation of mRNA and protein functions. The situ hybridization and quantitative transcriptomics analysis could provide detail information about mRNA subcellular localization; however, they are time consuming and expensive. It is highly desired to develop computational tools for timely and effectively predicting mRNA subcellular location. In this work, by using binomial distribution and one-way analysis of vari...
Source: Briefings in Bioinformatics - January 28, 2020 Category: Bioinformatics Authors: Zhang ZY, Yang YH, Ding H, Wang D, Chen W, Lin H Tags: Brief Bioinform Source Type: research

CpG-island-based annotation and analysis of human housekeeping genes.
Abstract By reviewing previous CpG-related studies, we consider that the transcription regulation of about half of the human genes, mostly housekeeping (HK) genes, involves CpG islands (CGIs), their methylation states, CpG spacing and other chromosomal parameters. However, the precise CGI definition and positioning of CGIs within gene structures, as well as specific CGI-associated regulatory mechanisms, all remain to be explained at individual gene and gene-family levels, together with consideration of species and lineage specificity. Although previous studies have already classified CGIs into high-CpG (HCGI), int...
Source: Briefings in Bioinformatics - January 25, 2020 Category: Bioinformatics Authors: Zhang L, Dai Z, Yu J, Xiao M Tags: Brief Bioinform Source Type: research

Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions.
In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark ...
Source: Briefings in Bioinformatics - January 25, 2020 Category: Bioinformatics Authors: Shen C, Hu Y, Wang Z, Zhang X, Zhong H, Wang G, Yao X, Xu L, Cao D, Hou T Tags: Brief Bioinform Source Type: research

Erratum to: Molecular dynamics simulations for genetic interpretation in protein coding regions: where we are, where to go and when.
cho J PMID: 31960888 [PubMed - as supplied by publisher] (Source: Briefings in Bioinformatics)
Source: Briefings in Bioinformatics - January 21, 2020 Category: Bioinformatics Authors: Galano-Frutos JJ, GarcĂ­a-Cebollada H, Sancho J Tags: Brief Bioinform Source Type: research

Deep learning for drug response prediction in cancer.
Abstract Predicting the sensitivity of tumors to specific anti-cancer treatments is a challenge of paramount importance for precision medicine. Machine learning(ML) algorithms can be trained on high-throughput screening data to develop models that are able to predict the response of cancer cell lines and patients to novel drugs or drug combinations. Deep learning (DL) refers to a distinct class of ML algorithms that have achieved top-level performance in a variety of fields, including drug discovery. These types of models have unique characteristics that may make them more suitable for the complex task of modeling...
Source: Briefings in Bioinformatics - January 17, 2020 Category: Bioinformatics Authors: Baptista D, Ferreira PG, Rocha M Tags: Brief Bioinform Source Type: research

Machine learning approaches and databases for prediction of drug-target interaction: a survey paper.
Abstract The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in order to avoid costly and laborious yet not-always-deterministic experiments to determine drug-target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been propos...
Source: Briefings in Bioinformatics - January 17, 2020 Category: Bioinformatics Authors: Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K Tags: Brief Bioinform Source Type: research

Structural dynamics and allostery of Rab proteins: strategies for drug discovery and design.
Abstract Rab proteins represent the largest family of the Rab superfamily guanosine triphosphatase (GTPase). Aberrant human Rab proteins are associated with multiple diseases, including cancers and neurological disorders. Rab subfamily members display subtle conformational variations that render specificity in their physiological functions and can be targeted for subfamily-specific drug design. However, drug discovery efforts have not focused much on targeting Rab allosteric non-nucleotide binding sites which are subjected to less evolutionary pressures to be conserved, hence are likely to offer subfamily specific...
Source: Briefings in Bioinformatics - January 17, 2020 Category: Bioinformatics Authors: Kumar AP, Verma CS, Lukman S Tags: Brief Bioinform Source Type: research

NCMCMDA: miRNA-disease association prediction through neighborhood constraint matrix completion.
In this study, we proposed a new computational model named Neighborhood Constraint Matrix Completion for MiRNA-Disease Association prediction (NCMCMDA) to predict potential miRNA-disease associations. The main task of NCMCMDA was to recover the missing miRNA-disease associations based on the known miRNA-disease associations and integrated disease (miRNA) similarity. In this model, we innovatively integrated neighborhood constraint with matrix completion, which provided a novel idea of utilizing similarity information to assist the prediction. After the recovery task was transformed into an optimization problem, we solved i...
Source: Briefings in Bioinformatics - January 12, 2020 Category: Bioinformatics Authors: Chen X, Sun LG, Zhao Y Tags: Brief Bioinform Source Type: research

A survey and systematic assessment of computational methods for drug response prediction.
This study provides insights and lessons for future research into drug response prediction. PMID: 31927568 [PubMed - as supplied by publisher] (Source: Briefings in Bioinformatics)
Source: Briefings in Bioinformatics - January 11, 2020 Category: Bioinformatics Authors: Chen J, Zhang L Tags: Brief Bioinform Source Type: research

SCDC: bulk gene expression deconvolution by multiple single-cell RNA sequencing references.
Abstract Recent advances in single-cell RNA sequencing (scRNA-seq) enable characterization of transcriptomic profiles with single-cell resolution and circumvent averaging artifacts associated with traditional bulk RNA sequencing (RNA-seq) data. Here, we propose SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets. SCDC adopts an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets that are produced in different laboratories and at different times, implicitly addressing the problem of batch...
Source: Briefings in Bioinformatics - January 10, 2020 Category: Bioinformatics Authors: Dong M, Thennavan A, Urrutia E, Li Y, Perou CM, Zou F, Jiang Y Tags: Brief Bioinform Source Type: research

On the critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation.
Abstract A review, recently published in this journal by Fang (2019), showed that methods trained for the prediction of protein stability changes upon mutation have a very critical bias: they neglect that a protein variation (A- > B) and its reverse (B- > A) must have the opposite value of the free energy difference (ΔΔGAB = - ΔΔGBA). In this letter, we complement the Fang's paper presenting a more general view of the problem. In particular, a machine learning-based method, published in 2015 (INPS), addressed the bias issue directly. We include ...
Source: Briefings in Bioinformatics - December 28, 2019 Category: Bioinformatics Authors: Savojardo C, Martelli PL, Casadio R, Fariselli P Tags: Brief Bioinform Source Type: research

Drug-target prediction utilizing heterogeneous bio-linked network embeddings.
Abstract To enable modularization for network-based prediction, we conducted a review of known methods conducting the various subtasks corresponding to the creation of a drug-target prediction framework and associated benchmarking to determine the highest-performing approaches. Accordingly, our contributions are as follows: (i) from a network perspective, we benchmarked the association-mining performance of 32 distinct subnetwork permutations, arranging based on a comprehensive heterogeneous biomedical network derived from 12 repositories; (ii) from a methodological perspective, we identified the best prediction s...
Source: Briefings in Bioinformatics - December 27, 2019 Category: Bioinformatics Authors: Zong N, Wong RSN, Yu Y, Wen A, Huang M, Li N Tags: Brief Bioinform Source Type: research

Computational resources and strategies to assess single-molecule dynamics of the translation process in S. cerevisiae.
Azevedo NF Abstract This work provides a systematic and comprehensive overview of available resources for the molecular-scale modelling of the translation process through agent-based modelling. The case study is the translation in Saccharomyces cerevisiae, one of the most studied yeasts. The data curation workflow encompassed structural information about the yeast (i.e. the simulation environment), and the proteins, ribonucleic acids and other types of molecules involved in the process (i.e. the agents). Moreover, it covers the main process events, such as diffusion (i.e. motion of molecules in the environment) a...
Source: Briefings in Bioinformatics - December 27, 2019 Category: Bioinformatics Authors: T Magalhães B, Lourenço A, Azevedo NF Tags: Brief Bioinform Source Type: research

Deep learning for mining protein data.
Abstract The recent emergence of deep learning to characterize complex patterns of protein big data reveals its potential to address the classic challenges in the field of protein data mining. Much research has revealed the promise of deep learning as a powerful tool to transform protein big data into valuable knowledge, leading to scientific discoveries and practical solutions. In this review, we summarize recent publications on deep learning predictive approaches in the field of mining protein data. The application architectures of these methods include multilayer perceptrons, stacked autoencoders, deep belief n...
Source: Briefings in Bioinformatics - December 20, 2019 Category: Bioinformatics Authors: Shi Q, Chen W, Huang S, Wang Y, Xue Z Tags: Brief Bioinform Source Type: research

Current challenges and best-practice protocols for microbiome analysis.
Abstract Analyzing the microbiome of diverse species and environments using next-generation sequencing techniques has significantly enhanced our understanding on metabolic, physiological and ecological roles of environmental microorganisms. However, the analysis of the microbiome is affected by experimental conditions (e.g. sequencing errors and genomic repeats) and computationally intensive and cumbersome downstream analysis (e.g. quality control, assembly, binning and statistical analyses). Moreover, the introduction of new sequencing technologies and protocols led to a flood of new methodologies, which also hav...
Source: Briefings in Bioinformatics - December 18, 2019 Category: Bioinformatics Authors: Bharti R, Grimm DG Tags: Brief Bioinform Source Type: research

Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches.
, Kaski S Abstract Predicting the response of cancer cell lines to specific drugs is one of the central problems in personalized medicine, where the cell lines show diverse characteristics. Researchers have developed a variety of computational methods to discover associations between drugs and cell lines, and improved drug sensitivity analyses by integrating heterogeneous biological data. However, choosing informative data sources and methods that can incorporate multiple sources efficiently is the challenging part of successful analysis in personalized medicine. The reason is that finding decisive factors of canc...
Source: Briefings in Bioinformatics - December 15, 2019 Category: Bioinformatics Authors: Güvenç Paltun B, Mamitsuka H, Kaski S Tags: Brief Bioinform Source Type: research

A survey on adverse drug reaction studies: data, tasks and machine learning methods.
Abstract MOTIVATION: Adverse drug reaction (ADR) or drug side effect studies play a crucial role in drug discovery. Recently, with the rapid increase of both clinical and non-clinical data, machine learning methods have emerged as prominent tools to support analyzing and predicting ADRs. Nonetheless, there are still remaining challenges in ADR studies. RESULTS: In this paper, we summarized ADR data sources and review ADR studies in three tasks: drug-ADR benchmark data creation, drug-ADR prediction and ADR mechanism analysis. We focused on machine learning methods used in each task and then compare performance...
Source: Briefings in Bioinformatics - December 15, 2019 Category: Bioinformatics Authors: Nguyen DA, Nguyen CH, Mamitsuka H Tags: Brief Bioinform Source Type: research

An extensive review of tools for manual annotation of documents.
eva J Abstract MOTIVATION: Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms. Further, annotation tools are also used to extract new information for a particular use case. However, owing to the high number of existing annotation tools, finding the one that best fits particular needs is a demanding task that requires searching the scientific literature followed by installing and trying various tools. METHODS: We searched for annotation tools and selected a subset of them according to five require...
Source: Briefings in Bioinformatics - December 15, 2019 Category: Bioinformatics Authors: Neves M, Ĺ eva J Tags: Brief Bioinform Source Type: research

Predicting drug-induced hepatotoxicity based on biological feature maps and diverse classification strategies.
In this study, we developed a drug-induced hepatotoxicity prediction model taking account of both the biological context and the computational efficacy based on toxicogenomics data. Specifically, we proposed a novel gene selection algorithm considering gene's participation, named BioCB, to choose the discriminative genes and make more efficient prediction. Then instead of using the raw gene expression levels to characterize each drug, we developed a two-dimensional biological process feature pattern map to represent each drug. Then we employed two strategies to handle the maps and identify the hepatotoxicity, the direct us...
Source: Briefings in Bioinformatics - December 14, 2019 Category: Bioinformatics Authors: Su R, Wu H, Liu X, Wei L Tags: Brief Bioinform Source Type: research

Current RNA-seq methodology reporting limits reproducibility.
Abstract Ribonucleic acid sequencing (RNA-seq) identifies and quantifies RNA molecules from a biological sample. Transformation from raw sequencing data to meaningful gene or isoform counts requires an in silico bioinformatics pipeline. Such pipelines are modular in nature, built using selected software and biological references. Software is usually chosen and parameterized according to the sequencing protocol and biological question. However, while biological and technical noise is alleviated through replicates, biases due to the pipeline and choice of biological references are often overlooked. Here, we show tha...
Source: Briefings in Bioinformatics - December 8, 2019 Category: Bioinformatics Authors: Simoneau J, Dumontier S, Gosselin R, Scott MS Tags: Brief Bioinform Source Type: research

The effect of tissue composition on gene co-expression.
In this study, we illustrate the effect of variable cell-type composition on correlation-based network estimation and provide a mathematical decomposition of the tissue-level correlation. We show that a class of deconvolution methods developed to separate tumor and stromal signatures can be applied to two component cell-type mixtures. In simulated and real data, we identify conditions in which a deconvolution approach would be beneficial. Our results suggest that uncorrelated cell-type-specific markers are ideally suited to deconvolute both the expression and co-expression patterns of an individual cell type. We provide a ...
Source: Briefings in Bioinformatics - December 8, 2019 Category: Bioinformatics Authors: Zhang Y, Cuerdo J, Halushka MK, McCall MN Tags: Brief Bioinform Source Type: research

Molecular dynamics simulations for genetic interpretation in protein coding regions: where we are, where to go and when.
cho J Abstract The increasing ease with which massive genetic information can be obtained from patients or healthy individuals has stimulated the development of interpretive bioinformatics tools as aids in clinical practice. Most such tools analyze evolutionary information and simple physical-chemical properties to predict whether replacement of one amino acid residue with another will be tolerated or cause disease. Those approaches achieve up to 80-85% accuracy as binary classifiers (neutral/pathogenic). As such accuracy is insufficient for medical decision to be based on, and it does not appear to be increasing,...
Source: Briefings in Bioinformatics - December 8, 2019 Category: Bioinformatics Authors: Galano-Frutos JJ, GarcĂ­a-Cebollada H, Sancho J Tags: Brief Bioinform Source Type: research

Deep learning of pharmacogenomics resources: moving towards precision oncology.
Abstract The recent accumulation of cancer genomic data provides an opportunity to understand how a tumor's genomic characteristics can affect its responses to drugs. This field, called pharmacogenomics, is a key area in the development of precision oncology. Deep learning (DL) methodology has emerged as a powerful technique to characterize and learn from rapidly accumulating pharmacogenomics data. We introduce the fundamentals and typical model architectures of DL. We review the use of DL in classification of cancers and cancer subtypes (diagnosis and treatment stratification of patients), prediction of drug resp...
Source: Briefings in Bioinformatics - December 8, 2019 Category: Bioinformatics Authors: Chiu YC, Chen HH, Gorthi A, Mostavi M, Zheng S, Huang Y, Chen Y Tags: Brief Bioinform Source Type: research

Fold-LTR-TCP: protein fold recognition based on triadic closure principle.
In this study, the protein fold recognition is treated as an information retrieval task. The Learning to Rank model (LTR) was employed to retrieve the query protein against the template proteins to find the template proteins in the same fold with the query protein in a supervised manner. The triadic closure principle (TCP) was performed on the ranking list generated by the LTR to improve its accuracy by considering the relationship among the query protein and the template proteins in the ranking list. Finally, a predictor called Fold-LTR-TCP was proposed. The rigorous test on the LE benchmark dataset showed that the Fold-L...
Source: Briefings in Bioinformatics - December 8, 2019 Category: Bioinformatics Authors: Liu B, Zhu Y, Yan K Tags: Brief Bioinform Source Type: research

Pathway Tools version 23.0 update: software for pathway/genome informatics and systems biology.
Abstract MOTIVATION: Biological systems function through dynamic interactions among genes and their products, regulatory circuits and metabolic networks. Our development of the Pathway Tools software was motivated by the need to construct biological knowledge resources that combine these many types of data, and that enable users to find and comprehend data of interest as quickly as possible through query and visualization tools. Further, we sought to support the development of metabolic flux models from pathway databases, and to use pathway information to leverage the interpretation of high-throughput data sets. ...
Source: Briefings in Bioinformatics - December 8, 2019 Category: Bioinformatics Authors: Karp PD, Midford PE, Billington R, Kothari A, Krummenacker M, Latendresse M, Ong WK, Subhraveti P, Caspi R, Fulcher C, Keseler IM, Paley SM Tags: Brief Bioinform Source Type: research

Landscape of cancer diagnostic biomarkers from specifically expressed genes.
In this study, we comprehensively surveyed the specifically expressed genes (SEGs) using the SEGtool based on the big data of gene expression from the The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) projects. In 15 solid tumors, we identified 233 cancer-specific SEGs (cSEGs), which were specifically expressed in only one cancer and showed great potential to be diagnostic biomarkers. Among them, three cSEGs (OGDH, MUDENG and ACO2) had a sample frequency>80% in kidney cancer, suggesting their high sensitivity. Furthermore, we identified 254 cSEGs as early-stage diagnostic biomarkers across 17 canc...
Source: Briefings in Bioinformatics - December 8, 2019 Category: Bioinformatics Authors: Lv Y, Lin SY, Hu FF, Ye Z, Zhang Q, Wang Y, Guo AY Tags: Brief Bioinform Source Type: research