STRAS:a snakemake pipeline for genome-wide short tandem repeats annotation and score

AbstractHigh-throughput whole genome sequencing (WGS) is clinically used in finding single nucleotide variants and small indels. Several bioinformatics tools are developed to call short tandem repeats (STRs) copy numbers from WGS data, such as ExpansionHunter denovo, GangSTR and HipSTR. However, expansion disorders are rare and it is hard to find candidate expansions in single patient sequencing data with ~  800,000 STRs calls. In this paper I describe a snakemake pipeline for genome-wide STRs Annotation and Score (STRAS) using a Random Forest (RF) model to predict pathogenicity. The predictor was validated by benchmark data from Clinvar and PUBMED. True positive rate was 93.8%. True negative rate wa s 98.0%.Precision was 98.6% and recall rate was 93.8%. F1-score was 0.961. Sensitivity was 93.8% and specificity was 99.6%. These results showed STRAS could be a useful tool for clinical researchers to find STR loci of interest and filter out neutral STRs. STRAS is freely available athttps://github.com/fancheyu5/STRAS.
Source: Human Genetics - Category: Genetics & Stem Cells Source Type: research
More News: Genetics