Genome-Enabled Prediction Using the BLR (Bayesian Linear Regression) R-Package
The BLR (Bayesian linear regression) package of R implements several Bayesian regression models for continuous traits. The package was originally developed for implementing the Bayesian LASSO (BL) of Park and Casella (J Am Stat Assoc 103(482):681–686, 2008), extended to accommodate fixed effects and regressions on pedigree using methods described by de los Campos et al. (Genetics 182(1):375–385, 2009). In 2010 we further developed the code into an R-package, reprogrammed some internal aspects of the algorithm in the C language to increase computational speed, and further documented the package (Plant Genome J 3...
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Implementing a QTL Detection Study (GWAS) Using Genomic Prediction Methodology
Genomic prediction exploits historical genotypic and phenotypic data to predict performance on selection candidates based only on their genotypes. It achieves this by a process known as training that derives the values of all the chromosome fragments that can be characterized by regressing the historical phenotypes on some or all of the genotyped loci. A genome-wide association study (GWAS) involves a genome-wide search for chromosome fragments with significant association with phenotype. One Bayesian approach to GWAS makes inferences using samples from the posterior distribution of genotypic effects obtained in the traini...
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Bayesian Methods Applied to GWAS
Bayesian multiple-regression methods are being successfully used for genomic prediction and selection. These regression models simultaneously fit many more markers than the number of observations available for the analysis. Thus, the Bayes theorem is used to combine prior beliefs of marker effects, which are expressed in terms of prior distributions, with information from data for inference. Often, the analyses are too complex for closed-form solutions and Markov chain Monte Carlo (MCMC) sampling is used to draw inferences from posterior distributions. This chapter describes how these Bayesian multiple-regression analyses ...
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Association Weight Matrix: A Network-Based Approach Towards Functional Genome-Wide Association Studies
In this chapter we describe the Association Weight Matrix (AWM), a novel procedure to exploit the results from genome-wide association studies (GWAS) and, in combination with network inference algorithms, generate gene networks with regulatory and functional significance. In simple terms, the AWM is a matrix with rows represented by genes and columns represented by phenotypes. Individual {i, j}th elements in the AWM correspond to the association of the SNP in the ith gene to the jth phenotype. While our main objective is to provide a recipe-like tutorial on how to build and use AWM, we also take the opportunity to briefly ...
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Mixed Effects Structural Equation Models and Phenotypic Causal Networks
Complex networks with causal relationships among variables are pervasive in biology. Their study, however, requires special modeling approaches. Structural equation models (SEM) allow the representation of causal mechanisms among phenotypic traits and inferring the magnitude of causal relationships. This information is important not only in understanding how variables relate to each other in a biological system, but also to predict how this system reacts under external interventions which are common in fields related to health and food production. Nevertheless, fitting a SEM requires defining a priori the causal structure ...
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Epistasis, Complexity, and Multifactor Dimensionality Reduction
Genome-wide association studies (GWASs) and other high-throughput initiatives have led to an information explosion in human genetics and genetic epidemiology. Conversion of this wealth of new information about genomic variation to knowledge about public health and human biology will depend critically on the complexity of the genotype to phenotype mapping relationship. We review here computational approaches to genetic analysis that embrace, rather than ignore, the complexity of human health. We focus on multifactor dimensionality reduction (MDR) as an approach for modeling one of these complexities: epistasis or gene&ndash...
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Using PLINK for Genome-Wide Association Studies (GWAS) and Data Analysis
Within this chapter we introduce the basic PLINK functions for reading in data, applying quality control, and running association analyses. Three worked examples are provided to illustrate: data management and assessment of population substructure, association analysis of a quantitative trait, and qualitative or case–control association analyses. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Statistical Analysis of Genomic Data
In this chapter we describe methods for statistical analysis of GWAS data with the goal of quantifying evidence for genomic effects associated with trait variation, while avoiding spurious associations due to evidence not being well quantified or due to population structure. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)
This chapter provides an overview of statistical methods for genome-wide association studies (GWAS) in animals, plants, and humans. The simplest form of GWAS, a marker-by-marker analysis, is illustrated with a simple example. The problem of selecting a significance threshold that accounts for the large amount of multiple testing that occurs in GWAS is discussed. Population structure causes false positive associations in GWAS if not accounted for, and methods to deal with this are presented. Methodology for more complex models for GWAS, including haplotype-based approaches, accounting for identical by descent versus identic...
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Quality Control for Genome-Wide Association Studies
This chapter overviews the quality control (QC) issues for SNP-based genotyping methods used in genome-wide association studies. The main metrics for evaluating the quality of the genotypes are discussed followed by a worked out example of QC pipeline starting with raw data and finishing with a fully filtered dataset ready for downstream analysis. The emphasis is on automation of data storage, filtering, and manipulation to ensure data integrity throughput the process and on how to extract a global summary from these high dimensional datasets to allow better-informed downstream analytical decisions. All examples will be ru...
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Managing Large SNP Datasets with SNPpy
Using relational databases to manage SNP datasets is a very useful technique that has significant advantages over alternative methods, including the ability to leverage the power of relational databases to perform data validation, and the use of the powerful SQL query language to export data. SNPpy is a Python program which uses the PostgreSQL database and the SQLAlchemy Python library to automate SNP data management. This chapter shows how to use SNPpy to store and manage large datasets. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Designing a GWAS: Power, Sample Size, and Data Structure
In this chapter we describe a novel Bayesian approach to designing GWAS studies with the goal of ensuring robust detection of effects of genomic loci associated with trait variation. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Descriptive Statistics of Data: Understanding the Data Set and Phenotypes of Interest
A good understanding of the design of an experiment and the observational data that have been collected as part of the experiment is a key pre-requisite for correct and meaningful preparation of field data for further analysis. In this chapter, I provide a guideline of how an understanding of the field data can be gained, preparation steps that arise as a consequence of the experimental or data structure, and how to fit a linear model to extract data for further analysis. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Genomic Selection in Animal Breeding Programs
Genomic selection can have a major impact on animal breeding programs, especially where traits that are important in the breeding objective are hard to select for otherwise. Genomic selection provides more accurate estimates for breeding value earlier in the life of breeding animals, giving more selection accuracy and allowing lower generation intervals. From sheep to dairy cattle, the rates of genetic improvement could increase from 20 to 100 % and hard-to-measure traits can be improved more effectively. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Incorporating Prior Knowledge to Increase the Power of Genome-Wide Association Studies
Typical methods of analyzing genome-wide single nucleotide variant (SNV) data in cases and controls involve testing each variant’s genotypes separately for phenotype association, and then using a substantial multiple-testing penalty to minimize the rate of false positives. This approach, however, can result in low power for modestly associated SNVs. Furthermore, simply looking at the most associated SNVs may not directly yield biological insights about disease etiology. SNVset methods attempt to address both limitations of the traditional approach by testing biologically meaningful sets of SNVs (e.g., genes or pathwa...
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news