QTL-mapping and genomic prediction for bovine respiratory disease in U.S. Holsteins using sequence imputation and feature selection
Background: National genetic evaluations for disease resistance do not exist, precluding the genetic improvement of cattle for these traits. We imputed BovineHD genotypes to whole genome sequence for 2703 Holsteins that were cases or controls for Bovine Respiratory Disease and sampled from either California or New Mexico to construct and compare genomic prediction models. The sequence variation reference dataset comprised variants called for 1578 animals from Run 5 of the 1000 Bull Genomes Project, including 450 Holsteins and 29 animals sequenced from this study population. Genotypes for 9,282,726 variants with minor allele frequencies ?5 percent were imputed and used to obtain genomic predictions in GEMMA using a Bayesian Sparse Linear Mixed Model. Results: Variation explained by markers increased from 13.6 percent using BovineHD data to 14.4 percent using imputed whole genome sequence data and the resolution of genomic regions detected as harbouring QTL substantially increased. Explained variation in the analysis of the combined California and New Mexico data was less than when data for each state were separately analysed and the estimated genetic correlation between risk of Bovine Respiratory Disease in California and New Mexico Holsteins was - 0.36. Consequently, genomic predictions trained using the data from one state did not accurately predict disease risk in the other state. To determine if a prediction model could be developed with utility in both states, we selected variants within genomic regions harbouring: 1) genes involved in the normal immune response to infection by pathogens responsible for Bovine Respiratory Disease detected by RNA-Seq analysis, and/or 2) QTL identified in the association analysis of the imputed sequence variants. The model based on QTL selected variants is biased but when trained in one state generated BRD risk predictions with positive accuracies in the other state. Conclusions: We demonstrate the utility of sequence-based and biology-driven model development for genomic selection. Disease phenotypes cannot be routinely recorded in most livestock species and the observed phenotypes may vary in their genomic architecture due to variation in the pathogen composition across environments. Elucidation of trait biology and genetic architecture may guide the development of prediction models with utility across breeds and environments.
This work is licensed under a Creative Commons license: cc-by