Ndata quality control in genetic case control association studies pdf

A genomewide association study identifies two novel. These critical steps are paramount to the success of a case. Head and neck cancer, case control, genomewide association study, genetic susceptibility, singlenucleotide polymorphism. Analysis of casecontrol studies of genetic and environmental. This protocol deals with the quality control qc of genotype data from genomewide and candidate gene casecontrol association studies. The steps described involve the identification and. Data quality control in genetic case control association studies free download as pdf file. Introduction data for genome wide association studies gwas demand a fair amount of preprocessing and quality control qc, especially snp genotypes. Due to varied study designs and genotyping platforms between multiple. Pdf data quality control in genetic casecontrol association studies.

Current available robust approaches in this area are mainly based on the optimal trend tests for some specific genetic models, such as recessive, additive. In human casecontrol association studies, one of the chisquare tests typically carried out is based on a 2. Quality control, imputation and analysis of genomewide. Quality control and statistical analysis author andries t. Data quality control in genetic casecontrol association studies carl a. All the examples above were for cohort studies or clinical trials in which we compared either cumulative incidence or incidence rates among two or more exposure groups. Traditionally, association mappingstudies with the case control design have been used to test for diseasemarker association by selecting one affected sibling per sibship, to form the case group, and comparing the alleles or genotype frequencies with a. This protocol details the steps for data quality assessment and control that are typically carried out during case control association studies. In this context we consider the utility of quantitative scores e. Data quality control in genetic casecontrol association. Here we extend these methods and describe a system of qcqa for genotypic data in genome. Statistical analysis of genomewide association gwas data jim stankovich. Several genetic association tests have been proposed see e.

Genetic association is when one or more genotypes within a population cooccur with a phenotypic trait more often than would be expected by chance occurrence studies of genetic association aim to test whether singlelocus alleles or genotype frequencies or more generally, multilocus haplotype frequencies differ between two groups of individuals usually diseased. Basic statistical analysis in genetic casecontrol studies. Data quality control in genetic casecontrol association studies. When case control studies were first developed, most were conducted retrospectively, and it is sometimes assumed that the raredisease assumption applies to all case control studies. In many ways, selection of cases is the easier task, because clinical. Laurie c, mirel d, pugh e, bierut l, bhangale t, boehm f, caporaso n, edenburgh h, gabriel s, harris e, et al. A genomewide association study identifies two novel susceptible regions for squamous. A genetic association case control study compares the frequency of alleles or genotypes at genetic marker loci, usually singlenucleotide polymorphisms. Nestle waters, the worlds leading bottled water company, has built a solid reputation on the quality and purity of its products. Population stratification is a major cause of false positive results in genetic association studies, particularly in case control studies. For psychiatric traits, prs is also significantly associated with casecontrol status. Genome wide association studies, quality control, illumina, r statistics 1. What is a false positive negative association and how can a genomewide study minimize these types of errors.

For example, in a genetic study including subjects from asia and. This article provides a broad outline of the design and analysis of such studies, focusing on casecontrol studies in candidate genes or regions. In the present study, we investigated the associations between genetic risk scores grss and narcolepsy along with their predictive power methods. A popular statistical method is the modelfree pearsons chisquare test. Calculates cochranarmitage trend test p value for different genetic model. Quality control and quality assurance in genotypic data for. Quality control and quality assurance in genotypic data for genomewide association studies. Automated quality control for genome wide association studies sally r. The e ciency of a genetic association study critically hinges on the statistical methods adopted. Nyholt 1 human genetics volume 109, pages 564 565 2001 cite this article. In fact, concern about the effect of population substructure on case control studies is common spielman et al. We formulate the two degrees of freedom associated with a given genotype distribution in terms of two biologically relevant parameters, 1 the probability f that. For analysis of casecontrol genetic association studies, it has recently been shown that geneenvironment independence in the population can be leveraged to increase ef. The fundamental goal of a casecontrol association study.

This protocol describes how to appropriately design a genetic association case control study, either focussing on a candidate gene or region, or implementing a genomewide approach. To identify genetic variants for risk of squamous cell carcinoma of the head and neck. Statistical analysis of genomewide association gwas data. Khoury2 case control studies using parents of case subjects as the control subjects provide an innovative way to study associations of genetic markers with disease risk. Understand the conditions under which population stratification can occur. Anderson 1,2, fredrik h pettersson 1, geraldine m clarke 1, lon r cardon 3, andrew p. Quality control of common and rare variants springerlink. Firstly, we present power calculations quantifying power in a unified framework for a range of scenarios. What is the relationship between genomic coverage and the power of genetic association study.

To locate disease variants and dichotomous trait loci, association studies of genetic markers are often conducted with a case control design. A largescale genomewide association study of asian. However, the test depends on the scores based on the underlying genetic model and thus it may have substantial loss of. This article outlines the design and analysis of genetic association studies, but it focuses specifically on case control studies in candidate genes or regions.

Pdf this protocol details the steps for data quality assessment and control that are typically carried out during casecontrol association studies find, read. Gwas for multiple sclerosis ms data cleaning quality control results. Ancestral heterogeneity among the samples in a case control association studies can induce spurious associations. Efficient study designs for test of genetic association using sibship data and unrelated cases and controls. Genetic casecontrol association studies correcting for multiple testing dale r. Genomic control for association studies request pdf.

While the protocol applies to genotypes after they have been determined called from probe intensity data, it is still important to understand how the genotype calling was conducted. Coleman jonathan coleman is a phd student at the mrc social, genetic and developmental psychiatry centre sgdp, using genomic methods to explore differential response to psychological treatments for anxiety disorders. New models of collaboration in genomewide association studies. A popular method for case control design is pearsons chisquare test. Robust tests for matched casecontrol genetic association. Zondervan1 1 genetic and genomic epidemiology unit, wellcome trust centre for human genetics, roosevelt drive, university of oxford, oxford, united kingdom ox3 7bn. However, in a true case control study we dont measure and compare incidence. The steps described involve the identification and removal of dna samples and markers that introduce bias to the study. Data manipulation and converting format for popular genetic software. This paper provides details on the necessary steps to assess and control data in genome wide association studies gwas using genotype information on a large number of genetic markers for large number of individuals.

Genetic casecontrol association studies correcting for. Suitability of some data quality controls thresholds for. In fact, this is the sine qua non of association based genetic studies. Association studies can focus on candidate genes, a particular genomic region, or adopt a genomewide association approach, each of which has implications for marker selection. Another robust test is the max3 which was also proposed as an efficiency robust test for unmatched genetic association studies 7, 28. One simple and perhaps one of the most natural test, is the singular marker. Highlights identifies rational quality control thresholds for genomewide association study. Yangaccounting for unmeasured population substructure in case control studies of genetic association using a. Automated quality control for genome wide association studies. In genetic association studies, departure from hwe in cases has also been used to test genetic association in the case control design nielsen and others, 1998. Linkage disequilibriumbased quality control for large. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance qcqa have been developed.

Data quality control in genetic case control association studies carl a. Citeseerx document details isaac councill, lee giles, pradeep teregowda. Second, we illustrate commonly used tests of association between snps and phenotypic traits of interest while controlling for potential confounders. A case control study consisting of 903 narcolepsy patients and 1,981 healthy control subjects was performed. Combining identity by descent and association in genetic. Snp qc commonly uses expertguided filters based on qc variables e. The central theme in casecontrol genetic association studies is to e ciently identify genetic markers associated with casecontrol status. Quality control, imputation and analysis of genomewide genotyping data from the illumina humancoreexome microarray jonathan r. Linkage mapping of complex diseases is often followed by association studies between phenotypes and marker genotypes through use of case control or familybased designs. Thorough data quality control qc is a key step to the success of highthroughput genotyping approaches.

Quality control is the system of actions which have the aim to measure the quality of the product manufactured at the company and to approve or disapprove its further production and trade. We studied the processes that can bias the outcomes away from a true representation of. Population stratification an overview sciencedirect topics. Basic statistical analysis in genetic case control studies geraldine m clarke 1, carl a anderson 2, fredrik h pettersson 1, lon r cardon 3, andrew p morris 1, and krina t zondervan 1.

Designing candidate gene and genomewide casecontrol. To detect single nucleotide polymorphisms snps that are associated with a common disease in a case control genomewide association study gwas, powerful yet robust tests are desirable. Samples in genetic casecontrol association analyses. The abundant single nucleotide polymorphisms snps are the markers of choice in genetic case control association studies. Be familiar with the methods used to address population stratification. Read the quality control case study to see how nestle waters used infinityqs to achieve realtime visibility over production processes across 26 factories. Powerful statistical methods are critical to accomplishing this goal. Association tests through combining pvalues for case control.

Statistical methods to test for association in case control gwa studies allele counting chisquare test logistic regression multiple testing and power example. We describe disease models, measures of association and testing at genotypic individual versus allelic. Robust statistical tests of genetic association for the. Practice of epidemiology on information coded in gene. Data quality control in genetic case control association studies. A fundamental issue in a genetic association study is to e ciently identify associated genetic markers typically single neucleotide polymorphisms, or snps. Aug 26, 2010 this protocol details the steps for data quality assessment and control that are typically carried out during case control association studies. Analysis of case control studies of genetic and environmental factors with missing genetic information and haplotypephase ambiguity christine spinka,1 raymond j. However, using departure from hwe in cases as a test statistic has lower power for the additive model and no power at all for the multiplicative model nielsen and others, 1998. Genomic control, a new approach to genetic based association studies. Case control association studies use genetic markers as putative etiologic risk factors. Robust trend tests for genetic association in casecontrol. There is no followup period in case control studies.

In genetics, a genomewide association study gwa study, or gwas, also known as whole genome association study wga study, or wgas, is an observational study of a genomewide set of genetic variants in different individuals to see if any variant is associated with a trait. Even in this era of genomewide studies, case control studies still form the majority of published reports. Association between genetic risk scores and risk of. Genetic casecontrol association studies in neuropsychiatry. Genetic association analysis has been performed in a case control setting for identification of the genetic determinants of a certain phenotype. Case control studies use subjects who already have a disease, trait or other condition and determine if there are characteristics of these patients that differ from those who do not have the disease or trait. We formulate the two degrees of freedom associated with a given genotype distribution in terms of two biologically relevant parameters, 1 the probability f that an individuals two. Quality control procedures for genome wide association studies. Common statistical issues in genomewide association. Genomic control for association studies 999 independent, but that assumnption is false if there is population substructure or related individuals within one or both of the samples. The approach is controversial and has tended to produce associations in neuropsychiatry that do not stand the test of time.

Describe what is meant by population stratification. Nestle waters quality control case study infinityqs. Collaborators in ophgs human genome epidemiology network hugenet which helps to translate genetic research findings into opportunities for preventive medicine and public health by advancing the synthesis, interpretation, and dissemination of populationbased data on human genetic variation in health and disease. Analogy to the unmatched counterpart, zheng and tian proposed the max3 statistic for matched case control association study which is defined as. Manhattan plot and qq plot for the genomewide association study results. Dec 30, 2005 we studied a trend test for genetic association between disease and the number of risk alleles using case control data. How to deal with the early gwas data when imputing and. Donnelly p, faraone sv, frazer k, gabriel s, et al. Discuss how population stratification may affect the interpretation of case control genetic association studies.

Combining identity by descent and association in genetic case. These critical steps are paramount to the success of a casecontrol study and are necessary before statistically testing for association. Association studies are subdivided in to two types of. In genetic case control studies, the frequency of alleles or genotypes is compared between the cases and controls. Robust statistical tests of genetic association for the case. Practice of epidemiology robust estimation for secondary. The strategy for marker selection will affect the statistical power of the study to detect a disease association and is a crucial element of study design.

Teoa,b introduction genomewide association study gwas is increasingly common as an experimental design for investigating the genetic basis of common diseases and complex traits in humans. This protocol details the data quality assessment and control steps that are typically carried out during case control association studies. When a certain organization produces some products, no matter of what kind and price, they are obliged to be checked on the. Li m, boehnke m and abecasis gr am j hum genet 2006 78.

Genetic model selection in twophase analysis for case. In addition to outlining the published ideas on this method, we describe several extensions. The steps described involve the identification and removal of dna samples and markers that introduce bias. Quality control, imputation and analysis of genomewide genotyping. Pdf basic statistical analysis in genetic casecontrol. Marker selection for genetic casecontrol association. Analysis for population association studies generally involves preliminary analyses such as quality control, hardyweinberg equilibrium tests, examination of linkage disequilibrium and recombination, to be followed by tests of association for single andor multiple snps, both may involve case control or binary phenotype or continuous outcomes.

The transition from genetic linkage analyses to association studies risch and merikangas, 1996. The simulated data used here have passed standard quality control. Common statistical issues in genomewide association studies. Request pdf case control genetic association studies in gastrointestinal disease. Given these limitations, the case control study remains the mainstay of genetic association studies, and the most important issues relate to choice of the two study groups. Significant differences in allele or genotype frequencies between cases affected individuals and controls unaffected subjects are taken as evidence for its involvement in the phenotypic trait under study. Hardyweinberg equilibrium, missing proportion msp and minor allele frequency maf to.

Subsequent analyses such as genomewide association studies rely. Quality control qc is thus a critical step in largescale studies of genetic variation. Method for the study of associations between disease and genetic markers w. Genomic control, a new approach to geneticbased association. Marker selection for genetic casecontrol association studies.

It is our intention to use dbgap data to conduct secondary analysis of the influence of admixture on the outcome of data quality control qc in genetic association studies to inform future studies of the optimal qc metric for the genetic association analysis of admixed population. Weiss, in clinical and translational science second edition, 2017. Dai, m2c200, vaccine and infectious disease division, fred hutchinson cancer research center. These critical steps are paramount to the success of a case control study and are necessary before statistically testing for association.

When the data are sampled from families, this trend test can be adjusted to take into account the correlations among family members in complex pedigrees. Genetic association studies are used to find candidate genes or genome regions that contribute to a specific disease by testing for a correlation between disease status and genetic variation. To identify genetic factors influencing quantitative traits of biomedical importance, we conducted a genomewide association study in 8,842 samples from populationbased cohorts recruited in korea. However, it actually only applies to those case control studies in which controls are sampled only from the nondiseased rather than the whole population. Recently, we performed the first genomewide association study of response. Review and recommendations as our knowledge of genetic variation grows, our ability to use this information to.

Genomewide association studies gwas are routinely conducted for both quantitative and binary disease traits. Following extensive research several criteria and thresholds have been established for data qc at the sample and variant level. Efficient study designs for test of genetic association using. Quality control for genome wide association studies.

Marees, hilde kluiver, sven stringer, florence vorspan, emmanuel curis, cynthia marieclaire, eske m. Quality control and quality assurance in genotypic data. Biases in study design and errors in genotype calling have the potential to introduce systematic biases into genetic casecontrol association. Casecontrol association studies use genetic markers as putative etiologic risk factors. Anderson1,2, fredrik h pettersson1, geraldine m clarke1, lon r cardon3, andrew p. It can be thought of as a specific type of confounding by race or ethnicity. For each study design our goal is to achieve control similar to that obtained for a familybased study, but with the convenience found in a populationbased. Genomewide association study of 14,000 cases of seven. Hence, for case control studies, test statistics are generally inflated relative to expectation under the assumption of an in. The quality control qc filtering of single nucleotide polymorphisms snps is an important step in genomewide association studies to minimize potential false findings. The fundamental goal of a case control association study is to test for an allelic frequency difference between cases and controls to find snps that affect disease susceptibility. A powerful allele based test for casecontrol association studies. Biases in study design and errors in genotype calling have the potential to introduce systematic biases into genetic case control association studies, leading to an increase in the number of falsepositive and falsenegative associations see box 1 for a glossary of terms.