View on GitHub

MendelGWAS.jl

This package performs standard Genome-Wide Association Studies.

Download this project as a .zip file Download this project as a tar.gz file

Overview

Mendel GWAS is a component of the umbrella OpenMendel project. This analysis option performs a standard Genome-Wide Association Study (GWAS) to assess common genetic variants in unrelated individuals.

Appropriate Problems and Data Sets

Mendel GWAS analysis input data is unrelated individuals genotyped at a large number of autosomal or X-linked SNPs. Mendel GWAS uses the compressed SNP data files.

Installation

Note: The three OpenMendel packages (1) SnpArrays, (2) MendelSearch, and (3) MendelBase must be installed before any other OpenMendel package will run. It is easiest if these three packages are installed in the above order and before any other OpenMendel package.

Within Julia, use the package manager to install MendelGWAS:

pkg> add https://github.com/OpenMendel/MendelGWAS.jl.git

This package supports Julia v1.0+

Input Files

The MendelGWAS analysis package uses the following input files. Example input files can be found in the data subfolder of the MendelGWAS project.

Control file

The Control file is a text file consisting of keywords and their assigned values. The format of the Control file is:

Keyword = Keyword_Value(s)

Below is an example of a simple Control file to run GWAS:

#
# Input and Output files.
#
plink_input_basename = gwas 1 data
output_file = gwas 1 Output.txt
manhattan_plot_file = gwas 1 Manhattan Plot Output.png
#
# Analysis parameters for GWAS option.
#
regression = linear
regression_formula = Trait ~ Sex

In the example above, there are five keywords. The keyword plink_input_basename tells MendelGWAS that the input data files will comprise three PLINK format data files: gwas 1 data.fam, gwas 1 data.bim, and gwas 1 data.bed. The next two keywords specify the output files: gwas 1 Output.txt the results file - and gwas 1 Manhattan Plot Output.png - a plot of the results. The last two keywords specify analysis parameters. The text after the “=” are the keyword values.

Keywords

This is a list of OpenMendel keywords relevant to GWAS. A list of OpenMendel keywords common to most analysis package can be found here. The names of keywords are not case sensitive. (The keyword values may be case sensitive.)

Keyword Default Value Allowed Values Short Description
affected_designator affected   Under logistic regression, the trait label assigned to cases
distribution   Binomial(), Gamma(), Normal(), Poisson(), etc. Name of distribution function
link   LogitLink(), IdentityLink(), LogLink(), etc. Name of link function
lrt_threshold 5e-8   Threshold for the score test p-value below which a likelihood ratio test is performed
maf_threshold 0.01   Threshold for the minor allele frequency below which SNPs are not analyzed
manhattan_plot_file     Name of file to hold Manhattan plot
regression   Linear, Logistic, or Poisson Type of standard regression to perform; if left blank, the keywords distribution and link must be assigned values
regression_formula     Defines the regression model to analyze under the null hypothesis. See below for a description of the syntax to use

The value for the keyword regression_formula takes the following form: the trait variable is separated from the predictors by “~”. For example, regression_formula = Case_Control ~ Sex + BMI means the trait labeled Case_Control will be modeled as a grand mean (intercept term) along with the covariates of Sex and BMI. Case_Control and BMI must be field names used in the pedigree file. (If you use a PLINK style FAM file, which does not use a header row, then the trait value should be referred to as “Trait”.) Interactions between predictors are represented in the regression_formula using an “&”, for example, Case_Control ~ Sex + BMI + Sex&BMI would add the interaction term to the previous main effects. (A product term, for example, Sex*BMI, is a short cut for main effects plus their interaction.) If the right hand side (after the “~”) is blank, then the model uses only the grand mean. Do not include a term for the SNPs as that is added automatically for each alternative model.

Under logistic regression, the cases are those individuals whose value at the trait field is the same as the label assigned to the keyword affected_designator. The trait field is the field listed on the left hand side (before the “~”) in the regression_formula. The controls are those individuals with non-missing values at the trait that are not cases. Of course individuals with missing values at the trait are neither cases nor controls.

Data Files

GWAS requires a Control file and a Pedigree file. Genotype data is provided in a SNP data file, with a SNP Definition File describing the SNPs. OpenMendel will also accept PLINK format FAM and BIM files. Details on the format and contents of the Control and data files can be found on the MendelBase documentation page. There are example data files in the GWAS data folder.

Running the Analysis

To run this analysis package, first launch Julia. Then load the package with the command:

 julia> using MendelGWAS

Next, if necessary, change to the directory containing your files, for example,

 julia> cd("~/path/to/data/files/")

Finally, to run the analysis using the parameters in your Control file, for example, Control_file.txt, use the command:

 julia> GWAS("Control_file.txt")

Note: The package is called MendelGWAS but the analysis function is called simply GWAS.

Citation

If you use this analysis package in your research, please cite the following reference in the resulting publications:

OPENMENDEL: a cooperative programming project for statistical genetics. Zhou H, Sinsheimer JS, Bates DM, Chu BB, German CA, Ji SS, Keys KL, Kim J, Ko S, Mosher GD, Papp JC, Sobel EM, Zhai J, Zhou JJ, Lange K. Hum Genet. 2019 Mar 26. doi: 10.1007/s00439-019-02001-z. [Epub ahead of print] PMID: 30915546

Acknowledgments

This project is supported by the National Institutes of Health under NIGMS awards R01GM053275 and R25GM103774 and NHGRI award R01HG006139.