View on GitHub

MendelEstimateFrequencies.jl

This analysis option calculates a likelihood-based estimate of allele frequencies.

Download this project as a .zip file Download this project as a tar.gz file

Overview

Mendel Estimate Frequencies is a component of the umbrella OpenMendel project.

Appropriate Problems and Data Sets

The Estimate Frequencies model applies to pedigrees, including those with missing marker data. This option can handle noncodominant markers and markers with more than 2 alleles. With too many marker alleles computational efficiency suffers and large sample statistical assumptions become suspect. We recommend consolidating alleles until at most eight alleles remain and each has a frequency of 0.05 or greater.

Installation

Note: The three OpenMendel packages (1) SnpArrays, (2) MendelSearch, and (3) MendelBase must be installed before any other OpenMendel package will run. It is easiest if these three packages are installed in the above order and before any other OpenMendel package.

Within Julia, use the package manager to install MendelEstimateFrequencies:

 pkg> add https://github.com/OpenMendel/MendelEstimateFrequencies.jl.git

This package supports Julia v1.0+

Input Files

The Mendel EstimateFrequencies analysis package uses the following input files. Example input files can be found in the data subfolder of the Mendel EstimateFrequencies project. (An analysis won’t always need every file type below.)

Control file

The Control file is a text file consisting of keywords and their assigned values. The format of the Control file is:

Keyword = Keyword_Value(s)

Below is an example of a simple Control file to run EstimateFrequencies:

#
# Input and Output files.
#
locus_file = estimate frequencies 2 LocusFrame.txt
pedigree_file = estimate frequencies 2 PedigreeFrame.txt
phenotype_file = estimate frequencies 2 PhenotypeFrame.txt
output_file = estimate frequencies 2 Output.txt
#
# Analysis parameters for Estimate Frequencies option.
#

In the example above, there are three keywords specifying the input files: estimate frequencies 2 LocusFrame.txt, estimate frequencies 2 PedigreeFrame.txt, and estimate frequencies 2 PhenotypeFrame.txt. There is one keyword specifying the standard output file: estimate frequencies 2 Output.txt. There are no analysis parameters specified for this run; all analysis parameters take the default values. The text after the ‘=’ are the keyword values. A list of OpenMendel keywords common to most analysis package can be found here. The names of keywords are not case sensitive. (The keyword values may be case sensitive.)

Data Files

EstimateFrequencies requires a Control file, and a Pedigree file. Genotype data can be included in the Pedigree file, in which case a Locus file is required. Alternatively, genotype data can be provided in a SNP data file, in which case a SNP Definition File is required. OpenMendel will also accept PLINK format FAM and BIM files. Details on the format and contents of the Control and data files can be found on the MendelBase documentation page. There are example data files in the EstimateFrequencies data folder.

Running the Analysis

To run this analysis package, first launch Julia. Then load the package with the command:

 julia> using MendelEstimateFrequencies

Next, if necessary, change to the directory containing your files, for example,

 julia> cd("~/path/to/data/files/")

Finally, to run the analysis using the parameters in your Control file, for example, Control_file.txt, use the command:

 julia> EstimateFrequencies("Control_file.txt")

Note: The package is called MendelEstimateFrequencies but the analysis function is called simply EstimateFrequencies.

Citation

If you use this analysis package in your research, please cite the following reference in the resulting publications:

OPENMENDEL: a cooperative programming project for statistical genetics. Zhou H, Sinsheimer JS, Bates DM, Chu BB, German CA, Ji SS, Keys KL, Kim J, Ko S, Mosher GD, Papp JC, Sobel EM, Zhai J, Zhou JJ, Lange K. Hum Genet. 2019 Mar 26. doi: 10.1007/s00439-019-02001-z. [Epub ahead of print] PMID: 30915546

Acknowledgments

This project is supported by the National Institutes of Health under NIGMS awards R01GM053275 and R25GM103774 and NHGRI award R01HG006139.