View on GitHub

MendelGameteCompetition.jl

This analysis option implements a gamete competition analysis, which is a generalization of the TDT analysis.

Download this project as a .zip file Download this project as a tar.gz file

Overview

Mendel Gamete Competition is a component of the umbrella OpenMendel project. The gamete competition model is an application of the Bradley-Terry model and can be considered a parametric form of the TDT for use with pedigree data because, besides getting p-values, we also get a measure of the strength of the allelic associations. The Bradley-Terry model was originally applied to problems such as ranking teams in a sports league based on the intra-league win/loss records. In genetics, alleles assume the role of teams, and transmission parameters (the τs) assume the role of the winning propensities 2, 3. As implemented in this version of OpenMendel, the gamete competition is an affected only association analysis and we assume the allele frequencies for the marker are known without error. Let allele i be assigned a segregation parameter τi, then the probability that a heterozygous parent with genotype i=j transmits allele i is the ratio τi/(τij). Because this ratio is invariant when i and j are multiplied by the same constant c, we need to impose the constraint that the most frequent allele k has segregation parameter τk = 1. These propensities replace the normally used Mendelian segregation parameters for heterozygous parents’ transmissions in the Elston-Stewart-Ott representation of the likelihood of the pedigrees. In fact, under the null of no association between the marker and the trait in the gamete competition, Mendelian segregation ratios hold for heterozygous parents’ transmissions so that τi=1 is true for all alleles i and the likelihood reverts to the standard one. Note that the transmissions for homozygous parents always conform to standard Mendelian segregation ratios both under the null or alternative thus, like the TDT, only heterozygous parents are informative. To test whether Mendelian segregation can be rejected, we estimate these τs by maximum likelihood and conduct a likelihood ratio test. P-values are calculated assuming that the likelihood ratio test statistic is asymptotically chi square distributed. The degrees of freedom are equal to the number of alleles minus 1.

Appropriate Problems and Data Sets

The Gamete Competition model applies to pedigrees, including those with missing marker data. With too many marker alleles computational efficiency suffers and large sample statistical assumptions become suspect. We recommend consolidating alleles until at most eight alleles remain and each has a frequency of 0.05 or greater. If the fraction of missing data is large, ethnic stratification may come into play. One remedy is to limit analysis to a single ethnic group; another is to use ethnic-specific allele frequencies.

Installation

Note: The three OpenMendel packages (1) SnpArrays, (2) MendelSearch, and (3) MendelBase must be installed before any other OpenMendel package will run. It is easiest if these three packages are installed in the above order and before any other OpenMendel package.

Within Julia, use the package manager to install MendelGameteCompetition:

pkg> add https://github.com/OpenMendel/MendelGameteCompetition.jl.git

This package supports Julia v1.0+

Input Files

The MendelGameteCompetition analysis package uses the following input files. Example input files can be found in the data subfolder of the MendelGameteCompetition project. (An analysis won’t always need every file type below.)

Control file

The Control file is a text file consisting of keywords and their assigned values. The format of the Control file is:

Keyword = Keyword_Value(s)

Below is an example of a simple Control file to run Gamete Competition:

#
# Input and Output files.
#
locus_file = gamete competition LocusFrame.txt
pedigree_file = gamete competition PedigreeFrame.txt
output_file = gamete competition Output.txt
gamete_competition_output_table = gamete competition Output Table.txt
#
# Analysis parameters for Gamete Competition option.
#
trait = ACE
affected_designator = 1
standard_errors = true

In the example above, there are six keywords. The first three keywords specify the input and output files: gamete competition LocusFrame.txt, gamete competition PedigreeFrame.txt, and gamete competition Output.txt. The last three keywords specify the analysis parameters: disease_status, affected_designator, and standard_errors. The text after the ‘=’ are the keyword values.

Keywords

This is a list of OpenMendel keywords specific to Gamete Competition. A list of OpenMendel keywords common to most analysis package can be found here. The names of keywords are not case sensitive. (The keyword values may be case sensitive.)

Keyword Default Value Allowed Values Short Description
GameteCompetition_output_file GameteCompetition_Output_File.txt User defined output file name Creates a lod score table output file
repetitions      
xlinked_analysis FALSE TRUE, FALSE Whether or not markers are on the X chromosome

Data Files

Gamete Competition requires a Control file, and a Pedigree file. Genotype data can be included in the Pedigree file, in which case a Locus file is required. Alternatively, genotype data can be provided in a SNP data file, in which case a SNP Definition File is required. OpenMendel will also accept PLINK format FAM and BIM files. Details on the format and contents of the Control and data files can be found on the MendelBase documentation page. There are example data files in the Gamete Competition data folder.

Running the Analysis

To run this analysis package, first launch Julia. Then load the package with the command:

 julia> using MendelGameteCompetition

Next, if necessary, change to the directory containing your files, for example,

 julia> cd("~/path/to/data/files/")

Finally, to run the analysis using the parameters in your Control file, for example, Control_file.txt, use the command:

 julia> GameteCompetition("Control_file.txt")

Note: The package is called MendelGameteCompetition but the analysis function is called simply GameteCompetition.

Interpreting the results

There are two forms of output. A table is output to the screen that corresponds to a data frame that can be used in other analyses as desired. For the SNP data provided in the example data files in the Gamete Competition data folder, the results are:

Row Marker LowAllele Low τ HighAllele High τ Pvalue
1 SNP1 1 1.0 2 5.41918 2.80892e-5
2 SNP2 1 1.0 2 5.0539 2.57148e-5
3 SNP3 1 1.0 2 5.23133 3.22822e-6
4 SNP4 1 1.0 2 4.52948 3.96425e-5
5 SNP5 1 1.0 2 4.59438 3.25329e-5
6 SNP6 1 0.654208 2 1.0 0.00527033
7 SNP7 2 1.0 1 8.04062 2.28436e-6
8 ID 1 1.0 2 6.67263 4.60103e-6
9 SNP9 1 1.0 2 7.45069 7.69303e-6
10 CT 1 1.0 2 8.51002 7.94566e-7

For each marker, the allele with the smallest transmission, its corresponding τ, the allele with the largest transmission, and its corresponding τ are provided along with the p-value for the test of association of the marker with the trait. In the example provided, the CT Marker is the most associated with the trait because it has the smallest p-value. The most frequent allele is the 1 allele so it is assigned τ1 = 1. The 2 allele is ~8.51 times more likely to be transmitted from a 1/2 parent than the 1 allele. Details of the analysis are provided in the output text file. In this text file, the iterations of the numeric loglikelihood maximization, the maximum likelihood estimates at the maximum log likelihood, their standard errors and their correlations are provided for each marker (see the example output, gamete competition Output.txt)

Citation

If you use this analysis package in your research, please cite the following reference in the resulting publications:

1. OPENMENDEL: a cooperative programming project for statistical genetics. Zhou H, Sinsheimer JS, Bates DM, Chu BB, German CA, Ji SS, Keys KL, Kim J, Ko S, Mosher GD, Papp JC, Sobel EM, Zhai J, Zhou JJ, Lange K. Hum Genet. 2019 Mar 26. doi: 10.1007/s00439-019-02001-z. [Epub ahead of print] PMID: 30915546

2. Sinsheimer JS, Blangero J, Lange K (2000). Gamete competition models. American Journal of Human Genetics 66:1168-1172.

3. Sinsheimer JS, McKenzie CA, Keavney B, Lange K (2001). SNPs and snails and puppy dogs’ tails: Analysis of SNP data using the gamete competition model. Annals of Human Genetics 65:483-490.

Acknowledgments

This project is supported by the National Institutes of Health under NIGMS awards R01GM053275 and R25GM103774 and NHGRI award R01HG006139.