Heritability analysis and testing SNP association using maximum likelihoods of variance component models¶

Lange Symposium

Juhyun Kim, juhkim111@ucla.edu

Department of Biostatistics, UCLA

Feb 22, 2020

Machine information:

versioninfo()

VarianceComponentModels.jl¶

VarianceComponentModels.jl is a package that resides in OpenMendel ecosystem. It implements computation routines for fitting and testing variance component model of form

$$\text{vec}(Y) \sim \text{Normal}(XB, \Sigma_1 \otimes V_1 + \cdots + \Sigma_m \otimes V_m)$$

where $\otimes$ is the Kronecker product.

In this model, data is represented by

Y: n x d response matrix
X: n x p covariate matrix
V=(V1,...,Vm): a tuple of m n x n covariance matrices

and parameters are

B: p x d mean parameter matrix
Σ=(Σ1,...,Σm): a tuple of m d x d variance components.

Package Features¶

Maximum likelihood estimation (MLE) and restricted maximum likelihood estimation (REML) of mean parameters $B$ and variance component parameters $Σ$
Allow constraints in the mean parameters $B$
Choice of optimization algorithms: Fisher scoring and minorization-maximization algorithm
Heritability analysis in genetics

Heritability Analysis¶

Variance component estimation can be used to estimate heritability of a quantitative trait.

Data files¶

For this analysis, we use a sample data set EUR_subset from SnpArrays.jl. This data set is available in the data folder of the package.

using SnpArrays

┌ Info: Recompiling stale cache file /Users/juhyun-kim/.julia/compiled/v1.2/SnpArrays/iEYce.ji for SnpArrays [4e780e97-f5bf-4111-9dc4-b70aaf691b06]
└ @ Base loading.jl:1240

datapath = normpath(SnpArrays.datadir())

"/Users/juhyun-kim/.julia/packages/SnpArrays/Arv5H/data"

EUR_subset.bed, EUR_subset.bim, and EUR_subset.fam is a set of Plink files in binary format.

using Glob
readdir(glob"EUR_subset.*", datapath)

3-element Array{String,1}:
 "/Users/juhyun-kim/.julia/packages/SnpArrays/Arv5H/data/EUR_subset.bed"
 "/Users/juhyun-kim/.julia/packages/SnpArrays/Arv5H/data/EUR_subset.bim"
 "/Users/juhyun-kim/.julia/packages/SnpArrays/Arv5H/data/EUR_subset.fam"

Read in binary SNP data¶

We use the SnpArrays.jl package to read in binary SNP data and compute the empirical kinship matrix.

# read in genotype data from Plink binary file
const EUR_subset = SnpArray(SnpArrays.datadir("EUR_subset.bed"))

379×54051 SnpArray:
 0x03  0x03  0x03  0x02  0x02  0x03  …  0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x02  0x03  0x02  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x03  0x03  0x03     0x02  0x02  0x02  0x03  0x03  0x02
 0x03  0x03  0x03  0x00  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x00  0x03  0x03     0x02  0x02  0x02  0x03  0x03  0x03
 0x02  0x03  0x03  0x03  0x03  0x03  …  0x03  0x03  0x03  0x03  0x03  0x02
 0x02  0x03  0x03  0x02  0x02  0x03     0x03  0x03  0x02  0x02  0x03  0x03
 0x02  0x03  0x03  0x03  0x02  0x02     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x00  0x02  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x02  0x03  0x03  0x02  0x03  0x02     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x02  0x03  0x03  …  0x03  0x03  0x02  0x02  0x03  0x03
 0x03  0x03  0x03  0x02  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x02
 0x03  0x02  0x03  0x02  0x02  0x03     0x03  0x03  0x03  0x03  0x03  0x03
    ⋮                             ⋮  ⋱     ⋮                             ⋮
 0x03  0x03  0x03  0x00  0x02  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x02  0x02  0x03     0x02  0x02  0x02  0x03  0x02  0x03
 0x03  0x03  0x03  0x02  0x02  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x02  0x03  0x03  0x02  0x03  0x03  …  0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x00  0x00  0x03     0x02  0x02  0x02  0x03  0x03  0x03
 0x02  0x03  0x03  0x03  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x02  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x02  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x02  0x03  0x03  0x03  0x03  0x03  …  0x03  0x03  0x02  0x02  0x03  0x03
 0x03  0x03  0x03  0x00  0x03  0x03     0x03  0x03  0x03  0x03  0x03  0x03
 0x02  0x03  0x03  0x02  0x00  0x02     0x03  0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x02  0x02  0x03     0x03  0x03  0x03  0x03  0x03  0x03

`EUR_subset`¶

EUR_subset contains 379 individuals and 54,051 SNPs. There is no missing genotype in EUR_subset.

Minor allele frequencies (MAF) for each SNP:

maf_EUR = maf(EUR_subset)

54051-element Array{Float64,1}:
 0.09762532981530347
 0.01319261213720313
 0.04485488126649073
 0.48944591029023743
 0.32189973614775724
 0.09102902374670185
 0.3733509234828496 
 0.05277044854881263
 0.0554089709762533 
 0.11345646437994727
 0.20448548812664913
 0.16226912928759896
 0.27176781002638517
 ⋮                  
 0.341688654353562  
 0.13192612137203164
 0.24802110817941958
 0.21240105540897103
 0.12532981530343013
 0.13192612137203164
 0.07387862796833777
 0.07783641160949872
 0.13588390501319259
 0.0554089709762533 
 0.01319261213720313
 0.02638522427440637

Histogram of minor allele frequency:

## generate a histogram of MAF
# using Plots, PyPlot
# gr(size=(600, 500), html_output_format=:png)
# hist_maf = histogram(maf_EUR, xlab = "Minor Allele Frequency (MAF)", 
#                    ylab = "Number of SNPs", label="")
# png(hist_maf, "hist_MAF.png")

image info

Note that about 29% of SNPs are rare variants (MAF < 0.05).

count(!iszero, maf_EUR .< 0.05) / length(maf_EUR)

0.2914839688442397

Empirical kinship matrix¶

For a measure of relatedness, we compute empirical kinship matrix based on all SNPs by the genetic relation matrix (GRM). If there are missing genotypes, they are imputed on the fly by drawing according to the minor allele frequencies.

Kinship coefficients summarize genetic similarity between pairs of individuals. To estimate kinship coefficient $\Phi_{ij}$ between individuals $i$ and $j$ using GRM:

$$\widehat{\Phi}_{GRMij} = \frac{1}{2S} \sum_{k=1}^S \frac{(G_{ik}-2p_k)(G_{jk}-2p_k)}{2p_k(1-p_k)},$$

where

$S$: number of SNPs in this set
$p_k$: minor allele frequency of SNP $k$
$G_{ik} \in \{0,1,2\}$: number of copies of minor alleles at the $k$-th SNP of the $i$-th individual

## GRM using SNPs with maf > 0.01 (default) 
Φgrm = grm(EUR_subset; method = :GRM) # classical genetic relationship matrix
# Φgrm = grm(EUR_subset; method = :MoM) # method of moment method
# Φgrm = grm(EUR_subset; method = :Robust) # robust method

379×379 Array{Float64,2}:
  0.526913     -0.010026     -0.0012793    …   0.00536883    0.00713397 
 -0.010026      0.500049      0.00147092      -0.00178778   -0.00344277 
 -0.0012793     0.00147092    0.521904        -0.0109387    -0.00262695 
 -0.00239381    0.00550462    0.00755985      -0.00265867   -0.000141742
 -0.00391296    0.00422806    0.0222034       -0.0107694    -0.00248895 
 -0.000555581   0.000696874   0.0125771    …  -0.0100831    -0.00575495 
 -0.0095376     0.00231344   -0.00259641      -0.00282701    0.000732385
 -0.00823869    0.00556861    0.0060825       -0.00911662   -0.00638629 
  0.00117402   -0.00444907   -0.0029182       -0.00244795    0.00634087 
 -0.0111617     0.00436269    0.000537307     -0.00483523   -0.00621726 
 -0.00252813   -0.000626719   0.00753937   …  -0.00180836    0.00714953 
  0.0112036    -0.0024306     0.00446458      -0.00983116   -0.00296109 
 -0.000451414   0.00707358   -0.00620136      -0.00473171   -0.00720874 
  ⋮                                        ⋱                            
  0.00299953   -0.0113682    -0.00331268       0.00565771    0.00808339 
 -0.000521854  -0.00646386   -0.00512474       0.00167572    0.00482433 
  0.000970612  -0.00430486   -0.0129335        0.00490171    0.0147362  
 -0.000163565  -0.00179531   -0.0108611    …   0.000459623   0.0036255  
 -0.00668885    0.00617908   -0.00415613       0.0031146    -0.00221311 
  0.00297165   -0.00743712   -0.0137732        0.0058214     0.00386105 
 -0.0024849     0.00320647   -0.0127255       -0.0058388     0.00697399 
 -0.00166949   -0.0086805     0.00108433       0.0173396     0.0188838  
 -0.00412641    0.00783768   -0.00977001   …   0.00661336    0.00828826 
  0.00678741   -0.00598927   -0.00121965      -0.00176491   -0.00611699 
  0.00536883   -0.00178778   -0.0109387        0.492959      0.0131705  
  0.00713397   -0.00344277   -0.00262695       0.0131705     0.512193

Simulating phenotypes¶

We simulate phenotype vector from

$$\mathbf{y} \sim \text{Normal}(\mathbf{1}, 0.1 \widehat{\Phi}_{GRM} + 0.9 \mathbf{I})$$

where $\widehat{\Phi}_{GRM}$ is the estimated empirical kinship matrix Φgrm.

The data should be available in pheno.txt.

## simulate `pheno.txt` 
# using LinearAlgebra, DelimitedFiles
# Random.seed!(1234)
# Ω = 0.1 * Φgrm + 0.9 * Matrix(1.0*I, nobs, nobs)
# Ωchol = cholesky(Symmetric(Ω))
# y = ones(nobs) + Ωchol.L * randn(nobs)
# writedlm("pheno.txt", y)

Phenotypes¶

Read in the phenotype data and plot a histogram.

using DelimitedFiles 
pheno = readdlm("pheno.txt")

379×1 Array{Float64,2}:
  1.846582104608307  
  0.12019614558345848
  0.5172368025545149 
  0.11933401051509984
  1.8407354203053767 
  3.155309404417616  
  1.518422163488851  
  0.737544574135081  
  1.4904102203720164 
  0.4942945743765427 
  0.4566487030521649 
  0.9830094325553045 
  1.1241872723791884 
  ⋮                  
  0.03800817892237962
  0.7685596964598539 
  0.9285816069462199 
 -1.3005655794765896 
  1.27142883079584   
  1.8149274022746835 
  2.353663701577899  
  1.3085170568729798 
  1.2023649250831836 
  2.523945778298307  
  2.339893360260807  
  0.08293644385047372

Histogram of phenotype values:

## generate histogram of phenotype values
#hist_pheno = histogram(pheno, xlab="Phenotype", ylab="Frequency", label="")
#png(hist_pheno, "hist_pheno.png")

image info

Pre-processing data for heritability analysis¶

To prepare variance component model fitting, we form an instance of VarianceComponentVariate. The two covariance matrices are $(2\Phi, I)$.

using VarianceComponentModels, LinearAlgebra
# no. of observations 
nobs = size(pheno, 1)

# form data as VarianceComponentVariate
X = ones(nobs)
EURdata = VarianceComponentVariate(pheno, X, (2Φgrm, Matrix(1.0I, nobs, nobs)))
fieldnames(typeof(EURdata))

┌ Info: Precompiling VarianceComponentModels [813005db-34b4-5f71-be9e-1bbf0a1d8f1c]
└ @ Base loading.jl:1242

(:Y, :X, :V)

EURdata

VarianceComponentVariate{Float64,2,Array{Float64,2},Array{Float64,1},Array{Float64,2}}([1.846582104608307; 0.12019614558345848; … ; 2.339893360260807; 0.08293644385047372], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], ([1.053826221313203 -0.020052039279012966 … 0.010737654900514952 0.014267936018676375; -0.020052039279012966 1.0000975487266066 … -0.0035755533910532874 -0.0068855303347012935; … ; 0.010737654900514952 -0.0035755533910532874 … 0.9859187797469418 0.02634099295946666; 0.014267936018676375 -0.0068855303347012935 … 0.02634099295946666 1.0243852451056223], [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]))

Heritability of single trait¶

Before fitting the variance component model, we pre-compute the eigen-decomposition of $2\Phi_{\text{GRM}}$, the rotated responses, and the constant part in log-likelihood, and store them as a TwoVarCompVariateRotate instance, which is re-used in various variane component estimation procedures.

We use Fisher scoring algorithm to fit variance component model for our trait.

# pre-compute eigen-decomposition 
@time EURdata_rotated = TwoVarCompVariateRotate(EURdata)
fieldnames(typeof(EURdata_rotated))

# form data set for trait 
trait_data = TwoVarCompVariateRotate(EURdata_rotated.Yrot, 
    EURdata_rotated.Xrot, EURdata_rotated.eigval, EURdata_rotated.eigvec, 
    EURdata_rotated.logdetV2)

# initialize model parameters
trait_model = VarianceComponentModel(trait_data)

# estimate variance components
_, _, _, Σcov, = mle_fs!(trait_model, trait_data; solver=:Ipopt, verbose=false)
σ2a = trait_model.Σ[1][1] # additive genetic variance 
σ2e = trait_model.Σ[2][1] # environmental variance 
@show σ2a, σ2e

  1.121293 seconds (2.83 M allocations: 142.585 MiB, 5.98% gc time)

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

(σ2a, σ2e) = (0.5360487493497349, 0.39861050027076955)

(0.5360487493497349, 0.39861050027076955)

Additive genetic variance:

σ2a

0.5360487493497349

Environmental/non-genetic variance:

σ2e

0.39861050027076955

# heritability and its standard error from single trait analysis
h, hse = heritability(trait_model.Σ, Σcov)


[h[1], hse[1]]

2-element Array{Float64,1}:
 0.5735231845909452
 0.2564683288674743

We can also run MM algorithm.

trait_model = VarianceComponentModel(trait_data)
@time _, _, _, Σcov, = mle_mm!(trait_model, trait_data; verbose=false)
σ2a = trait_model.Σ[1][1]
σ2e = trait_model.Σ[2][1]
@show σ2a, σ2e

  1.063961 seconds (135.49 k allocations: 12.157 MiB)
(σ2a, σ2e) = (0.5242689405328554, 0.4101012322722476)

(0.5242689405328554, 0.4101012322722476)

Heritability and its standard error.

h, hse = heritability(trait_model.Σ, Σcov)
[h[1], hse[1]]

2-element Array{Float64,1}:
 0.5610934036549248 
 0.26548354772408295

Multivariate trait analysis¶

For the joint analysis of multiple traits, go to VarianceComponentModels documentation.

Exercise¶

Repeat the above analysis computing the empirical kinship matrix using the method of moment method or the robust method. See SnpArrays.jl documentation.

# Φgrm = grm(EUR_subset; method = )

Testing SNP association using maximum likelihoods of variance component models¶

credit: Heritability tutorial by Sarah Ji, Janet Sinsheimer and Hua Zhou

Suppose we want to see a particular SNP has an effect on a given phenotype after accounting for relatedness among individuals. Here we fit variance component model with a single SNP s as fixed effect.

$$\hspace{5em} \mathbf{y} = \mathbf{X}\mathbf{\beta} + \mathbf{G}_s \gamma + \mathbf{g} + \mathbf{\epsilon} \hspace{5em} (1)$$

\begin{equation} \begin{array}{ll} \mathbf{g} \sim N(\mathbf{0}, \sigma_g^2\mathbf{\Phi}) \\ \mathbf{\epsilon} \sim N(\mathbf{0}, \sigma_e^2\mathbf{I}) \end{array} \end{equation}

where

$\mathbf{y}$: phenotype

and

Fixed effects:
- $\mathbf{X}$: matrix of covariates including intercept
- $\beta$: vector of covariate effects, including intercept
- $\mathbf{G}_s$: genotype of SNP s
- $\gamma$: (scalar) association parameter of interest, measuring the effect of genotype on phenotype
Random effects:
- $\mathbf{g}$: random vector of polygenic effects with $\mathbf{g} \sim N(\mathbf{0}, \sigma_g^2 \mathbf{\Phi})$
  - $\sigma_g^2$: additive genetic variance
  - $\mathbf{\Phi}$: matrix of pairwise measures of genetic relatedness
- $\epsilon$: random vector with $\epsilon \sim N(\mathbf{0}, \sigma_e^2\mathbf{I})$
  - $\sigma_e^2$: non-genetic variance due to non-genetic effects assumed to be acting independently on individuals

To test whether SNP s is associated with phenotype, we fit two models. First consider the model without SNP s as fixed effects (aka null model):

$$\hspace{5em} \mathbf{y} = \mathbf{X}\mathbf{\beta} + \mathbf{g} + \mathbf{\epsilon} \hspace{5em} (2)$$

and the model with SNP s as fixed effects (1). Then we can compare the log likelihood to see if there is improvement in the model fit with inclusion of the SNP of interest.

Fit the null model¶

using VarianceComponentModels
# null data model has two variance components but no SNP fixed effects
# form data as VarianceComponentVariate matrix 
X = ones(nobs)
nulldata = VarianceComponentVariate(pheno, X, (2Φgrm, Matrix(1.0I, nobs, nobs)))

VarianceComponentVariate{Float64,2,Array{Float64,2},Array{Float64,1},Array{Float64,2}}([1.846582104608307; 0.12019614558345848; … ; 2.339893360260807; 0.08293644385047372], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], ([1.053826221313203 -0.020052039279012966 … 0.010737654900514952 0.014267936018676375; -0.020052039279012966 1.0000975487266066 … -0.0035755533910532874 -0.0068855303347012935; … ; 0.010737654900514952 -0.0035755533910532874 … 0.9859187797469418 0.02634099295946666; 0.014267936018676375 -0.0068855303347012935 … 0.02634099295946666 1.0243852451056223], [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]))

nullmodel = VarianceComponentModel(nulldata)

VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0], ([1.0], [1.0]), Array{Float64}(undef,0,1), Char[], Float64[], -Inf, Inf)

@time nulllogl, nullmodel, = fit_mle!(nullmodel, nulldata; algo=:FS)

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  5.6731456e+02 0.00e+00 1.00e+02   0.0 0.00e+00    -  0.00e+00 0.00e+00   0 
   5  5.2391619e+02 0.00e+00 4.25e-03 -11.0 2.02e-03    -  1.00e+00 1.00e+00f  1 MaxS
  10  5.2391619e+02 0.00e+00 1.92e-05 -11.0 1.17e-05    -  1.00e+00 1.00e+00f  1 MaxS
  15  5.2391619e+02 0.00e+00 1.14e-07 -11.0 6.93e-08    -  1.00e+00 1.00e+00f  1 MaxSA

Number of Iterations....: 18

                                   (scaled)                 (unscaled)
Objective...............:   5.1219287461557849e+02    5.2391619033590780e+02
Dual infeasibility......:   5.2488922822599992e-09    5.3690314416598994e-09
Constraint violation....:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   0.0000000000000000e+00    0.0000000000000000e+00
Overall NLP error.......:   5.2488922822599992e-09    5.3690314416598994e-09


Number of objective function evaluations             = 19
Number of objective gradient evaluations             = 19
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 18
Total CPU secs in IPOPT (w/o function evaluations)   =      0.015
Total CPU secs in NLP function evaluations           =      0.131

EXIT: Optimal Solution Found.
  1.072384 seconds (1.71 M allocations: 88.389 MiB, 4.58% gc time)

(-523.9161903359078, VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.9900140588281162], ([0.5360487493497349], [0.39861050027076955]), Array{Float64}(undef,0,1), Char[], Float64[], -Inf, Inf), ([0.2872182098049488], [0.2789237673497682]), [0.08249430004355958 -0.07781584431646911; -0.07781584431646914 0.07779846799258762], [0.03243058347883627], [0.001051742744777768])

The null model log-likelihood (no SNP effects)

nulllogl

-523.9161903359078

The null model mean effects (a grand mean)

nullmodel.B

1×1 Array{Float64,2}:
 0.9900140588281162

The null model additive genetic variance

nullmodel.Σ[1]

1×1 Array{Float64,2}:
 0.5360487493497349

The null model environmental variance

nullmodel.Σ[2]

1×1 Array{Float64,2}:
 0.39861050027076955

Fit the alternative model¶

snp_vec = convert(Vector{Float64}, EUR_subset[:, 10]) 
Xalt = [ones(nobs) snp_vec]
altdata = VarianceComponentVariate(pheno, Xalt, (2Φgrm, Matrix(1.0I, nobs, nobs)))

VarianceComponentVariate{Float64,2,Array{Float64,2},Array{Float64,2},Array{Float64,2}}([1.846582104608307; 0.12019614558345848; … ; 2.339893360260807; 0.08293644385047372], [1.0 3.0; 1.0 3.0; … ; 1.0 2.0; 1.0 3.0], ([1.053826221313203 -0.020052039279012966 … 0.010737654900514952 0.014267936018676375; -0.020052039279012966 1.0000975487266066 … -0.0035755533910532874 -0.0068855303347012935; … ; 0.010737654900514952 -0.0035755533910532874 … 0.9859187797469418 0.02634099295946666; 0.014267936018676375 -0.0068855303347012935 … 0.02634099295946666 1.0243852451056223], [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]))

altmodel = VarianceComponentModel(altdata)

VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0; 0.0], ([1.0], [1.0]), Array{Float64}(undef,0,2), Char[], Float64[], -Inf, Inf)

@time altlogl, altmodel, = fit_mle!(altmodel, altdata; algo=:FS)

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  5.6689443e+02 0.00e+00 1.00e+02   0.0 0.00e+00    -  0.00e+00 0.00e+00   0 
   5  5.2302301e+02 0.00e+00 1.54e-03 -11.0 9.40e-04    -  1.00e+00 1.00e+00f  1 MaxS
  10  5.2302301e+02 0.00e+00 2.88e-06 -11.0 2.06e-06    -  1.00e+00 1.00e+00f  1 MaxSA
  15  5.2302301e+02 0.00e+00 6.40e-09 -11.0 4.58e-09    -  1.00e+00 1.00e+00f  1 MaxS

Number of Iterations....: 15

                                   (scaled)                 (unscaled)
Objective...............:   5.0937186058543728e+02    5.2302300937938026e+02
Dual infeasibility......:   6.4017217632119081e-09    6.5732876919355476e-09
Constraint violation....:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   0.0000000000000000e+00    0.0000000000000000e+00
Overall NLP error.......:   6.4017217632119081e-09    6.5732876919355476e-09


Number of objective function evaluations             = 16
Number of objective gradient evaluations             = 16
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 15
Total CPU secs in IPOPT (w/o function evaluations)   =      0.369
Total CPU secs in NLP function evaluations           =      0.058

EXIT: Optimal Solution Found.
  0.506979 seconds (894.73 k allocations: 49.159 MiB, 3.66% gc time)

(-523.0230093793803, VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.64308909871664; 0.12582254534186557], ([0.5131884765245983], [0.41657713663291]), Array{Float64}(undef,0,2), Char[], Float64[], -Inf, Inf), ([0.2881234472862555], [0.2805079673800254]), [0.08301512087611566 -0.07854568365270753; -0.07854568365270752 0.0786847197636734], [0.26113129412822383; 0.09394055576184406], [0.06818955277308096 -0.02433230943958347; -0.02433230943958347 0.008824828016844134])

The alternative model log-likelihood:

altlogl

-523.0230093793803

The alternative model mean effects (a grand mean)

altmodel.B

2×1 Array{Float64,2}:
 0.64308909871664   
 0.12582254534186557

The alternative model additive genetic variance

altmodel.Σ[1]

1×1 Array{Float64,2}:
 0.5131884765245983

The alternative model environmental variance

altmodel.Σ[2]

1×1 Array{Float64,2}:
 0.41657713663291

Likelihood ratio test¶

We use likelihood ratio test (LRT) to test the goodness-of-fit between two models.

Our likelihood ratio test statistic is 1.786 (distributed chi-squared), with one degree of freedom.

using Distributions
LRT = 2(altlogl - nulllogl)

1.7863619130550887

The associated p-value:

pval = ccdf(Chisq(1), LRT)

0.18137005667849493

We see that adding this SNP as a covariate to the model does not fit significantly better than the null model. In other words, the SNP does not explain more of the variation in our trait.

Exercise¶

Use minorization-maximization algorithm (algo=:MM) to find MLEs of both null model and alternative model. Then conduct the likelihood ratio test.

Table of Contents

Heritability analysis and testing SNP association using maximum likelihoods of variance component models¶

VarianceComponentModels.jl¶

Package Features¶

Heritability Analysis¶

Data files¶

Read in binary SNP data¶

EUR_subset¶

Empirical kinship matrix¶

Simulating phenotypes¶

Phenotypes¶

Pre-processing data for heritability analysis¶

Heritability of single trait¶

Multivariate trait analysis¶

Exercise¶

Testing SNP association using maximum likelihoods of variance component models¶

Fit the null model¶

Fit the alternative model¶

Likelihood ratio test¶

Exercise¶

`EUR_subset`¶