MLE and REML
Machine information
versioninfo()
Julia Version 0.6.0 Commit 903644385b (2017-06-19 13:05 UTC) Platform Info: OS: macOS (x86_64-apple-darwin13.4.0) CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
Demo data
For demonstration, we generate a random data set.
# generate data from a d-variate response variane component model srand(123) n = 1000 # no. observations d = 2 # dimension of responses m = 2 # no. variance components p = 2 # no. covariates # n-by-p design matrix X = randn(n, p) # p-by-d mean component regression coefficient B = ones(p, d) # a tuple of m covariance matrices V = ntuple(x -> zeros(n, n), m) for i = 1:m-1 Vi = randn(n, 50) copy!(V[i], Vi * Vi') end copy!(V[m], eye(n)) # last covarianec matrix is idendity # a tuple of m d-by-d variance component parameters Σ = ntuple(x -> zeros(d, d), m) for i in 1:m Σi = randn(d, d) copy!(Σ[i], Σi' * Σi) end # form overall nd-by-nd covariance matrix Ω Ω = zeros(n * d, n * d) for i = 1:m Ω += kron(Σ[i], V[i]) end Ωchol = cholfact(Ω) # n-by-d responses Y = X * B + reshape(Ωchol[:L] * randn(n*d), n, d);
Maximum likelihood estimation (MLE)
To find the MLE of parameters $(B,\Sigma_1,\ldots,\Sigma_m)$, we take 3 steps:
Step 1 (Construct data). Construct an instance of VarianceComponentVariate, which consists fields
Y: $n$-by-$d$ responsesX: $n$-by-$p$ covariate matrixV=(V[1],...,V[m]): a tuple of $n$-by-$n$ covariance matrices. The last covariance matrix must be positive definite and usually is the identity matrix.
using VarianceComponentModels vcdata = VarianceComponentVariate(Y, X, V) fieldnames(vcdata)
3-element Array{Symbol,1}:
:Y
:X
:V
In the absence of covariates $X$, we can simply initialize by vcdata = VarianceComponentVariate(Y, V).
Step 2 (Construct a model). Construct an instance of VarianceComponentModel, which consists of fields
B: $n$-by-$p$ mean regression coefficientsΣ=(Σ[1],...,Σ[m]): variane component parameters respectively.
When constructed from a VarianceComponentVariate instance, the mean parameters $B$ are initialized to be zero and the tuple of variance component parameters $\Sigma$ to be (eye(d),...,eye(d)).
vcmodel = VarianceComponentModel(vcdata) fieldnames(vcmodel)
7-element Array{Symbol,1}:
:B
:Σ
:A
:sense
:b
:lb
:ub
vcmodel
VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0 0.0; 0.0 0.0], ([1.0 0.0; 0.0 1.0], [1.0 0.0; 0.0 1.0]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf)
The remaining fields A, sense, b, lb, ub specify (optional) constraints on the mean parameters B:
A is an constraint matrix with $pd$ columns, sense is a vector of charaters taking values '<', '=' or '>', and lb and ub are the lower and upper bounds for vec(B). By default, A, sense, b are empty, lb is -Inf, and ub is Inf. If any constraits are non-trivial, final estimates of B are enforced to satisfy them.
When a better initial guess is available, we can initialize by calling vcmodel=VarianceComponentModel(B0, Σ0) directly.
Step 3 (Fit model). Call optmization routine fit_mle!. The keywork algo dictates the optimization algorithm: :MM (minorization-maximization algorithm) or :FS (Fisher scoring algorithm).
vcmodel_mle = deepcopy(vcmodel) @time logl, vcmodel_mle, Σse, Σcov, Bse, Bcov = fit_mle!(vcmodel_mle, vcdata; algo = :MM);
MM Algorithm
Iter Objective
-------- -------------
0 -6.253551e+03
1 -3.881454e+03
2 -3.853179e+03
3 -3.846525e+03
4 -3.844906e+03
5 -3.844506e+03
6 -3.844406e+03
7 -3.844381e+03
8 -3.844375e+03
9 -3.844374e+03
10 -3.844373e+03
0.290970 seconds (10.45 k allocations: 24.036 MiB, 4.73% gc time)
The output of fit_mle! contains
- final log-likelihood
logl
-3844.3731814180805
- fitted model
fieldnames(vcmodel_mle)
7-element Array{Symbol,1}:
:B
:Σ
:A
:sense
:b
:lb
:ub
vcmodel_mle
VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([1.092 1.04727; 0.955346 1.01632], ([0.380637 -0.305465; -0.305465 4.51938], [1.84009 0.265569; 0.265569 2.17275]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf)
- standard errors of the estimated varianec component parameters
Σse
([0.0765136 0.263047; 0.263047 0.904332], [0.0844292 0.0917441; 0.0917441 0.0996927])
- covariance matrix of the variance component parameters estimates
Σcov
8×8 Array{Float64,2}:
0.00585433 -0.00467019 -0.00467019 … -1.07903e-6 -1.557e-7
-0.00467019 0.0691937 0.00372555 -1.557e-7 -1.27444e-6
-0.00467019 0.00372555 0.0691937 -8.83212e-6 -1.27444e-6
0.00372555 -0.055198 -0.055198 -1.27444e-6 -1.04316e-5
-7.4779e-6 -1.07903e-6 -1.07903e-6 0.00102878 0.000148477
-1.07903e-6 -8.83212e-6 -1.557e-7 … 0.000148477 0.00121477
-1.07903e-6 -1.557e-7 -8.83212e-6 0.00841698 0.00121477
-1.557e-7 -1.27444e-6 -1.27444e-6 0.00121477 0.00993864
- standard errors of the estimated mean parameters
Bse
2×2 Array{Float64,2}:
0.042559 0.0487086
0.0430588 0.049178
- covariance matrix of the mean parameter estimates
Bcov
4×4 Array{Float64,2}:
0.00181127 -1.98035e-5 0.000240705 -2.59506e-6
-1.98035e-5 0.00185406 -2.59506e-6 0.000247285
0.000240705 -2.59506e-6 0.00237252 -2.63542e-5
-2.59506e-6 0.000247285 -2.63542e-5 0.00241848
Restricted maximum likelihood estimation (REML)
REML (restricted maximum likelihood estimation) is a popular alternative to the MLE. To find the REML of a variane component model, we replace the above step 3 by
Step 3. Call optmization routine fit_reml!.
vcmodel_reml = deepcopy(vcmodel) @time logl, vcmodel_reml, Σse, Σcov, Bse, Bcov = fit_reml!(vcmodel_reml, vcdata; algo = :MM);
MM Algorithm
Iter Objective
-------- -------------
0 -4.215053e+03
1 -3.925799e+03
2 -3.865114e+03
3 -3.851105e+03
4 -3.847732e+03
5 -3.846903e+03
6 -3.846698e+03
7 -3.846647e+03
8 -3.846634e+03
9 -3.846631e+03
10 -3.846630e+03
0
The output of fit_reml! contains
- the final log-likelihood at REML estimate
logl
-3844.3777179025096
- REML estimates
fieldnames(vcmodel_reml)
7-element Array{Symbol,1}:
:B
:Σ
:A
:sense
:b
:lb
:ub
vcmodel_reml
VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([1.092 1.04727; 0.955345 1.01632], ([0.380594 -0.305485; -0.305485 4.51994], [1.84285 0.261963; 0.261963 2.17842]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf)
- standard errors of the estimated varianec component parameters
Σse
([0.0765055 0.26305; 0.26305 0.904446], [0.0845559 0.0919325; 0.0919325 0.0999526])
- covariance matrix of the variance component parameters estimates
Σcov
8×8 Array{Float64,2}:
0.0058531 -0.00467005 -0.00467005 … -1.06597e-6 -1.51499e-7
-0.00467005 0.0691951 0.00372613 -1.51499e-7 -1.26041e-6
-0.00467005 0.00372613 0.0691951 -8.86843e-6 -1.26041e-6
0.00372613 -0.0552092 -0.0552092 -1.26041e-6 -1.0486e-5
-7.50035e-6 -1.06597e-6 -1.06597e-6 0.00101633 0.000144472
-1.06597e-6 -8.86843e-6 -1.51499e-7 … 0.000144472 0.0012014
-1.06597e-6 -1.51499e-7 -8.86843e-6 0.00845158 0.0012014
-1.51499e-7 -1.26041e-6 -1.26041e-6 0.0012014 0.00999052
- standard errors of the estimated mean parameters
Bse
2×2 Array{Float64,2}:
0.0425909 0.0487744
0.043091 0.0492444
- covariance matrix of the mean parameter estimates
Bcov
4×4 Array{Float64,2}:
0.00181398 -1.98331e-5 0.000237127 -2.55589e-6
-1.98331e-5 0.00185683 -2.55589e-6 0.000243624
0.000237127 -2.55589e-6 0.00237894 -2.6426e-5
-2.55589e-6 0.000243624 -2.6426e-5 0.00242501
Optimization algorithms
Finding the MLE or REML of variance component models is a non-trivial nonlinear optimization problem. The main complications are the non-convexity of objective function and the positive semi-definiteness constraint of variane component parameters $\Sigma_1,\ldots,\Sigma_m$. In specific applications, users should try different algorithms with different starting points in order to find a better solution. Here are some tips for efficient computation.
In general the optimization algorithm needs to invert the $nd$ by $nd$ overall covariance matrix $\Omega = \Sigma_1 \otimes V_1 + \cdots + \Sigma_m \otimes V_m$ in each iteration. Inverting a matrix is an expensive operation with $O(n^3 d^3)$ floating operations. When there are only two varianec components ($m=2$), this tedious task can be avoided by taking one (generalized) eigendecomposion of $(V_1, V_2)$ and rotating data $(Y, X)$ by the eigen-vectors.
vcdatarot = TwoVarCompVariateRotate(vcdata) fieldnames(vcdatarot)
5-element Array{Symbol,1}:
:Yrot
:Xrot
:eigval
:eigvec
:logdetV2
Two optimization algorithms are implemented: Fisher scoring (mle_fs!) and the minorization-maximization (MM) algorithm (mle_mm!). Both take the rotated data as input. These two functions give finer control of the optimization algorithms. Generally speaking, MM algorithm is more stable while Fisher scoring (if it converges) yields more accurate answer.
vcmodel_mm = deepcopy(vcmodel) @time mle_mm!(vcmodel_mm, vcdatarot; maxiter=10000, funtol=1e-8, verbose = true);
MM Algorithm
Iter Objective
-------- -------------
0 -6.253551e+03
1 -3.881454e+03
2 -3.853179e+03
3 -3.846525e+03
4 -3.844906e+03
5 -3.844506e+03
6 -3.844406e+03
7 -3.844381e+03
8 -3.844375e+03
9 -3.844374e+03
10 -3.844373e+03
0.018754 seconds (9.15 k allocations: 680.172 KiB)
# MM estimates vcmodel_mm.B
2×2 Array{Float64,2}:
1.092 1.04727
0.955346 1.01632
# MM estimates vcmodel_mm.Σ
([0.380637 -0.305465; -0.305465 4.51938], [1.84009 0.265569; 0.265569 2.17275])
Fisher scoring (mle_fs!) uses either Ipopt.jl (keyword solver=:Ipopt) or KNITRO.jl (keyword solver=:Knitro) as the backend solver. Ipopt is open source and installation of Ipopt.jl package alone is sufficient.
# Fisher scoring using Ipopt vcmodel_ipopt = deepcopy(vcmodel) @time mle_fs!(vcmodel_ipopt, vcdatarot; solver=:Ipopt, maxiter=1000, verbose=true);
This is Ipopt version 3.12.4, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).
Number of nonzeros in equality constraint Jacobian...: 0
Number of nonzeros in inequality constraint Jacobian.: 0
Number of nonzeros in Lagrangian Hessian.............: 21
Total number of variables............................: 6
variables with only lower bounds: 0
variables with lower and upper bounds: 0
variables with only upper bounds: 0
Total number of equality constraints.................: 0
Total number of inequality constraints...............: 0
inequality constraints with only lower bounds: 0
inequality constraints with lower and upper bounds: 0
inequality constraints with only upper bounds: 0
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
0 4.2109423e+03 0.00e+00 1.00e+02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0
5 3.8445586e+03 0.00e+00 7.87e-01 -11.0 4.94e-02 - 1.00e+00 1.00e+00f 1 MaxS
10 3.8443870e+03 0.00e+00 2.25e-01 -11.0 1.38e-02 - 1.00e+00 1.00e+00f 1 MaxS
15 3.8443742e+03 0.00e+00 6.23e-02 -11.0 3.78e-03 - 1.00e+00 1.00e+00f 1 MaxS
20 3.8443733e+03 0.00e+00 1.70e-02 -11.0 1.03e-03 - 1.00e+00 1.00e+00f 1 MaxS
25 3.8443732e+03 0.00e+00 4.61e-03 -11.0 2.79e-04 - 1.00e+00 1.00e+00f 1 MaxS
30 3.8443732e+03 0.00e+00 1.25e-03 -11.0 7.56e-05 - 1.00e+00 1.00e+00f 1 MaxS
35 3.8443732e+03 0.00e+00 3.39e-04 -11.0 2.05e-05 - 1.00e+00 1.00e+00f 1 MaxS
40 3.8443732e+03 0.00e+00 9.19e-05 -11.0 5.55e-06 - 1.00e+00 1.00e+00f 1 MaxS
45 3.8443732e+03 0.00e+00 2.49e-05 -11.0 1.51e-06 - 1.00e+00 1.00e+00f 1 MaxS
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
50 3.8443732e+03 0.00e+00 6.76e-06 -11.0 4.08e-07 - 1.00e+00 1.00e+00f 1 MaxSA
55 3.8443732e+03 0.00e+00 1.83e-06 -11.0 1.11e-07 - 1.00e+00 1.00e+00f 1 MaxSA
60 3.8443732e+03 0.00e+00 4.97e-07 -11.0 3.00e-08 - 1.00e+00 1.00e+00f 1 MaxSA
Number of Iterations....: 63
(scaled) (unscaled)
Objective...............: 3.4496886481728075e+02 3.8443731733053696e+03
Dual infeasibility......: 2.2693631692678575e-07 2.5290047242499938e-06
Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00
Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00
Overall NLP error.......: 2.2693631692678575e-07 2.5290047242499938e-06
Number of objective function evaluations = 64
Number of objective gradient evaluations = 64
Number of equality constraint evaluations = 0
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 63
Total CPU secs in IPOPT (w/o function evaluations) = 0.020
Total CPU secs in NLP function evaluations = 0.256
EXIT: Solved To Acceptable Level.
# Ipopt estimates vcmodel_ipopt.B
2×2 Array{Float64,2}:
1.092 1.04727
0.955346 1.01632
# Ipopt estimates vcmodel_ipopt.Σ
([0.380552 -0.305594; -0.305594 4.52106], [1.84008 0.265385; 0.265385 2.17287])
Knitro is a commercial software and users need to follow instructions at KNITRO.jl for proper functioning. Following code invokes Knitro as the backend optimization solver.
using KNITRO # Fisher scoring using Knitro vcmodel_knitro = deepcopy(vcmodel) @time mle_fs!(vcmodel_knitro, vcdatarot; solver=:Knitro, maxiter=1000, verbose=true); # Knitro estimates vcmodel_knitro.B # Knitro estimates vcmodel_knitro.Σ
Starting point
Here are a few strategies for successful optimization.
- For $d>1$ (multivariate response), initialize $B, \Sigma$ from univariate estimates.
- Use REML estimate as starting point for MLE.
- When there are only $m=2$ variance components, pre-compute
TwoVarCompVariateRotateand use it for optimization.
Constrained estimation of B
Many applications invoke constraints on the mean parameters B. For demonstration, we enforce B[1,1]=B[1,2] and all entries of B are within [0, 2].
# set up constraints on B vcmodel_constr = deepcopy(vcmodel) vcmodel_constr.A = [1.0 0.0 -1.0 0.0] vcmodel_constr.sense = '=' vcmodel_constr.b = 0.0 vcmodel_constr.lb = 0.0 vcmodel_constr.ub = 2.0 vcmodel_constr
VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0 0.0; 0.0 0.0], ([1.0 0.0; 0.0 1.0], [1.0 0.0; 0.0 1.0]), [1.0 0.0 -1.0 0.0], '=', 0.0, 0.0, 2.0)
We first try the MM algorithm.
# MM algorithm for constrained estimation of B @time mle_mm!(vcmodel_constr, vcdatarot; maxiter=10000, funtol=1e-8, verbose = true);
MM Algorithm
Iter Objective
-------- -------------
0 -6.253551e+03
1 -3.881820e+03
2 -3.853477e+03
3 -3.846807e+03
4 -3.845184e+03
5 -3.844783e+03
6 -3.844683e+03
7 -3.844658e+03
8 -3.844652e+03
9 -3.844650e+03
10 -3.844650e+03
0.031954 seconds (10.70 k allocations: 781.828 KiB)
fieldnames(vcmodel_constr)
7-element Array{Symbol,1}:
:B
:Σ
:A
:sense
:b
:lb
:ub
vcmodel_constr.B
2×2 Array{Float64,2}:
1.07177 1.07177
0.955683 1.01591
vcmodel_constr.Σ
([0.380624 -0.305498; -0.305498 4.51948], [1.84051 0.265065; 0.265065 2.17336])
Now let's try Fisher scoring.
# Fisher scoring using Ipopt for constrained estimation of B vcmodel_constr = deepcopy(vcmodel) vcmodel_constr.A = [1.0 0.0 -1.0 0.0] vcmodel_constr.sense = '=' vcmodel_constr.b = 0.0 vcmodel_constr.lb = 0.0 vcmodel_constr.ub = 2.0 vcmodel_constr @time mle_fs!(vcmodel_constr, vcdatarot; solver=:Ipopt, maxiter=1000, verbose=true);
This is Ipopt version 3.12.4, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).
Number of nonzeros in equality constraint Jacobian...: 0
Number of nonzeros in inequality constraint Jacobian.: 0
Number of nonzeros in Lagrangian Hessian.............: 21
Total number of variables............................: 6
variables with only lower bounds: 0
variables with lower and upper bounds: 0
variables with only upper bounds: 0
Total number of equality constraints.................: 0
Total number of inequality constraints...............: 0
inequality constraints with only lower bounds: 0
inequality constraints with lower and upper bounds: 0
inequality constraints with only upper bounds: 0
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
0 4.2114270e+03 0.00e+00 1.00e+02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0
5 3.8448353e+03 0.00e+00 7.87e-01 -11.0 4.94e-02 - 1.00e+00 1.00e+00f 1 MaxS
10 3.8446636e+03 0.00e+00 2.25e-01 -11.0 1.38e-02 - 1.00e+00 1.00e+00f 1 MaxS
15 3.8446509e+03 0.00e+00 6.23e-02 -11.0 3.78e-03 - 1.00e+00 1.00e+00f 1 MaxS
20 3.8446499e+03 0.00e+00 1.70e-02 -11.0 1.03e-03 - 1.00e+00 1.00e+00f 1 MaxS
25 3.8446498e+03 0.00e+00 4.61e-03 -11.0 2.79e-04 - 1.00e+00 1.00e+00f 1 MaxS
30 3.8446498e+03 0.00e+00 1.25e-03 -11.0 7.56e-05 - 1.00e+00 1.00e+00f 1 MaxS
35 3.8446498e+03 0.00e+00 3.39e-04 -11.0 2.05e-05 - 1.00e+00 1.00e+00f 1 MaxS
40 3.8446498e+03 0.00e+00 9.19e-05 -11.0 5.56e-06 - 1.00e+00 1.00e+00f 1 MaxS
45 3.8446498e+03 0.00e+00 2.49e-05 -11.0 1.51e-06 - 1.00e+00 1.00e+00f 1 MaxS
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
50 3.8446498e+03 0.00e+00 6.76e-06 -11.0 4.08e-07 - 1.00e+00 1.00e+00f 1 MaxSA
55 3.8446498e+03 0.00e+00 1.83e-06 -11.0 1.11e-07 - 1.00e+00 1.00e+00f 1 MaxSA
60 3.8446498e+03 0.00e+00 4.97e-07 -11.0 3.00e-08 - 1.00e+00 1.00e+00f 1 MaxSA
Number of Iterations....: 63
(scaled) (unscaled)
Objective...............: 3.4484507551949008e+02 3.8446498170293403e+03
Dual infeasibility......: 2.2694405349430929e-07 2.5301808715939735e-06
Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00
Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00
Overall NLP error.......: 2.2694405349430929e-07 2.5301808715939735e-06
Number of objective function evaluations = 64
Number of objective gradient evaluations = 64
Number of equality constraint evaluations = 0
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 63
Total CPU secs in IPOPT (w/o function evaluations) = 0.016
Total CPU secs in NLP function evaluations = 0.417
EXIT: Solved To Acceptable Level.
vcmodel_constr.B
2×2 Array{Float64,2}:
1.07177 1.07177
0.955683 1.01591
vcmodel_constr.Σ
([0.380539 -0.305626; -0.305626 4.52116], [1.8405 0.264881; 0.264881 2.17348])