MLE and REML
Machine information
versioninfo()
Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i5-6267U CPU @ 2.90GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Demo data
For demonstration, we generate a random data set.
# generate data from a d-variate response variane component model
using Random, LinearAlgebra
Random.seed!(123)
n = 1000 # no. observations
d = 2 # dimension of responses
m = 2 # no. variance components
p = 2 # no. covariates
# n-by-p design matrix
X = randn(n, p)
# p-by-d mean component regression coefficient
B = ones(p, d)
# a tuple of m covariance matrices
V = ntuple(x -> zeros(n, n), m)
for i = 1:m-1
Vi = randn(n, 50)
copyto!(V[i], Vi * Vi')
end
copyto!(V[m], Matrix(I, n, n)) # last covarianec matrix is idendity
# a tuple of m d-by-d variance component parameters
Σ = ntuple(x -> zeros(d, d), m)
for i in 1:m
Σi = randn(d, d)
copyto!(Σ[i], Σi' * Σi)
end
# form overall nd-by-nd covariance matrix Ω
Ω = zeros(n * d, n * d)
for i = 1:m
Ω += kron(Σ[i], V[i])
end
Ωchol = cholesky(Ω)
# n-by-d responses
Y = X * B + reshape(Ωchol.L * randn(n*d), n, d);
Maximum likelihood estimation (MLE)
To find the MLE of parameters $(B,\Sigma_1,\ldots,\Sigma_m)$, we take 3 steps:
Step 1 (Construct data). Construct an instance of VarianceComponentVariate
, which consists fields
Y
: $n$-by-$d$ responsesX
: $n$-by-$p$ covariate matrixV=(V[1],...,V[m])
: a tuple of $n$-by-$n$ covariance matrices. The last covariance matrix must be positive definite and usually is the identity matrix.
using VarianceComponentModels
vcdata = VarianceComponentVariate(Y, X, V)
fieldnames(typeof(vcdata))
(:Y, :X, :V)
In the absence of covariates $X$, we can simply initialize by vcdata = VarianceComponentVariate(Y, V)
.
Step 2 (Construct a model). Construct an instance of VarianceComponentModel
, which consists of fields
B
: $n$-by-$p$ mean regression coefficientsΣ=(Σ[1],...,Σ[m])
: variane component parameters respectively.
When constructed from a VarianceComponentVariate
instance, the mean parameters $B$ are initialized to be zero and the tuple of variance component parameters $\Sigma$ to be (eye(d),...,eye(d))
.
vcmodel = VarianceComponentModel(vcdata)
fieldnames(typeof(vcmodel))
(:B, :Σ, :A, :sense, :b, :lb, :ub)
vcmodel
VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0 0.0; 0.0 0.0], ([1.0 0.0; 0.0 1.0], [1.0 0.0; 0.0 1.0]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf)
The remaining fields A
, sense
, b
, lb
, ub
specify (optional) constraints on the mean parameters B
:
\[A * \text{vec}(B) \,\, =(\text{or } \ge \text{or } \le) \,\, b\]
\[lb \le \text{vec}(B) \le ub\]
A
is an constraint matrix with $pd$ columns, sense
is a vector of charaters taking values '<'
, '='
or '>'
, and lb
and ub
are the lower and upper bounds for vec(B)
. By default, A
, sense
, b
are empty, lb
is -Inf
, and ub
is Inf
. If any constraits are non-trivial, final estimates of B
are enforced to satisfy them.
When a better initial guess is available, we can initialize by calling vcmodel=VarianceComponentModel(B0, Σ0)
directly.
Step 3 (Fit model). Call optmization routine fit_mle!
. The keywork algo
dictates the optimization algorithm: :MM
(minorization-maximization algorithm) or :FS
(Fisher scoring algorithm).
vcmodel_mle = deepcopy(vcmodel)
@time logl, vcmodel_mle, Σse, Σcov, Bse, Bcov = fit_mle!(vcmodel_mle, vcdata; algo = :MM);
MM Algorithm
Iter Objective
-------- -------------
0 -6.253551e+03
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************
1 -3.881454e+03
2 -3.853179e+03
3 -3.846525e+03
4 -3.844906e+03
5 -3.844506e+03
6 -3.844406e+03
7 -3.844381e+03
8 -3.844375e+03
9 -3.844374e+03
10 -3.844373e+03
5.031460 seconds (11.29 M allocations: 568.015 MiB, 4.78% gc time)
The output of fit_mle!
contains
- final log-likelihood
logl
-3844.3731814180887
- fitted model
fieldnames(typeof(vcmodel_mle))
(:B, :Σ, :A, :sense, :b, :lb, :ub)
vcmodel_mle
VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([1.092 1.04727; 0.955346 1.01632], ([0.380637 -0.305465; -0.305465 4.51938], [1.84009 0.265569; 0.265569 2.17275]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf)
- standard errors of the estimated varianec component parameters
Σse
([0.0765136 0.263047; 0.263047 0.904332], [0.0844292 0.0917441; 0.0917441 0.0996927])
- covariance matrix of the variance component parameters estimates
Σcov
8×8 Array{Float64,2}:
0.00585433 -0.00467019 -0.00467019 … -1.07903e-6 -1.557e-7
-0.00467019 0.0691937 0.00372555 -1.557e-7 -1.27444e-6
-0.00467019 0.00372555 0.0691937 -8.83212e-6 -1.27444e-6
0.00372555 -0.055198 -0.055198 -1.27444e-6 -1.04316e-5
-7.4779e-6 -1.07903e-6 -1.07903e-6 0.00102878 0.000148477
-1.07903e-6 -8.83212e-6 -1.557e-7 … 0.000148477 0.00121477
-1.07903e-6 -1.557e-7 -8.83212e-6 0.00841698 0.00121477
-1.557e-7 -1.27444e-6 -1.27444e-6 0.00121477 0.00993864
- standard errors of the estimated mean parameters
Bse
2×2 Array{Float64,2}:
0.0425562 0.0483834
0.0430596 0.0492809
- covariance matrix of the mean parameter estimates
Bcov
4×4 Array{Float64,2}:
0.00181103 -1.96485e-5 0.000243441 -4.38252e-6
-1.96485e-5 0.00185413 -4.38252e-6 0.000246407
0.000243441 -4.38252e-6 0.00234096 -5.73331e-6
-4.38252e-6 0.000246407 -5.73331e-6 0.00242861
Restricted maximum likelihood estimation (REML)
REML (restricted maximum likelihood estimation) is a popular alternative to the MLE. To find the REML of a variane component model, we replace the above step 3 by
Step 3. Call optmization routine fit_reml!
.
vcmodel_reml = deepcopy(vcmodel)
@time logl, vcmodel_reml, Σse, Σcov, Bse, Bcov = fit_reml!(vcmodel_reml, vcdata; algo = :MM);
MM Algorithm
Iter Objective
-------- -------------
0 -4.215053e+03
1 -3.925799e+03
2 -3.865114e+03
3 -3.851105e+03
4 -3.847732e+03
5 -3.846903e+03
6 -3.846698e+03
7 -3.846647e+03
8 -3.846634e+03
9 -3.846631e+03
10 -3.846630e+03
0.726373 seconds (388.90 k allocations: 82.673 MiB, 13.22% gc time)
The output of fit_reml!
contains
- the final log-likelihood at REML estimate
logl
-3844.3777179025055
- REML estimates
fieldnames(typeof(vcmodel_reml))
(:B, :Σ, :A, :sense, :b, :lb, :ub)
vcmodel_reml
VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([1.092 1.04727; 0.955345 1.01632], ([0.380594 -0.305485; -0.305485 4.51994], [1.84285 0.261963; 0.261963 2.17842]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf)
- standard errors of the estimated variance component parameters
Σse
([0.0765055 0.26305; 0.26305 0.904446], [0.0845559 0.0919325; 0.0919325 0.0999526])
- covariance matrix of the variance component parameters estimates
Σcov
8×8 Array{Float64,2}:
0.0058531 -0.00467005 -0.00467005 … -1.06597e-6 -1.51499e-7
-0.00467005 0.0691951 0.00372613 -1.51499e-7 -1.26041e-6
-0.00467005 0.00372613 0.0691951 -8.86843e-6 -1.26041e-6
0.00372613 -0.0552092 -0.0552092 -1.26041e-6 -1.0486e-5
-7.50035e-6 -1.06597e-6 -1.06597e-6 0.00101633 0.000144472
-1.06597e-6 -8.86843e-6 -1.51499e-7 … 0.000144472 0.0012014
-1.06597e-6 -1.51499e-7 -8.86843e-6 0.00845158 0.0012014
-1.51499e-7 -1.26041e-6 -1.26041e-6 0.0012014 0.00999052
- standard errors of the estimated mean parameters
Bse
2×2 Array{Float64,2}:
0.0425881 0.0484485
0.0430919 0.0493475
- covariance matrix of the mean parameter estimates
Bcov
4×4 Array{Float64,2}:
0.00181375 -1.96783e-5 0.000239868 -4.34611e-6
-1.96783e-5 0.00185691 -4.34611e-6 0.000242745
0.000239868 -4.34611e-6 0.00234726 -5.73082e-6
-4.34611e-6 0.000242745 -5.73082e-6 0.00243518
Optimization algorithms
Finding the MLE or REML of variance component models is a non-trivial nonlinear optimization problem. The main complications are the non-convexity of objective function and the positive semi-definiteness constraint of variane component parameters $\Sigma_1,\ldots,\Sigma_m$. In specific applications, users should try different algorithms with different starting points in order to find a better solution. Here are some tips for efficient computation.
In general the optimization algorithm needs to invert the $nd$ by $nd$ overall covariance matrix $\Omega = \Sigma_1 \otimes V_1 + \cdots + \Sigma_m \otimes V_m$ in each iteration. Inverting a matrix is an expensive operation with $O(n^3 d^3)$ floating operations. When there are only two varianec components ($m=2$), this tedious task can be avoided by taking one (generalized) eigendecomposion of $(V_1, V_2)$ and rotating data $(Y, X)$ by the eigen-vectors.
vcdatarot = TwoVarCompVariateRotate(vcdata)
fieldnames(typeof(vcdatarot))
(:Yrot, :Xrot, :eigval, :eigvec, :logdetV2)
Two optimization algorithms are implemented: Fisher scoring (mle_fs!
) and the minorization-maximization (MM) algorithm (mle_mm!
). Both take the rotated data as input. These two functions give finer control of the optimization algorithms. Generally speaking, MM algorithm is more stable while Fisher scoring (if it converges) yields more accurate answer.
vcmodel_mm = deepcopy(vcmodel)
@time mle_mm!(vcmodel_mm, vcdatarot; maxiter=10000, funtol=1e-8, verbose = true);
MM Algorithm
Iter Objective
-------- -------------
0 -6.253551e+03
1 -3.881454e+03
2 -3.853179e+03
3 -3.846525e+03
4 -3.844906e+03
5 -3.844506e+03
6 -3.844406e+03
7 -3.844381e+03
8 -3.844375e+03
9 -3.844374e+03
10 -3.844373e+03
0.055578 seconds (21.91 k allocations: 1.394 MiB)
# MM estimates
vcmodel_mm.B
2×2 Array{Float64,2}:
1.092 1.04727
0.955346 1.01632
# MM estimates
vcmodel_mm.Σ
([0.380637 -0.305465; -0.305465 4.51938], [1.84009 0.265569; 0.265569 2.17275])
Fisher scoring (mle_fs!
) uses either Ipopt.jl (keyword solver=:Ipopt
) or KNITRO.jl (keyword solver=:Knitro
) as the backend solver. Ipopt is open source and installation of Ipopt.jl package alone is sufficient.
# Fisher scoring using Ipopt
vcmodel_ipopt = deepcopy(vcmodel)
@time mle_fs!(vcmodel_ipopt, vcdatarot; solver=:Ipopt, maxiter=1000, verbose=true);
This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).
Number of nonzeros in equality constraint Jacobian...: 0
Number of nonzeros in inequality constraint Jacobian.: 0
Number of nonzeros in Lagrangian Hessian.............: 21
Total number of variables............................: 6
variables with only lower bounds: 0
variables with lower and upper bounds: 0
variables with only upper bounds: 0
Total number of equality constraints.................: 0
Total number of inequality constraints...............: 0
inequality constraints with only lower bounds: 0
inequality constraints with lower and upper bounds: 0
inequality constraints with only upper bounds: 0
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
0 4.2109423e+03 0.00e+00 1.00e+02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0
5 3.8445586e+03 0.00e+00 7.87e-01 -11.0 4.94e-02 - 1.00e+00 1.00e+00f 1 MaxS
10 3.8443870e+03 0.00e+00 2.25e-01 -11.0 1.38e-02 - 1.00e+00 1.00e+00f 1 MaxS
15 3.8443742e+03 0.00e+00 6.23e-02 -11.0 3.78e-03 - 1.00e+00 1.00e+00f 1 MaxS
20 3.8443733e+03 0.00e+00 1.70e-02 -11.0 1.03e-03 - 1.00e+00 1.00e+00f 1 MaxS
25 3.8443732e+03 0.00e+00 4.61e-03 -11.0 2.79e-04 - 1.00e+00 1.00e+00f 1 MaxS
30 3.8443732e+03 0.00e+00 1.25e-03 -11.0 7.56e-05 - 1.00e+00 1.00e+00f 1 MaxS
35 3.8443732e+03 0.00e+00 3.39e-04 -11.0 2.05e-05 - 1.00e+00 1.00e+00f 1 MaxS
40 3.8443732e+03 0.00e+00 9.19e-05 -11.0 5.55e-06 - 1.00e+00 1.00e+00f 1 MaxS
45 3.8443732e+03 0.00e+00 2.49e-05 -11.0 1.51e-06 - 1.00e+00 1.00e+00f 1 MaxS
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
50 3.8443732e+03 0.00e+00 6.76e-06 -11.0 4.08e-07 - 1.00e+00 1.00e+00f 1 MaxSA
55 3.8443732e+03 0.00e+00 1.83e-06 -11.0 1.11e-07 - 1.00e+00 1.00e+00f 1 MaxSA
60 3.8443732e+03 0.00e+00 4.97e-07 -11.0 3.00e-08 - 1.00e+00 1.00e+00h 1 MaxSA
Number of Iterations....: 63
(scaled) (unscaled)
Objective...............: 3.4496886481728791e+02 3.8443731733053728e+03
Dual infeasibility......: 2.2693631660531264e-07 2.5290047206674095e-06
Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00
Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00
Overall NLP error.......: 2.2693631660531264e-07 2.5290047206674095e-06
Number of objective function evaluations = 64
Number of objective gradient evaluations = 64
Number of equality constraint evaluations = 0
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 63
Total CPU secs in IPOPT (w/o function evaluations) = 1.739
Total CPU secs in NLP function evaluations = 0.293
EXIT: Solved To Acceptable Level.
2.745554 seconds (4.30 M allocations: 210.935 MiB, 2.63% gc time)
# Ipopt estimates
vcmodel_ipopt.B
2×2 Array{Float64,2}:
1.092 1.04727
0.955346 1.01632
# Ipopt estimates
vcmodel_ipopt.Σ
([0.380552 -0.305594; -0.305594 4.52106], [1.84008 0.265385; 0.265385 2.17287])
Knitro is a commercial software and users need to follow instructions at KNITRO.jl for proper functioning. Following code invokes Knitro as the backend optimization solver.
using KNITRO
# Fisher scoring using Knitro
vcmodel_knitro = deepcopy(vcmodel)
@time mle_fs!(vcmodel_knitro, vcdatarot; solver=:Knitro, maxiter=1000, verbose=true);
# Knitro estimates
vcmodel_knitro.B
# Knitro estimates
vcmodel_knitro.Σ
Starting point
Here are a few strategies for successful optimization.
- For $d>1$ (multivariate response), initialize $B, \Sigma$ from univariate estimates.
- Use REML estimate as starting point for MLE.
- When there are only $m=2$ variance components, pre-compute
TwoVarCompVariateRotate
and use it for optimization.
Constrained estimation of B
Many applications invoke constraints on the mean parameters B
. For demonstration, we enforce B[1,1]=B[1,2]
and all entries of B
are within [0, 2].
# set up constraints on B
vcmodel_constr = deepcopy(vcmodel)
vcmodel_constr.A = [1.0 0.0 -1.0 0.0]
vcmodel_constr.sense = '='
vcmodel_constr.b = 0.0
vcmodel_constr.lb = 0.0
vcmodel_constr.ub = 2.0
vcmodel_constr
VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0 0.0; 0.0 0.0], ([1.0 0.0; 0.0 1.0], [1.0 0.0; 0.0 1.0]), [1.0 0.0 -1.0 0.0], '=', 0.0, 0.0, 2.0)
We first try the MM algorithm.
# MM algorithm for constrained estimation of B
@time mle_mm!(vcmodel_constr, vcdatarot; maxiter=10000, funtol=1e-8, verbose = true);
MM Algorithm
Iter Objective
-------- -------------
0 -6.253551e+03
1 -3.881820e+03
2 -3.853477e+03
3 -3.846807e+03
4 -3.845184e+03
5 -3.844783e+03
6 -3.844683e+03
7 -3.844658e+03
8 -3.844652e+03
9 -3.844650e+03
10 -3.844650e+03
0.185885 seconds (179.51 k allocations: 9.295 MiB)
fieldnames(typeof(vcmodel_constr))
(:B, :Σ, :A, :sense, :b, :lb, :ub)
vcmodel_constr.B
2×2 Array{Float64,2}:
1.07177 1.07177
0.955683 1.01591
vcmodel_constr.Σ
([0.380624 -0.305498; -0.305498 4.51948], [1.84051 0.265065; 0.265065 2.17336])
Now let's try Fisher scoring.
# Fisher scoring using Ipopt for constrained estimation of B
vcmodel_constr = deepcopy(vcmodel)
vcmodel_constr.A = [1.0 0.0 -1.0 0.0]
vcmodel_constr.sense = '='
vcmodel_constr.b = 0.0
vcmodel_constr.lb = 0.0
vcmodel_constr.ub = 2.0
vcmodel_constr
@time mle_fs!(vcmodel_constr, vcdatarot; solver=:Ipopt, maxiter=1000, verbose=true);
This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).
Number of nonzeros in equality constraint Jacobian...: 0
Number of nonzeros in inequality constraint Jacobian.: 0
Number of nonzeros in Lagrangian Hessian.............: 21
Total number of variables............................: 6
variables with only lower bounds: 0
variables with lower and upper bounds: 0
variables with only upper bounds: 0
Total number of equality constraints.................: 0
Total number of inequality constraints...............: 0
inequality constraints with only lower bounds: 0
inequality constraints with lower and upper bounds: 0
inequality constraints with only upper bounds: 0
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
0 4.2114270e+03 0.00e+00 1.00e+02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0
5 3.8448353e+03 0.00e+00 7.87e-01 -11.0 4.94e-02 - 1.00e+00 1.00e+00f 1 MaxS
10 3.8446636e+03 0.00e+00 2.25e-01 -11.0 1.38e-02 - 1.00e+00 1.00e+00f 1 MaxS
15 3.8446509e+03 0.00e+00 6.23e-02 -11.0 3.78e-03 - 1.00e+00 1.00e+00f 1 MaxS
20 3.8446499e+03 0.00e+00 1.70e-02 -11.0 1.03e-03 - 1.00e+00 1.00e+00f 1 MaxS
25 3.8446498e+03 0.00e+00 4.61e-03 -11.0 2.79e-04 - 1.00e+00 1.00e+00f 1 MaxS
30 3.8446498e+03 0.00e+00 1.25e-03 -11.0 7.56e-05 - 1.00e+00 1.00e+00f 1 MaxS
35 3.8446498e+03 0.00e+00 3.39e-04 -11.0 2.05e-05 - 1.00e+00 1.00e+00f 1 MaxS
40 3.8446498e+03 0.00e+00 9.19e-05 -11.0 5.56e-06 - 1.00e+00 1.00e+00f 1 MaxS
45 3.8446498e+03 0.00e+00 2.49e-05 -11.0 1.51e-06 - 1.00e+00 1.00e+00f 1 MaxS
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
50 3.8446498e+03 0.00e+00 6.76e-06 -11.0 4.08e-07 - 1.00e+00 1.00e+00f 1 MaxSA
55 3.8446498e+03 0.00e+00 1.83e-06 -11.0 1.11e-07 - 1.00e+00 1.00e+00f 1 MaxSA
60 3.8446498e+03 0.00e+00 4.97e-07 -11.0 3.00e-08 - 1.00e+00 1.00e+00f 1 MaxSA
Number of Iterations....: 63
(scaled) (unscaled)
Objective...............: 3.4484507551949685e+02 3.8446498170293398e+03
Dual infeasibility......: 2.2694405475622814e-07 2.5301808856629548e-06
Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00
Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00
Overall NLP error.......: 2.2694405475622814e-07 2.5301808856629548e-06
Number of objective function evaluations = 64
Number of objective gradient evaluations = 64
Number of equality constraint evaluations = 0
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 63
Total CPU secs in IPOPT (w/o function evaluations) = 0.028
Total CPU secs in NLP function evaluations = 0.634
EXIT: Solved To Acceptable Level.
0.760983 seconds (102.63 k allocations: 8.135 MiB)
vcmodel_constr.B
2×2 Array{Float64,2}:
1.07177 1.07177
0.955683 1.01591
vcmodel_constr.Σ
([0.380539 -0.305626; -0.305626 4.52116], [1.8405 0.264881; 0.264881 2.17348])