MLE and REML
Machine information
versioninfo()
Julia Version 0.6.0 Commit 903644385b (2017-06-19 13:05 UTC) Platform Info: OS: macOS (x86_64-apple-darwin13.4.0) CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
Demo data
For demonstration, we generate a random data set.
# generate data from a d-variate response variane component model srand(123) n = 1000 # no. observations d = 2 # dimension of responses m = 2 # no. variance components p = 2 # no. covariates # n-by-p design matrix X = randn(n, p) # p-by-d mean component regression coefficient B = ones(p, d) # a tuple of m covariance matrices V = ntuple(x -> zeros(n, n), m) for i = 1:m-1 Vi = randn(n, 50) copy!(V[i], Vi * Vi') end copy!(V[m], eye(n)) # last covarianec matrix is idendity # a tuple of m d-by-d variance component parameters Σ = ntuple(x -> zeros(d, d), m) for i in 1:m Σi = randn(d, d) copy!(Σ[i], Σi' * Σi) end # form overall nd-by-nd covariance matrix Ω Ω = zeros(n * d, n * d) for i = 1:m Ω += kron(Σ[i], V[i]) end Ωchol = cholfact(Ω) # n-by-d responses Y = X * B + reshape(Ωchol[:L] * randn(n*d), n, d);
Maximum likelihood estimation (MLE)
To find the MLE of parameters $(B,\Sigma_1,\ldots,\Sigma_m)$, we take 3 steps:
Step 1 (Construct data). Construct an instance of VarianceComponentVariate
, which consists fields
Y
: $n$-by-$d$ responsesX
: $n$-by-$p$ covariate matrixV=(V[1],...,V[m])
: a tuple of $n$-by-$n$ covariance matrices. The last covariance matrix must be positive definite and usually is the identity matrix.
using VarianceComponentModels vcdata = VarianceComponentVariate(Y, X, V) fieldnames(vcdata)
3-element Array{Symbol,1}: :Y :X :V
In the absence of covariates $X$, we can simply initialize by vcdata = VarianceComponentVariate(Y, V)
.
Step 2 (Construct a model). Construct an instance of VarianceComponentModel
, which consists of fields
B
: $n$-by-$p$ mean regression coefficientsΣ=(Σ[1],...,Σ[m])
: variane component parameters respectively.
When constructed from a VarianceComponentVariate
instance, the mean parameters $B$ are initialized to be zero and the tuple of variance component parameters $\Sigma$ to be (eye(d),...,eye(d))
.
vcmodel = VarianceComponentModel(vcdata) fieldnames(vcmodel)
7-element Array{Symbol,1}: :B :Σ :A :sense :b :lb :ub
vcmodel
VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0 0.0; 0.0 0.0], ([1.0 0.0; 0.0 1.0], [1.0 0.0; 0.0 1.0]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf)
The remaining fields A
, sense
, b
, lb
, ub
specify (optional) constraints on the mean parameters B
:
A
is an constraint matrix with $pd$ columns, sense
is a vector of charaters taking values '<'
, '='
or '>'
, and lb
and ub
are the lower and upper bounds for vec(B)
. By default, A
, sense
, b
are empty, lb
is -Inf
, and ub
is Inf
. If any constraits are non-trivial, final estimates of B
are enforced to satisfy them.
When a better initial guess is available, we can initialize by calling vcmodel=VarianceComponentModel(B0, Σ0)
directly.
Step 3 (Fit model). Call optmization routine fit_mle!
. The keywork algo
dictates the optimization algorithm: :MM
(minorization-maximization algorithm) or :FS
(Fisher scoring algorithm).
vcmodel_mle = deepcopy(vcmodel) @time logl, vcmodel_mle, Σse, Σcov, Bse, Bcov = fit_mle!(vcmodel_mle, vcdata; algo = :MM);
MM Algorithm Iter Objective -------- ------------- 0 -6.253551e+03 1 -3.881454e+03 2 -3.853179e+03 3 -3.846525e+03 4 -3.844906e+03 5 -3.844506e+03 6 -3.844406e+03 7 -3.844381e+03 8 -3.844375e+03 9 -3.844374e+03 10 -3.844373e+03 0.290970 seconds (10.45 k allocations: 24.036 MiB, 4.73% gc time)
The output of fit_mle!
contains
- final log-likelihood
logl
-3844.3731814180805
- fitted model
fieldnames(vcmodel_mle)
7-element Array{Symbol,1}: :B :Σ :A :sense :b :lb :ub
vcmodel_mle
VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([1.092 1.04727; 0.955346 1.01632], ([0.380637 -0.305465; -0.305465 4.51938], [1.84009 0.265569; 0.265569 2.17275]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf)
- standard errors of the estimated varianec component parameters
Σse
([0.0765136 0.263047; 0.263047 0.904332], [0.0844292 0.0917441; 0.0917441 0.0996927])
- covariance matrix of the variance component parameters estimates
Σcov
8×8 Array{Float64,2}: 0.00585433 -0.00467019 -0.00467019 … -1.07903e-6 -1.557e-7 -0.00467019 0.0691937 0.00372555 -1.557e-7 -1.27444e-6 -0.00467019 0.00372555 0.0691937 -8.83212e-6 -1.27444e-6 0.00372555 -0.055198 -0.055198 -1.27444e-6 -1.04316e-5 -7.4779e-6 -1.07903e-6 -1.07903e-6 0.00102878 0.000148477 -1.07903e-6 -8.83212e-6 -1.557e-7 … 0.000148477 0.00121477 -1.07903e-6 -1.557e-7 -8.83212e-6 0.00841698 0.00121477 -1.557e-7 -1.27444e-6 -1.27444e-6 0.00121477 0.00993864
- standard errors of the estimated mean parameters
Bse
2×2 Array{Float64,2}: 0.042559 0.0487086 0.0430588 0.049178
- covariance matrix of the mean parameter estimates
Bcov
4×4 Array{Float64,2}: 0.00181127 -1.98035e-5 0.000240705 -2.59506e-6 -1.98035e-5 0.00185406 -2.59506e-6 0.000247285 0.000240705 -2.59506e-6 0.00237252 -2.63542e-5 -2.59506e-6 0.000247285 -2.63542e-5 0.00241848
Restricted maximum likelihood estimation (REML)
REML (restricted maximum likelihood estimation) is a popular alternative to the MLE. To find the REML of a variane component model, we replace the above step 3 by
Step 3. Call optmization routine fit_reml!
.
vcmodel_reml = deepcopy(vcmodel) @time logl, vcmodel_reml, Σse, Σcov, Bse, Bcov = fit_reml!(vcmodel_reml, vcdata; algo = :MM);
MM Algorithm Iter Objective -------- ------------- 0 -4.215053e+03 1 -3.925799e+03 2 -3.865114e+03 3 -3.851105e+03 4 -3.847732e+03 5 -3.846903e+03 6 -3.846698e+03 7 -3.846647e+03 8 -3.846634e+03 9 -3.846631e+03 10 -3.846630e+03 0
The output of fit_reml!
contains
- the final log-likelihood at REML estimate
logl
-3844.3777179025096
- REML estimates
fieldnames(vcmodel_reml)
7-element Array{Symbol,1}: :B :Σ :A :sense :b :lb :ub
vcmodel_reml
VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([1.092 1.04727; 0.955345 1.01632], ([0.380594 -0.305485; -0.305485 4.51994], [1.84285 0.261963; 0.261963 2.17842]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf)
- standard errors of the estimated varianec component parameters
Σse
([0.0765055 0.26305; 0.26305 0.904446], [0.0845559 0.0919325; 0.0919325 0.0999526])
- covariance matrix of the variance component parameters estimates
Σcov
8×8 Array{Float64,2}: 0.0058531 -0.00467005 -0.00467005 … -1.06597e-6 -1.51499e-7 -0.00467005 0.0691951 0.00372613 -1.51499e-7 -1.26041e-6 -0.00467005 0.00372613 0.0691951 -8.86843e-6 -1.26041e-6 0.00372613 -0.0552092 -0.0552092 -1.26041e-6 -1.0486e-5 -7.50035e-6 -1.06597e-6 -1.06597e-6 0.00101633 0.000144472 -1.06597e-6 -8.86843e-6 -1.51499e-7 … 0.000144472 0.0012014 -1.06597e-6 -1.51499e-7 -8.86843e-6 0.00845158 0.0012014 -1.51499e-7 -1.26041e-6 -1.26041e-6 0.0012014 0.00999052
- standard errors of the estimated mean parameters
Bse
2×2 Array{Float64,2}: 0.0425909 0.0487744 0.043091 0.0492444
- covariance matrix of the mean parameter estimates
Bcov
4×4 Array{Float64,2}: 0.00181398 -1.98331e-5 0.000237127 -2.55589e-6 -1.98331e-5 0.00185683 -2.55589e-6 0.000243624 0.000237127 -2.55589e-6 0.00237894 -2.6426e-5 -2.55589e-6 0.000243624 -2.6426e-5 0.00242501
Optimization algorithms
Finding the MLE or REML of variance component models is a non-trivial nonlinear optimization problem. The main complications are the non-convexity of objective function and the positive semi-definiteness constraint of variane component parameters $\Sigma_1,\ldots,\Sigma_m$. In specific applications, users should try different algorithms with different starting points in order to find a better solution. Here are some tips for efficient computation.
In general the optimization algorithm needs to invert the $nd$ by $nd$ overall covariance matrix $\Omega = \Sigma_1 \otimes V_1 + \cdots + \Sigma_m \otimes V_m$ in each iteration. Inverting a matrix is an expensive operation with $O(n^3 d^3)$ floating operations. When there are only two varianec components ($m=2$), this tedious task can be avoided by taking one (generalized) eigendecomposion of $(V_1, V_2)$ and rotating data $(Y, X)$ by the eigen-vectors.
vcdatarot = TwoVarCompVariateRotate(vcdata) fieldnames(vcdatarot)
5-element Array{Symbol,1}: :Yrot :Xrot :eigval :eigvec :logdetV2
Two optimization algorithms are implemented: Fisher scoring (mle_fs!
) and the minorization-maximization (MM) algorithm (mle_mm!
). Both take the rotated data as input. These two functions give finer control of the optimization algorithms. Generally speaking, MM algorithm is more stable while Fisher scoring (if it converges) yields more accurate answer.
vcmodel_mm = deepcopy(vcmodel) @time mle_mm!(vcmodel_mm, vcdatarot; maxiter=10000, funtol=1e-8, verbose = true);
MM Algorithm Iter Objective -------- ------------- 0 -6.253551e+03 1 -3.881454e+03 2 -3.853179e+03 3 -3.846525e+03 4 -3.844906e+03 5 -3.844506e+03 6 -3.844406e+03 7 -3.844381e+03 8 -3.844375e+03 9 -3.844374e+03 10 -3.844373e+03 0.018754 seconds (9.15 k allocations: 680.172 KiB)
# MM estimates vcmodel_mm.B
2×2 Array{Float64,2}: 1.092 1.04727 0.955346 1.01632
# MM estimates vcmodel_mm.Σ
([0.380637 -0.305465; -0.305465 4.51938], [1.84009 0.265569; 0.265569 2.17275])
Fisher scoring (mle_fs!
) uses either Ipopt.jl (keyword solver=:Ipopt
) or KNITRO.jl (keyword solver=:Knitro
) as the backend solver. Ipopt is open source and installation of Ipopt.jl package alone is sufficient.
# Fisher scoring using Ipopt vcmodel_ipopt = deepcopy(vcmodel) @time mle_fs!(vcmodel_ipopt, vcdatarot; solver=:Ipopt, maxiter=1000, verbose=true);
This is Ipopt version 3.12.4, running with linear solver mumps. NOTE: Other linear solvers might be more efficient (see Ipopt documentation). Number of nonzeros in equality constraint Jacobian...: 0 Number of nonzeros in inequality constraint Jacobian.: 0 Number of nonzeros in Lagrangian Hessian.............: 21 Total number of variables............................: 6 variables with only lower bounds: 0 variables with lower and upper bounds: 0 variables with only upper bounds: 0 Total number of equality constraints.................: 0 Total number of inequality constraints...............: 0 inequality constraints with only lower bounds: 0 inequality constraints with lower and upper bounds: 0 inequality constraints with only upper bounds: 0 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 0 4.2109423e+03 0.00e+00 1.00e+02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0 5 3.8445586e+03 0.00e+00 7.87e-01 -11.0 4.94e-02 - 1.00e+00 1.00e+00f 1 MaxS 10 3.8443870e+03 0.00e+00 2.25e-01 -11.0 1.38e-02 - 1.00e+00 1.00e+00f 1 MaxS 15 3.8443742e+03 0.00e+00 6.23e-02 -11.0 3.78e-03 - 1.00e+00 1.00e+00f 1 MaxS 20 3.8443733e+03 0.00e+00 1.70e-02 -11.0 1.03e-03 - 1.00e+00 1.00e+00f 1 MaxS 25 3.8443732e+03 0.00e+00 4.61e-03 -11.0 2.79e-04 - 1.00e+00 1.00e+00f 1 MaxS 30 3.8443732e+03 0.00e+00 1.25e-03 -11.0 7.56e-05 - 1.00e+00 1.00e+00f 1 MaxS 35 3.8443732e+03 0.00e+00 3.39e-04 -11.0 2.05e-05 - 1.00e+00 1.00e+00f 1 MaxS 40 3.8443732e+03 0.00e+00 9.19e-05 -11.0 5.55e-06 - 1.00e+00 1.00e+00f 1 MaxS 45 3.8443732e+03 0.00e+00 2.49e-05 -11.0 1.51e-06 - 1.00e+00 1.00e+00f 1 MaxS iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 50 3.8443732e+03 0.00e+00 6.76e-06 -11.0 4.08e-07 - 1.00e+00 1.00e+00f 1 MaxSA 55 3.8443732e+03 0.00e+00 1.83e-06 -11.0 1.11e-07 - 1.00e+00 1.00e+00f 1 MaxSA 60 3.8443732e+03 0.00e+00 4.97e-07 -11.0 3.00e-08 - 1.00e+00 1.00e+00f 1 MaxSA Number of Iterations....: 63 (scaled) (unscaled) Objective...............: 3.4496886481728075e+02 3.8443731733053696e+03 Dual infeasibility......: 2.2693631692678575e-07 2.5290047242499938e-06 Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00 Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00 Overall NLP error.......: 2.2693631692678575e-07 2.5290047242499938e-06 Number of objective function evaluations = 64 Number of objective gradient evaluations = 64 Number of equality constraint evaluations = 0 Number of inequality constraint evaluations = 0 Number of equality constraint Jacobian evaluations = 0 Number of inequality constraint Jacobian evaluations = 0 Number of Lagrangian Hessian evaluations = 63 Total CPU secs in IPOPT (w/o function evaluations) = 0.020 Total CPU secs in NLP function evaluations = 0.256 EXIT: Solved To Acceptable Level.
# Ipopt estimates vcmodel_ipopt.B
2×2 Array{Float64,2}: 1.092 1.04727 0.955346 1.01632
# Ipopt estimates vcmodel_ipopt.Σ
([0.380552 -0.305594; -0.305594 4.52106], [1.84008 0.265385; 0.265385 2.17287])
Knitro is a commercial software and users need to follow instructions at KNITRO.jl for proper functioning. Following code invokes Knitro as the backend optimization solver.
using KNITRO # Fisher scoring using Knitro vcmodel_knitro = deepcopy(vcmodel) @time mle_fs!(vcmodel_knitro, vcdatarot; solver=:Knitro, maxiter=1000, verbose=true); # Knitro estimates vcmodel_knitro.B # Knitro estimates vcmodel_knitro.Σ
Starting point
Here are a few strategies for successful optimization.
- For $d>1$ (multivariate response), initialize $B, \Sigma$ from univariate estimates.
- Use REML estimate as starting point for MLE.
- When there are only $m=2$ variance components, pre-compute
TwoVarCompVariateRotate
and use it for optimization.
Constrained estimation of B
Many applications invoke constraints on the mean parameters B
. For demonstration, we enforce B[1,1]=B[1,2]
and all entries of B
are within [0, 2].
# set up constraints on B vcmodel_constr = deepcopy(vcmodel) vcmodel_constr.A = [1.0 0.0 -1.0 0.0] vcmodel_constr.sense = '=' vcmodel_constr.b = 0.0 vcmodel_constr.lb = 0.0 vcmodel_constr.ub = 2.0 vcmodel_constr
VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([0.0 0.0; 0.0 0.0], ([1.0 0.0; 0.0 1.0], [1.0 0.0; 0.0 1.0]), [1.0 0.0 -1.0 0.0], '=', 0.0, 0.0, 2.0)
We first try the MM algorithm.
# MM algorithm for constrained estimation of B @time mle_mm!(vcmodel_constr, vcdatarot; maxiter=10000, funtol=1e-8, verbose = true);
MM Algorithm Iter Objective -------- ------------- 0 -6.253551e+03 1 -3.881820e+03 2 -3.853477e+03 3 -3.846807e+03 4 -3.845184e+03 5 -3.844783e+03 6 -3.844683e+03 7 -3.844658e+03 8 -3.844652e+03 9 -3.844650e+03 10 -3.844650e+03 0.031954 seconds (10.70 k allocations: 781.828 KiB)
fieldnames(vcmodel_constr)
7-element Array{Symbol,1}: :B :Σ :A :sense :b :lb :ub
vcmodel_constr.B
2×2 Array{Float64,2}: 1.07177 1.07177 0.955683 1.01591
vcmodel_constr.Σ
([0.380624 -0.305498; -0.305498 4.51948], [1.84051 0.265065; 0.265065 2.17336])
Now let's try Fisher scoring.
# Fisher scoring using Ipopt for constrained estimation of B vcmodel_constr = deepcopy(vcmodel) vcmodel_constr.A = [1.0 0.0 -1.0 0.0] vcmodel_constr.sense = '=' vcmodel_constr.b = 0.0 vcmodel_constr.lb = 0.0 vcmodel_constr.ub = 2.0 vcmodel_constr @time mle_fs!(vcmodel_constr, vcdatarot; solver=:Ipopt, maxiter=1000, verbose=true);
This is Ipopt version 3.12.4, running with linear solver mumps. NOTE: Other linear solvers might be more efficient (see Ipopt documentation). Number of nonzeros in equality constraint Jacobian...: 0 Number of nonzeros in inequality constraint Jacobian.: 0 Number of nonzeros in Lagrangian Hessian.............: 21 Total number of variables............................: 6 variables with only lower bounds: 0 variables with lower and upper bounds: 0 variables with only upper bounds: 0 Total number of equality constraints.................: 0 Total number of inequality constraints...............: 0 inequality constraints with only lower bounds: 0 inequality constraints with lower and upper bounds: 0 inequality constraints with only upper bounds: 0 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 0 4.2114270e+03 0.00e+00 1.00e+02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0 5 3.8448353e+03 0.00e+00 7.87e-01 -11.0 4.94e-02 - 1.00e+00 1.00e+00f 1 MaxS 10 3.8446636e+03 0.00e+00 2.25e-01 -11.0 1.38e-02 - 1.00e+00 1.00e+00f 1 MaxS 15 3.8446509e+03 0.00e+00 6.23e-02 -11.0 3.78e-03 - 1.00e+00 1.00e+00f 1 MaxS 20 3.8446499e+03 0.00e+00 1.70e-02 -11.0 1.03e-03 - 1.00e+00 1.00e+00f 1 MaxS 25 3.8446498e+03 0.00e+00 4.61e-03 -11.0 2.79e-04 - 1.00e+00 1.00e+00f 1 MaxS 30 3.8446498e+03 0.00e+00 1.25e-03 -11.0 7.56e-05 - 1.00e+00 1.00e+00f 1 MaxS 35 3.8446498e+03 0.00e+00 3.39e-04 -11.0 2.05e-05 - 1.00e+00 1.00e+00f 1 MaxS 40 3.8446498e+03 0.00e+00 9.19e-05 -11.0 5.56e-06 - 1.00e+00 1.00e+00f 1 MaxS 45 3.8446498e+03 0.00e+00 2.49e-05 -11.0 1.51e-06 - 1.00e+00 1.00e+00f 1 MaxS iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 50 3.8446498e+03 0.00e+00 6.76e-06 -11.0 4.08e-07 - 1.00e+00 1.00e+00f 1 MaxSA 55 3.8446498e+03 0.00e+00 1.83e-06 -11.0 1.11e-07 - 1.00e+00 1.00e+00f 1 MaxSA 60 3.8446498e+03 0.00e+00 4.97e-07 -11.0 3.00e-08 - 1.00e+00 1.00e+00f 1 MaxSA Number of Iterations....: 63 (scaled) (unscaled) Objective...............: 3.4484507551949008e+02 3.8446498170293403e+03 Dual infeasibility......: 2.2694405349430929e-07 2.5301808715939735e-06 Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00 Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00 Overall NLP error.......: 2.2694405349430929e-07 2.5301808715939735e-06 Number of objective function evaluations = 64 Number of objective gradient evaluations = 64 Number of equality constraint evaluations = 0 Number of inequality constraint evaluations = 0 Number of equality constraint Jacobian evaluations = 0 Number of inequality constraint Jacobian evaluations = 0 Number of Lagrangian Hessian evaluations = 63 Total CPU secs in IPOPT (w/o function evaluations) = 0.016 Total CPU secs in NLP function evaluations = 0.417 EXIT: Solved To Acceptable Level.
vcmodel_constr.B
2×2 Array{Float64,2}: 1.07177 1.07177 0.955683 1.01591
vcmodel_constr.Σ
([0.380539 -0.305626; -0.305626 4.52116], [1.8405 0.264881; 0.264881 2.17348])