Simulating responses
The rvarlmm()
and rvarlmm!()
functions can be used to generate a respone from user-supplied data and parameters. The rand!()
command can be used to overwrite the response in a VarLmmModel object based on the parameters and optional user-supplied distribution.
The rand!(m::WSVarLmmModel; respdist = MvNormal, γωdist = MvNormal, Σγω = [], kwargs...)
function replaces the responses m.data[i].y
with a simulated response based on:
- The data in the model object's data
X, Z, W
matrices. - The parameter values in the model.
- The conditional distribution of the response given the random effects.
- The distribution of the random effects.
- If simulating from MvTDistribution, you must specify the degrees of freedom via
df = x
.
The rvarlmm()
takes arrays of matricies of the data in addition to the reponse. It generates a simulated response from the VarLMM model based on:
Xs
: array of each clustersX
: mean fixed effects covariatesZs
: array of each clustersZ
: random location effects covariatesWs
: array of each clustersW
: within-subject variance fixed effects covariatesβ
: mean fixed effects vectorτ
: within-subject variance fixed effects vectorrespdist
: the distribution for response. Default is MvNormal.Σγ
: random location effects covariance matrix.Σγω
: joint random location and random scale effects covariance matrix (if generating from full model).- If simulating from MvTDistribution, you must specify the degrees of freedom via
df = x
.
The rvarlmm!()
function can be used to generate a simulated response from the VarLMM model based on a dataframe and place the generated response into the datatable with the respname
field.
Note: the dataframe MUST be ordered by grouping variable for it to generate in the correct order. This can be checked via dataframe == sort(dataframe, idvar)
. The response is based on:
meanformula
: represents the formula for the mean fixed effectsβ
(variables in X matrix)reformula
: represents the formula for the mean random effects γ (variables in Z matrix)wsvarformula
: represents the formula for the within-subject variance fixed effects τ (variables in W matrix)idvar
: the id variable for groupings.dataframe
: the dataframe holding all of the data for the model. For this function it must be in order.β
: mean fixed effects vectorτ
: within-subject variance fixed effects vectorrespdist
: the distribution for response. Default is MvNormal.Σγ
: random location effects covariance matrix.Σγω
: joint random location and random scale effects covariance matrix (if generating from full model)respname
: symbol representing the simulated response variable name.- If simulating from MvTDistribution, you must specify the degrees of freedom via
df = x
.
For both functions, only one of the Σγ or Σγω matrices have to be specified in order to use the function. Σγ can be used to specify that the generative model will not include a random scale component. It outputs ys
: an array of reponse y
that match the order of the data arrays (Xs, Zs, Ws
).
We can start by loading the pacakges, data, and fitting a model.
using CSV, DataFrames, Random, WiSER
filepath = normpath(joinpath(dirname(pathof(WiSER)), "../data/"))
df = DataFrame(CSV.File(filepath * "sbp.csv"))
vlmm = WSVarLmmModel(
@formula(sbp ~ 1 + agegroup + gender + bmi_std + meds),
@formula(sbp ~ 1 + bmi_std),
@formula(sbp ~ 1 + agegroup + meds + bmi_std),
:id, df);
WiSER.fit!(vlmm)
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
run = 1, ‖Δβ‖ = 0.037311, ‖Δτ‖ = 0.166678, ‖ΔL‖ = 0.100999, status = Optimal, time(s) = 0.215811
run = 2, ‖Δβ‖ = 0.005220, ‖Δτ‖ = 0.006748, ‖ΔL‖ = 0.048735, status = Optimal, time(s) = 0.083461
Within-subject variance estimation by robust regression (WiSER)
Mean Formula:
sbp ~ 1 + agegroup + gender + bmi_std + meds
Random Effects Formula:
sbp ~ 1 + bmi_std
Within-Subject Variance Formula:
sbp ~ 1 + agegroup + meds + bmi_std
Number of individuals/clusters: 500
Total observations: 5011
Fixed-effects parameters:
───────────────────────────────────────────────────────────
Estimate Std. Error Z Pr(>|Z|)
───────────────────────────────────────────────────────────
β1: (Intercept) 106.308 0.14384 739.07 <1e-99
β2: agegroup 14.9844 0.0633245 236.63 <1e-99
β3: gender: Male 10.0749 0.100279 100.47 <1e-99
β4: bmi_std 0.296424 0.0139071 21.31 <1e-99
β5: meds: OnMeds -10.1107 0.122918 -82.26 <1e-99
τ1: (Intercept) -2.5212 0.393792 -6.40 <1e-09
τ2: agegroup 1.50759 0.135456 11.13 <1e-28
τ3: meds: OnMeds -0.435225 0.0621076 -7.01 <1e-11
τ4: bmi_std 0.0052695 0.0224039 0.24 0.8140
───────────────────────────────────────────────────────────
Random effects covariance matrix Σγ:
"γ1: (Intercept)" 1.00196 0.0181387
"γ2: bmi_std" 0.0181387 0.000549357
Once the model has been fit, we can overwrite the response variable simulating a new response based on the model's current parameters. This is done by calling the rand!()
function on the model object. Here we simulate from a multivariate normal dsitribution for $y$.
yoriginal = copy(vlmm.data[1].y)
Random.seed!(123)
WiSER.rand!(vlmm; respdist = MvNormal)
[yoriginal vlmm.data[1].y]
9×2 Matrix{Float64}:
159.586 163.223
161.849 161.898
160.484 160.667
161.134 165.167
165.443 162.258
160.053 163.019
162.1 162.065
163.153 161.422
166.675 160.552
Other response distributions have been coded. To get a list of available distributions use respdists()
respdists()
6-element Vector{Symbol}:
:MvNormal
:MvTDist
:Gamma
:InverseGaussian
:InverseGamma
:Uniform
WiSER.rand!(vlmm; respdist = InverseGamma)
vlmm.data[1].y
9-element Vector{Float64}:
163.1357991396798
165.1430739396795
162.08532565729797
159.6822122192519
161.57688777672846
164.58912923247945
157.98192168704628
164.12521616207954
163.73003300792996
We can also simulate a response variable from a dataframe and a formula.
If you don't want to overwrite the response variable in the dataframe, you can use the respname
optional keyword argument to specify the desired variable name to save the response variable as.
df = DataFrame(id = [1; 1; 2; 3; 3; 3; 4], y = randn(7),
x2 = randn(7), x3 = randn(7), z2 = randn(7), w2 = randn(7), w3 = randn(7))
f1 = @formula(y ~ 1 + x2 + x3)
f2 = @formula(y ~ 1 + z2)
f3 = @formula(y ~ 1 + w2 + w3)
β = zeros(3)
τ = zeros(3)
Σγ = [1. 0.; 0. 1.]
rvarlmm!(f1, f2, f3, :id, df, β, τ;
Σγ = Σγ, respname = :response)
[df[!, :y] df[!, :response]]
7×2 Matrix{Float64}:
-0.474718 0.0945213
-0.615475 2.06463
-0.577114 3.7559
-1.14809 1.66547
-0.53171 -0.250939
1.26381 -0.400644
-1.32798 -1.67888