Linear Mixed Effects Model with Presence of High-dimensional Fixed Effects

Published:

Linear mixed effects model (LMM) has been widely used in many fields. As a part of my Ph.D. research, LMM has some fascinating theoretical work that has been done. And as the dimension of available covariates increases nowadays, traditional LMM (p«n) can no longer help solve some problems. For example, when the data is collected from field experiments and design information is available, high-dimensional linear model is clearly incorrect for modeling the relationship between phenotype and omic-features. There is need of developing working and user-friendly algorithm/software for fitting high-dimensional LMM.

Two highly recommended books for LMM.

  • Asymptotic Analysis of Mixed Effects Models Theory, Applications, and Open Problems, JIMING JIANG
  • Mixed Models Theory and Applications with R, EUGENE DEMIDENKO

Introduction

LMM has a long history.

High-dimensional model

\[\mathbf{Y}=\mathbf{1}_N \beta_0 + \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}, \text{ E}(\boldsymbol{\epsilon})=\mathbf{0}_N, \text{ Var}(\boldsymbol{\epsilon})=\sigma^2 \mathbf{I}_{N \times N}.\]

High-dimensional linear models have been well studied. Asymptotic theorems have been proposed and proved. But most of them rely on a critical assumption, which is independence of observations. When samples are correlated, either by design or repeated measurements, existing high-dimensional linear models theory can not be applied directly.

Why REstricted Maximum Likelihood estimation (REML)?

First of all, both MLE and REML are likelihood based approach. Likelihood based inference essentially depends on probability models abstracted from the observed data to capture the underlying system which generated the data. It is super powerful with greater power of detecting the signal from noise. But criticism also comes as usually a probability model is too ideal for real world problem and requires strong assumptions to hold. When assumptions are violated or the model is completely wrong, likelihood inference can lead to rabbit holes.

So why still REML? MLE is known for its inconsistency with respect to the variance components. While REML adjusts the influence of estimating fixed effects, and provides consistent (actually asymptotic normal) estimator. These theoretical properties are very appealing as they not only guarantee the accuracy of estimates when the model is correct, but also naturally leads to statistical inference based on asymptotic distribution.

What the curse?