Mixed Model vs The World

Last updated on May 4, 2022 8 min read R Tutorial

At one of my first lab meetings as a postdoc, I started advertising multilevel/mixed models to my colleagues to convince them to give up multiple testing. On that occasion, my supervisor struck me with the question:

Why should we do a mixed models when we can run a separate regression for each subject?

My first untold answer - written on my face - was: why would you run a separate regression for each subject?

But I soon understood that skepticism does not promote change, so I engineered a comparison between mixed model and standard linear regression that allowed the visual inspection of the difference between the two.

I obtained two plots that I love very much and that nobody in my lab has really cared about or understood since then (very sadly, they always get skimmed off paper drafts).

But making them was epic nonetheless. Ladies and gentlemen, here you have

Method

As I could not be satisfied with a twofold comparison, I fitted 3 linear regressions:

Non-pooled standard linear repression (N-SLR): if one wants to obtain individual coefficients (as the mixed model does) from a standard linear regression, this is the most straight-forward way of doing it. I entered subject as predictor, and I excluded the intercept from the equation
Individual standard linear repression (O-SLR): this is the one my supervisor suggested. Thanks to a for loop (may the Gods of real programmers forgive me), I run as many regressions as the number of subjects. Each regression includes one individual subject’s data only
Multilevel/mixed model (MM): I entered subject in the random effect

I am going to use the sleepstudy dataset, and excluding some observations at random, thus leaving a number of subjects with an incomplete record. This is a typical situation where mixed modelling is recommended for robustly handling missing data.

The way it does that is to weight each subject’s individual coefficient by the number of observations when calculating the average coefficient - assuming that a the individual coefficients of a subject with missing data will be less reliable. This sounds clear enough, however, when I provide this explanation in real life, this is what most people hear:

That’s why we are going on with this demonstration.

The Example

I am going to use the continuous variable Days as unique predictor; its coefficient represents the standardized effect size of the effect of Time; its standard error represents the precision of the coefficient.

For each regression type, the coefficient and the standard error correspond to:

N/I-SLR and I-SLR: the intercept corresponds to the individual coefficient. The standard error corresponds to the square root of the sample variance, or mean squared error (MSE).
MM: the sum of the fixed and random effect corresponds to the individual coefficient. The standard error corresponds to the joint MSE of the variances of the fixed and the random effect.

Adding missing values to the dataset

# install.packages(c("lme4", "dplyr", "knitr", "ggplot2"))
library(lme4) #initialise the libraries containing the functions that we need
library(dplyr)

data("sleepstudy") # initialise the dataset

set.seed(7) #setting the seed to we sample the same row numbers
missing <- sample_n(sleepstudy, 18, weight = Reaction) %>%
  arrange(Subject)
sleepstudy <- sleepstudy %>%
  anti_join(missing, by = c("Reaction", "Days", "Subject"))
knitr::kable(head(sleepstudy))

Reaction	Days	Subject
249.5600	0	308
250.8006	2	308
356.8519	4	308
414.6901	5	308
290.1486	7	308
430.5853	8	308

As shown above, some of the data is missing (for subject 308, Day 1, 3 and 6 are missing).

N-SLR

First, we run the standard linear regression, assuming Subject as predictor.

lm_nslr <-lm(Reaction ~ Days + Subject - 1, data=sleepstudy)
coef <- summary(lm_nslr)$coefficients
coef.lm_nslr <- rbind(data.frame(Subject=gsub(pattern="Subject", 
                                              replacement="",
                                              x=rownames(coef[-1,])), 
                           Intercept=coef[-1,1],
                           St.er.int=coef[-1,2],
                           Method="n-slr"))
rownames(coef.lm_nslr) <- NULL
knitr::kable(head(coef.lm_nslr))

Subject	Intercept	St.er.int	Method
308	290.0047	13.10764	n-slr
309	170.5242	11.62282	n-slr
310	187.2806	10.47325	n-slr
330	263.0426	10.85180	n-slr
331	263.2829	11.12269	n-slr
332	263.5814	10.47325	n-slr

And obtain individual coefficients and standard errors for each subject.

I-SLR

Then, we iteratively subset the dataset inside a for loop to run individual linear regressions for each of them.

coef.lm_islr <- data.frame()
ids <- unique(sleepstudy$Subject)
for (i in 1:length(ids)) 
{
  d.s = subset(sleepstudy, Subject==ids[i])
  if (is.nan(mean(d.s$Days, na.rm=T))) 
  {
    coef.lm_islr = rbind(coef.lm_islr, data.frame(Subject=(ids[i]), 
    Intercept=NA, St.er.int=NA, Method="i-slr"))
  }
  else
  {
    lm_islr = lm(Reaction ~ Days, data=d.s)
   coef = summary(lm_islr)$coefficients
    coef.lm_islr = rbind(coef.lm_islr, data.frame(Subject=(ids[i]),
    Intercept=coef[1,1], St.er.int=coef[1,2], Method="i-slr"))
  }
}
knitr::kable(head(coef.lm_islr))

Subject	Intercept	St.er.int	Method
308	251.4847	49.133461	i-slr
309	207.0660	6.378284	i-slr
310	203.4842	7.241315	i-slr
330	288.6689	14.061765	i-slr
331	284.7526	18.185512	i-slr
332	264.2516	35.792017	i-slr

Just as above, we obtain individual coefficients and standard errors for each subject.

Mixed Model

When running a mixed model, we do not need to trick the formula but simply add subject as random effect, and extract the individual coefficients.

lm_mm <- lmer(Reaction ~ Days + (1|Subject), data=sleepstudy)
coef <- coef(lm_mm)$Subject$'(Intercept)'
err <- ranef(lm_mm)$Subject$'(Intercept)'
coef.mm <- rbind(data.frame(Subject=rownames(coef(lm_mm)$Subject),
Intercept=coef, St.er.int=err, Method="mm"))
knitr::kable(head(coef.mm))

Subject	Intercept	St.er.int	Method
308	286.1487	32.914509	mm
309	177.4405	-75.793661	mm
310	191.7955	-61.438648	mm
330	262.3964	9.162275	mm
331	262.6362	9.402012	mm
332	262.9750	9.740829	mm

Lastly, we combine the three sets of individual coefficients, each calculated with a different method, into the same dataset.

all_coef <- rbind(coef.lm_nslr, coef.lm_islr, coef.mm)

Plotting the Individual Coefficients

To get a grasp of the reliability of the 3 sets of individual coefficients, we are going to plot them against the real values, the actual measurements of Reaction. For this example, we are going to only focus on the Intercept, so we are plotting it against the baseline Reaction at Day 0 (of whom the Intercept is an estimate).

bs_data <- sleepstudy %>%
  filter(Days==0) %>%
  left_join(all_coef, by="Subject")

gp_data <- sleepstudy %>%
  group_by(Subject) %>%
  summarise(N=n())

m.er.int <- all_coef %>%
  group_by(Method) %>%
  summarise(m.er.int=mean(St.er.int))

bs_data_p <- bs_data %>%
  left_join(gp_data, by="Subject") %>%
  left_join(m.er.int, by="Method") %>%
  mutate(er.st.int=ifelse(St.er.int>1.9+m.er.int, "above 2 std. er",
                          ifelse(St.er.int>m.er.int & St.er.int<1.9+m.er.int, "within 1 std. err", 
                                 ifelse(St.er.int<m.er.int & St.er.int>-1.9+m.er.int, "within 1 std. err",
                                        ifelse(St.er.int<-1.9+m.er.int, "below 2 std. er",))))) %>%
  mutate(er.st.int=factor(er.st.int, levels = c("above 2 std. er", "within 1 std. err", "below 2 std. er"))) %>%
  mutate(Method=factor(Method, levels = c("n-slr", "i-slr", "mm")))
knitr::kable(head(bs_data_p))

Reaction	Subject	Intercept	St.er.int	Method	N	m.er.int	er.st.int
249.5600	308	290.0047	13.107638	n-slr	6	11.01682	within 1 std. err
249.5600	308	251.4847	49.133461	i-slr	6	15.23288	above 2 std. er
249.5600	308	286.1487	32.914509	mm	6	0.00000	above 2 std. er
222.7339	309	170.5242	11.622820	n-slr	8	11.01682	within 1 std. err
222.7339	309	207.0660	6.378284	i-slr	8	15.23288	within 1 std. err
222.7339	309	177.4405	-75.793661	mm	8	0.00000	below 2 std. er

The plots below present the individual coefficients as individual points; the error bars represent the standard errors. At a glance, the 3 flaking plots illustrate the core difference between standard regression and mixed models.

In non-pooled standard regression, despite the fact that there are individual coefficients, the standard errors are very close to each other, and do not vary based on, say, the number of available observations.

In the individual standard regressions, we have 3 Subjects that stand out because their error is above 2 standard errors from the mean error. The second plot shows that there is correlation between number of observations and magnitude of the error. However, in this case, the standard error is a measure of precision of the measurement of one Subject compared to the same Subject, and not the entire sample, so we cannot be sure if this is due to missing values or that Subject performing in a very disparate way.

In the case of the mixed model, all the observations are marked at different levels, meaning that all the observations are going to assume a different weight when calculating the fixed effects. A couple get a very long bar even thought they did not from the individual standard regressions, meaning that their performance might be consistent withing that subject, but highly atypical when taking into account everybody else.

library(RColorBrewer) #preparing filling colours
set.seed(7)
n <- length(unique(sleepstudy$Subject))
fill <- grDevices::colors()
fill1 <- sample(fill, n)

Plotting on the mean

library(ggplot2)

ggplot(data=(bs_data_p),
       aes(x=Reaction, y=Intercept, 
           fill=Subject)) +
  facet_grid(~Method) +
  
  geom_errorbar(aes(x=Reaction,
                    ymin=Intercept-St.er.int,
                    ymax=Intercept+St.er.int,
                    col=er.st.int),
                position=position_dodge(2)) +
  
  geom_jitter(aes(size=N), shape=21) +
  
  scale_radius() +

  scale_fill_manual(values=fill1) + 
  
  scale_color_manual(values=c("firebrick1", "black", "darkorange")) +
  
  guides(fill = "none") +
  
  coord_cartesian(ylim=c(150, 350)) +
  
  theme_bw()

On N

ggplot(bs_data_p,
       aes(x=N, y=Intercept, 
           fill=Subject)) +
  
  geom_errorbar(position=position_dodge(5),
                aes(ymin=Intercept-St.er.int,
                    ymax=Intercept+St.er.int,
                    col=er.st.int, group=Subject)) +
  
  geom_point(size=3, position=position_dodge(5), shape=21) +
  
  guides(fill = "none") +
  
  scale_fill_manual(values = fill1) +
  
  scale_color_manual(values=c("firebrick1", "black", "darkorange")) +
  
  scale_x_continuous(breaks = c(5:10)) +
  
  facet_wrap(~Method) +
  
  theme_bw()

If we consider the intercept from each regression:

knitr::kable(rbind(`n-slr`=mean(lm_nslr$coef[-1]),
`i-slr`=mean(coef.lm_islr$Intercept),
`mm`=mean(coef.mm$Intercept)))


n-slr	253.2050
i-slr	252.0537
mm	253.2341

In this (artificial) example, the difference is small. However, we may observe that both the non-pooled (n-slr) and individual (i-slr) regressions are underestimating the intercept. This may be due to standard errors below the average error being absent or small in n-slr and i-slr, as shown by the plots. In a real-life scenario, this might lead to believe that Subjects had faster reactions on average at the beginning of the experiment - thus influencing the estimate of how much slower they become in the course of the experiment. The mixed model would instead weight down the faster Subject, proportionally to how many observations are available.

I hope that this post might serve as an inspiration to others, and that when you see these plots this is not what you will see:

linear regression mixed model R random effects regression rstats statistics