---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: '0.8'
    jupytext_version: '1.4.1'
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---
# Linear Mixed Effects Regressions

After assembling a ground motion database with gmprocess, linear mixed effects models can be used to analyze trends in the data.
In this tutorial, we will read in some sample ground motion data, compute residuals relative to a ground motion model, and perform a linear, mixed effects regression to estimate the between- and within-event terms.

First, we import the necessary packages:

```{code-cell} ipython3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

from openquake.hazardlib import valid
from openquake.hazardlib.imt import SA
from openquake.hazardlib.contexts import simple_cmaker
from gmprocess.utils.constants import DATA_DIR
```

This tutorial uses the [statsmodels](https://www.statsmodels.org/stable/install.html) package to create a mixed effects regression from spectral accelerations output from `gmprocess`.

We are using a ground motion dataset of earthquakes from the 2019 Ridgecrest, California Earthquake Sequence.

```{code-cell} ipython3
:tags: [remove-stderr]
# Path to example data
data_path = DATA_DIR / "lme" / "SA_rotd50.0_2020.03.31.csv"
df = pd.read_csv(data_path)
```

We will use the ASK14 ground motion model (GMM) and spectral acceleration for an oscillator period of 1.0 seconds.

```{code-cell} ipython3
gsim = valid.gsim('''[AbrahamsonEtAl2014]''')
imt = SA(1.0)
```

To evaluate the GMM, we must make some simple assumptions regarding the rupture properties (dip, rake, width, ztor) and site properties (Vs30, Z1).

```{code-cell} ipython3
predicted_data = []

cmaker = simple_cmaker([gsim], [str(imt)], mags=["%2.f" % mag for mag in df.EarthquakeMagnitude])
n = df.shape[0]
ctx = cmaker.new_ctx(n)
ctx["mag"] = df.EarthquakeMagnitude
ctx["dip"] = 90.0
ctx["rake"] = 0.0
ctx["width"] = 10**(-0.76 + 0.27 * df.EarthquakeMagnitude)
ctx["ztor"] = df.EarthquakeDepth
ctx["vs30"] = 760.0
ctx["vs30measured"] = False
ctx["rrup"] = df.RuptureDistance
ctx["rjb"] = df.JoynerBooreDistance
ctx["z1pt0"] = 48.0
ctx["rx"] = -1
ctx["ry0"] = -1

# Evaluate the GMM.
mean = cmaker.get_mean_stds([ctx])[0][0][0]
# Convert from ln(g) to %g
predicted_data = 100.0 * np.exp(mean)
```

Now add the model residuals to the dataframe:

```{code-cell} ipython3
resid_col = 'SA_1_ASK14_residuals'
df[resid_col] = np.log(df['SA(1.000)']) - np.log(predicted_data)
```

For the linear mixed effects regression, we use the 'EarthquakeId' column to group the data by event to compute the between-event and within-event terms.

```{code-cell} ipython3
mdf = smf.mixedlm('%s ~ 1' % resid_col, df, groups=df['EarthquakeId']).fit()
print(mdf.summary())
```

The model yields a bias (intercept) of 0.828. We can also look at the between-event terms with in the `random_effects` attribute, which is a dictionary with keys corresponding to the random effect grouping.

```{code-cell} ipython3
re_array = [float(re.iloc[0]) for group, re in mdf.random_effects.items()]
plt.plot(re_array, 'o')
plt.xlabel('Index')
plt.ylabel('Random effect');
```

and the within-event residuals

```{code-cell} ipython3
plt.plot(df['EpicentralDistance'], mdf.resid, 'o')
plt.xlabel('Epicentral Distance, km')
plt.ylabel('Within-event residual');
```

It is useful to merge the random effect terms with the original dataframe to look at the between-event terms as a function of other parameters, such as hypocentral depth.

```{code-cell} ipython3
btw_event_terms = pd.DataFrame(mdf.random_effects).T
df = df.merge(btw_event_terms, left_on='EarthquakeId', right_index=True)
df_events = df.drop_duplicates(subset=['EarthquakeId'])

fig = plt.figure()
axes = fig.add_axes([0.1, 0.1, 0.8, 0.8])
axes.scatter(df_events['EarthquakeDepth'], df_events['Group'])
axes.set_xlabel('Hypocentral Depth (km)')
axes.set_ylabel('Event term (T = 1 s)')
axes.axhline(0, ls='--', c='k')
axes.set_ylim(-2.5, 2.5);
```