Parameter Sweep Tutorial¶
This tutorial demonstrates how to run the M1 likelihood model using multiple values of the model’s parameters.
Introduction¶
The models in pfdf allow users to specify the calibration parameters. By default, the models will use the parameters described in the scientific literature, but advanced users may wish to use different parameters in some cases. One anticipated use case is testing multiple values of model parameters in order to calibrate models to new regions. For example, the models of Staley et al., 2017 were calibrated in southern California, but these models are often applied in other regions - such as Arizona and Colorado. As such, researchers may be interested in running these models with a suite of parameters, and pfdf provides support for this.
In this tutorial, we will examine how to run assessment models with multiple parameters, using the M1 model of Staley et al. (2017) as an example. We’ll examine how to run the model using multiple values of the same parameter, and then an advanced case of runs using multiple varying parameters.
Prerequisites¶
Install pfdf¶
To run this tutorial, you must have installed pfdf 3+ with tutorial resources in your Jupyter kernel. The following line checks this is the case:
import check_installation
pfdf and tutorial resources are installed
Previous Tutorials¶
You must run the Preprocessing Tutorial before this one. This is because we’ll use the preprocessed datasets to derive a stream segment network for this tutorial. The following line checks the workspace for the preprocessed datasets:
from tools import workspace
workspace.check_preprocessed()
The preprocessed datasets are present
We also strongly recommend completing the Hazard Assessment Tutorial before this one. This is because this tutorial assumes familiarity with many of the concepts introduced in that tutorial.
Example Network¶
Next, we’ll build an example stream segment network. This process is explored in detail in the Hazard Assessment Tutorial.
from tools import examples
segments = examples.build_segments()
print(segments)
Segments:
Total Segments: 696
Local Networks: 41
Located Basins: False
Raster Metadata:
Shape: (1261, 1874)
CRS("NAD83")
Transform(dx=9.259259269219641e-05, dy=-9.25925927753796e-05, left=-117.99879629663243, top=34.23981481425213, crs="NAD83")
BoundingBox(left=-117.99879629663243, bottom=34.123055554762374, right=-117.82527777792725, top=34.23981481425213, crs="NAD83")
Imports¶
Finally, we’ll import the s17
module, which implements the Staley 2017 models. We’ll also import numpy
, to help work with parameter arrays:
from pfdf.models import s17
import numpy as np
Getting Started¶
Model Variables¶
We’ll start by computing the (T)errain, (F)ire, and (S)oil variables for the M1 model. As discussed in the hazard assessment tutorial, the final T
, F
, and S
outputs are 1D arrays with one element per stream segment.
from pfdf.raster import Raster
from pfdf import severity, watershed
# Load datasets
dem = Raster.from_file('preprocessed/dem.tif')
dnbr = Raster.from_file('preprocessed/dnbr.tif')
kf = Raster.from_file('preprocessed/kf.tif')
barc4 = Raster.from_file('preprocessed/barc4.tif')
# Determine watershed characteristics
moderate_high = severity.mask(barc4, ["moderate", "high"])
conditioned = watershed.condition(dem)
flow = watershed.flow(conditioned)
slopes = watershed.slopes(conditioned, flow)
# Compute M1 variables
T, F, S = s17.M1.variables(segments, moderate_high, slopes, dnbr, kf, omitnan=True)
In the hazard assessment tutorial, we saw how to run the M1 model on these variable using the standard calibration parameters from the literature. In this tutorial, we’ll still use these variables, but we’ll run the model with different calibration parameters.
Design Storms¶
We’ll also define some example design storms for the model. For simplicity, we’ll only use design storms for 15-minute rainfall durations. As a reminder, the design storms are peak rainfall accumulations (in millimeters) over a 15-minute interval:
R15 = [5, 6, 8.75, 10]
Testing One Parameter¶
As a reminder, the M1 model uses 4 calibrated coefficients in its logistic regression model. These are:
Coefficient |
Description |
---|---|
B |
Model intercept |
Ct |
Terrain variable coefficient |
Cf |
Fire variable coefficient |
Cs |
Soil variable coefficient |
Let’s say we’d like to calibrate the model’s Ct
parameter for our dataset. One approach could be to run the model using multiple values of Ct
. If we have a database of known debris-flow events, we could compare the database to the model results to try and determine an optimal Ct
value.
Here, we’ll demonstrate how to run the model using multiple Ct
values. (Comparing results to a database is beyond the scope of this tutorial). For brevity, we’ll limit our investigation to 15-minute rainfall durations. We’ll start by getting the standard B
, Cf
, and Cs
variables from the literature:
B, _, Cf, Cs = s17.M1.parameters(durations=15)
print(B)
print(Cf)
print(Cs)
[-3.63]
[0.67]
[0.7]
Next, we’ll sample every Ct
value from 0.01 to 1 in steps of 0.01. Examining the shape, we note we’ve sampled 100 possible values:
Ct = np.arange(0.01, 1.01, 0.01)
print(Ct.shape)
(100,)
Finally, we’ll run the likelihood model using our sampled parameters:
likelihoods = s17.likelihood(R15, B, Ct, T, Cf, F, Cs, S)
The output is a 3D numpy array. We previously discussed that each row holds likelihoods for a stream segment, and each column is a design storm. However, the third dimension is new. Here, each element along the third dimension holds results for a unique set of calibration parameters. We will refer to these sets as model runs.
For example, examining the shape of the output array, we find it has 100 elements along the third dimension. Each of these elements corresponds to one of our 100 sampled Ct
values:
print(likelihoods.shape)
print((segments.size, len(R15), Ct.size))
(696, 4, 100)
(696, 4, 100)
We can now go on to compare these results to a database of debris-flow events, or use the results for other research purposes.
Testing Multiple Parameters¶
The previous example only sampled one parameter. However, we could instead choose to test multiple values of multiple parameters simultaneously. When this is the case, all parameters with multiple values should be vectors with the same number of elements. Each iterative set of parameter values is then used for a distinct model run. Any scalar parameters will use the same value for each run.
For example, let’s say we’ve decided to sample the Ct
and Cf
parameters simultaneously. We’ll generate 1000 random values of Ct
with a mean of 0.4 and standard deviation of 0.25. We’ll also generate 1000 values of Cf
with a mean of 0.67 and a standard deviation of 0.3:
# Seeds the random number generator to make this example reproducible
rng = np.random.default_rng(seed=123456789)
# Sample the parameters
Ct = 0.4 + 0.25 * np.random.rand(1000)
Cf = 0.67 + 0.3 * np.random.rand(1000)
We can now run the model as usual. Note that we’re using the scalar B
and Cs
values from the literature:
likelihoods = s17.likelihood(R15, B, Ct, T, Cf, F, Cs, S)
Once again, the output is a 3D array. Examining the shape, we note there are 1000 elements along the third dimension:
print(likelihoods.shape)
(696, 4, 1000)
Here, each element along the third dimension holds the likelihoods for one of the 1000 tested (Ct, Cf)
pairs. For example, element 0 holds the results for (Ct[0], Cf[0])
, column 1 holds results for (Ct[1], Cf[1])
, etc.
Important!¶
This tutorial used random sampling for the sake of brevity, but this is often a poor sampling method in practice, as it can result in undersampled regions of the parameter space. Instead, we recommend users consider more sophisticated sampling methods, such as Latin Hypercube Sampling.
Conclusion¶
In this tutorial, we’ve learned how to run a hazard assessment model using new model parameters, using the M1 likelihood model as an example. In the tutorial, we saw how to run the model for (1) a single sampled parameter, and (2) multiple parameters sampled simultaneously.