pfdf 3.0.0 Release Notes¶
This release adds the data package, which provides routines to download commonly used datasets from the internet. The release also provides pfdf tutorials as Jupyter Notebooks, improves array broadcasting in the gartner2014 and staley2017 modules, and introduces the RasterMetadata class (which manages raster metaata without loading data arrays to memory).
Backwards Compatibility
The updates to array broadcasting in the gartner2014 and staley2017 modules may break backwards compatibility in some cases. Please read the breaking changes section for details.
Updates¶
Major Additions
Added the data package, which provides routines to download commonly-used datasets from the internet
Added the RasterMetadata class, which manages raster metadata without loading raster data arrays into memory
Tutorials
Converted the tutorials to Jupyter Notebooks
Added a tutorial on the
data
packageSplit the raster tutorial into three tutorials: an introduction, raster properties, and raster factories
Added a tutorial on the
RasterMetadata
class
Installation
pfdf can now be installed from the official USGS package registry. This is the recommended way to install pfdf.
Projection Package
Added the projection.crs module, which provides utilities for working with
pyproj.CRS
objectsAdded the
match_crs
andremove_crs
methods to the BoundingBox and Transform classes
Raster
Added the
from_url
factory, which buildsRaster
objects for datasets accessed via web URLssave
now returns the path to the saved fileAdded the
dtype
,field_casting
, andoperation
options to from_points and from_polygonsPinned rasterized vector feature dtypes to
int32
for integer fields, andfloat64
for floating-point fieldsAdded __getitem__, which allows indexing into
Raster
objectsAdded
copy
option to Raster.fill and Raster.set_rangerepr
now only includes non-default raster names
Segments
Added the
located_basins
property, which indicates if the object has pre-located outlet basins,save
now returns the path to the saved filerepr
now includes more info on the network
Models
Improved broadcasting in the staley2017 models. Output arrays are now (Segments x Accumulations/Probabilities x Parameter Runs)
Improved broadcasting in the gartner2014 models. Output arrays are now (Segments x Rainfall Intensities x Parameter Runs x Confidence Intervals).
Users can now disable confidence interval calculations in the gartner2014 models
utils.intensity.from_accumulation now broadcasts along the final array dimension by default.
Added the
dim
option to utils.intensity.from_accumulation to select the broadcasting dimension.
Exceptions
Added 9 new exceptions pertaining to download errors from third-party data providers
Added 2 new exceptions for when a dataset does not overlap a required bounding box
Bug Fixes
Fixed a bug where Raster.from_points and Raster.from_polygons would ignore an entire MultiPoint or MultiPolygon feature if a single point or polygon was outside a queried bounding box
Fixed a bug that prevented
Raster.from_points
andRaster.from_polygons
from rasterizing features in geodatabases and other structured directoriesFixed a bug where some
BoundingBox
andTransform
properties were returned as numpy arrays instead of floatsFixed a bug wherein
Raster.from_file
would ignore thedriver
optionFixed a bug wherein
Raster.set_range
andRaster.fill
would fail when filling a float32 raster with NaN NoDataFixed a bug that prevented loading float32 rasters from file with default NoData
Removed the unused
y
option from projection.Transform.reproject
For Developers
Added slow and web markers
Added
quicktest
andwebtest
developer scriptsAdded developer scripts to build and run tutorials
Pipeline now runs tutorials and web tests on merge requests
Added a scheduled pipeline for daily builds. The daily builds do not use
poetry.lock
, instead building from the most up-to-date dependencies.pyproject.toml
now conforms to PEP 508 and poetry 2+Improved pipeline implementation of multiple Python builds
API Breaking Changes¶
Changes to model broadcasting in the gartner2014 and staley2017 wil break backwards compatibility in some cases. This section details these changes and suggest possible fixes.
gartner2014¶
Formerly, output arrays from the gartner2014
module had dimensions of (Segments x Parameter Runs). In this formulation, rainfall intensities and confidence interval parameters were grouped with the model coefficients to characterize parameter runs. In v3.0.0, rainfall intensities and confidence intervals have been separated into their own dimensions. As such, output arrays can now be 4D with dimensions of (Segments x Rainfall Intensities x Parameter Runs x Confidence Intervals).
This change is unlikely to affect most users, as running the model with multiple coefficients and/or confidence intervals is rare. The most common case is to run the multiple over a stream segment network for multiple rainfall intensities, and this will still return a 2D array of model results.
staley2017¶
Formerly, output arrays from the likelihood
and accumulation
functions had dimensions of (Segments x Durations x Accumulations/Probabilities). The order of these dimensions has been altered to (Segments x Accumulations/Probabilities x Durations) in order to simplify broadcasting for the most common use cases.
This change is most likely to affect users of the accumulation
function (often referred to as rainfall thresholds), as this function is often run with multiple sets of model coefficients. This change will also affect users of the likelihood
function who set keepdims = True
. The following is a suggested fix for common use cases:
Accumulation
Change this:
accumulations = s17.accumulation(...)
for s in segments:
for d in durations:
for p in probabilities:
accumulations[s, d, p]
to this:
accumulations = s17.accumulation(...)
for s in segments:
for p in probabilities:
for d in durations:
accumulations[s, p, d]
Likelihood
Change this:
likelihoods = s17.likelihood(..., keepdims=True)
for s in segments:
for r in R15:
likelihoods[s, :, r]
to this:
likelihoods = s17.likelihood(..., keepdims=True)
for s in segments:
for r in R15:
likelihoods[s, r, :]
intensity.from_accumulation¶
The utils.intensity.from_accumulation function has been updated to accommodate the new dimension order in the staley2017
model. Formerly, this function always broadcast durations along the second dimension of an array. Now, this function broadcasts durations along the final array dimension, regardless of dimensional index.
This change is unlikely to affect most users, as the dimensions of the accumulation arrays from the staley2017
module have shifted in a like manner.