RasterMetadata Tutorial¶
This tutorial examines the RasterMetadata
class, which can be used to manage raster metadata, without loading data values to memory.
Introduction¶
Sometimes, you may wish to query a raster’s metadata, without actually loading the raster’s data array into memory. The following examples describe common cases, and there are many more:
Example 1: You want to check a raster’s bounding box to determine if it intersects your area of interest - but if it doesn’t intersect, then you don’t want to load the raster into memory.
Example 2: You want to rasterize a Polygon dataset, but the dataset is quite large and you think the rasterized output might occupy too much memory. As such, you’d like to check the memory footprint for several possible resolutions, without actually rasterizing the dataset.
Example 3: You’re considering reprojecting a raster, but want to inspect the transform of the reprojected dataset first. Since reprojecting a data array can be computationally expensive, you’d like to avoid that before inspecting the new transform.
The RasterMetadata
class allows you to implement all these tasks. The class records and manages a raster dataset’s metadata - both spatial and data array metadata - without loading the actual data array into memory. This can be useful for advanced users who are concerned with memory and/or computational efficiency. In this tutorial, we’ll examine the RasterMetadata
class and its key methods.
Prerequisites¶
Install pfdf¶
To run this tutorial, you must have installed pfdf 3+ with tutorial resources in your Jupyter kernel. The following line checks this is the case:
import check_installation
pfdf and tutorial resources are installed
Example Files¶
Finally, we’ll save some example files to use in the tutorial. The raster dataset is a 50x75 grid of random values between 0 and 100 with a border of -1 NoData values along the edges. The raster is projected in EPSG:26911 with a 10 meter resolution. We’ll also save an example Polygon feature collection.
from tools import examples
examples.build_raster()
examples.build_polygons()
Built example raster
Built example polygon collection
RasterMetadata Objects¶
RasterMetadata
objects are designed to manage raster metadata without the computational expense of loading and manipulating data arrays. This includes both spatial metadata, and data grid metadata. For example, even though a data array is not loaded into memory, you can still access the array’s metadata properties, such as the shape, dtype, memory footprint, etc.
The class broadly parallels the Raster
class. For example, you can create RasterMetadata
objects from existing data sources using factory methods, just like the Raster
class. These factories have the same names as the Raster
class and have nearly identical options. One exception is the RasterMetadata
constructor, which is quite different from the Raster
constructor. Please consult the API for more details.
Similarly, the RasterMetadata
class includes methods for each preprocessing method in the Raster
class. You can use these methods to determine the metadata for a raster dataset resulting from the associated preprocessing method. The class also includes several methods not included in the Raster
class, which advanced users can use to implement custom metadata routines.
Unlike Raster
objects, RasterMetadata
objects are immutable. This means you cannot alter a RasterMetadata
object once it has been created. As such, any methods that update metadata fields return a new RasterMetadata
object. This differs from the Raster
class, which includes some methods that alter Raster
objects in-place. This is most prevalent for raster preprocessing methods.
For example, calling Raster.reproject
will alter the data grid in-place, and will not return any output. By contrast, calling RasterMetadata.reproject
will return the RasterMetadata
object for the reprojected dataset as output, but the calling object will not be altered.
You can import the RasterMetadata
class from the pfdf.raster
namespace. We’ll also import the Raster
class to help implement parts of the tutorial:
from pfdf.raster import RasterMetadata, Raster
Properties¶
RasterMetadata
objects support all the same metadata properties and methods as Raster
objects. For example, let’s load the RasterMetadata
object for our example file:
metadata = RasterMetadata.from_file('examples/raster.tif')
We can inspect various metadata fields using the usual Raster
properties. For example:
# Data array metadata
print(metadata.nodata)
print(metadata.shape)
print(metadata.dtype)
print(metadata.nbytes)
# CRS
print(metadata.crs.name)
print(metadata.utm_zone.name)
# Transform
print(metadata.transform)
print(metadata.resolution('meters'))
print(metadata.pixel_area('meters'))
# Bounding Box
print(metadata.bounds)
print(metadata.center)
-999
(50, 75)
int16
7500
NAD83 / UTM zone 11N
WGS 84 / UTM zone 10S
Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
(10.0, 10.0)
100.0
BoundingBox(left=0.0, bottom=-500.0, right=750.0, top=0.0, crs="NAD83 / UTM zone 11N")
(375.0, -250.0)
However, there are three Raster
properties that RasterMetadata
does not support. These are values
, data_mask
and nodata_mask
- the class does not include these properties because there is no data array loaded into memory:
try:
metadata.values
metadata.data_mask
metadata.nodata_mask
except Exception:
print('Not supported because there is no data array in memory')
Not supported because there is no data array in memory
Factory Methods¶
You can use factory methods to build RasterMetadata
objects from various data sources. The class uses the same methods as the Raster
class including:
from_file
: Loads metadata from a raster in the local filesystemfrom_url
: Loads metadata from a raster accessed via a web URLfrom_array
: Builds metadata for a numpy arrayfrom_points
: Determines metadata for a rasterized Point/MultiPoint collectionfrom_polygons
: Determines metadata for a rasterized Polygon/MultiPolygon collection
These factories broadly use the same options as their associated Raster
factories. This allows you to determine the metadata for the Raster
object created with the same options.
Saved Datasets¶
For example, you can load the metadata for a raster file using from_file
:
RasterMetadata.from_file('examples/raster.tif')
RasterMetadata:
Shape: (50, 75)
Dtype: int16
NoData: -999
CRS("NAD83 / UTM zone 11N")
Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
BoundingBox(left=0.0, bottom=-500.0, right=750.0, top=0.0, crs="NAD83 / UTM zone 11N")
and you can use the bounds
option to determine the metadata for the raster that would be loaded when constrained to the indicated bounding box:
RasterMetadata.from_file('examples/raster.tif', bounds=[0, -250, 370, 0, 26911])
RasterMetadata:
Shape: (25, 37)
Dtype: int16
NoData: -999
CRS("NAD83 / UTM zone 11N")
Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
BoundingBox(left=0.0, bottom=-250.0, right=370.0, top=0.0, crs="NAD83 / UTM zone 11N")
Note that the shape and bounding box in this second object are different from the metadata for the full file.
Rasterizing Vector Features¶
Analogously, you can use the from_points
and from_polygons
factories to determine the metadata for the dataset that would arise from rasterizing vector features. For example:
RasterMetadata.from_polygons('examples/polygons.geojson')
RasterMetadata:
Shape: (10, 10)
Dtype: bool
NoData: False
CRS("NAD83 / UTM zone 11N")
Transform(dx=10.0, dy=-10.0, left=0.0, top=100.0, crs="NAD83 / UTM zone 11N")
BoundingBox(left=0.0, bottom=0.0, right=100.0, top=100.0, crs="NAD83 / UTM zone 11N")
Again, the options broadly parallel the Raster
factories, so you can determine how different options would affect the rasterized array:
RasterMetadata.from_polygons('examples/polygons.geojson', resolution=30)
RasterMetadata:
Shape: (4, 4)
Dtype: bool
NoData: False
CRS("NAD83 / UTM zone 11N")
Transform(dx=30.0, dy=-30.0, left=0.0, top=100.0, crs="NAD83 / UTM zone 11N")
BoundingBox(left=0.0, bottom=-20.0, right=120.0, top=100.0, crs="NAD83 / UTM zone 11N")
RasterMetadata.from_polygons('examples/polygons.geojson', field='my-data')
RasterMetadata:
Shape: (10, 10)
Dtype: int32
NoData: -2147483648
CRS("NAD83 / UTM zone 11N")
Transform(dx=10.0, dy=-10.0, left=0.0, top=100.0, crs="NAD83 / UTM zone 11N")
BoundingBox(left=0.0, bottom=0.0, right=100.0, top=100.0, crs="NAD83 / UTM zone 11N")
require_overlap¶
The from_file
and from_url
methods both include the require_overlap
option, which is not included in the parallel Raster
factories. This option is intended to help users test whether a dataset intersects an area of interest.
By default, the factories will raise an error if the saved dataset does not intersect the bounding box, just like the Raster
class. However, you can use require_overlap
option to disable this behavior and return a metadata object with 0 values in the shape instead. This can be useful when searching through multiple files/URLs for a dataset that intersects an area of interest.
For example, let’s mimic a search through a DEM tileset. Suppose we want to load data near the town of Golden, Colorado. Specifically, we want data from 39.73 to 39.79 N, and 105.19 to 105.24 W:
url = 'https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/historical/n42w102/USGS_13_n42w102_20220415.tif'
bounds = {'left': -105.24, 'right': -105.19, 'bottom': 39.73, 'top': 39.79, 'crs': 4326}
The following would fail, because the queried tile does not intersect the bounding box:
try:
RasterMetadata.from_url(url, bounds=bounds)
except Exception:
print('Error because the file did not intersect the bounding box')
Error because the file did not intersect the bounding box
Whereas the following would succeed, returning a metadata object whose shape includes 0 values:
metadata = RasterMetadata.from_url(url, bounds=bounds, require_overlap=False)
print(metadata.shape)
(0, 0)
This behavior can be useful for search routines. For example, you could use something like the following to find an intersecting tile:
tiles = [url] # This would have many more URLs in it
for tile in tiles:
metadata = RasterMetadata.from_url(url, bounds=bounds, require_overlap=False)
if 0 in metadata.shape:
continue
Other Creation Methods¶
There are other ways to obtain RasterMetadata
objects, beyond the factory methods. The most common way is to query the metadata
property of a Raster
object. For example:
raster = Raster.from_file('examples/raster.tif')
metadata = raster.metadata
print(metadata)
RasterMetadata:
Shape: (50, 75)
Dtype: int16
NoData: -999
CRS("NAD83 / UTM zone 11N")
Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
BoundingBox(left=0.0, bottom=-500.0, right=750.0, top=0.0, crs="NAD83 / UTM zone 11N")
You can also create RasterMetadata
objects using the constructor. This can allow advanced users to build custom RasterMetadata
objects from scratch. Please read the API if you are interested in this topic.
Preprocessing¶
You can also use the RasterMetadata
class to determine the metadata of rasters that would result from various preprocessing operations. Currently, the class supports the following methods:
fill,
buffer,
clip, and
reproject
The find
and set_range
methods are not explicitly supported, because they primarily act on a raster’s data array.
When calling one of these preprocessing methods, the metadata for the resulting dataset is returned as output. For example, let’s load the metadata for our example dataset:
metadata = RasterMetadata.from_file('examples/raster.tif')
print(metadata)
RasterMetadata:
Shape: (50, 75)
Dtype: int16
NoData: -999
CRS("NAD83 / UTM zone 11N")
Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
BoundingBox(left=0.0, bottom=-500.0, right=750.0, top=0.0, crs="NAD83 / UTM zone 11N")
We can use the reproject
command to determine how reprojection options would alter the dataset:
metadata.reproject(crs=4326)
RasterMetadata:
Shape: (50, 75)
Dtype: int16
NoData: -999
CRS("WGS 84")
Transform(dx=8.959056632837322e-05, dy=-9.019459346488628e-05, left=-121.48874389822696, top=0.0, crs="WGS 84")
BoundingBox(left=-121.48874389822696, bottom=-0.004509729673244314, right=-121.48202460575233, top=0.0, crs="WGS 84")
The clip
command shows how clipping would alter the dataset:
metadata.clip(bounds = [0, -250, 500, 0])
RasterMetadata:
Shape: (25, 50)
Dtype: int16
NoData: -999
CRS("NAD83 / UTM zone 11N")
Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
BoundingBox(left=0.0, bottom=-250.0, right=500.0, top=0.0, crs="NAD83 / UTM zone 11N")
The buffer
command shows the effects of buffering the bounding box:
metadata.buffer(3, 'kilometers')
RasterMetadata:
Shape: (650, 675)
Dtype: int16
NoData: -999
CRS("NAD83 / UTM zone 11N")
Transform(dx=10.0, dy=-10.0, left=-3000.0, top=3000.0, crs="NAD83 / UTM zone 11N")
BoundingBox(left=-3000.0, bottom=-3500.0, right=3750.0, top=3000.0, crs="NAD83 / UTM zone 11N")
and fill
returns metadata lacking a NoData value:
metadata.fill()
RasterMetadata:
Shape: (50, 75)
Dtype: int16
NoData: None
CRS("NAD83 / UTM zone 11N")
Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
BoundingBox(left=0.0, bottom=-500.0, right=750.0, top=0.0, crs="NAD83 / UTM zone 11N")
Conclusion¶
In this tutorial, we introduced the RasterMetadata
class and some of its key commands. This included:
metadata properties,
factory methods,
how to obtain a metadata object from a raster object, and
preprocessing methods.
We note that this tutorial is only introduction, so the RasterMetadata
class includes more advanced commands not explored here. Please read the API for a complete guide to the class.