RasterMetadata Tutorial

This tutorial examines the RasterMetadata class, which can be used to manage raster metadata, without loading data values to memory.

Introduction

Sometimes, you may wish to query a raster’s metadata, without actually loading the raster’s data array into memory. The following examples describe common cases, and there are many more:

Example 1: You want to check a raster’s bounding box to determine if it intersects your area of interest - but if it doesn’t intersect, then you don’t want to load the raster into memory.

Example 2: You want to rasterize a Polygon dataset, but the dataset is quite large and you think the rasterized output might occupy too much memory. As such, you’d like to check the memory footprint for several possible resolutions, without actually rasterizing the dataset.

Example 3: You’re considering reprojecting a raster, but want to inspect the transform of the reprojected dataset first. Since reprojecting a data array can be computationally expensive, you’d like to avoid that before inspecting the new transform.

The RasterMetadata class allows you to implement all these tasks. The class records and manages a raster dataset’s metadata - both spatial and data array metadata - without loading the actual data array into memory. This can be useful for advanced users who are concerned with memory and/or computational efficiency. In this tutorial, we’ll examine the RasterMetadata class and its key methods.

Prerequisites

Install pfdf

To run this tutorial, you must have installed pfdf 3+ with tutorial resources in your Jupyter kernel. The following line checks this is the case:

import check_installation
pfdf and tutorial resources are installed

Example Files

Finally, we’ll save some example files to use in the tutorial. The raster dataset is a 50x75 grid of random values between 0 and 100 with a border of -1 NoData values along the edges. The raster is projected in EPSG:26911 with a 10 meter resolution. We’ll also save an example Polygon feature collection.

from tools import examples
examples.build_raster()
examples.build_polygons()
Built example raster
Built example polygon collection

RasterMetadata Objects

RasterMetadata objects are designed to manage raster metadata without the computational expense of loading and manipulating data arrays. This includes both spatial metadata, and data grid metadata. For example, even though a data array is not loaded into memory, you can still access the array’s metadata properties, such as the shape, dtype, memory footprint, etc.

The class broadly parallels the Raster class. For example, you can create RasterMetadata objects from existing data sources using factory methods, just like the Raster class. These factories have the same names as the Raster class and have nearly identical options. One exception is the RasterMetadata constructor, which is quite different from the Raster constructor. Please consult the API for more details.

Similarly, the RasterMetadata class includes methods for each preprocessing method in the Raster class. You can use these methods to determine the metadata for a raster dataset resulting from the associated preprocessing method. The class also includes several methods not included in the Raster class, which advanced users can use to implement custom metadata routines.

Unlike Raster objects, RasterMetadata objects are immutable. This means you cannot alter a RasterMetadata object once it has been created. As such, any methods that update metadata fields return a new RasterMetadata object. This differs from the Raster class, which includes some methods that alter Raster objects in-place. This is most prevalent for raster preprocessing methods.

For example, calling Raster.reproject will alter the data grid in-place, and will not return any output. By contrast, calling RasterMetadata.reproject will return the RasterMetadata object for the reprojected dataset as output, but the calling object will not be altered.

You can import the RasterMetadata class from the pfdf.raster namespace. We’ll also import the Raster class to help implement parts of the tutorial:

from pfdf.raster import RasterMetadata, Raster

Properties

RasterMetadata objects support all the same metadata properties and methods as Raster objects. For example, let’s load the RasterMetadata object for our example file:

metadata = RasterMetadata.from_file('examples/raster.tif')

We can inspect various metadata fields using the usual Raster properties. For example:

# Data array metadata
print(metadata.nodata)
print(metadata.shape)
print(metadata.dtype)
print(metadata.nbytes)

# CRS
print(metadata.crs.name)
print(metadata.utm_zone.name)

# Transform
print(metadata.transform)
print(metadata.resolution('meters'))
print(metadata.pixel_area('meters'))

# Bounding Box
print(metadata.bounds)
print(metadata.center)
-999
(50, 75)
int16
7500
NAD83 / UTM zone 11N
WGS 84 / UTM zone 10S
Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
(10.0, 10.0)
100.0
BoundingBox(left=0.0, bottom=-500.0, right=750.0, top=0.0, crs="NAD83 / UTM zone 11N")
(375.0, -250.0)

However, there are three Raster properties that RasterMetadata does not support. These are values, data_mask and nodata_mask - the class does not include these properties because there is no data array loaded into memory:

try:
    metadata.values
    metadata.data_mask
    metadata.nodata_mask
except Exception:
    print('Not supported because there is no data array in memory')
Not supported because there is no data array in memory

Factory Methods

You can use factory methods to build RasterMetadata objects from various data sources. The class uses the same methods as the Raster class including:

  • from_file: Loads metadata from a raster in the local filesystem

  • from_url: Loads metadata from a raster accessed via a web URL

  • from_array: Builds metadata for a numpy array

  • from_points: Determines metadata for a rasterized Point/MultiPoint collection

  • from_polygons: Determines metadata for a rasterized Polygon/MultiPolygon collection

These factories broadly use the same options as their associated Raster factories. This allows you to determine the metadata for the Raster object created with the same options.

Saved Datasets

For example, you can load the metadata for a raster file using from_file:

RasterMetadata.from_file('examples/raster.tif')
RasterMetadata:
    Shape: (50, 75)
    Dtype: int16
    NoData: -999
    CRS("NAD83 / UTM zone 11N")
    Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
    BoundingBox(left=0.0, bottom=-500.0, right=750.0, top=0.0, crs="NAD83 / UTM zone 11N")

and you can use the bounds option to determine the metadata for the raster that would be loaded when constrained to the indicated bounding box:

RasterMetadata.from_file('examples/raster.tif', bounds=[0, -250, 370, 0, 26911])
RasterMetadata:
    Shape: (25, 37)
    Dtype: int16
    NoData: -999
    CRS("NAD83 / UTM zone 11N")
    Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
    BoundingBox(left=0.0, bottom=-250.0, right=370.0, top=0.0, crs="NAD83 / UTM zone 11N")

Note that the shape and bounding box in this second object are different from the metadata for the full file.

Rasterizing Vector Features

Analogously, you can use the from_points and from_polygons factories to determine the metadata for the dataset that would arise from rasterizing vector features. For example:

RasterMetadata.from_polygons('examples/polygons.geojson')
RasterMetadata:
    Shape: (10, 10)
    Dtype: bool
    NoData: False
    CRS("NAD83 / UTM zone 11N")
    Transform(dx=10.0, dy=-10.0, left=0.0, top=100.0, crs="NAD83 / UTM zone 11N")
    BoundingBox(left=0.0, bottom=0.0, right=100.0, top=100.0, crs="NAD83 / UTM zone 11N")

Again, the options broadly parallel the Raster factories, so you can determine how different options would affect the rasterized array:

RasterMetadata.from_polygons('examples/polygons.geojson', resolution=30)
RasterMetadata:
    Shape: (4, 4)
    Dtype: bool
    NoData: False
    CRS("NAD83 / UTM zone 11N")
    Transform(dx=30.0, dy=-30.0, left=0.0, top=100.0, crs="NAD83 / UTM zone 11N")
    BoundingBox(left=0.0, bottom=-20.0, right=120.0, top=100.0, crs="NAD83 / UTM zone 11N")
RasterMetadata.from_polygons('examples/polygons.geojson', field='my-data')
RasterMetadata:
    Shape: (10, 10)
    Dtype: int32
    NoData: -2147483648
    CRS("NAD83 / UTM zone 11N")
    Transform(dx=10.0, dy=-10.0, left=0.0, top=100.0, crs="NAD83 / UTM zone 11N")
    BoundingBox(left=0.0, bottom=0.0, right=100.0, top=100.0, crs="NAD83 / UTM zone 11N")

require_overlap

The from_file and from_url methods both include the require_overlap option, which is not included in the parallel Raster factories. This option is intended to help users test whether a dataset intersects an area of interest.

By default, the factories will raise an error if the saved dataset does not intersect the bounding box, just like the Raster class. However, you can use require_overlap option to disable this behavior and return a metadata object with 0 values in the shape instead. This can be useful when searching through multiple files/URLs for a dataset that intersects an area of interest.

For example, let’s mimic a search through a DEM tileset. Suppose we want to load data near the town of Golden, Colorado. Specifically, we want data from 39.73 to 39.79 N, and 105.19 to 105.24 W:

url = 'https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/historical/n42w102/USGS_13_n42w102_20220415.tif'
bounds = {'left': -105.24, 'right': -105.19, 'bottom': 39.73, 'top': 39.79, 'crs': 4326}

The following would fail, because the queried tile does not intersect the bounding box:

try:
    RasterMetadata.from_url(url, bounds=bounds)
except Exception:
    print('Error because the file did not intersect the bounding box')
Error because the file did not intersect the bounding box

Whereas the following would succeed, returning a metadata object whose shape includes 0 values:

metadata = RasterMetadata.from_url(url, bounds=bounds, require_overlap=False)
print(metadata.shape)
(0, 0)

This behavior can be useful for search routines. For example, you could use something like the following to find an intersecting tile:

tiles = [url]  # This would have many more URLs in it
for tile in tiles:
    metadata = RasterMetadata.from_url(url, bounds=bounds, require_overlap=False)
    if 0 in metadata.shape:
        continue

Other Creation Methods

There are other ways to obtain RasterMetadata objects, beyond the factory methods. The most common way is to query the metadata property of a Raster object. For example:

raster = Raster.from_file('examples/raster.tif')
metadata = raster.metadata
print(metadata)
RasterMetadata:
    Shape: (50, 75)
    Dtype: int16
    NoData: -999
    CRS("NAD83 / UTM zone 11N")
    Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
    BoundingBox(left=0.0, bottom=-500.0, right=750.0, top=0.0, crs="NAD83 / UTM zone 11N")

You can also create RasterMetadata objects using the constructor. This can allow advanced users to build custom RasterMetadata objects from scratch. Please read the API if you are interested in this topic.

Preprocessing

You can also use the RasterMetadata class to determine the metadata of rasters that would result from various preprocessing operations. Currently, the class supports the following methods:

  • fill,

  • buffer,

  • clip, and

  • reproject

The find and set_range methods are not explicitly supported, because they primarily act on a raster’s data array.

When calling one of these preprocessing methods, the metadata for the resulting dataset is returned as output. For example, let’s load the metadata for our example dataset:

metadata = RasterMetadata.from_file('examples/raster.tif')
print(metadata)
RasterMetadata:
    Shape: (50, 75)
    Dtype: int16
    NoData: -999
    CRS("NAD83 / UTM zone 11N")
    Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
    BoundingBox(left=0.0, bottom=-500.0, right=750.0, top=0.0, crs="NAD83 / UTM zone 11N")

We can use the reproject command to determine how reprojection options would alter the dataset:

metadata.reproject(crs=4326)
RasterMetadata:
    Shape: (50, 75)
    Dtype: int16
    NoData: -999
    CRS("WGS 84")
    Transform(dx=8.959056632837322e-05, dy=-9.019459346488628e-05, left=-121.48874389822696, top=0.0, crs="WGS 84")
    BoundingBox(left=-121.48874389822696, bottom=-0.004509729673244314, right=-121.48202460575233, top=0.0, crs="WGS 84")

The clip command shows how clipping would alter the dataset:

metadata.clip(bounds = [0, -250, 500, 0])
RasterMetadata:
    Shape: (25, 50)
    Dtype: int16
    NoData: -999
    CRS("NAD83 / UTM zone 11N")
    Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
    BoundingBox(left=0.0, bottom=-250.0, right=500.0, top=0.0, crs="NAD83 / UTM zone 11N")

The buffer command shows the effects of buffering the bounding box:

metadata.buffer(3, 'kilometers')
RasterMetadata:
    Shape: (650, 675)
    Dtype: int16
    NoData: -999
    CRS("NAD83 / UTM zone 11N")
    Transform(dx=10.0, dy=-10.0, left=-3000.0, top=3000.0, crs="NAD83 / UTM zone 11N")
    BoundingBox(left=-3000.0, bottom=-3500.0, right=3750.0, top=3000.0, crs="NAD83 / UTM zone 11N")

and fill returns metadata lacking a NoData value:

metadata.fill()
RasterMetadata:
    Shape: (50, 75)
    Dtype: int16
    NoData: None
    CRS("NAD83 / UTM zone 11N")
    Transform(dx=10.0, dy=-10.0, left=0.0, top=0.0, crs="NAD83 / UTM zone 11N")
    BoundingBox(left=0.0, bottom=-500.0, right=750.0, top=0.0, crs="NAD83 / UTM zone 11N")

Conclusion

In this tutorial, we introduced the RasterMetadata class and some of its key commands. This included:

  • metadata properties,

  • factory methods,

  • how to obtain a metadata object from a raster object, and

  • preprocessing methods.

We note that this tutorial is only introduction, so the RasterMetadata class includes more advanced commands not explored here. Please read the API for a complete guide to the class.