Raster Intro Tutorial

This tutorial provides a brief introduction to the Raster class, which facilitates working with raster datasets.

Introduction

Raster datasets are fundamental to pfdf - many routines require rasters as input, and many produce new rasters as output. In brief, a raster dataset is a rectangular grid composed of pixels, which are rectangular cells with assigned data values. The pixels are regularly spaced along the X and Y axes, and each axis may use its own spacing interval. A raster is usually associated with some spatial metadata, which locates the raster’s pixels in space. Some rasters will also have a NoData value - when this is the case, pixels equal to the NoData value represent missing data.

A raster’s spatial metadata consists of a coordinate reference system (CRS) and an affine transformation matrix (also known as the transform). The transform converts the data grid’s column indices to spatial coordinates, and the CRS specifies the location of these coordinates on the Earth’s surface. A transform defines a raster’s resolution and alignment (the location of pixel edges) and takes the form:

\[\begin{split} \begin{vmatrix} dx & 0 & \mathrm{left}\\ 0 & dy & \mathrm{top} \end{vmatrix} \end{split}\]

Here dx and dy are the change in spatial coordinate when incrementing one column or row, and their absolute values define the raster’s resolution. Meanwhile, left and top indicate the spatial coordinates of the data grid’s left and top edges, which defines the raster’s alignment. The two remaining coefficients can be used to implement shear transforms, but pfdf only supports rectangular pixels, so these will always be 0 for our purposes.

To facilitate working with these datasets, pfdf provides the Raster class. In brief, the class provides routines to

  • Load and build rasters from a variety of sources,

  • Manage data values,

  • Manage spatial metadata,

  • Preprocess datasets, and

  • Save rasters to file.

This tutorial provides a brief introduction to the Raster class, sufficient for implementing a basic hazard assessment. You can also find more detailed discussions in the Raster Properties, Raster Factories, and Preprocessing tutorials.

Prerequisites

Install pfdf

To run this tutorial, you must have installed pfdf 3+ with tutorial resources in your Jupyter kernel. The following line checks this is the case:

import check_installation
pfdf and tutorial resources are installed

Clean Workspace

Next, we’ll clean our workspace of any example datasets created by this tutorial.

from tools import workspace
workspace.remove_examples()
Cleaned workspace of example files

Imports

Then, we’ll import the Raster class from pfdf, and some small tools to help run the tutorial. We’ll also use numpy to work with raster data grids. (Note: Importing Raster can take a bit, as Python needs to compile the numba library to do so).

from pfdf.raster import Raster
from tools import print_path
import numpy as np

Raster Objects

The Raster class is used to create and manipulate Raster objects. Each Raster object holds the data grid for a raster dataset, along with associated metadata. Here, we’ll use the from_file command to create a new Raster object from the dNBR raster dataset included in the data folder. We’ll discuss this command in a later section, but for now, just know that it’s creating a Raster object from our example dataset:

raster = Raster.from_file('data/dnbr.tif')

Printing the raster to the console, we can read a summary of the raster’s data grid and spatial metadata:

print(raster)
Raster:
    Shape: (1280, 1587)
    Dtype: int16
    NoData: -32768
    CRS("NAD83 / UTM zone 11N")
    Transform(dx=10.0, dy=-10.0, left=408022.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")
    BoundingBox(left=408022.12009999994, bottom=3776255.5413000006, right=423892.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")

We’ll introduce these properties over the next few sections.

Raster Properties

Raster objects have a variety of properties that return information of the raster’s data grid and metadata. This section only introduces a few common properties, but you can find a more detailed discussion in the Raster Properties tutorial.

Data Grid

You can use the values property to return a Raster object’s data grid as a numpy array. Here, we observe that our dNBR data array has a buffer of -32768 NoData pixels along the edges:

print(raster.values)
[[-32768 -32768 -32768 ... -32768 -32768 -32768]
 [-32768 -32768 -32768 ... -32768 -32768 -32768]
 [-32768 -32768 -32768 ... -32768 -32768 -32768]
 ...
 [-32768 -32768 -32768 ... -32768 -32768 -32768]
 [-32768 -32768 -32768 ... -32768 -32768 -32768]
 [-32768 -32768 -32768 ... -32768 -32768 -32768]]

Inspecting a portion of the array away from the edges, we find the example dataset consists of integers that are mostly under 1000:

print(raster.values[600:700, 600:700])
[[692 697 697 ... 699 726 699]
 [728 728 728 ... 699 699 726]
 [728 728 728 ... 699 714 726]
 ...
 [651 651 651 ... 659 659 590]
 [651 651 624 ... 641 590 576]
 [651 651 624 ... 635 576 573]]

The raster values are read-only. This means they’ll work fine for most mathematical routines, but you’ll need to make a copy if you want to alter the data elements directly. For example:

# Most routines are fine
median = np.median(raster.values)
# But this will fail because it attempts to alter array elements
try:
    raster.values[0,:] = 0
except Exception as error:
    print('Failed because we attempted to alter the array directly')
Failed because we attempted to alter the array directly
# This is fine because we copied the array first
values = raster.values.copy()
values[0,:] = 0  # Replaces the top row with 0 values
print(values)
[[     0      0      0 ...      0      0      0]
 [-32768 -32768 -32768 ... -32768 -32768 -32768]
 [-32768 -32768 -32768 ... -32768 -32768 -32768]
 ...
 [-32768 -32768 -32768 ... -32768 -32768 -32768]
 [-32768 -32768 -32768 ... -32768 -32768 -32768]
 [-32768 -32768 -32768 ... -32768 -32768 -32768]]

Array Metadata

Raster objects have properties to report the data array’s metadata. Some useful properties include:

  • shape: The shape of the data array

  • dtype: The data type of the dataset

  • nodata: The NoData value

  • nbytes: The size of the array in bytes.

For example, inspecting these properties for our example raster, we find the data grid is 1280 x 1587 pixels, uses a 16-bit integer data type, has a NoData value of -32768, and uses 4 MB of memory:

print(raster.shape)
print(raster.dtype)
print(raster.nodata)
print(raster.nbytes)
(1280, 1587)
int16
-32768
4062720

Spatial Metadata

Other properties return the raster’s spatial metadata. The most commonly used properties include:

You can learn more about these metadata objects in the Spatial Metadata tutorial.

Inspecting our example Raster, we find it is projected in EPSG:26911, and has a resolution of 10 CRS units (in this case, meters).

raster.crs
<Projected CRS: EPSG:26911>
Name: NAD83 / UTM zone 11N
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: UTM zone 11N
- method: Transverse Mercator
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
raster.transform
Transform(dx=10.0, dy=-10.0, left=408022.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")

Examining its bounding box, we find the raster spans from roughly 408022.1201 to 423892.1201 along the X axis, and from 3776255.5413 to 3789055.5413 along the Y axis:

raster.bounds
BoundingBox(left=408022.12009999994, bottom=3776255.5413000006, right=423892.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")

Note that the bounding box coordinates are defined relative to the raster’s CRS, in this case EPSG:26911.

Raster Factories

To create a Raster object, you should use a Raster factory method. These methods build new Raster objects from different types of data sources. The factories follow the naming convention from_<type>, where <type> is a particular type of data source. Some common factories include:

  • from_file: Loads a raster from the local filesystem,

  • from_url: Loads a raster from a web URL

  • from_array: Builds a raster from a numpy array

  • from_points: Builds a raster from a collection of Point or MultiPoint features

  • from_polygons: Builds a raster from a collection of Polygon or MultiPolygon features

Each factory includes options for building Raster objects from the associated data source. For example, from_file includes an option to only load data in an area of interest, and from_url includes options for connecting to the remote server. You can find a detailed discussion of these factories in the Raster Factory Tutorial.

Saving Rasters

It’s often useful to save a Raster object to file, particularly when an analytical routine produces a new raster dataset as output. You can save Raster objects using the save command. This command takes a file name or path as input, and returns the path to the saved file as output. For example:

path = raster.save('examples/my-raster.tif')
print_path(path)
.../examples/my-raster.tif

By default, the save command will not allow you to overwrite existing files. For example, calling the save command a second time with the same file name will fail because the file already exists:

try:
    raster.save('examples/my-raster.tif')
except FileExistsError:
    print('Failed because the file already exists')
Failed because the file already exists

You can permit overwriting by setting overwrite=True:

raster.save('examples/my-raster.tif', overwrite=True)
print('overwrote the existing file')
overwrote the existing file

Conclusion

In this tutorial, we’ve introduced the Raster class, which facilitates working with raster datasets. We’ve learned how to access a Raster object’s data array, and examined properties with important metadata. We’ve learned that Raster objects are created using dedicated factory methods, and we’ve shown how to save raster datasets to file.

This tutorial was deliberately brief, and later tutorials examine the class in greater detail. As a reminder, you can learn more about Raster objects in the Raster Properties, Raster Factories, and Preprocessing tutorials. In the next tutorial, we’ll learn how to use the data package to download commonly used datasets (many of which are rasters) from the internet.