Raster Intro Tutorial¶
This tutorial provides a brief introduction to the Raster
class, which facilitates working with raster datasets.
Introduction¶
Raster datasets are fundamental to pfdf - many routines require rasters as input, and many produce new rasters as output. In brief, a raster dataset is a rectangular grid composed of pixels, which are rectangular cells with assigned data values. The pixels are regularly spaced along the X and Y axes, and each axis may use its own spacing interval. A raster is usually associated with some spatial metadata, which locates the raster’s pixels in space. Some rasters will also have a NoData value - when this is the case, pixels equal to the NoData value represent missing data.
A raster’s spatial metadata consists of a coordinate reference system (CRS) and an affine transformation matrix (also known as the transform). The transform converts the data grid’s column indices to spatial coordinates, and the CRS specifies the location of these coordinates on the Earth’s surface. A transform defines a raster’s resolution and alignment (the location of pixel edges) and takes the form:
Here dx and dy are the change in spatial coordinate when incrementing one column or row, and their absolute values define the raster’s resolution. Meanwhile, left and top indicate the spatial coordinates of the data grid’s left and top edges, which defines the raster’s alignment. The two remaining coefficients can be used to implement shear transforms, but pfdf only supports rectangular pixels, so these will always be 0 for our purposes.
To facilitate working with these datasets, pfdf provides the Raster
class. In brief, the class provides routines to
Load and build rasters from a variety of sources,
Manage data values,
Manage spatial metadata,
Preprocess datasets, and
Save rasters to file.
This tutorial provides a brief introduction to the Raster
class, sufficient for implementing a basic hazard assessment. You can also find more detailed discussions in the Raster Properties, Raster Factories, and Preprocessing tutorials.
Prerequisites¶
Install pfdf¶
To run this tutorial, you must have installed pfdf 3+ with tutorial resources in your Jupyter kernel. The following line checks this is the case:
import check_installation
pfdf and tutorial resources are installed
Clean Workspace¶
Next, we’ll clean our workspace of any example datasets created by this tutorial.
from tools import workspace
workspace.remove_examples()
Cleaned workspace of example files
Imports¶
Then, we’ll import the Raster
class from pfdf, and some small tools to help run the tutorial. We’ll also use numpy
to work with raster data grids. (Note: Importing Raster
can take a bit, as Python needs to compile the numba library to do so).
from pfdf.raster import Raster
from tools import print_path
import numpy as np
Raster Objects¶
The Raster
class is used to create and manipulate Raster
objects. Each Raster
object holds the data grid for a raster dataset, along with associated metadata. Here, we’ll use the from_file
command to create a new Raster
object from the dNBR raster dataset included in the data folder. We’ll discuss this command in a later section, but for now, just know that it’s creating a Raster
object from our example dataset:
raster = Raster.from_file('data/dnbr.tif')
Printing the raster to the console, we can read a summary of the raster’s data grid and spatial metadata:
print(raster)
Raster:
Shape: (1280, 1587)
Dtype: int16
NoData: -32768
CRS("NAD83 / UTM zone 11N")
Transform(dx=10.0, dy=-10.0, left=408022.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")
BoundingBox(left=408022.12009999994, bottom=3776255.5413000006, right=423892.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")
We’ll introduce these properties over the next few sections.
Raster Properties¶
Raster
objects have a variety of properties that return information of the raster’s data grid and metadata. This section only introduces a few common properties, but you can find a more detailed discussion in the Raster Properties tutorial.
Data Grid¶
You can use the values
property to return a Raster
object’s data grid as a numpy array. Here, we observe that our dNBR data array has a buffer of -32768 NoData pixels along the edges:
print(raster.values)
[[-32768 -32768 -32768 ... -32768 -32768 -32768]
[-32768 -32768 -32768 ... -32768 -32768 -32768]
[-32768 -32768 -32768 ... -32768 -32768 -32768]
...
[-32768 -32768 -32768 ... -32768 -32768 -32768]
[-32768 -32768 -32768 ... -32768 -32768 -32768]
[-32768 -32768 -32768 ... -32768 -32768 -32768]]
Inspecting a portion of the array away from the edges, we find the example dataset consists of integers that are mostly under 1000:
print(raster.values[600:700, 600:700])
[[692 697 697 ... 699 726 699]
[728 728 728 ... 699 699 726]
[728 728 728 ... 699 714 726]
...
[651 651 651 ... 659 659 590]
[651 651 624 ... 641 590 576]
[651 651 624 ... 635 576 573]]
The raster values are read-only. This means they’ll work fine for most mathematical routines, but you’ll need to make a copy if you want to alter the data elements directly. For example:
# Most routines are fine
median = np.median(raster.values)
# But this will fail because it attempts to alter array elements
try:
raster.values[0,:] = 0
except Exception as error:
print('Failed because we attempted to alter the array directly')
Failed because we attempted to alter the array directly
# This is fine because we copied the array first
values = raster.values.copy()
values[0,:] = 0 # Replaces the top row with 0 values
print(values)
[[ 0 0 0 ... 0 0 0]
[-32768 -32768 -32768 ... -32768 -32768 -32768]
[-32768 -32768 -32768 ... -32768 -32768 -32768]
...
[-32768 -32768 -32768 ... -32768 -32768 -32768]
[-32768 -32768 -32768 ... -32768 -32768 -32768]
[-32768 -32768 -32768 ... -32768 -32768 -32768]]
Array Metadata¶
Raster
objects have properties to report the data array’s metadata. Some useful properties include:
shape
: The shape of the data arraydtype
: The data type of the datasetnodata
: The NoData valuenbytes
: The size of the array in bytes.
For example, inspecting these properties for our example raster, we find the data grid is 1280 x 1587 pixels, uses a 16-bit integer data type, has a NoData value of -32768, and uses 4 MB of memory:
print(raster.shape)
print(raster.dtype)
print(raster.nodata)
print(raster.nbytes)
(1280, 1587)
int16
-32768
4062720
Spatial Metadata¶
Other properties return the raster’s spatial metadata. The most commonly used properties include:
crs
: The coordinate reference system as a pyproj.crs object,transform
: The affine transform as a pfdf.projection.Transform object, andbounds
: The bounding box as a pfdf.projection.BoundingBox object
You can learn more about these metadata objects in the Spatial Metadata tutorial.
Inspecting our example Raster
, we find it is projected in EPSG:26911, and has a resolution of 10 CRS units (in this case, meters).
raster.crs
<Projected CRS: EPSG:26911>
Name: NAD83 / UTM zone 11N
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: UTM zone 11N
- method: Transverse Mercator
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
raster.transform
Transform(dx=10.0, dy=-10.0, left=408022.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")
Examining its bounding box, we find the raster spans from roughly 408022.1201 to 423892.1201 along the X axis, and from 3776255.5413 to 3789055.5413 along the Y axis:
raster.bounds
BoundingBox(left=408022.12009999994, bottom=3776255.5413000006, right=423892.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")
Note that the bounding box coordinates are defined relative to the raster’s CRS, in this case EPSG:26911.
Raster Factories¶
To create a Raster
object, you should use a Raster
factory method. These methods build new Raster
objects from different types of data sources. The factories follow the naming convention from_<type>
, where <type>
is a particular type of data source. Some common factories include:
from_file
: Loads a raster from the local filesystem,from_url
: Loads a raster from a web URLfrom_array
: Builds a raster from a numpy arrayfrom_points
: Builds a raster from a collection of Point or MultiPoint featuresfrom_polygons
: Builds a raster from a collection of Polygon or MultiPolygon features
Each factory includes options for building Raster
objects from the associated data source. For example, from_file
includes an option to only load data in an area of interest, and from_url
includes options for connecting to the remote server. You can find a detailed discussion of these factories in the Raster Factory Tutorial.
Saving Rasters¶
It’s often useful to save a Raster
object to file, particularly when an analytical routine produces a new raster dataset as output. You can save Raster
objects using the save
command. This command takes a file name or path as input, and returns the path to the saved file as output. For example:
path = raster.save('examples/my-raster.tif')
print_path(path)
.../examples/my-raster.tif
By default, the save
command will not allow you to overwrite existing files. For example, calling the save
command a second time with the same file name will fail because the file already exists:
try:
raster.save('examples/my-raster.tif')
except FileExistsError:
print('Failed because the file already exists')
Failed because the file already exists
You can permit overwriting by setting overwrite=True
:
raster.save('examples/my-raster.tif', overwrite=True)
print('overwrote the existing file')
overwrote the existing file
Conclusion¶
In this tutorial, we’ve introduced the Raster
class, which facilitates working with raster datasets. We’ve learned how to access a Raster
object’s data array, and examined properties with important metadata. We’ve learned that Raster
objects are created using dedicated factory methods, and we’ve shown how to save raster datasets to file.
This tutorial was deliberately brief, and later tutorials examine the class in greater detail. As a reminder, you can learn more about Raster
objects in the Raster Properties, Raster Factories, and Preprocessing tutorials. In the next tutorial, we’ll learn how to use the data
package to download commonly used datasets (many of which are rasters) from the internet.