Raster Properties Tutorial¶
This tutorial introduces the Raster
class and examines routines to manage data values and spatial metadata.
Introduction¶
Raster datasets are fundamental to pfdf - many routines require rasters as input, and many produce new rasters as output. In brief, a raster dataset is a rectangular grid of data values. The individual values (often called pixels) are regularly spaced along the X and Y axes, and each axis may use its own spacing interval. A raster is usually associated with some spatial metadata, which locates the raster’s pixels in space. Some rasters will also have a NoData value - when this is the case, pixels equal to the NoData value represent missing data.
A raster’s spatial metadata consists of a coordinate reference system (CRS) and an affine transformation matrix (also known as the transform). The transform converts the data grid’s column indices to spatial coordinates, and the CRS specifies the location of these coordinates on the Earth’s surface. A transform defines a raster’s resolution and alignment (the location of pixel edges) and takes the form:
Here dx and dy are the change in spatial coordinate when incrementing one column or row, and their absolute values define the raster’s resolution. Meanwhile, left and top indicate the spatial coordinates of the data grid’s left and top edges, which defines the raster’s alignment. The two remaining coefficients can be used to implement shear transforms, but pfdf only supports rectangular pixels, so these will always be 0 for our purposes.
In this tutorial, we’ll learn how to use Raster
objects to manage data values and spatial metadata. Other routines are explored later in the Raster Factories and Preprocessing tutorials.
Prerequisites¶
Install pfdf¶
To run this tutorial, you must have installed pfdf 3+ with tutorial resources in your Jupyter kernel. The following line checks this is the case:
import check_installation
pfdf and tutorial resources are installed
Imports¶
We’ll next import the Raster
class from pfdf. We’ll also use numpy
to work with raster data grids.
from pfdf.raster import Raster
import numpy as np
Raster Object¶
We’ll start by using the from_file
command to create a Raster
object for an example dataset. (You can learn more about this command in the Raster Factories Tutorial. Here, we’ll specifically load the dNBR raster used in the main tutorial series:
raster = Raster('data/dnbr.tif')
Printing the object to the console, we find a summary of the data grid and spatial metadata:
print(raster)
Raster:
Shape: (1280, 1587)
Dtype: int16
NoData: -32768
CRS("NAD83 / UTM zone 11N")
Transform(dx=10.0, dy=-10.0, left=408022.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")
BoundingBox(left=408022.12009999994, bottom=3776255.5413000006, right=423892.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")
Data Grid¶
You can use the values
property to return a Raster
object’s data grid:
raster.values
array([[-32768, -32768, -32768, ..., -32768, -32768, -32768],
[-32768, -32768, -32768, ..., -32768, -32768, -32768],
[-32768, -32768, -32768, ..., -32768, -32768, -32768],
...,
[-32768, -32768, -32768, ..., -32768, -32768, -32768],
[-32768, -32768, -32768, ..., -32768, -32768, -32768],
[-32768, -32768, -32768, ..., -32768, -32768, -32768]], dtype=int16)
Raster
objects represent their data grids as numpy arrays, so provide several properties determined by the array. For example, you can use the shape
property to return the array shape (nrows x ncols), size
to return the number of pixels. dtype
to return the data type, and nbytes
to return the memory consumed by the array. Users who prefer rasterio’s syntax can also use height
and width
to return the number of rows and columns, respectively:
print(f'shape = {raster.shape}')
print(f'height = {raster.height}')
print(f'width = {raster.width}')
print(f'size = {raster.size}')
print(f'dtype = {raster.dtype}')
print(f'nbytes = {raster.nbytes}')
shape = (1280, 1587)
height = 1280
width = 1587
size = 2031360
dtype = int16
nbytes = 4062720
The values
property returns a read-only view of the Raster
object’s data grid. Most routines will work as normal, but you’ll need to make a copy if you want to alter array elements directly:
# Most routines work as normal
median = np.median(raster.values)
# But this will fail because it attempts to alter array elements
try:
rasters.values[0,:] = 0
except Exception:
print('Failed because we attempted to change the array')
Failed because we attempted to change the array
# This is fine because we copied the array first
values = raster.values.copy()
values[0,:] = 0
print(values)
[[ 0 0 0 ... 0 0 0]
[-32768 -32768 -32768 ... -32768 -32768 -32768]
[-32768 -32768 -32768 ... -32768 -32768 -32768]
...
[-32768 -32768 -32768 ... -32768 -32768 -32768]
[-32768 -32768 -32768 ... -32768 -32768 -32768]
[-32768 -32768 -32768 ... -32768 -32768 -32768]]
NoData Values¶
You can use the nodata
property to return a raster’s NoData value:
print(raster.nodata)
-32768
The nodata_mask
property will return a boolean array indicating the locations of NoData values in the data grid. Here, True
values indicate NoData pixels, and False
values indicate data pixels. Inspecting the NoData mask for the example dataset, we can observe locations of NoData pixels along the data grid’s edges:
raster.nodata_mask
array([[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
...,
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True]])
Alternatively, you can use the data_mask
property to return the inverse mask, wherein True
indicates data pixels and False
is NoData:
raster.data_mask
array([[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]])
These masks can be useful for manipulating and/or visualizing raster data values after processing.
CRS¶
Several other properties return a raster’s spatial metadata. The crs
returns the raster’s coordinate reference system as a pyproj.CRS object, crs_units
reports the CRS’s coordinate units along the X and Y axes, and utm_zone
returns the CRS of the best UTM zone for the raster’s center point:
raster.crs
<Projected CRS: EPSG:26911>
Name: NAD83 / UTM zone 11N
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: UTM zone 11N
- method: Transverse Mercator
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
raster.crs_units
('metre', 'metre')
raster.utm_zone
<Projected CRS: EPSG:32611>
Name: WGS 84 / UTM zone 11N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 120°W and 114°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - Alberta; British Columbia (BC); Northwest Territories (NWT); Nunavut. Mexico. United States (USA).
- bounds: (-120.0, 0.0, -114.0, 84.0)
Coordinate Operation:
- name: UTM zone 11N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
Transform¶
You can use the transform
property to return a raster’s Transform
object. This object manages the affine transform, and you can learn more in the Spatial Metadata Tutorial:
raster.transform
Transform(dx=10.0, dy=-10.0, left=408022.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")
You can also use the resolution
method to return the resolution along the X and Y axes, and pixel_area
to return the area of a single pixel:
print(raster.resolution())
print(raster.pixel_area())
(10.0, 10.0)
100.0
By default, these commands return values in meters, but you can use the units
option to select other units:
resolution = raster.resolution(units='feet')
area = raster.pixel_area(units='feet')
print(resolution)
print(area)
(32.808398950131235, 32.808398950131235)
1076.3910416709723
You can find a list of supported units here: Supported Units
Bounding Box¶
You can use the bounds
property to return a raster’s BoundingBox
object. This object manages the raster’s bounding box, and you can learn more in the Spatial Metadata Tutorial:
raster.bounds
BoundingBox(left=408022.12009999994, bottom=3776255.5413000006, right=423892.12009999994, top=3789055.5413000006, crs="NAD83 / UTM zone 11N")
You can also use the left
, right
, top
, and bottom
properties to return the coordinates of specific edges, and the center
property to return the (X, Y) coordinate of the raster’s center point:
print(f'left = {raster.left}')
print(f'right = {raster.right}')
print(f'bottom = {raster.bottom}')
print(f'top = {raster.top}')
print(f'center = {raster.center}')
left = 408022.12009999994
right = 423892.12009999994
bottom = 3776255.5413000006
top = 3789055.5413000006
center = (415957.12009999994, 3782655.5413000006)
Conclusion¶
In this tutorial, we’ve introduced raster datasets, and learned how to use the Raster
class to manage their data grids and spatial metadata. In the next tutorial, we’ll learn how to load and build Raster
objects from a variety of different data sources.