Compute features and perform Machine Learning

In this example, we load a SENTINEL image and compute some features on this image :

Gaussian filters of RGB NIR bands
NDVI and NDVI filtered with Gaussian & Laplacian

Then, we load a Field data tif image that contains the different classes (only two in our situation).
We extract two numpy that are required in a random forest (with scikit)
We apply the model to the entire image

import rastereasy

1) Process sentinel image

1.1) Read image

name_im='./data/demo/sentinel.tif'
names = {"CO" : 1,"B": 2,"G":3,"R":4,"RE1":5,"RE2":6,"RE3":7,"NIR":8,"WA":9,"SWIR1":10,"SWIR2":11,"SWIR3":12}
image=rastereasy.open(name_im, names=names)
image.info()
image.colorcomp(bands=['R','G','B'], extent='pixel',title='original image')

- Size of the image:
   - Rows (height): 1000
   - Cols (width): 1000
   - Bands: 12
- Spatial resolution: 10.0  meters / degree (depending on projection system)
- Central point latitude - longitude coordinates: (7.04099599, 38.39058840)
- Driver: GTiff
- Data type: int16
- Projection system: EPSG:32637
- Nodata: -32768.0

- Given names for spectral bands: 
   {'CO': 1, 'B': 2, 'G': 3, 'R': 4, 'RE1': 5, 'RE2': 6, 'RE3': 7, 'NIR': 8, 'WA': 9, 'SWIR1': 10, 'SWIR2': 11, 'SWIR3': 12}

../_images/f8711e3aa090968383f8239f3d980be5584bc3eb9aa9131a463a61417c291100.png

1.2) Extract R,G,B,NIR

rgbnir=image.select_bands(['R','G','B','NIR'])
rgbnir.info()
rgbnir.colorcomp(bands=['NIR','G','B'], title = 'color comp NIR image')

- Size of the image:
   - Rows (height): 1000
   - Cols (width): 1000
   - Bands: 4
- Spatial resolution: 10.0  meters / degree (depending on projection system)
- Central point latitude - longitude coordinates: (7.04099599, 38.39058840)
- Driver: GTiff
- Data type: int16
- Projection system: EPSG:32637
- Nodata: -32768.0

- Given names for spectral bands: 
   {'R': 1, 'G': 2, 'B': 3, 'NIR': 4}

../_images/87e900aff496637cbf8b699d774aa58ce81a1f3abaa24f97c59373ca413dab9c.png

1.3) Compute gaussian filters with `sigma=10` and `sigma=20`

# sigma = 10
rgbnir_10=rgbnir.filter(method="gaussian",sigma=10)
rgbnir_10.colorcomp(bands=['R','G','B'], title = 'Gaussian filter (10) on RGB image')
rgbnir_10.colorcomp(bands=['NIR','G','B'], title = 'Gaussian filter (10) on NIR-G-B image')

../_images/ffce9f1271bf1587b23b1fb674f4a308c40cc03fbbab1037311b931d1de117fd.png

../_images/c06cf81ce433f946437580d5139010ac818a46a452c15a476166a7264b5e6edf.png

# sigma = 20
rgbnir_20=rgbnir.filter(method="gaussian",sigma=20)
rgbnir_20.colorcomp(bands=['R','G','B'], title = 'Gaussian filter (20) on RGB image')
rgbnir_20.colorcomp(bands=['NIR','G','B'], title = 'Gaussian filter (20) on NIR-G-B image')

../_images/89cf7d0993cbe282cbe23457ec3f4062acd42f5727f4b0bbde966606e92cd6b2.png

../_images/f2bf1fb655c112d612fcab8ceb2a48e0734dc47355d35e63604756fab082622e.png

1.4) Compute NDVI and two filtered version (gaussian `sigma=10` and `sigma=20`)

Extract R and NIR bands

r=rgbnir.select_bands('R')
nir=rgbnir.select_bands('NIR')

Compute and visu NDVI

ndvi=(nir-r)/(nir+r)
ndvi.visu(cmap='Greens', title="NDVI")
ndvi.info()

../_images/15774a879a8216cb7075a307647086db73d669002649de31f2cfe264f39d2930.png

- Size of the image:
   - Rows (height): 1000
   - Cols (width): 1000
   - Bands: 1
- Spatial resolution: 10.0  meters / degree (depending on projection system)
- Central point latitude - longitude coordinates: (7.04099599, 38.39058840)
- Driver: GTiff
- Data type: float64
- Projection system: EPSG:32637
- Nodata: nan

- Given names for spectral bands: 
   {'NIR': 1}

# As we can see, the name associated is the name of the 1st band involved in the operation
# For more consistency, we can change the name with :
ndvi.change_names({'NDVI':1})
ndvi.info()

- Size of the image:
   - Rows (height): 1000
   - Cols (width): 1000
   - Bands: 1
- Spatial resolution: 10.0  meters / degree (depending on projection system)
- Central point latitude - longitude coordinates: (7.04099599, 38.39058840)
- Driver: GTiff
- Data type: float64
- Projection system: EPSG:32637
- Nodata: nan

- Given names for spectral bands: 
   {'NDVI': 1}

Gaussian filter of NDVI

# sigma = 10 and 20
ndvi_10=ndvi.filter(method="gaussian",sigma=10)
ndvi_20=ndvi.filter(method="gaussian",sigma=20)
ndvi_10.visu(cmap='Greens', title="NDVI (sigma=10)")
ndvi_20.visu(cmap='Greens', title="NDVI (sigma=20)")

../_images/8be84b8da414d78a0ea35e9ded34dc88b8926b65ed2cb899512f3e321154d05a.png

../_images/556632fab10ed2972cc07c27cdc243955c25e94bc1c8be26fd7f6ad8e90de99c.png

Laplacian filter of NDVI

laplace_1=ndvi.filter(method = 'laplace')
laplace_1.visu()
laplace_2=ndvi.filter(method = 'laplace', pre_smooth_sigma=10)
laplace_2.visu()

../_images/adc81d22f2313be388acf7d6e894ba5e14b59b9d05a0ba35cd039808b22d1a04.png

../_images/7c21f29ee3d0daae44dbfaa304de189e62997afd455cf5868350ba791586d45e.png

2) Stack all features

We can now stack all features, in rgbnir, rgbnir_10, rgbnir_20, ndvi, ndvi_10, ndvi_20, laplace_1, laplace_2

im_features=rgbnir.stack(rgbnir_10).stack(rgbnir_20).stack(ndvi).stack(ndvi_10).stack(ndvi_20).stack(laplace_1).stack(laplace_2)
im_features.info()

- Size of the image:
   - Rows (height): 1000
   - Cols (width): 1000
   - Bands: 17
- Spatial resolution: 10.0  meters / degree (depending on projection system)
- Central point latitude - longitude coordinates: (7.04099599, 38.39058840)
- Driver: GTiff
- Data type: float64
- Projection system: EPSG:32637
- Nodata: -32768.0

- Given names for spectral bands: 
   {'R_1_1_1': 1, 'G_1_1_1': 2, 'B_1_1_1': 3, 'NIR_1_1_1': 4, 'R_2_1_1': 5, 'G_2_1_1': 6, 'B_2_1_1': 7, 'NIR_2_1_1': 8, 'R_1_1': 9, 'G_1_1': 10, 'B_1_1': 11, 'NIR_1_1': 12, 'NDVI_1_1': 13, 'NDVI_2_1': 14, 'NDVI_1': 15, 'NDVI_2': 16, 'NDVI': 17}

As we can see, names are adapted. For more clarity, we can

change the names of each individual features (ex: ndvi_10.change_names({'NDVI_10':1}) to make the naming of stacks
Reformat names with im_features.reset_names()

We decide here the second solution

im_features.reset_names()
im_features.info()

- Size of the image:
   - Rows (height): 1000
   - Cols (width): 1000
   - Bands: 17
- Spatial resolution: 10.0  meters / degree (depending on projection system)
- Central point latitude - longitude coordinates: (7.04099599, 38.39058840)
- Driver: GTiff
- Data type: float64
- Projection system: EPSG:32637
- Nodata: -32768.0

- Given names for spectral bands: 
   {'1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9, '10': 10, '11': 11, '12': 12, '13': 13, '14': 14, '15': 15, '16': 16, '17': 17}

3) Standardize the features

It is in general useful to standardize data for ML algorithms

im_features_std,scaler  = im_features.standardize()
# Check that mean is 0 and std 1
print(im_features_std.mean(axis='pixel'))
print(im_features_std.std(axis='pixel'))

[ 1.25112365e-16 -7.51185780e-17 -5.80371307e-17 -5.67297320e-17
  2.00537897e-14  3.86009447e-15  2.83656618e-14 -2.11864517e-14
 -4.87677312e-14  8.21022468e-14  6.18409217e-14  4.41988277e-14
 -3.19445860e-13  9.11597908e-15  1.28763418e-14  2.05613304e-19
  1.24913413e-17]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

4) Read the classes data

name_data_classes = './data/demo/field_vegetation.tif'
ground_truth = rastereasy.open(name_data_classes)
ground_truth.visu(extent='pixel',colorbar=True)
ground_truth.info()

../_images/1f6d53912f9be33b7b7a27522060f0348bf0d503d08750c87e70d660e6420060.png

- Size of the image:
   - Rows (height): 1000
   - Cols (width): 1000
   - Bands: 1
- Spatial resolution: 10.0  meters / degree (depending on projection system)
- Central point latitude - longitude coordinates: (7.04099599, 38.39058840)
- Driver: GTiff
- Data type: uint8
- Projection system: EPSG:32637
- Nodata: 2.0

- Given names for spectral bands: 
   {'1': 1}

Here, we have 3 labels : 0,1,2:

0 corresponds to bare soils
1 corresponds to vegetation
2 corresponds to no data

We then have to extract only labels with 0 and 1 for the training (and not care with label 2).

To do this, we have the function rastereasy.extract_numpy_tables

4) Extract two numpy that are required in a scikit ML technique

scikitrequires (X,y) numpy tables:

X being the data, of size (N x f) (N: number of data and f: number of features)
y being the outputs, of size (N,) that contains the outputs (values, name of the class, …)

help(rastereasy.extract_numpy_tables)

Help on function extract_numpy_tables in module rastereasy:

extract_numpy_tables(data, outputs, label=None)
    Extract paired NumPy tables (features and outputs) for machine learning.

    This function converts raster data stored in `Geoimage` objects into two
    NumPy arrays: one containing the input features (X), and one containing
    the corresponding outputs/labels (y). Optionally, it can filter the
    dataset to only include samples corresponding to one or multiple label
    values.

    Parameters
    ----------
    data : Geoimage
        A `rastereasy.Geoimage` containing the input features
        (e.g., multispectral bands).
    outputs : Geoimage
        A `rastereasy.Geoimage` containing the outputs/labels
        (e.g., classes or quantitative values).
    label : int or list of int, optional (default=None)
        If provided, only the samples corresponding to this label value
        (or list of values) in `outputs` will be extracted.

    Returns
    -------
    X : numpy.ndarray
        Input feature table of shape (N, f), where
        N is the number of extracted samples and
        f is the number of features (bands).
    y : numpy.ndarray
        Output array of shape (N, ), containing the labels/outputs
        associated with each sample.

    Examples
    --------
    >>> # Extract all data/labels from two Geoimages
    >>> X, y = extract_numpy_tables(data, labels)

    >>> # Extract only the samples where label == 1
    >>> X, y = extract_numpy_tables(data, outputs, label=1)

    >>> # Extract only the samples with labels in [1, 3, 5]
    >>> X, y = extract_numpy_tables(data, outputs, label=[1, 3, 5])

# We are interested in labels 0 and 1 for training
X,y=rastereasy.extract_numpy_tables(im_features_std, ground_truth, label=[0,1])
print(X.shape)
print(y.shape)

(50000, 17)
(50000,)

5) Train our favorite ML algorithm

Now we can apply a ML algorithm on data. Here we use a random forest

Training

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X, y)

RandomForestClassifier(max_depth=2, random_state=0)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

6) Apply our favorite ML algorithm on a Geoimage

To apply the trained model on a Geoimage, we have the fonction predict

help(im_features_std.predict)

Help on method predict in module rastereasy:

predict(model, bands=None) method of rastereasy.Geoimage instance
    Apply a pre-trained machine learning model to the image.

    This method applies a machine learning model (such as one created by kmeans())
    to the image data, creating a new classified or transformed image.

    Parameters
    ----------
    model : scikit model or tuple
        If tuple, it must containi (ml_model, scaler) where:
        - ml_model: A trained scikit-learn model with a predict() method
        - scaler: The scaler used for standardization (or None if not used)
    bands : list of str or None, optional
        List of bands to use as input for the model. If None, all bands are used.
        Default is None.

    Returns
    -------
    Geoimage
        A new Geoimage containing the model output

    Examples
    --------
    >>> # Train a model on one image and apply to another
    >>> classified, model = reference_image.kmeans(n_clusters=5)
    >>> new_classified = target_image.predict(model)
    >>> new_classified.visu(colorbar=True, cmap='viridis')
    >>>
    >>> # Train on specific bands and apply to the same bands
    >>> _, model = image.kmeans(bands=["NIR", "Red"], n_clusters=3)
    >>> result = image.predict(model, bands=["NIR", "Red"])
    >>> result.save("classified.tif")
    >>>
    >>> # Apply a RF model trained of other data to a Geoimage
    >>> from sklearn.ensemble import RandomForestClassifier
    >>> clf = RandomForestClassifier(max_depth=2, random_state=0)
    >>> clf.fit(X, y)
    >>> result = image.predict(clf)


    Notes
    -----
    - The model must have been trained on data with the same structure as what it's being applied to (e.g., same number of bands)
    - If a scaler was used during training, it will be applied before prediction
    - This method is useful for:
    - Applying a classification model to new images
    - Ensuring consistent classification across multiple scenes
    - Time-series analysis with consistent classification

# As we can see in the help, we can apply it directly on the image with standardized bands
vegetation = im_features_std.predict(clf)
vegetation.visu()

../_images/314f95404b15619c581af961ffdecf85d980c29fa2f2223575db19f6120a84b2.png

# We can apply it directly on the image without standardized bands, and give the scaler issued from standardization
vegetation = im_features.predict((clf,scaler))
vegetation.visu()