Welcome to mftrees’s documentation!

Training a Model

The first step in training a model is to generate training data from a source imagery mosaic, extra augment layers, and an target map. This is done using the mft.features program. This program outputs a .npz file containing the generated training features, as well as extra metadata parameters that will be passed through to subsequent steps in the modelling process.

Relevant parameters, an example invocation.

mft.features

MOSAIC_FILE: An image (likely VRT) to chip and compute training features from

mft.features [OPTIONS] MOSAIC_FILE

Options

-t, --target-map <target_map>

A lower resolution target georeferenced image that will control the chipping behavior, as well as training data values

--bins <bins>

Number of freq bins to use for spectra generation

--pixel-size <pixel_size>

rescaled pixel size

-o, --out <out>
-a, --augment-file <augment_file>

Arguments

MOSAIC_FILE

Required argument

The next step is to compute a manifold embedding and train an xgboost regressor. These steps are accomplished using the mft.train program. This program outputs a model as a .joblib package that can then be applied to new data to make predictions.

Relevant parameters, an example invocation.

mft.train

TRAINING_FILE: NumPy serialized file where ‘arr_0’ is the input feature matrix

mft.train [OPTIONS] TRAINING_FILE

Options

--embed, --no-embed

Transform features via sampled spectral embedding prior to fit

--n-components <n_components>

Number of features to use for Nystroem extension

--n-boosting-stages <n_boosting_stages>

Max number of Gradient Boosting Stages

-c, --n-clusters <n_clusters>

Number of k-means clusters

-d <d>

Number of output dimensions

-of <of>

npz feature output filename

-s, --seed <seed>

random seed for test/train partition

-lr, --learning-rate <learning_rate>

learning rate for xgboost

--gpu
--hist
--approx
--tree-depth <tree_depth>

Max tree depth in ensemble

--augments-only

Use only augment values for fitting clustered data

--max-projection-samples <max_projection_samples>

Max number of approximated features to use for Spectral Embedding

Arguments

TRAINING_FILE

Required argument

mft.histmatch

Histogram match a georeferenced raster to a reference

mft.histmatch [OPTIONS] IMG_PATH

Options

-o, --out_path <out_path>

classification output geotiff

-r, --ref_path <ref_path>

Reference mosaic used for baselayer matching

Arguments

IMG_PATH

Required argument

mft.predict

MODELS_FILE: joblib-serialized carbon estimation model

mft.predict [OPTIONS] MODEL_FILE

Options

--mosaic-file <mosaic_file>

Preprocessed image mosaic file as a GeoTIFF

-a, --augment-file <augment_file>

Prepressed augmentation data file as a GeoTIFF

-o, --out <out>

classification output geotiff

--blm, --no-blm

Base Layer Match mosaic to reference

--reference <reference>

Reference mosaic used for baselayer matching

Arguments

MODEL_FILE

Required argument

Indices and tables