Welcome to mftrees’s documentation!¶
Training a Model¶
The first step in training a model is to generate training data from a source imagery mosaic, extra augment layers, and an target map. This is done using the mft.features
program. This program outputs a .npz
file containing the generated training features, as well as extra metadata parameters that will be passed through to subsequent steps in the modelling process.
Relevant parameters, an example invocation.
mft.features¶
MOSAIC_FILE: An image (likely VRT) to chip and compute training features from
mft.features [OPTIONS] MOSAIC_FILE
Options
-
-t
,
--target-map
<target_map>
¶ A lower resolution target georeferenced image that will control the chipping behavior, as well as training data values
-
--bins
<bins>
¶ Number of freq bins to use for spectra generation
-
--pixel-size
<pixel_size>
¶ rescaled pixel size
-
-o
,
--out
<out>
¶
-
-a
,
--augment-file
<augment_file>
¶
Arguments
-
MOSAIC_FILE
¶
Required argument
The next step is to compute a manifold embedding and train an xgboost regressor. These steps are accomplished using the mft.train
program. This program outputs a model as a .joblib
package that can then be applied to new data to make predictions.
Relevant parameters, an example invocation.
mft.train¶
TRAINING_FILE: NumPy serialized file where ‘arr_0’ is the input feature matrix
mft.train [OPTIONS] TRAINING_FILE
Options
-
--embed
,
--no-embed
¶
Transform features via sampled spectral embedding prior to fit
-
--n-components
<n_components>
¶ Number of features to use for Nystroem extension
-
--n-boosting-stages
<n_boosting_stages>
¶ Max number of Gradient Boosting Stages
-
-c
,
--n-clusters
<n_clusters>
¶ Number of k-means clusters
-
-d
<d>
¶ Number of output dimensions
-
-of
<of>
¶ npz feature output filename
-
-s
,
--seed
<seed>
¶ random seed for test/train partition
-
-lr
,
--learning-rate
<learning_rate>
¶ learning rate for xgboost
-
--gpu
¶
-
--hist
¶
-
--approx
¶
-
--tree-depth
<tree_depth>
¶ Max tree depth in ensemble
-
--augments-only
¶
Use only augment values for fitting clustered data
-
--max-projection-samples
<max_projection_samples>
¶ Max number of approximated features to use for Spectral Embedding
Arguments
-
TRAINING_FILE
¶
Required argument
mft.histmatch¶
Histogram match a georeferenced raster to a reference
mft.histmatch [OPTIONS] IMG_PATH
Options
-
-o
,
--out_path
<out_path>
¶ classification output geotiff
-
-r
,
--ref_path
<ref_path>
¶ Reference mosaic used for baselayer matching
Arguments
-
IMG_PATH
¶
Required argument
mft.predict¶
MODELS_FILE: joblib-serialized carbon estimation model
mft.predict [OPTIONS] MODEL_FILE
Options
-
--mosaic-file
<mosaic_file>
¶ Preprocessed image mosaic file as a GeoTIFF
-
-a
,
--augment-file
<augment_file>
¶ Prepressed augmentation data file as a GeoTIFF
-
-o
,
--out
<out>
¶ classification output geotiff
-
--blm
,
--no-blm
¶
Base Layer Match mosaic to reference
-
--reference
<reference>
¶ Reference mosaic used for baselayer matching
Arguments
-
MODEL_FILE
¶
Required argument