Modules

Main Script

Train a U-Net based on COSMO-1e input data.

This script loads and processes input data and trains a U-Net model to predict target species data using COSMO-1e input data. The model is trained and evaluated using the training and validation data, and the best model is saved. Predictions are then made using the saved model on the validation data, and a report is generated with the predictions. The report is then converted to an html file using an R script.

Modules:

  • socket: Provides a way to get the hostname of the computer the script is running

    on.

  • subprocess: Provides a way to run the R script to convert the report to an html

    file.

  • git: Provides a way to get the SHA of the current git commit.

  • pyprojroot: Provides a way to get the root directory of the project.

  • tensorflow: Provides a way to set up the tensorflow backend and random seed.

First-party Modules:

  • aldernet.utils.load_data: Provides a way to load the training and validation data.

  • aldernet.utils.save_predictions_and_generate_report: Provides a way to save the

    predictions and generate a report with them.

  • aldernet.utils.setup_output_directory: Provides a way to set up the output

    directory.

  • aldernet.utils.tf_setup: Provides a way to set up the tensorflow backend.

  • aldernet.utils.train_and_evaluate_model: Provides a way to train and evaluate the model.

Variables:

  • repo: The git repository object for the project.

  • sha: The SHA of the current git commit.

  • hostname: The hostname of the computer the script is running on.

  • settings: A dictionary containing the settings for the script. The keys are:
    • “input_species”: The species to use for input data.

    • “target_species”: The species to predict.

    • “retrain_model”: Whether to retrain the model or use a saved model.

    • “tune_with_ray”: Whether to use Ray for hyperparameter tuning.

    • “zoom”: The zoom level to use for the input data.

    • “noise_dim”: The dimension of the noise vector to use for the generator.

    • “epochs”: The number of epochs to train the model for.

    • “shuffle”: Whether to shuffle the training data before each epoch.

    • “add_weather”: Whether to add weather data to the input data.

    • “conv”: Whether to use convolutional layers instead of dense layers.

    • “members”: The number of ensemble members to use.

    • “device”: A dictionary containing the device to use for training. The keys

      are: - “gpu”: The number of the GPU to use for training.

  • data_train: The training data.

  • data_valid: The validation data.

  • run_path: The path to the output directory for the script.

  • best_model: The best model obtained during training.

Utility Functions

Provide all required functions for the training script.

Functions:

  • build_unet(conv: bool=True) -> tf.keras.Model: builds a UNet model for predicting

    pollenconcentrations.

  • cbr(filters, name=None) -> Sequential: Creates a convolutional block with batch

    normalization and leaky ReLU activation.

  • compile_generator(height, width, weather_features, noise_dim, filters): Compiles a

    generator model using Tensorflow with the specified parameters.

  • create_batcher(data: xarray.Dataset, batch_size: int, add_weather: bool, shuffle:

    bool) -> Batcher: creates a batcher object for the given data and parameters.

  • create_optimizer(learning_rate: float, beta_1: float, beta_2: float)

    ->tf.keras.optimizers.Adam: creates an optimizer object for the model.

  • define_filters(zoom: str) -> List[int]: Define the filters based on the provided zoom.

  • down(filters, name=None) -> Sequential: Creates a downsampling block with

    convolutional layer, batch normalization, and leaky ReLU activation.

  • generate_report(df: pd.DataFrame, settings: Dict[str, Union[str, int, float, bool]])

    -> None: Generate a report based on the provided dataframe and settings.

  • get_callbacks(run_path: str, sha: str) -> List[ray.tune.callback.TrialCallback]: Get

    the callbacks based on the provided run path and sha.

  • get_scheduler(settings: Dict[str, Union[str, int, float, bool]]) ->

    ray.tune.schedulers.asha.ASHAScheduler: Get the ASHAScheduler based on the provided settings.

  • get_tune_config(run_path: str) -> Dict[str, Union[str, Dict[str, str], float]]: Get

    the Tune configuration based on the provided run path.

  • load_data(hostname: str, settings: Dict[str, Union[str, int, float, bool]]) ->

    Tuple[xr.Dataset, xr.Dataset]: Load data based on the provided hostname and settings.

  • load_pretrained_model() -> keras.engine.sequential.Sequential: Load a pre-trained

    model.

  • predict_season(best_model: tf.keras.Model, data_valid: xr.Dataset, noise_dim: int,

    add_weather: bool) -> np.ndarray: predicts the pollen season with the best model and returns a numpy array of predicted values.

  • prepare_generator(run_path: str, settings: Dict[str, Union[str, int, float, bool]],

    data_train: xr.Dataset) -> keras.engine.training.Model: Prepare the generator based on the provided settings and data.

  • read_scaling_data() -> Tuple[float, float]: reads scaling data from a file and returns

    a tuple containing the center and scale.

  • rsync_mlruns(run_path: str) -> None: Rsync the mlruns based on the provided run path.

  • save_generator_summary_and_plot(run_path: str, generator: keras.engine.training.Model)

    -> None: Save the generator summary and plot based on the provided run path and generator.

  • save_predictions_and_generate_report(settings: Dict[str, Union[str, int, float,

    bool]], best_model: keras.engine.sequential.Sequential, data_valid: xr.Dataset) -> None: Save the predictions and generate the report based on the provided settings, best_model, and data_valid.

  • setup_directories(run_path: str, tune_trial: str) -> None: creates directories for

    visualizations.

  • setup_output_directory(settings: Dict[str, Union[str, int, float, bool]]) -> str: Set

    up the output directory based on the provided settings.

  • train_and_evaluate_model(run_path: str, settings: Dict[str, Union[str, int, float,

    bool]], data_train: xr.Dataset, data_valid: xr.Dataset, sha: str) -> keras.engine.sequential.Sequential: Train and evaluate the model based on the provided settings, data, and run path.

  • train_epoch(generator: tf.keras.Model, optimizer_gen: tf.keras.optimizers.Optimizer,

    data_train:Batcher, noise_dim: int, add_weather: bool, center: float, scale: float, epoch: tf.Variable,step: tf.Variable, run_path: str, tune_trial: str) -> np.ndarray: trains the model for one epoch and returns a numpy array of loss values.

  • train_model_simple(data_train: xr.Dataset, data_valid: xr.Dataset, epochs: int,

    add_weather: bool, conv: bool=True) -> tf.keras.Model: trains a simple model for predicting pollen concentrations and returns the trained model.

  • train_model(config: Dict[str, float], generator: tf.keras.Model, data_train:

    Batcher,data_valid: Batcher, run_path: str, noise_dim: int, add_weather: bool, shuffle: bool) -> None: trains the model using a configuration dictionary and saves the best model.

  • train_step(generator, optimizer_gen, input_train, target_train, weather_train,

    noise_dim, add_weather): Performs one training step for the generator of a NNet to generate images of pollen concentrations in the air given an input image of trees and weather data.

  • train_with_ray_tune(run_path: str, settings: Dict[str, Union[str, int, float, bool]],

    data_train: xr.Dataset, data_valid: xr.Dataset, sha: str) -> keras.engine.sequential.Sequential: Train the model using Ray Tune based on the provided settings, data, and run path.

  • train_without_ray_tune(settings: Dict[str, Union[str, int, float, bool]], data_train:

    xr.Dataset, data_valid: xr.Dataset) -> keras.engine.sequential.Sequential: Train the model without Ray Tune based on the provided settings and data.

  • up(filters, name=None) -> Sequential: Creates an upsampling blockwith transposed

    convolutional layer, batch normalization, and leaky ReLU activation.

  • validate_epoch(generator: tf.keras.Model, data_valid: Batcher, noise_dim: int,

    add_weather: bool, center: float, scale: float, epoch: tf.Variable, step_valid: int, run_path: str, tune_trial: str) -> Tuple[np.ndarray, int]: validates the model for one epoch and returns a tuple containing a numpy array of loss values and the step number.

Attributes:

All functions and classes defined in this script.

COPYRIGHT (c) 2022 MeteoSwiss, contributors listed in AUTHORS. Distributed under the terms of the BSD 3-Clause License. SPDX-License-Identifier: BSD-3-Clause

This docstring was autogenerated by GPT-4.

aldernet.utils.build_unet(conv=True)[source]

Build a U-Net model.

Args:

conv (bool): Whether to use convolutional blocks.

Returns:

tf.keras.Model: U-Net model.

aldernet.utils.cbr(filters, name=None) Sequential[source]

Construct a Convolution-BatchNorm-ReLU block using the Keras Sequential API.

Args:

filters (int): Number of filters in the Conv2D layer. name (str): Optional name for the block.

Returns:

Sequential: A Keras Sequential model representing the CBR block.

aldernet.utils.compile_generator(height, width, weather_features, noise_dim, filters)[source]

Compile a generator model for image-to-image translation using Keras API.

Args:

height (int): Height of the input images. width (int): Width of the input images. weather_features (int): Number of weather features in the weather input. noise_dim (int): Dimensionality of the noise input. filters (List[int]): Number of filters in each layer of the generator.

Returns:

Model: A Keras Model representing the compiled generator model.

aldernet.utils.create_batcher(data, batch_size, add_weather, shuffle)[source]

Create a Batcher object for the given data.

Args:

data (Tuple[Tensor]): A tuple of Tensors representing the input and target data. batch_size (int): Batch size for training. add_weather (bool): Whether to add weather data to the input. shuffle (bool): Whether to shuffle the data during training.

Returns:

Batcher: A Batcher object for the given data.

aldernet.utils.create_optimizer(learning_rate, beta_1, beta_2)[source]

Create and returns an Adam optimizer with given hyperparameters.

Args:

learning_rate (float): The learning rate for the optimizer. beta_1 (float): The exponential decay rate for the first moment estimates. beta_2 (float): The exponential decay rate for the second-moment estimates.

Returns:

tf.keras.optimizers.Optimizer: An instance of the Adam optimizer.

aldernet.utils.define_filters(zoom)[source]

Define the filters for the generator model based on the specified zoom level.

Args:

zoom (str): The zoom level.

Returns:

list: A list containing the filter values for the generator model.

aldernet.utils.down(filters, name=None) Sequential[source]

Construct a down-sampling block using the Keras Sequential API.

Args:

filters (int): Number of filters in the Conv2D layer. name (str): Optional name for the block.

Returns:

Sequential: A Keras Sequential model representing the down-sampling block.

aldernet.utils.generate_report(df, settings)[source]

Generate a report summarizing the experiment results.

Args:

df (pandas.core.frame.DataFrame): A dataframe containing the predicted and observed values. settings (dict): Dictionary of settings.

aldernet.utils.get_callbacks(run_path, sha)[source]

Get the MLflow logger callback for logging the experiment results.

Args:

run_path (str): Path to the output directory for storing results. sha (str): A string containing the Git commit hash for the current code.

Returns:

list: A list containing the MLflow logger callback.

aldernet.utils.get_runtime_env()[source]

Get the runtime environment for Ray Tune.

Returns:

dict: A dictionary containing the working directory and excluded files and directories.

aldernet.utils.get_scheduler(settings)[source]

Get the scheduler for Ray Tune.

Args:

settings (dict): Dictionary of settings.

Returns:

ray.tune.schedulers.AsyncHyperBandScheduler: The scheduler for hyperparameter tuning.

aldernet.utils.get_tune_config()[source]

Get the configuration settings for Ray Tune.

Args:

None

Returns:

dict: A dictionary containing the learning rate, beta values, and MLflow settings.

aldernet.utils.load_data(hostname, settings)[source]

Load training and validation data for the specified hostname and zoom level.

Args:

hostname (str): Name of the host to load data from. settings (dict): Dictionary of settings, including zoom level.

Returns:

tuple: A tuple of two xarray datasets containing the training and validation data.

Raises:

ValueError: If the hostname is not recognized.

aldernet.utils.load_pretrained_model()[source]

Load a pretrained neural network model.

Returns:

tensorflow.python.keras.engine.functional.Functional: The pretrained neural network model.

aldernet.utils.predict_season(best_model, data_valid, noise_dim, add_weather)[source]

Use the trained best_model to predict the full pollen season for data_valid.

Args:

best_model (keras.Model): The trained U-Net model. data_valid (xarray.Dataset): The validation data. noise_dim (int): The dimensionality of the noise vector. add_weather (bool): Whether or not to include weather data.

Returns:

predictions (ndarray): The predicted pollen season.

aldernet.utils.prepare_generator(run_path, settings, data_train)[source]

Prepare the generator model for training the NNet.

Args:

run_path (str): Path to the output directory for storing results. settings (dict): Dictionary of settings. data_train (xarray.Dataset): Training data.

Returns:

tensorflow.python.keras.engine.functional.Functional: The compiled generator model.

aldernet.utils.read_scaling_data()[source]

Read and returns the center and scale values from scaling.txt file.

Returns:

Tuple[float, float]: A tuple of center and scale values.

aldernet.utils.rsync_mlruns(run_path)[source]

Copy the MLflow run data to the current working directory.

Args:

run_path (str): Path to the output directory for storing results.

aldernet.utils.save_generator_summary_and_plot(run_path, generator)[source]

Save a summary of the generator model and a visualization of its architecture.

Args:

run_path (str): Path to the output directory for storing results. generator(tensorflow.python.keras.engine.functional.Functional): The generator model to summarize and plot.

aldernet.utils.save_predictions_and_generate_report(settings, best_model, data_valid)[source]

Save the predicted and observed values to a CSV file and generate a report.

Args:

settings (dict): Dictionary of settings. best_model(tensorflow.python.keras.engine.functional.Functional): The trained neural network model. data_valid (xarray.Dataset): Validation data.

aldernet.utils.setup_directories(run_path, tune_trial)[source]

Set up the directory structure for saving visualization images.

Args:

run_path (str): Path to the run directory. tune_trial (str): Trial ID for hyperparameter tuning.

Returns:

None

aldernet.utils.setup_output_directory(settings)[source]

Create a new output directory and return the path to the new directory.

Args:

settings (dict): Dictionary of settings.

Returns:

str: The path to the new output directory.

aldernet.utils.tf_setup()[source]

Set up Tensorflow to use GPU memory growth.

aldernet.utils.train_and_evaluate_model(run_path, settings, data_train, data_valid, sha)[source]

Train and evaluate a neural network model using the specified data and settings.

Args:

run_path (str): Path to the output directory for storing results. settings (dict): Dictionary of settings. data_train (xarray.Dataset): Training data. data_valid (xarray.Dataset): Validation data. sha (str): A string containing the Git commit hash for the current code.

Returns:

tensorflow.python.keras.engine.functional.Functional: The trained neural network model.

aldernet.utils.train_epoch(generator, optimizer_gen, data_train, noise_dim, add_weather, center, scale, epoch, step, run_path, tune_trial)[source]

Trains the generator for one epoch and returns the loss values.

Args:

generator (tf.keras.Model): The generator model. optimizer_gen (tf.keras.optimizers.Optimizer): The generator optimizer. data_train (Batcher): The training data batcher. noise_dim (int): The dimensionality of the noise vector. add_weather (bool): If True, the weather data is included in training. center (float): The center value for the scaling. scale (float): The scale value for the scaling. epoch (tf.Variable): The current epoch. step (tf.Variable): The current step. run_path (str): The path to the directory where the output files will be stored. tune_trial (str): The name of the current Tune trial.

Returns:

np.ndarray: An array of loss values.

aldernet.utils.train_model(config, generator, data_train, data_valid, run_path, noise_dim, add_weather, shuffle)[source]

Train a model on the training data, with validation on the validation data.

Args:

config (Dict[str, Any]): Configuration dictionary. generator (tf.keras.Model): Generator model. data_train (xr.Dataset): Training data. data_valid (xr.Dataset): Validation data. run_path (str): Path to save the model checkpoints. noise_dim (int): Dimension of the noise input. add_weather (bool): Whether to add weather to the input data. shuffle (bool): Whether to shuffle the training data.

Returns:

None

aldernet.utils.train_model_simple(data_train, data_valid, epochs, add_weather, conv=True)[source]

Train a simple U-Net model on data_train and validate on data_valid.

Args:

data_train (Batcher): Training data. data_valid (Batcher): Validation data. epochs (int): Number of epochs to train the model for. add_weather (bool): Whether or not to include weather data. conv (bool, optional): Whether to use convolutional layers (default is True).

Returns:

model (keras.Model): The trained U-Net model.

aldernet.utils.train_step(generator, optimizer_gen, input_train, target_train, weather_train, noise_dim, add_weather)[source]

Perform one training step of the generator of a NNet using the given inputs.

Args:

generator (Model): The Keras Model representing the generator. optimizer_gen (Optimizer): The optimizer for the generator. input_train (Tensor): The input tensor for the generator. target_train (Tensor): The target tensor for the generator. weather_train (Tensor): The weather tensor for the generator. noise_dim (int): Dimensionality of the noise input. add_weather (bool): Whether to add the weather input to the generator.

Returns:

float: The loss of the generator on the given inputs.

aldernet.utils.train_with_ray_tune(run_path, settings, data_train, data_valid, sha)[source]

Train a neural network model using Ray Tune for hyperparameter tuning.

Args:

run_path (str): Path to the output directory for storing results. settings (dict): Dictionary of settings. data_train (xarray.Dataset): Training data. data_valid (xarray.Dataset): Validation data. sha (str): A string containing the Git commit hash for the current code.

Returns:

tensorflow.python.keras.engine.functional.Functional: The trained neural network model.

aldernet.utils.train_without_ray_tune(settings, data_train, data_valid)[source]

Train a neural network model without using Ray Tune for hyperparameter tuning.

Args:

settings (dict): Dictionary of settings. data_train (xarray.Dataset): Training data. data_valid (xarray.Dataset): Validation data.

Returns:

tensorflow.python.keras.engine.functional.Functional: The trained neural network model.

aldernet.utils.up(filters, name=None) Sequential[source]

Construct an up-sampling block using the Keras Sequential API.

Args:

filters (int): Number of filters in the Conv2DTranspose layer. name (str): Optional name for the block.

Returns:

Sequential: A Keras Sequential model representing the up-sampling block.

aldernet.utils.validate_epoch(generator, data_valid, noise_dim, add_weather, center, scale, epoch, step_valid, run_path, tune_trial)[source]

Evaluate the generator for one epoch on validation data and return the loss.

Args:

generator (tf.keras.Model): The generator model. data_valid (Batcher): The validation data batcher. noise_dim (int): The dimensionality of the noise vector. add_weather (bool): If True, the weather data is included in training. center (float): The center value for the scaling. scale (float): The scale value for the scaling. epoch (tf.Variable): The current epoch. step_valid (int): The current validation step. run_path (str): The path to the directory where the output files will be stored. tune_trial (str): The name of the current Tune trial.

Returns:

Tuple[np.ndarray, int]: A tuple of loss values and the current validation step.

aldernet.utils.write_png(image, path, pretty)[source]

Write an image in PNG format to a file.

Args:

image (Tensor): A 3-tuple of Tensors representing the input, target, and predicted images. path (str): Path to the output file. pretty (bool): Whether to create a pretty visualization of the image.

Returns:

None

Data Preparation

Helper functions to import and pre-process zarr archives.

class aldernet.data.data_utils.Batcher(data, batch_size, add_weather, shuffle=True)[source]

Generates data for Keras.

on_epoch_end()[source]

Update indexes after each epoch.

class aldernet.data.data_utils.Params[source]

Retrieve selected weather parameters.

class aldernet.data.data_utils.Stations[source]

Retrieve the measurement stations.