ampycloud package

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: highest-level init magic

Subpackages

Submodules

ampycloud.cluster module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: clustering tools

ampycloud.cluster.agglomerative_cluster(data: ndarray, n_clusters: int | None = None, metric: str = 'euclidean', linkage: str = 'single', distance_threshold: int | float = 1) → tuple

Function that wraps arround sklearn.cluster.AgglomerativeClustering.

Parameters:

data (ndarray) – array of [x, y] pairs to run the clustering on.
n_clusters (int, optional) – see sklearn.cluster.AgglomerativeClustering for details. Defaults to None.
metric (str, optional) – see sklearn.cluster.AgglomerativeClustering for details. Defaults to ‘euclidian’.
linkage (str, optional) – see sklearn.cluster.AgglomerativeClustering for details. Defaults to ‘single’.
distance_threshold (int|float, optional) – see sklearn.cluster.AgglomerativeClustering for details. Defaults to 1.

Returns:

int, ndarray – number of clusters found, and corresponding clustering labels for each data point.

ampycloud.cluster.clusterize(data: ndarray, algo: str | None = None, **kwargs: dict) → tuple | None

Umbrella clustering routine, that provides a single access point to the different clustering algorithms.

Parameters:

data (ndarray) – array of [x, y] arrays to clusterize.
algo (str, optional) – clustering algorithm, that must be one of [None, ‘agglomerative’]. Defaults to None.
kwargs (dict, optional) – keyword arguments to be fed to the underlying clustering function.

Returns:

int, ndarray – the number of clusters identified, and the associated labels for each data point.

ampycloud.core module

Distributed under the terms of the BSD-3-Clause license.

SPDX-License-Identifier: BSD-3-Clause

Module contains: core ampycloud routines. All fcts meant to be used by users directly are here.

ampycloud.core.copy_prm_file(save_loc: str = './', which: str = 'defaults') → None

Create a local copy of a specific ampycloud parameter file.

Parameters:

save_loc (str, optional) – location to save the YML file to. Defaults to ‘./’.
which (str, optional) – name of the parameter file to copy. Defaults to ‘defaults’.

Example

import ampycloud
ampycloud.copy_prm_file(save_loc='.', which='default')

Note

There is also a high-level entry point that allows users to get a local copy of the ampycloud parameter files directly from the command line:

ampycloud_copy_prm_file -which=default

ampycloud.core.set_prms(pth: str | Path) → None

Sets the dynamic=scientific ampycloud parameters from a suitable YAML file.

Parameters:: pth (str|Path) – path+filename to a YAML parameter file for ampycloud.

Note

It is recommended to first get a copy of the default ampycloud parameter file using copy_prm_file(), and edit its content as required.

Doing so should ensure full compliance with the default structure of dynamic.AMPYCLOUD_PRMS.

Warning

This is NOT a thread-safe way of setting parameters. If you plan on running concurrent ampycloud evaluations, parameters should be fed directly to run().

Example

import ampycloud
ampycloud.copy_prm_file(save_loc='.', which='default')
ampycloud.set_prms('./ampycloud_default_prms.yml')

ampycloud.core.reset_prms(which: str | list | None = None) → None

Reset the ampycloud dynamic=scientific parameters to their default values.

Parameters:: which (str|list, optional) – (list of) names of parameters to reset specifically. If not set (by default), all parameters will be reset.

Example

import ampycloud
from ampycloud import dynamic

# Change a parameter
dynamic.AMPYCLOUD_PRMS['MAX_HOLES_OKTA8'] = 0
# Reset them
ampycloud.reset_prms()
print('Back to the default value:', dynamic.AMPYCLOUD_PRMS['MAX_HOLES_OKTA8'])

ampycloud.core.run(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | datetime | None = None) → CeiloChunk

Runs the ampycloud algorithm on a given dataset.

Parameters:

data (pd.DataFrame) – the data to be processed, as a pandas.DataFrame.
prms (dict, optional) – a (nested) dict of parameters to adjust for this specific run. This is meant as a thread-safe way of adjusting parameters for different runs. Any unspecified parameter will be taken from dynamic.AMPYCLOUD_PRMS at init time.
geoloc (str, optional) – the name of the geographic location where the data was taken. Defaults to None.
ref_dt (str|datetime.datetime, optional) – reference date and time of the observations, corresponding to Delta t = 0. Defaults to None. Note that if a datetime instance is specified, it will be turned almost immediately to str via str(ref_dt).

Returns:

data.CeiloChunk – the data chunk with all the processing outcome bundled cleanly.

All that is required to run the ampycloud algorithm is a properly formatted dataset. At the moment, specifying geoloc and ref_dt serves no purpose other than to enhance plots (should they be created). There is no special requirements for geoloc and ref_dt: as long as they are strings, you can set them to whatever you please.

Important

ampycloud treats Vertical Visibility hits no differently than any other hit. Hence, it is up to the user to adjust the Vertical Visibility hit height (and/or ignore some of them, for example) prior to feeding them to ampycloud, so that it can be used as a cloud hit.

Important

ampycloud uses the dt and ceilo values to decide if two hits are simultaenous, or not. It is thus important that the values of dt be sufficiently precise to distinguish between different measurements. Essentially, each measurement (which may be comprised of several hits) should be associated to a unique (ceilo; dt) set of values. Failure to do so may result in incorrect estimations of the cloud layer densities. See data.CeiloChunk.max_hits_per_layer for more details.

All the scientific parameters of the algorithm are set dynamically in the dynamic module. From within a Python session all these parameters can be changed directly. For example, to change the Minimum Sector Altitude (to be specified in ft aal), one would do:

from ampycloud import dynamic
dynamic.AMPYCLOUD_PRMS['MSA'] = 5000

Alternatively, the scientific parameters can also be defined and fed to ampycloud via a YAML file. See set_prms() for details.

Caution

By default, the function run() will use the parameter values set in dynamic.AMPYCLOUD_PRMS, which is not thread safe. Users interested to run multiple concurrent ampycloud calculations with distinct sets of parameters within the same Python session are thus urged to feed the required parameters directly to run() via the prms keyword argument, which expects a (nested) dictionnary with keys compatible with dynamic.AMPYCLOUD_PRMS.

Examples:

# Define only the parameters that are non-default. To adjust the MSA, use:
prms = {'MSA': 10000}

# Or to adjust some other algorithm parameters:
prms = {'LAYERING_PRMS':{'gmm_kwargs':{'scores': 'BIC'}, 'min_prob': 1.0}}

The data.CeiloChunk instance returned by this function contains all the information associated to the ampycloud algorithm, inclduing the raw data and slicing/grouping/layering info. Its method data.CeiloChunk.metar_msg() provides direct access to the resulting METAR-like message. Users that require the height, okta amount, and/or exact sky coverage fraction of layers can get them via the data.CeiloChunk.layers class property.

Example

In the following example, we create the canonical mock dataset of ampycloud, run the algorithm on it, and fetch the resulting METAR-like message:

from datetime import datetime
import ampycloud
from ampycloud.utils import mocker

# Generate the canonical demo dataset for ampycloud
mock_data = mocker.canonical_demo_data()

# Run the ampycloud algorithm on it, setting the MSA to 10'000 ft aal.
chunk = ampycloud.run(mock_data, prms={'MSA':10000},
                      geoloc='Mock data', ref_dt=datetime.now())

# Get the resulting METAR message
print(chunk.metar_msg())

# Display the full information available for the layers found
print(chunk.layers)

ampycloud.core.metar(data: DataFrame) → str

Run the ampycloud algorithm on a dataset and extract a METAR report of the cloud layers.

Parameters:: data (pd.DataFrame) – the data to be processed, as a pandas.DataFrame.
Returns:: str – the METAR-like message.

Example:

import ampycloud
from ampycloud.utils import mocker

# Generate the canonical demo dataset for ampycloud
mock_data = mocker.canonical_demo_data()

# Compute the METAR message
msg = ampycloud.metar(mock_data)
print(msg)

ampycloud.core.demo() → tuple

Run the ampycloud algorithm on a demonstration dataset.

Returns:: pandas.DataFrame, data.CeiloChunk – the mock dataset used for the demonstration, and the data.CeiloChunk instance.

ampycloud.data module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: data classes

class ampycloud.data.AbstractChunk(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | None = None)

Bases: ABC

Abstract parent class for data chunk classes.

DATA_COLS = {'ceilo': string[python], 'dt': <class 'float'>, 'height': <class 'float'>, 'type': <class 'int'>}

required data columns

Type:: dict

abstract __init__(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | None = None) → None: Init routine for abstract class.

property msa: float: The Minimum Sector Altitude set when initializing this specific instance, in ft aal.

property msa_hit_buffer: float: The Minimum Sector Altitude hit buffer set when initializing this specific instance, in ft.

property data: DataFrame: The data of the chunk, as a pandas DataFrame.

property geoloc: str | None: The name of the geographic location of the observations.

property ref_dt: str | None: The reference date and time for the data, i.e. Delta t = 0.

property prms: dict: The dictionnary of ampycloud parameters set at the init of this class instance.

class ampycloud.data.CeiloChunk(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | None = None)

Bases: AbstractChunk

Child class for timeseries of Ceilometers hits, referred to as data ‘chunks’.

This class essentially gathers all the data and processing methods under one roof.

Warning

Some of these methods are actually intended to be used in order … Some safety mechanisms have been put in place to ensure this actually happens, but still …

You’ve been warned.

__init__(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | None = None) → None

CeiloChunk init method.

Parameters:

data (pd.DataFrame) – the input data. See above for details.
prms (dict, optional) – dictionnary of ampycloud algorithm parameters.
geoloc (str, optional) – name of the geolocation of the observations.
ref_dt (str, optional) – reference date and time of the observations, corresponding to Delta t = 0. Defaults to None.

The input data is required to be a pandas DataFrame with 4 columns described in CeiloChunk.DATA_COLS, i.e. :

['ceilo', 'dt', 'height', 'type']

Specifically:

ceilo: contains names/ids of the ceilometer associated to the measurements, as str. This is important to derive correct sky coverage percentage when combining data from more than 1 ceilometer, in case ceilometers report multiple hits at the same time.

dt: time delta between the (planned) METAR issuances time and the hit time in s, as float. This should typically be a negative number (because METARs are assembled using existing=past ceilometer observations).

height: cloud base hit height in ft above aerodrome level (aal), as float. The cloud base height computed by the ceilometer.

type: cloud hit type, as int. A value n>0 indicates that the hit is the n-th (from the ground) that was reported by this specific ceilometer for this specific timestep. A value of n=-1 indicates that the cloud hit corresponds to a Vertical Visibility hit.

Note

For now, geoloc and ref_dt serve no purposes other than improving the diagnostic plots. This is also why ref_dt is a str, such that users can specify it however they please.

data_rescaled(dt_mode: str | None = None, height_mode: str | None = None, dt_kwargs: dict | None = None, height_kwargs: dict | None = None) → DataFrame

Returns a copy of the data, rescaled according to the provided parameters.

Parameters:

dt_mode (str, optional) – scaling rule for the time deltas. Defaults to None.
height_mode (str, optional) – scaling rule for the heights. Defaults to None.
dt_kwargs (dict, optional) – dict of arguments to be fed to the chosen dt scaling routine. Defaults to None.
height_kwargs (dict, optinal) – dict of arguments to be fed to the chosen height scaling routine. Defaults to None.

Returns:

pd.DataFrame – a copy of the data, rescaled.

Note

The kwargs approach was inspired by the reply from Jonathan Eunice on SO.

property ceilos: list

The list of all ceilometers included in the data chunk.

Returns:: list of str – the list of ceilo names.

property max_hits_per_layer: int

The maximum number of ceilometer hits possible for a given layer, given the chunk data.

Returns:: int – the max number of ceilometer hit for a layer. Divide by len(self.ceilos) to get the average max number of hits per ceilometer per layer (remember: not all ceilometers may have the same number of timestamps over the chunk time period !).

This is the total number of unique timesteps from all ceilometers considered.

Note

This value assumes that a layer can contain only 1 hit per ceilometer per timestep, i.e. 2 simultaneous hits from a given ceilometer can never belong to the same cloud layer.

metarize(which: str = 'slices') → None

Assembles a pandas.DataFrame of slice/group/layer METAR properties of interest.

Parameters:: which (str, optional) – whether to process ‘slices’, ‘groups’, or ‘layers’. Defaults to ‘slices’.

The pandas.DataFrame generated by this method is subsequently available via the the appropriate class property CeiloChunk.slices, CeiloChunk.groups, or CeiloChunk.layers, depending on the value of the argument which.

The slice/group/layer parameters computed/derived by this method include:

n_hits (int): duplicate-corrected number of hits

perc (float): sky coverage percentage (between 0-100)

okta (int): okta count

height_base (float): base height

height_mean (float): mean height

height_std (float): height standard deviation

height_min (float): minimum height

height_max (float): maximum height

thickness (float): thickness

fluffiness (float): fluffiness (expressed in height units, i.e. ft)

code (str): METAR-like code

significant (bool): whether the layer is significant according to the ICAO rules. See icao.significant_cloud() for details.

cluster_id (int): an ampycloud-internal identification number

isolated (bool): isolation status (for slices only)

ncomp (int): the number of subcomponents (for groups only)

Important

The value of n_hits is corrected for duplicate hits, to ensure a correct estimation of the sky coverage fraction. Essentially, two (or more) simultaneous hits from the same ceilometer are counted as one only. In other words, if a Type 1 and 2 hits from the same ceilometer, at the same observation time are included in a given slice/group/layer, they are counted as one hit only. This is a direct consequence of the fact that clouds have a single base height at any given time [citation needed].

Note

The metarize function is modularized in private submethods defined above.

find_slices() → None: Identify general height slices in the chunk data. Intended as the first stage towards the identification of cloud layers.

Important

The “parameters” of this function are all set in self.prms[‘SLICING_PRMS’].

find_groups() → None: Identifies groups of coherent hits accross overlapping slices. Intended as the second stage towards the identification of cloud layers.

Important

The “parameters” of this function are all set in self.prms[‘GROUPING_PRMS’].

find_layers() → None: Identifies individual layers from a list of groups, splitting these in 2 or 3 (if warranted) significant cloud sub-layers. Intended as the third stage towards the identification of cloud layers.

Important

The “parameters” of this function are set in self.prms[‘LAYERING_PRMS’].

property n_slices: None | int

Returns the number of slices identified in the data.

Returns:: int – the number of slices

property slices: DataFrame: Returns a pandas.DataFrame with information regarding the different slices identified by the slicing step.

property n_groups: None | int

Returns the number of groups identified in the data.

Returns:: int – the number of groups

property groups: DataFrame: Returns a pandas.DataFrame with information regarding the different groups identified by the grouping algorithm.

property n_layers: None | int

Returns the number of layers identified in the data.

Returns:: int – the number of layers.

property layers: DataFrame: Returns a pandas.DataFrame with information regarding the different layers identified by the layering algorithm.

property clouds_above_msa_buffer: bool

Returns whether a number of hits exceeding the threshold for 1 okta is detected above MSA + MSA_HIT_BUFFER.

Returns:: bool – whether high clouds were detected.

metar_msg(which: str = 'layers') → str

Construct a METAR-like message for the identified cloud slices, groups, or layers.

Parameters:: which (str, optional) – whether to look at ‘slices’, ‘groups’, or ‘layers’. Defaults to ‘layers’.
Returns:: str – the METAR-like message.

Important

The ICAO’s cloud layer selection rules applicable to METARs will be applied to create the resulting str ! See icao.significant_cloud() for details.

Caution

The Minimum Sector Altitude values set when the CeiloChunk instance was initialized will be applied ! If in doubt, the values used by this method are those set in the (parent) class attribute AbstractChunk.msa

ampycloud.dynamic module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: dynamic (scientific) parameters, which can be altered during execution.

ampycloud.dynamic.get_default_prms() → dict: Extract the default ampycloud parameters from the YAML configuration file.

ampycloud.dynamic.AMPYCLOUD_PRMS = {'BASE_LVL_HEIGHT_PERC': 5, 'BASE_LVL_LOOKBACK_PERC': 100, 'EXCLUDE_FOR_BASE_HEIGHT_CALC': [], 'GROUPING_PRMS': {'dt_scale': 180, 'height_pad_perc': 10, 'height_scale_range': [100, 500]}, 'LAYERING_PRMS': {'gmm_kwargs': {'delta_mul_gain': 0.95, 'min_prob': 1.0, 'mode': 'delta', 'rescale_0_to_x': 100, 'scores': 'BIC'}, 'min_okta_to_split': 2}, 'LOWESS': {'frac': 0.35, 'it': 3}, 'MAX_HITS_OKTA0': 3, 'MAX_HOLES_OKTA8': 1, 'MIN_SEP_LIMS': [10000], 'MIN_SEP_VALS': [250, 1000], 'MPL_STYLE': 'base', 'MSA': None, 'MSA_HIT_BUFFER': 1500, 'SLICING_PRMS': {'distance_threshold': 0.2, 'dt_scale': 100000, 'height_scale_kwargs': {'min_range': 1000}, 'height_scale_mode': 'minmax-scale'}}

The ampycloud parameters, first set from a config file via get_default_prms()

Type:: dict

ampycloud.errors module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: custom error and warning classes

exception ampycloud.errors.AmpycloudError

Bases: Exception

The default error class for ampycloud, which is a child of the Exception class.

exception ampycloud.errors.AmpycloudWarning

Bases: Warning

The default warning class for ampycloud, which is a child of the Warning class.

ampycloud.fluffer module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: fluffiness-related tools

ampycloud.fluffer.get_fluffiness(pts, **kwargs)

Utility functions to compute the fluffiness of a set of ceilometer hits.

Parameters:

pts (ndarray) – 2D array of [dt, height] ceilometer hits. None must have NaNs heights.
**kwargs (optional) – additional arguments to be fed to statsmodels.nonparameteric.lowess().

Returns:

float, ndarray –

the fluffiness (in height units) and LOWESS-smoothed (dt, height) values: (sorted).

The fluffiness is computed as 2 * mean(abs(y - lowess)), where lowess is the smooth LOWESS fit to the ceilometer hits.

The factor 2 stems from the fact that abs(y-lowess) corresponds to half(-ish) the slice thickness, that needs to be doubled in order to use the fluffiness to rescale the slice onto the 0 - 1 range.

To avoid LOWESS warning, hits with identical x coordinates (= time steps) are being offset by a small factor (1e-5).

ampycloud.hardcoded module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: hardcoded data

ampycloud.hardcoded.REQ_DATA_COLS = {'ceilo': string[python], 'dt': <class 'float'>, 'height': <class 'float'>, 'type': <class 'int'>}

the columns & associated types required for the pandas DataFrame fed to ampycloud.

Type:: dict

ampycloud.icao module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: ICAO-related utilities

ampycloud.icao.significant_cloud(oktas: list) → list

Assesses which cloud layers in a list are significant, according to the ICAO rules.

Parameters:: oktas (list) – the okta count of different cloud layers. These are assumed to be sorted from the lowest to the highest cloud layer !
Returns:: list of bool – whether a given layer is significant, or not.

The ICAO rules applied are as follows:

first layer is always reported

second layer must be SCT or more (i.e. 3 oktas or more)

third layer must be BKN or more (i.e. 5 oktas or more)

no more than 3 layers reported (since ampycloud does not deal with CB/TCU)

Reference:: Sec. 4.5.4.3 e) & footnote #14 in Table A3-1, Meteorological Service for International Air Navigation, Annex 3 to the Convention on International Civil Aviation, ICAO, 20th edition, July 2018.

ampycloud.layer module

Distributed under the terms of the 3-CLause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: layering tools

ampycloud.layer.scores2nrl(abics: ndarray) → ndarray

Converts AIC or BIC scores into probabilities = normalized relative likelihood.

Parameters:: abics (ndarray) – scores.
Returns:: ndarray – probabilities of the different models.

Specifically, this function computes:

\[p_i = \frac{e^{-0.5(\textrm{abics}_i-min(\textrm{abics}))}}{\sum_{i}e^{-0.5(\textrm{abics}_i-min(\textrm{abics}))}}\]

Note

The smaller the BIC/AIC scores, the better, but the higher the probabilities = normalized relative likelihood, the better !

ampycloud.layer.best_gmm(abics: ndarray, mode: str = 'delta', min_prob: float = 1.0, delta_mul_gain: float = 1.0) → int

Identify which Gaussian Mixture Model is most appropriate given AIC or BIC scores.

Parameters:

abics (ndarray) – the AICs or BICs scores, ordered from simplest to most complex model.
mode (str, optional) – one of [‘delta’, ‘prob’]. Defaults to ‘delta’.
min_prob (float, optional) – minimum model probability computed from the scores’s relative likelihood, below which the other models will be considered. Set it to 1 to select the model with the lowest score, irrespective of its probability. Defaults to 1. This has no effect unless mode=’prob’.
delta_mul_gain (float, optional) – a smaller score will only be considered “valid” if it is smaller than delta_mul_gain*current_best_score. Defaults to 1. This has no effect unless mode=’delta’.

Returns:

int – index of the “most appropriate” model.

Model selection can be based on:

1. the normalized relative likelihood values (see scores2nrl()) of the AIC or/and BIC scores, or 2. the normalized absolute offsets between the AIC or BIC scores.

The mode defaults to ‘delta’, i.e. the normalized absolute offsets between the scores scores.

Note

The order of the scores does matter for this routine.

Starting with the first model as the “current best model”, the model n will become the “current best model” if:

mode=’prob’:

prob(abics[current_best_model]) < min_prob
AND
prob(abics[n]) > prob(abics[current_best_model])

mode=’delta’:

abics[n] < delta_mul_gain * abics[current_best_model]

The default arguments of this function lead to selecting the number of components with the smallest score.

ampycloud.layer.ncomp_from_gmm(vals: ndarray, ncomp_max: int = 3, min_sep: int | float = 0, layer_base_params: dict[str, int] | None = None, scores: str = 'BIC', rescale_0_to_x: float | None = None, random_seed: int = 42, **kwargs: dict) → tuple

Runs a Gaussian Mixture Model on 1-D data, to determine if it contains 1, 2, or 3 components.

Parameters:

vals (ndarray) – the data to process. If ndarray is 1-D, it will be reshaped to 2-D via .reshape(-1, 1).
ncomp_max (int, optional) – maximum number of Gaussian components to assess. Defaults to 3.
min_sep (int|float, optional) – minimum separation, in data unit, required between the mean location of two Gaussian components to consider them distinct. Defaults to 0. This is used in complement to any parameters fed to best_gmm(), that will first decide how many components looks “best”, at which point these may get merged depending on min_sep. I.e. min_sep does not lead to re-running the GMM, it only merges the identified layers if required.
layer_base_params – Defined ampycloud parameters.
scores (str, optional) – either ‘BIC’ or ‘AIC’, to use Baysian Information Criterion or Akaike Information criterion scores.
rescale_0_to_x (float, optional) – if set, vals will be rescaled between 0 and this value before running the Gaussian Mixture Modelling. Defaults to None = no rescaling.
random_seed (int, optional) – used to reset temporarily the value of numpy.random.seed() to ensure repeatable results. Defaults to 42, because it is the Answer to the Ultimate Question of Life, the Universe, and Everything.
**kwargs (dict, optional) – these will be fed to best_gmm().

Returns:

int, ndarray, ndarray – number of (likely) components, array of component ids to which each hit most likely belongs, array of AIC/BIC scores.

The default values lead to selecting the number of components with the smallest BIC values.

Note

This function was inspired from the “1-D Gaussian Mixture Model” example from astroML: https://www.astroml.org/book_figures/chapter4/fig_GMM_1D.html

ampycloud.logger module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: logging utilities

ampycloud.logger.log_func_call(logger: Logger) → Callable

Intended as a decorator to log function calls.

Parameters:: logger (logging.Logger) – a logger to feed info to.

The first part of the message containing the function name is at the ‘INFO’ level. The second part of the message containing the argument values is at the ‘DEBUG’ level.

Note

Adapted from the similar dvas function, which itself was adapted from this post on SO, in particular the reply from Kfir Eisner and Peter Mortensen. See also this.

ampycloud.scaler module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: data scaling tools

ampycloud.scaler.shift_and_scale(vals: ndarray, shift: int | float | None = None, scale: int | float = 1, mode: str = 'do') → ndarray

Shift (by a constant) and scale (by a constant) the data.

Parameters:

vals (ndarray) – values to (un-)shift-and-scale.
shift (int|float, optional) – amount to shift the data by. If not specified, it will be set to max(vals).
scale (int|float, optional) – the scaling value. Defaults to 1.
mode (str, optional) – whether to ‘do’ or ‘undo’ the shift-and-scale.

Returns:

np.ndarray – the (un-)shifted-and-scaled array.

This function converts x to (x-shift)/scale if mode='do', and to x * scale + shift if mode='undo'.

ampycloud.scaler.minmax_scale(vals: ndarray, min_val: float | int | None = None, max_val: float | int | None = None, mode: str = 'do') → ndarray

Rescale the data onto a [0, 1] interval, possibly forcing a specific and/or minimum: interval range.

Parameters:

vals (ndarray) – values to (un-)minmax-scale.
mode (str, optional) – whether to ‘scale’ or ‘descale’, i.e. undo the scaling.
min_val (int|float, optional) – value to be mapped to 0. If not set, will be min(vals). Defaults to None.
max_val (int|float, optional) – value to be mapped to 1. If not set, will be max(vals). Defaults to None.

Returns:

ndarray – The (un-)minmax-scaled array.

ampycloud.scaler.minrange2minmax(vals: ndarray, min_range: int | float = 0) → tuple

Transform a minimum range into a pair of min/max values.

Parameters:

vals (np.ndarray) – values to assess.
min_range (int|float, optional) – mininum range to meet. Defaults to 0.

Returns:

tuple – the min and max values of the data range of at least min_range in size.

Essentially, if max(vals)-min(vals) >= min_range, this function returns [min(vals), max(vals)]. Else, it returns [val_mid-min_range/2, val_mid+min_range/2], with `val_mid=(max(vals)+min(vals))/2.

ampycloud.scaler.step_scale(vals: ndarray, steps: list, scales: list, mode: str = 'do') → ndarray

Scales values step-wise, with different constants bewteen specific steps.

Parameters:

vals (ndarray) – values to scale.
steps (list, optional) – the step edges. E.g. [8000, 14000].
scales (list, optional) – the scaling values (=dividers) for each step. E.g. [100, 500, 1000]. Must have len(scales) = len(steps)+1.
mode (str, optional) – whether to ‘do’ or ‘undo’ the scaling.

Returns:

ndarray – (un-)step-scaled values

Values are divided by scales[i] between steps[i-1:i]. Anything outside the range of steps is divided by scales[0] or scale[-1].

Note that this function ensures that each step is properly offseted to ensure that the scaled data is continuous (no gaps and no overlapping steps) !

ampycloud.scaler.convert_kwargs(vals: ndarray, fct: str, **kwargs) → dict

Converts the user-input keywords such that they can be fed to the underlying scaling functions.

Parameters:

vals (np.ndarray) – the values to be processed.
fct (str) – the scaling mode, e.g. ‘shift-and-scale’, etc ….
**kwargs – dict of keyword arguments to be converted, if warranted.

Returns:

dict – the data-adjusted set of kwargs.

Note

This function was first introduced to accomodate the creation of a secondary axis on the ampycloud diagnostic plots. It is a buffer that allows to separate “user” scaling keywords from the “deterministic” scaling keywords required to get a specific scaling, no matter the underlying dataset (as is required for plotting a secondary axis).

Essentially, this function allows to feed either “user” or “deterministic” keywords to apply_scaling(), such that the former will be turned into the latter, and the latter will remain untouched.

ampycloud.scaler.apply_scaling(vals: ndarray, fct: str | None = None, **kwargs) → ndarray

Umbrella scaling routine, that gathers all the individual ones under a single entry point.

Parameters:

vals (ndarray) – values to scale.
fct (str, optional) – name of the scaling function to use. Can be one of [‘shift-and-scale’, ‘minmax-scale’, or ‘step-scale’]. Defaults to None = do nothing.
**kwargs – keyword arguments that will be fed to the underlying scaling function.

Returns:

ndarray – the scaled values.

ampycloud.version module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: ampycloud version

ampycloud.version.VERSION = '2.1.1'

the one-and-only place where the ampycloud version is set.

Type:: str

ampycloud.wmo module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: WMO-related utilities

ampycloud.wmo.perc2okta(val: int | float | ndarray) → ndarray

Converts a sky coverage percentage into oktas.

Parameters:: val (int|float|ndarray) – the sky coverage percentage to convert, in percent.
Returns:: ndarray of int – the okta value(s).

One okta corresponds to 1/8 of the sky covered by clouds. The cases of 0 and 8 oktas are special, in that these indicate that the sky is covered at exactly 0%, respectively 100%. This implies that the 1 okta and 7 okta bins are larger than others.

Specifically:

0 okta == val=0

1 okta == 0 < val <= 1.5*100/8

2 oktas == 1.5*100/8 < val <= 2.5*100/8

…

7 oktas == 6.5*100/8 < val < 100

8 oktas == val=100

Reference:: Boers, R., de Haij, M. J., Wauben, W. M. F., Baltink, H. K., van Ulft, L. H., Savenije, M., and Long, C. N. (2010), Optimized fractional cloudiness determination from five ground-based remote sensing techniques, J. Geophys. Res., 115, D24116, doi:10.1029/2010JD014661.

ampycloud.wmo.okta2code(val: int) → str | None

Convert an okta value to a METAR code.

Parameters:: int – okta value between 0 and 9 (included).
Returns:: str – METAR code

Conversion is as follows:

0 okta => NCD

1-2 oktas => FEW

3-4 oktas => SCT

5-7 oktas => BKN

8 oktas => OVC

9 oktas => None

ampycloud.wmo.okta2symb(val: int, use_metsymb: bool = False) → str

Convert an okta value to a LaTeX string, possibly using the metsymb LaTeX package.

Parameters:

int – okta value between 0 and 9 (included).
use_metsymb (bool, optional) – if True, will use the metsymb LaTeX package to draw proper okta symbols. If False, returns a digit. Defaults to False.

Returns:

str – LaTeX command.

Note

The metsymb LaTeX package is available under: https://github.com/MeteoSwiss/metsymb

ampycloud.wmo.height2code(val: int | float) → str

Function that converts a given height in hundreds of ft (3 digit number), e.g. 5000 ft -> 050, 500 ft -> 005.

Parameters:: val (int, float) – the height to convert, in feet.
Returns:: str – the corresponding METAR code chunk

Below 10’000 ft, the value is floored to the nearest 100 ft. Above 10’000 ft, the value is floored to the nearest 1000 ft.

Reference:: Aerodrome Reports and Forecasts, A Users’ Handbook to the Codes, WMO-No.782, 2020 edition. https://library.wmo.int/?lvl=notice_display&id=716

Warning

Currently, this function does not allow to implement EASA’s rule AMC1 MET.TR.205(e)(3) (i.e. setting a resolution of 50 ft up to 300 ft for aerodromes with established low-visibility approach and landing procedures). https://www.easa.europa.eu/downloads/22100/en