ampycloud package
Copyright (c) 2021-2022 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: highest-level init magic
Subpackages
- ampycloud.plots package
- Submodules
- ampycloud.plots.core module
- ampycloud.plots.diagnostics module
DiagnosticPlot
DiagnosticPlot.__init__()
DiagnosticPlot.setup_fig()
DiagnosticPlot.new_fig()
DiagnosticPlot.show_hits_only()
DiagnosticPlot.show_slices()
DiagnosticPlot.show_groups()
DiagnosticPlot.show_layers()
DiagnosticPlot.add_vv_legend()
DiagnosticPlot.add_ceilo_count()
DiagnosticPlot.add_max_hits()
DiagnosticPlot.add_geoloc_and_ref_dt()
DiagnosticPlot.add_ref_metar()
DiagnosticPlot.add_metar()
DiagnosticPlot.format_primary_axes()
DiagnosticPlot.format_slice_axes()
DiagnosticPlot.format_group_axes()
DiagnosticPlot.save()
DiagnosticPlot.show()
DiagnosticPlot.close_fig()
- ampycloud.plots.hardcoded module
- ampycloud.plots.secondary module
- ampycloud.plots.tools module
- ampycloud.utils package
Submodules
ampycloud.cluster module
Copyright (c) 2021-2022 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: clustering tools
- ampycloud.cluster.agglomerative_cluster(data: ndarray, n_clusters: int | None = None, metric: str = 'euclidean', linkage: str = 'single', distance_threshold: int | float = 1) tuple
Function that wraps arround
sklearn.cluster.AgglomerativeClustering
.- Parameters:
data (ndarray) – array of [x, y] pairs to run the clustering on.
n_clusters (int, optional) – see
sklearn.cluster.AgglomerativeClustering
for details. Defaults to None.metric (str, optional) – see
sklearn.cluster.AgglomerativeClustering
for details. Defaults to ‘euclidian’.linkage (str, optional) – see
sklearn.cluster.AgglomerativeClustering
for details. Defaults to ‘single’.distance_threshold (int|float, optional) – see
sklearn.cluster.AgglomerativeClustering
for details. Defaults to 1.
- Returns:
int, ndarray – number of clusters found, and corresponding clustering labels for each data point.
- ampycloud.cluster.clusterize(data: ndarray, algo: str | None = None, **kwargs: dict) tuple | None
Umbrella clustering routine, that provides a single access point to the different clustering algorithms.
- Parameters:
data (ndarray) – array of [x, y] arrays to clusterize.
algo (str, optional) – clustering algorithm, that must be one of [None, ‘agglomerative’]. Defaults to None.
kwargs (dict, optional) – keyword arguments to be fed to the underlying clustering function.
- Returns:
int, ndarray – the number of clusters identified, and the associated labels for each data point.
ampycloud.core module
Copyright (c) 2021-2024 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the BSD-3-Clause license.
SPDX-License-Identifier: BSD-3-Clause
Module contains: core ampycloud routines. All fcts meant to be used by users directly are here.
- ampycloud.core.copy_prm_file(save_loc: str = './', which: str = 'defaults') None
Create a local copy of a specific ampycloud parameter file.
- Parameters:
save_loc (str, optional) – location to save the YML file to. Defaults to ‘./’.
which (str, optional) – name of the parameter file to copy. Defaults to ‘defaults’.
Example
import ampycloud ampycloud.copy_prm_file(save_loc='.', which='default')
Note
There is also a high-level entry point that allows users to get a local copy of the ampycloud parameter files directly from the command line:
ampycloud_copy_prm_file -which=default
- ampycloud.core.set_prms(pth: str | Path) None
Sets the dynamic=scientific ampycloud parameters from a suitable YAML file.
- Parameters:
pth (str|Path) – path+filename to a YAML parameter file for ampycloud.
Note
It is recommended to first get a copy of the default ampycloud parameter file using
copy_prm_file()
, and edit its content as required.Doing so should ensure full compliance with the default structure of
dynamic.AMPYCLOUD_PRMS
.Warning
This is NOT a thread-safe way of setting parameters. If you plan on running concurrent ampycloud evaluations, parameters should be fed directly to
run()
.Example
import ampycloud ampycloud.copy_prm_file(save_loc='.', which='default') ampycloud.set_prms('./ampycloud_default_prms.yml')
- ampycloud.core.reset_prms(which: str | list | None = None) None
Reset the ampycloud dynamic=scientific parameters to their default values.
- Parameters:
which (str|list, optional) – (list of) names of parameters to reset specifically. If not set (by default), all parameters will be reset.
Example
import ampycloud from ampycloud import dynamic # Change a parameter dynamic.AMPYCLOUD_PRMS['MAX_HOLES_OKTA8'] = 0 # Reset them ampycloud.reset_prms() print('Back to the default value:', dynamic.AMPYCLOUD_PRMS['MAX_HOLES_OKTA8'])
- ampycloud.core.run(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | datetime | None = None) CeiloChunk
Runs the ampycloud algorithm on a given dataset.
- Parameters:
data (pd.DataFrame) – the data to be processed, as a
pandas.DataFrame
.prms (dict, optional) – a (nested) dict of parameters to adjust for this specific run. This is meant as a thread-safe way of adjusting parameters for different runs. Any unspecified parameter will be taken from
dynamic.AMPYCLOUD_PRMS
at init time.geoloc (str, optional) – the name of the geographic location where the data was taken. Defaults to None.
ref_dt (str|datetime.datetime, optional) – reference date and time of the observations, corresponding to Delta t = 0. Defaults to None. Note that if a datetime instance is specified, it will be turned almost immediately to str via
str(ref_dt)
.
- Returns:
data.CeiloChunk
– the data chunk with all the processing outcome bundled cleanly.
All that is required to run the ampycloud algorithm is a properly formatted dataset. At the moment, specifying
geoloc
andref_dt
serves no purpose other than to enhance plots (should they be created). There is no special requirements forgeoloc
andref_dt
: as long as they are strings, you can set them to whatever you please.Important
ampycloud treats Vertical Visibility hits no differently than any other hit. Hence, it is up to the user to adjust the Vertical Visibility hit height (and/or ignore some of them, for example) prior to feeding them to ampycloud, so that it can be used as a cloud hit.
Important
ampycloud uses the
dt
andceilo
values to decide if two hits are simultaenous, or not. It is thus important that the values ofdt
be sufficiently precise to distinguish between different measurements. Essentially, each measurement (which may be comprised of several hits) should be associated to a unique(ceilo; dt)
set of values. Failure to do so may result in incorrect estimations of the cloud layer densities. Seedata.CeiloChunk.max_hits_per_layer
for more details.All the scientific parameters of the algorithm are set dynamically in the
dynamic
module. From within a Python session all these parameters can be changed directly. For example, to change the Minimum Sector Altitude (to be specified in ft aal), one would do:from ampycloud import dynamic dynamic.AMPYCLOUD_PRMS['MSA'] = 5000
Alternatively, the scientific parameters can also be defined and fed to ampycloud via a YAML file. See
set_prms()
for details.Caution
By default, the function
run()
will use the parameter values set indynamic.AMPYCLOUD_PRMS
, which is not thread safe. Users interested to run multiple concurrent ampycloud calculations with distinct sets of parameters within the same Python session are thus urged to feed the required parameters directly torun()
via theprms
keyword argument, which expects a (nested) dictionnary with keys compatible withdynamic.AMPYCLOUD_PRMS
.Examples:
# Define only the parameters that are non-default. To adjust the MSA, use: prms = {'MSA': 10000} # Or to adjust some other algorithm parameters: prms = {'LAYERING_PRMS':{'gmm_kwargs':{'scores': 'BIC'}, 'min_prob': 1.0}}
The
data.CeiloChunk
instance returned by this function contains all the information associated to the ampycloud algorithm, inclduing the raw data and slicing/grouping/layering info. Its methoddata.CeiloChunk.metar_msg()
provides direct access to the resulting METAR-like message. Users that require the height, okta amount, and/or exact sky coverage fraction of layers can get them via thedata.CeiloChunk.layers
class property.Example
In the following example, we create the canonical mock dataset of ampycloud, run the algorithm on it, and fetch the resulting METAR-like message:
from datetime import datetime import ampycloud from ampycloud.utils import mocker # Generate the canonical demo dataset for ampycloud mock_data = mocker.canonical_demo_data() # Run the ampycloud algorithm on it, setting the MSA to 10'000 ft aal. chunk = ampycloud.run(mock_data, prms={'MSA':10000}, geoloc='Mock data', ref_dt=datetime.now()) # Get the resulting METAR message print(chunk.metar_msg()) # Display the full information available for the layers found print(chunk.layers)
- ampycloud.core.metar(data: DataFrame) str
Run the ampycloud algorithm on a dataset and extract a METAR report of the cloud layers.
- Parameters:
data (pd.DataFrame) – the data to be processed, as a
pandas.DataFrame
.- Returns:
str – the METAR-like message.
Example:
import ampycloud from ampycloud.utils import mocker # Generate the canonical demo dataset for ampycloud mock_data = mocker.canonical_demo_data() # Compute the METAR message msg = ampycloud.metar(mock_data) print(msg)
- ampycloud.core.demo() tuple
Run the ampycloud algorithm on a demonstration dataset.
- Returns:
pandas.DataFrame
,data.CeiloChunk
– the mock dataset used for the demonstration, and thedata.CeiloChunk
instance.
ampycloud.data module
Copyright (c) 2021-2024 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: data classes
- class ampycloud.data.AbstractChunk(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | None = None)
Bases:
ABC
Abstract parent class for data chunk classes.
- DATA_COLS = {'ceilo': string[python], 'dt': <class 'float'>, 'height': <class 'float'>, 'type': <class 'int'>}
required data columns
- Type:
dict
- abstract __init__(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | None = None) None
Init routine for abstract class.
- property msa: float
The Minimum Sector Altitude set when initializing this specific instance, in ft aal.
- property msa_hit_buffer: float
The Minimum Sector Altitude hit buffer set when initializing this specific instance, in ft.
- property data: DataFrame
The data of the chunk, as a pandas DataFrame.
- property geoloc: str | None
The name of the geographic location of the observations.
- property ref_dt: str | None
The reference date and time for the data, i.e. Delta t = 0.
- property prms: dict
The dictionnary of ampycloud parameters set at the init of this class instance.
- class ampycloud.data.CeiloChunk(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | None = None)
Bases:
AbstractChunk
Child class for timeseries of Ceilometers hits, referred to as data ‘chunks’.
This class essentially gathers all the data and processing methods under one roof.
Warning
Some of these methods are actually intended to be used in order … Some safety mechanisms have been put in place to ensure this actually happens, but still …
You’ve been warned.
- __init__(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | None = None) None
CeiloChunk init method.
- Parameters:
data (pd.DataFrame) – the input data. See above for details.
prms (dict, optional) – dictionnary of ampycloud algorithm parameters.
geoloc (str, optional) – name of the geolocation of the observations.
ref_dt (str, optional) – reference date and time of the observations, corresponding to Delta t = 0. Defaults to None.
The input data is required to be a pandas DataFrame with 4 columns described in CeiloChunk.DATA_COLS, i.e. :
['ceilo', 'dt', 'height', 'type']
Specifically:
ceilo
: contains names/ids of the ceilometer associated to the measurements, as str. This is important to derive correct sky coverage percentage when combining data from more than 1 ceilometer, in case ceilometers report multiple hits at the same time.dt
: time delta between the (planned) METAR issuances time and the hit time in s, as float. This should typically be a negative number (because METARs are assembled using existing=past ceilometer observations).height
: cloud base hit height in ft above aerodrome level (aal), as float. The cloud base height computed by the ceilometer.type
: cloud hit type, as int. A value n>0 indicates that the hit is the n-th (from the ground) that was reported by this specific ceilometer for this specific timestep. A value of n=-1 indicates that the cloud hit corresponds to a Vertical Visibility hit.
Note
For now, geoloc and ref_dt serve no purposes other than improving the diagnostic plots. This is also why ref_dt is a str, such that users can specify it however they please.
- data_rescaled(dt_mode: str | None = None, height_mode: str | None = None, dt_kwargs: dict | None = None, height_kwargs: dict | None = None) DataFrame
Returns a copy of the data, rescaled according to the provided parameters.
- Parameters:
dt_mode (str, optional) – scaling rule for the time deltas. Defaults to None.
height_mode (str, optional) – scaling rule for the heights. Defaults to None.
dt_kwargs (dict, optional) – dict of arguments to be fed to the chosen dt scaling routine. Defaults to None.
height_kwargs (dict, optinal) – dict of arguments to be fed to the chosen height scaling routine. Defaults to None.
- Returns:
pd.DataFrame – a copy of the data, rescaled.
Note
The kwargs approach was inspired by the reply from Jonathan Eunice on SO.
- property ceilos: list
The list of all ceilometers included in the data chunk.
- Returns:
list of str – the list of ceilo names.
- property max_hits_per_layer: int
The maximum number of ceilometer hits possible for a given layer, given the chunk data.
- Returns:
int – the max number of ceilometer hit for a layer. Divide by len(self.ceilos) to get the average max number of hits per ceilometer per layer (remember: not all ceilometers may have the same number of timestamps over the chunk time period !).
This is the total number of unique timesteps from all ceilometers considered.
Note
This value assumes that a layer can contain only 1 hit per ceilometer per timestep, i.e. 2 simultaneous hits from a given ceilometer can never belong to the same cloud layer.
- metarize(which: str = 'slices') None
Assembles a
pandas.DataFrame
of slice/group/layer METAR properties of interest.- Parameters:
which (str, optional) – whether to process ‘slices’, ‘groups’, or ‘layers’. Defaults to ‘slices’.
The
pandas.DataFrame
generated by this method is subsequently available via the the appropriate class propertyCeiloChunk.slices
,CeiloChunk.groups
, orCeiloChunk.layers
, depending on the value of the argumentwhich
.The slice/group/layer parameters computed/derived by this method include:
n_hits (int)
: duplicate-corrected number of hitsperc (float)
: sky coverage percentage (between 0-100)okta (int)
: okta countheight_base (float)
: base heightheight_mean (float)
: mean heightheight_std (float)
: height standard deviationheight_min (float)
: minimum heightheight_max (float)
: maximum heightthickness (float)
: thicknessfluffiness (float)
: fluffiness (expressed in height units, i.e. ft)code (str)
: METAR-like codesignificant (bool)
: whether the layer is significant according to the ICAO rules. Seeicao.significant_cloud()
for details.cluster_id (int)
: an ampycloud-internal identification numberisolated (bool)
: isolation status (for slices only)ncomp (int)
: the number of subcomponents (for groups only)
Important
The value of
n_hits
is corrected for duplicate hits, to ensure a correct estimation of the sky coverage fraction. Essentially, two (or more) simultaneous hits from the same ceilometer are counted as one only. In other words, if a Type1
and2
hits from the same ceilometer, at the same observation time are included in a given slice/group/layer, they are counted as one hit only. This is a direct consequence of the fact that clouds have a single base height at any given time [citation needed].Note
The metarize function is modularized in private submethods defined above.
- find_slices() None
Identify general height slices in the chunk data. Intended as the first stage towards the identification of cloud layers.
Important
The “parameters” of this function are all set in self.prms[‘SLICING_PRMS’].
- find_groups() None
Identifies groups of coherent hits accross overlapping slices. Intended as the second stage towards the identification of cloud layers.
Important
The “parameters” of this function are all set in self.prms[‘GROUPING_PRMS’].
- find_layers() None
Identifies individual layers from a list of groups, splitting these in 2 or 3 (if warranted) significant cloud sub-layers. Intended as the third stage towards the identification of cloud layers.
Important
The “parameters” of this function are set in self.prms[‘LAYERING_PRMS’].
- property n_slices: None | int
Returns the number of slices identified in the data.
- Returns:
int – the number of slices
- property slices: DataFrame
Returns a
pandas.DataFrame
with information regarding the different slices identified by the slicing step.
- property n_groups: None | int
Returns the number of groups identified in the data.
- Returns:
int – the number of groups
- property groups: DataFrame
Returns a
pandas.DataFrame
with information regarding the different groups identified by the grouping algorithm.
- property n_layers: None | int
Returns the number of layers identified in the data.
- Returns:
int – the number of layers.
- property layers: DataFrame
Returns a
pandas.DataFrame
with information regarding the different layers identified by the layering algorithm.
- property clouds_above_msa_buffer: bool
Returns whether a number of hits exceeding the threshold for 1 okta is detected above MSA + MSA_HIT_BUFFER.
- Returns:
bool – whether high clouds were detected.
- metar_msg(which: str = 'layers') str
Construct a METAR-like message for the identified cloud slices, groups, or layers.
- Parameters:
which (str, optional) – whether to look at ‘slices’, ‘groups’, or ‘layers’. Defaults to ‘layers’.
- Returns:
str – the METAR-like message.
Important
The ICAO’s cloud layer selection rules applicable to METARs will be applied to create the resulting
str
! Seeicao.significant_cloud()
for details.Caution
The Minimum Sector Altitude values set when the
CeiloChunk
instance was initialized will be applied ! If in doubt, the values used by this method are those set in the (parent) class attributeAbstractChunk.msa
ampycloud.dynamic module
Copyright (c) 2021-2022 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: dynamic (scientific) parameters, which can be altered during execution.
- ampycloud.dynamic.get_default_prms() dict
Extract the default ampycloud parameters from the YAML configuration file.
- ampycloud.dynamic.AMPYCLOUD_PRMS = {'BASE_LVL_HEIGHT_PERC': 5, 'BASE_LVL_LOOKBACK_PERC': 100, 'EXCLUDE_FOR_BASE_HEIGHT_CALC': [], 'GROUPING_PRMS': {'dt_scale': 180, 'height_pad_perc': 10, 'height_scale_range': [100, 500]}, 'LAYERING_PRMS': {'gmm_kwargs': {'delta_mul_gain': 0.95, 'min_prob': 1.0, 'mode': 'delta', 'rescale_0_to_x': 100, 'scores': 'BIC'}, 'min_okta_to_split': 2}, 'LOWESS': {'frac': 0.35, 'it': 3}, 'MAX_HITS_OKTA0': 3, 'MAX_HOLES_OKTA8': 1, 'MIN_SEP_LIMS': [10000], 'MIN_SEP_VALS': [250, 1000], 'MPL_STYLE': 'base', 'MSA': None, 'MSA_HIT_BUFFER': 1500, 'SLICING_PRMS': {'distance_threshold': 0.2, 'dt_scale': 100000, 'height_scale_kwargs': {'min_range': 1000}, 'height_scale_mode': 'minmax-scale'}}
The ampycloud parameters, first set from a config file via
get_default_prms()
- Type:
dict
ampycloud.errors module
Copyright (c) 2021-2022 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: custom error and warning classes
- exception ampycloud.errors.AmpycloudError
Bases:
Exception
The default error class for ampycloud, which is a child of the
Exception
class.
- exception ampycloud.errors.AmpycloudWarning
Bases:
Warning
The default warning class for ampycloud, which is a child of the
Warning
class.
ampycloud.fluffer module
Copyright (c) 2021-2024 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: fluffiness-related tools
- ampycloud.fluffer.get_fluffiness(pts, **kwargs)
Utility functions to compute the fluffiness of a set of ceilometer hits.
- Parameters:
pts (ndarray) – 2D array of [dt, height] ceilometer hits. None must have NaNs heights.
**kwargs (optional) – additional arguments to be fed to statsmodels.nonparameteric.lowess().
- Returns:
float, ndarray –
- the fluffiness (in height units) and LOWESS-smoothed (dt, height) values
(sorted).
The fluffiness is computed as 2 * mean(abs(y - lowess)), where lowess is the smooth LOWESS fit to the ceilometer hits.
The factor 2 stems from the fact that abs(y-lowess) corresponds to half(-ish) the slice thickness, that needs to be doubled in order to use the fluffiness to rescale the slice onto the 0 - 1 range.
To avoid LOWESS warning, hits with identical x coordinates (= time steps) are being offset by a small factor (1e-5).
ampycloud.hardcoded module
Copyright (c) 2022-2024 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: hardcoded data
- ampycloud.hardcoded.REQ_DATA_COLS = {'ceilo': string[python], 'dt': <class 'float'>, 'height': <class 'float'>, 'type': <class 'int'>}
the columns & associated types required for the pandas DataFrame fed to ampycloud.
- Type:
dict
ampycloud.icao module
Copyright (c) 2022 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: ICAO-related utilities
- ampycloud.icao.significant_cloud(oktas: list) list
Assesses which cloud layers in a list are significant, according to the ICAO rules.
- Parameters:
oktas (list) – the okta count of different cloud layers. These are assumed to be sorted from the lowest to the highest cloud layer !
- Returns:
list of bool – whether a given layer is significant, or not.
The ICAO rules applied are as follows:
first layer is always reported
second layer must be SCT or more (i.e. 3 oktas or more)
third layer must be BKN or more (i.e. 5 oktas or more)
no more than 3 layers reported (since ampycloud does not deal with CB/TCU)
- Reference:
Sec. 4.5.4.3 e) & footnote #14 in Table A3-1, Meteorological Service for International Air Navigation, Annex 3 to the Convention on International Civil Aviation, ICAO, 20th edition, July 2018.
ampycloud.layer module
Copyright (c) 2021-2024 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-CLause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: layering tools
- ampycloud.layer.scores2nrl(abics: ndarray) ndarray
Converts AIC or BIC scores into probabilities = normalized relative likelihood.
- Parameters:
abics (ndarray) – scores.
- Returns:
ndarray – probabilities of the different models.
Specifically, this function computes:
\[p_i = \frac{e^{-0.5(\textrm{abics}_i-min(\textrm{abics}))}}{\sum_{i}e^{-0.5(\textrm{abics}_i-min(\textrm{abics}))}}\]Note
The smaller the BIC/AIC scores, the better, but the higher the probabilities = normalized relative likelihood, the better !
- ampycloud.layer.best_gmm(abics: ndarray, mode: str = 'delta', min_prob: float = 1.0, delta_mul_gain: float = 1.0) int
Identify which Gaussian Mixture Model is most appropriate given AIC or BIC scores.
- Parameters:
abics (ndarray) – the AICs or BICs scores, ordered from simplest to most complex model.
mode (str, optional) – one of [‘delta’, ‘prob’]. Defaults to ‘delta’.
min_prob (float, optional) – minimum model probability computed from the scores’s relative likelihood, below which the other models will be considered. Set it to 1 to select the model with the lowest score, irrespective of its probability. Defaults to 1. This has no effect unless mode=’prob’.
delta_mul_gain (float, optional) – a smaller score will only be considered “valid” if it is smaller than delta_mul_gain*current_best_score. Defaults to 1. This has no effect unless mode=’delta’.
- Returns:
int – index of the “most appropriate” model.
Model selection can be based on:
1. the normalized relative likelihood values (see scores2nrl()) of the AIC or/and BIC scores, or 2. the normalized absolute offsets between the AIC or BIC scores.
The mode defaults to ‘delta’, i.e. the normalized absolute offsets between the scores scores.
Note
The order of the scores does matter for this routine.
Starting with the first model as the “current best model”, the model n will become the “current best model” if:
mode=’prob’:
prob(abics[current_best_model]) < min_prob AND prob(abics[n]) > prob(abics[current_best_model])
mode=’delta’:
abics[n] < delta_mul_gain * abics[current_best_model]
The default arguments of this function lead to selecting the number of components with the smallest score.
- ampycloud.layer.ncomp_from_gmm(vals: ndarray, ncomp_max: int = 3, min_sep: int | float = 0, layer_base_params: dict[str, int] | None = None, scores: str = 'BIC', rescale_0_to_x: float | None = None, random_seed: int = 42, **kwargs: dict) tuple
Runs a Gaussian Mixture Model on 1-D data, to determine if it contains 1, 2, or 3 components.
- Parameters:
vals (ndarray) – the data to process. If ndarray is 1-D, it will be reshaped to 2-D via .reshape(-1, 1).
ncomp_max (int, optional) – maximum number of Gaussian components to assess. Defaults to 3.
min_sep (int|float, optional) – minimum separation, in data unit, required between the mean location of two Gaussian components to consider them distinct. Defaults to 0. This is used in complement to any parameters fed to best_gmm(), that will first decide how many components looks “best”, at which point these may get merged depending on min_sep. I.e. min_sep does not lead to re-running the GMM, it only merges the identified layers if required.
layer_base_params – Defined ampycloud parameters.
scores (str, optional) – either ‘BIC’ or ‘AIC’, to use Baysian Information Criterion or Akaike Information criterion scores.
rescale_0_to_x (float, optional) – if set, vals will be rescaled between 0 and this value before running the Gaussian Mixture Modelling. Defaults to None = no rescaling.
random_seed (int, optional) – used to reset temporarily the value of
numpy.random.seed()
to ensure repeatable results. Defaults to 42, because it is the Answer to the Ultimate Question of Life, the Universe, and Everything.**kwargs (dict, optional) – these will be fed to best_gmm().
- Returns:
int, ndarray, ndarray – number of (likely) components, array of component ids to which each hit most likely belongs, array of AIC/BIC scores.
The default values lead to selecting the number of components with the smallest BIC values.
Note
This function was inspired from the “1-D Gaussian Mixture Model” example from astroML: https://www.astroml.org/book_figures/chapter4/fig_GMM_1D.html
ampycloud.logger module
Copyright (c) 2021-2022 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: logging utilities
- ampycloud.logger.log_func_call(logger: Logger) Callable
Intended as a decorator to log function calls.
- Parameters:
logger (logging.Logger) – a logger to feed info to.
The first part of the message containing the function name is at the ‘INFO’ level. The second part of the message containing the argument values is at the ‘DEBUG’ level.
ampycloud.scaler module
Copyright (c) 2021-2022 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: data scaling tools
- ampycloud.scaler.shift_and_scale(vals: ndarray, shift: int | float | None = None, scale: int | float = 1, mode: str = 'do') ndarray
Shift (by a constant) and scale (by a constant) the data.
- Parameters:
vals (ndarray) – values to (un-)shift-and-scale.
shift (int|float, optional) – amount to shift the data by. If not specified, it will be set to
max(vals)
.scale (int|float, optional) – the scaling value. Defaults to 1.
mode (str, optional) – whether to ‘do’ or ‘undo’ the shift-and-scale.
- Returns:
np.ndarray – the (un-)shifted-and-scaled array.
This function converts x to (x-shift)/scale if
mode='do'
, and to x * scale + shift ifmode='undo'
.
- ampycloud.scaler.minmax_scale(vals: ndarray, min_val: float | int | None = None, max_val: float | int | None = None, mode: str = 'do') ndarray
- Rescale the data onto a [0, 1] interval, possibly forcing a specific and/or minimum
interval range.
- Parameters:
vals (ndarray) – values to (un-)minmax-scale.
mode (str, optional) – whether to ‘scale’ or ‘descale’, i.e. undo the scaling.
min_val (int|float, optional) – value to be mapped to 0. If not set, will be
min(vals)
. Defaults to None.max_val (int|float, optional) – value to be mapped to 1. If not set, will be
max(vals)
. Defaults to None.
- Returns:
ndarray – The (un-)minmax-scaled array.
- ampycloud.scaler.minrange2minmax(vals: ndarray, min_range: int | float = 0) tuple
Transform a minimum range into a pair of min/max values.
- Parameters:
vals (np.ndarray) – values to assess.
min_range (int|float, optional) – mininum range to meet. Defaults to 0.
- Returns:
tuple – the min and max values of the data range of at least min_range in size.
Essentially, if max(vals)-min(vals) >= min_range, this function returns
[min(vals), max(vals)]
. Else, it returns[val_mid-min_range/2, val_mid+min_range/2]
, with`val_mid=(max(vals)+min(vals))/2
.
- ampycloud.scaler.step_scale(vals: ndarray, steps: list, scales: list, mode: str = 'do') ndarray
Scales values step-wise, with different constants bewteen specific steps.
- Parameters:
vals (ndarray) – values to scale.
steps (list, optional) – the step edges. E.g. [8000, 14000].
scales (list, optional) – the scaling values (=dividers) for each step. E.g. [100, 500, 1000]. Must have len(scales) = len(steps)+1.
mode (str, optional) – whether to ‘do’ or ‘undo’ the scaling.
- Returns:
ndarray – (un-)step-scaled values
Values are divided by scales[i] between steps[i-1:i]. Anything outside the range of steps is divided by scales[0] or scale[-1].
Note that this function ensures that each step is properly offseted to ensure that the scaled data is continuous (no gaps and no overlapping steps) !
- ampycloud.scaler.convert_kwargs(vals: ndarray, fct: str, **kwargs) dict
Converts the user-input keywords such that they can be fed to the underlying scaling functions.
- Parameters:
vals (np.ndarray) – the values to be processed.
fct (str) – the scaling mode, e.g. ‘shift-and-scale’, etc ….
**kwargs – dict of keyword arguments to be converted, if warranted.
- Returns:
dict – the data-adjusted set of kwargs.
Note
This function was first introduced to accomodate the creation of a secondary axis on the ampycloud diagnostic plots. It is a buffer that allows to separate “user” scaling keywords from the “deterministic” scaling keywords required to get a specific scaling, no matter the underlying dataset (as is required for plotting a secondary axis).
Essentially, this function allows to feed either “user” or “deterministic” keywords to
apply_scaling()
, such that the former will be turned into the latter, and the latter will remain untouched.
- ampycloud.scaler.apply_scaling(vals: ndarray, fct: str | None = None, **kwargs) ndarray
Umbrella scaling routine, that gathers all the individual ones under a single entry point.
- Parameters:
vals (ndarray) – values to scale.
fct (str, optional) – name of the scaling function to use. Can be one of [‘shift-and-scale’, ‘minmax-scale’, or ‘step-scale’]. Defaults to None = do nothing.
**kwargs – keyword arguments that will be fed to the underlying scaling function.
- Returns:
ndarray – the scaled values.
ampycloud.version module
Copyright (c) 2021-2022 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: ampycloud version
- ampycloud.version.VERSION = '2.1.1'
the one-and-only place where the ampycloud version is set.
- Type:
str
ampycloud.wmo module
Copyright (c) 2021-2024 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: WMO-related utilities
- ampycloud.wmo.perc2okta(val: int | float | ndarray) ndarray
Converts a sky coverage percentage into oktas.
- Parameters:
val (int|float|ndarray) – the sky coverage percentage to convert, in percent.
- Returns:
ndarray of int – the okta value(s).
One okta corresponds to 1/8 of the sky covered by clouds. The cases of 0 and 8 oktas are special, in that these indicate that the sky is covered at exactly 0%, respectively 100%. This implies that the 1 okta and 7 okta bins are larger than others.
Specifically:
0 okta == val=0
1 okta == 0 < val <= 1.5*100/8
2 oktas == 1.5*100/8 < val <= 2.5*100/8
…
7 oktas == 6.5*100/8 < val < 100
8 oktas == val=100
- Reference:
Boers, R., de Haij, M. J., Wauben, W. M. F., Baltink, H. K., van Ulft, L. H., Savenije, M., and Long, C. N. (2010), Optimized fractional cloudiness determination from five ground-based remote sensing techniques, J. Geophys. Res., 115, D24116, doi:10.1029/2010JD014661.
- ampycloud.wmo.okta2code(val: int) str | None
Convert an okta value to a METAR code.
- Parameters:
int – okta value between 0 and 9 (included).
- Returns:
str – METAR code
Conversion is as follows:
0 okta => NCD
1-2 oktas => FEW
3-4 oktas => SCT
5-7 oktas => BKN
8 oktas => OVC
9 oktas => None
- ampycloud.wmo.okta2symb(val: int, use_metsymb: bool = False) str
Convert an okta value to a LaTeX string, possibly using the metsymb LaTeX package.
- Parameters:
int – okta value between 0 and 9 (included).
use_metsymb (bool, optional) – if True, will use the metsymb LaTeX package to draw proper okta symbols. If False, returns a digit. Defaults to False.
- Returns:
str – LaTeX command.
Note
The metsymb LaTeX package is available under: https://github.com/MeteoSwiss/metsymb
- ampycloud.wmo.height2code(val: int | float) str
Function that converts a given height in hundreds of ft (3 digit number), e.g. 5000 ft -> 050, 500 ft -> 005.
- Parameters:
val (int, float) – the height to convert, in feet.
- Returns:
str – the corresponding METAR code chunk
Below 10’000 ft, the value is floored to the nearest 100 ft. Above 10’000 ft, the value is floored to the nearest 1000 ft.
- Reference:
Aerodrome Reports and Forecasts, A Users’ Handbook to the Codes, WMO-No.782, 2020 edition. https://library.wmo.int/?lvl=notice_display&id=716
Warning
Currently, this function does not allow to implement EASA’s rule AMC1 MET.TR.205(e)(3) (i.e. setting a resolution of 50 ft up to 300 ft for aerodromes with established low-visibility approach and landing procedures). https://www.easa.europa.eu/downloads/22100/en