ampycloud.utils package
Copyright (c) 2021-2022 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: ampycloud utilities
Submodules
ampycloud.utils.mocker module
Copyright (c) 2021-2024 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: tools to create mock datasets
- ampycloud.utils.mocker.flat_layer(dts: ndarray, height: float, height_std: float, sky_cov_frac: float) DataFrame
Generates a mock, flat, Gaussian cloud layer around a given height.
- Parameters:
dts (np.array of float) – time deltas, in s, for the simulated ceilometer hits.
height (float) – layer mean height, in ft above aerodrome level (aal).
height_std (float) – layer height standard deviation, in ft.
sky_cov_frac (float) – Sky coverage fraction. Random hits will be set to NaN to reach this value. Must be 0 <= x <= 1.
- Returns:
pandas.DataFrame
– the simulated layer with columns [‘dt’, ‘height’].
- ampycloud.utils.mocker.sin_layer(dts: ndarray, height: float, height_std: float, sky_cov_frac: float, period: int | float, amplitude: int | float) DataFrame
Generates a sinusoidal cloud layer.
- Parameters:
dts (np.array of float) – time deltas, in s, for the simulated ceilometer hits.
height (float) – layer mean height, in ft above aerodrome level (aal).
height_std (float) – layer height standard deviation, in ft.
sky_cov_frac (float, optional) – Sky coverage fraction. Random hits will be set to NaN to reach this value. Must be 0 <= x <= 1.
period (int|float) – period of the sine-wave, in s.
amplitude (int|float) – amplitude of the sine-wave, in ft.
- Returns:
pandas.DataFrame
– the simulated layer with columns [‘height’, ‘dt’].
- ampycloud.utils.mocker.mock_layers(n_ceilos: int, lookback_time: float, hit_gap: float, layer_prms: list) DataFrame
Generate a mock set of cloud layers for a specified number of ceilometers.
- Parameters:
n_ceilos (int) – number of ceilometers to simulate.
lookback_time (float) – length of the time interval, in s.
hit_gap (float) – number of seconds between ceilometer measurements.
layer_prms (list of dict) – list of layer parameters, provided as a dict for each layer. Each dict should specify all the parameters required to generate a
sin_layer()
(with the exception ofdts
that will be computed directly fromlookback_time
andhit_gap
):{'height':1000, 'height_std': 100, 'sky_cov_frac': 1, 'period': 100, 'amplitude': 0}
- Returns:
pandas.DataFrame
– a pandas DataFrame with the mock data, ready to be fed to ampycloud. Columns [‘ceilo’, ‘dt’, ‘height’, ‘type’] correspond to 1) ceilo names, 2) time deltas in s, 3) hit heights in ft aal, and 4) hit type.
Todo
add the possibility to set some VV hits in the mix
all of this could be done much more professionally with classes …
- ampycloud.utils.mocker.canonical_demo_data() DataFrame
This function creates the canonical ampycloud demonstration dataset, that can be used to illustrate the full behavior of the algorithm.
- Returns:
pandas.DataFrame
– the canonical mock dataset with properly-formatted columns.
ampycloud.utils.performance module
Copyright (c) 2021-2022 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: tools to assess the performance of ampycloud
- ampycloud.utils.performance.get_speed_benchmark(niter: int = 10) tuple
This function will run and time
ampycloud.core.demo()
to assess the code’s performance on a given machine.For now, this is a rather dumb and uninspired way to do it. If the need ever arises, this could certainly be done better, and (for example) also with a finer step resolution to see which step (slicing, grouping, layering) is the slowest, and also separate the generation of the mock dataset from its processing.
- Returns:
int, float, float, float, float, float – niter, mean, std, median, min, max, all in s.
ampycloud.utils.utils module
Copyright (c) 2022-2024 MeteoSwiss, contributors listed in AUTHORS.
Distributed under the terms of the 3-Clause BSD License.
SPDX-License-Identifier: BSD-3-Clause
Module contains: generic utilities
- ampycloud.utils.utils.check_data_consistency(pdf: DataFrame, req_cols: dict | None = None) DataFrame
Assesses whether a given
pandas.DataFrame
is compatible with the requirements of ampycloud.- Parameters:
pdf (pd.DataFrame) – the data to check.
req_cols (dict) – A dictionary in which keys correspond to the required columns, and their value are the column type. Defaults to None = the ampycloud requirements.
- Returns:
pd.DataFrame – the data, possibly cleaned-up of superfluous columns, and with corrected dtypes.
This function will raise an
ampycloud.errors.AmpycloudError
and/or anampycloud.errors.AmpycloudWarning
if it identifies very bad and/or very weird things inpdf
.Specifically, the input
pdf
must be apandas.DataFrame
with the following column names/types (formally defined inampycloud.hardcoded.REQ_DATA_COLS
):'ceilo'/pd.StringDtype(), 'dt'/float, 'height'/float, 'type'/int
The
ceilo
column contains the names/ids of the ceilometers aspd.StringDtype()
. See the pandas documentation for more info about this type.The
dt
column contains time deltas, in seconds, between a given ceilometer observation andref_dt
(i.e.obs_time-ref_dt
). Ideally,ref_dt
would be the issuing time of the METAR message, such thatdt
values are negative, with the smallest one corresponding to the oldest measurement.The
height
column contains the cloud base hit heights reported by the ceilometers, in ft above aerodrome level.The
type
column contains integers that correspond to the hit sequence id. If a given ceilometer is reporting multiple hits for a given timestep (corresponding to a cloud level 1, cloud level 2, cloud level 3, etc …), thetype
of these measurements would be1
,2
,3
, etc … Any data point with atype
of-1
will be flagged in the ampycloud plots as a vertical Visibility (VV) hit, but it will not be treated any differently than any other regular hit. Type0
corresponds to no (cloud) detection, in which case the corresponding hit height should be a NaN.Important
A non-detection corresponds to a valid measurement with a
dt
value, atype 0
, andNaN
as the height. It should not be confused with a non-observation, when no data was acquired at all !If it all sounds confusing, it is possible to obtain an example of the required data format from the
utils.mocker.canonical_demo_data()
routine of the package, like so:from ampycloud.utils import mocker mock_data = mocker.canonical_demo_data()
As mentionned above, it is also possible to verify if a given
pandas.DataFrame
is meeting the ampycloud requirements ahead of time viaampycloud.utils.utils.check_data_consistency()
:from ampycloud.utils.utils import check_data_consistency checked_pdf = check_data_consistency(pdf)
This will raise an
ampycloud.errors.AmpycloudError
if:pdf
is not apandas.DataFrame
.pdf
is missing a required column.pdf
has a length of 0.pdf
has duplicated rows.any time step for any ceilometer corresponds to both a type 0 (no hit) and not 0 (some hit)
any time step for any ceilometer corresponds to both a type -1 (VV hit) and not -1 (some hit/no hit)
The latter check implies that ampycloud cannot be fed a VV hit in parallel to a cloud base hit. Should a specific ceilometer return VV hits in parallel to cloud base hits, it is up to the user to decide whether to feed one or the other.
In addition, this will raise an
ampycloud.errors.AmpycloudWarning
if:any of
pdf
column type is not as expected. Note that in this case, the code will try to correct the type on the fly.pdf
has any superfluous columns. In this case, the code will drop them automatically.Any hit height is negative.
Any
type 0
hit has a non-NaN height.Any
type 1
hit has a NaN height.Any
type 2
hit does not have a coincidenttype 1
hit.Any
type 3
hit does not have a coincidenttype 2
hit.
- ampycloud.utils.utils.tmp_seed(seed: int)
Temporarily reset the
numpy.random.seed()
value.Adapted from the reply of Paul Panzer on SO.
Example:
with temp_seed(42): np.random.random(1)
- ampycloud.utils.utils.adjust_nested_dict(ref_dict: dict, new_dict: dict, lvls: list | None = None) dict
Update a given (nested) dictionnary given a second (possibly incomplete) one.
- Parameters:
ref_dict (dict) – reference dict of dict (of dict of dict …).
new_dict (dict) – values to update as a dict (of dict or dict of dict …)
lvls (list of str, optional) – names of the keys of the parent nested dict layers, used for reporting useful errors. This is used by the function itself when it calls itself. There is no need for the user to set this to anything at first. Defaults to None.
- Returns:
dict – the updated dict (of dict of dict of dict …)
Note
Inspired from the reply of Alex Martelli and Alex Telon on SO.
- ampycloud.utils.utils.calc_base_height(vals: ndarray, lookback_perc: int, height_perc: int) float
Calculate the layer base height.
- Parameters:
vals (npt.ArrayLike) – Ceilometer hits of a given layer. Must be a flat array/ Series of scalars and ordered in time, most recent entries last.
lookback_perc (int) – Percentage of points to take into account. 100% would correspond to all points, 50% to the recent half, etc.
height_perc (int) – Percentage of points that should be neglected when calculating the base height. Base height will be the minimum of the remaining points.
- Returns:
float – The layer base height.
- Raises:
AmpycloudError – Raised if the array passed to the n_largest percentile calculation is empty.