ampycloud.utils package

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: ampycloud utilities

Submodules

ampycloud.utils.mocker module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: tools to create mock datasets

ampycloud.utils.mocker.flat_layer(dts: ndarray, height: float, height_std: float, sky_cov_frac: float) → DataFrame

Generates a mock, flat, Gaussian cloud layer around a given height.

Parameters:

dts (np.array of float) – time deltas, in s, for the simulated ceilometer hits.
height (float) – layer mean height, in ft above aerodrome level (aal).
height_std (float) – layer height standard deviation, in ft.
sky_cov_frac (float) – Sky coverage fraction. Random hits will be set to NaN to reach this value. Must be 0 <= x <= 1.

Returns:

pandas.DataFrame – the simulated layer with columns [‘dt’, ‘height’].

ampycloud.utils.mocker.sin_layer(dts: ndarray, height: float, height_std: float, sky_cov_frac: float, period: int | float, amplitude: int | float) → DataFrame

Generates a sinusoidal cloud layer.

Parameters:

dts (np.array of float) – time deltas, in s, for the simulated ceilometer hits.
height (float) – layer mean height, in ft above aerodrome level (aal).
height_std (float) – layer height standard deviation, in ft.
sky_cov_frac (float, optional) – Sky coverage fraction. Random hits will be set to NaN to reach this value. Must be 0 <= x <= 1.
period (int|float) – period of the sine-wave, in s.
amplitude (int|float) – amplitude of the sine-wave, in ft.

Returns:

pandas.DataFrame – the simulated layer with columns [‘height’, ‘dt’].

ampycloud.utils.mocker.mock_layers(n_ceilos: int, lookback_time: float, hit_gap: float, layer_prms: list) → DataFrame

Generate a mock set of cloud layers for a specified number of ceilometers.

Parameters:

n_ceilos (int) – number of ceilometers to simulate.
lookback_time (float) – length of the time interval, in s.
hit_gap (float) – number of seconds between ceilometer measurements.
layer_prms (list of dict) – list of layer parameters, provided as a dict for each layer. Each dict should specify all the parameters required to generate a sin_layer() (with the exception of dts that will be computed directly from lookback_time and hit_gap):
```
{'height':1000, 'height_std': 100, 'sky_cov_frac': 1,
'period': 100, 'amplitude': 0}
```

Returns:

pandas.DataFrame – a pandas DataFrame with the mock data, ready to be fed to ampycloud. Columns [‘ceilo’, ‘dt’, ‘height’, ‘type’] correspond to 1) ceilo names, 2) time deltas in s, 3) hit heights in ft aal, and 4) hit type.

Todo

add the possibility to set some VV hits in the mix
all of this could be done much more professionally with classes …

ampycloud.utils.mocker.canonical_demo_data() → DataFrame

This function creates the canonical ampycloud demonstration dataset, that can be used to illustrate the full behavior of the algorithm.

Returns:: pandas.DataFrame – the canonical mock dataset with properly-formatted columns.

ampycloud.utils.performance module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: tools to assess the performance of ampycloud

ampycloud.utils.performance.get_speed_benchmark(niter: int = 10) → tuple

This function will run and time ampycloud.core.demo() to assess the code’s performance on a given machine.

For now, this is a rather dumb and uninspired way to do it. If the need ever arises, this could certainly be done better, and (for example) also with a finer step resolution to see which step (slicing, grouping, layering) is the slowest, and also separate the generation of the mock dataset from its processing.

Returns:: int, float, float, float, float, float – niter, mean, std, median, min, max, all in s.

ampycloud.utils.utils module

Distributed under the terms of the 3-Clause BSD License.

SPDX-License-Identifier: BSD-3-Clause

Module contains: generic utilities

ampycloud.utils.utils.check_data_consistency(pdf: DataFrame, req_cols: dict | None = None) → DataFrame

Assesses whether a given pandas.DataFrame is compatible with the requirements of ampycloud.

Parameters:

pdf (pd.DataFrame) – the data to check.
req_cols (dict) – A dictionary in which keys correspond to the required columns, and their value are the column type. Defaults to None = the ampycloud requirements.

Returns:

pd.DataFrame – the data, possibly cleaned-up of superfluous columns, and with corrected dtypes.

This function will raise an ampycloud.errors.AmpycloudError and/or an ampycloud.errors.AmpycloudWarning if it identifies very bad and/or very weird things in pdf.

Specifically, the input pdf must be a pandas.DataFrame with the following column names/types (formally defined in ampycloud.hardcoded.REQ_DATA_COLS):

'ceilo'/pd.StringDtype(), 'dt'/float, 'height'/float, 'type'/int

The ceilo column contains the names/ids of the ceilometers as pd.StringDtype(). See the pandas documentation for more info about this type.

The dt column contains time deltas, in seconds, between a given ceilometer observation and ref_dt (i.e. obs_time-ref_dt). Ideally, ref_dt would be the issuing time of the METAR message, such that dt values are negative, with the smallest one corresponding to the oldest measurement.

The height column contains the cloud base hit heights reported by the ceilometers, in ft above aerodrome level.

The type column contains integers that correspond to the hit sequence id. If a given ceilometer is reporting multiple hits for a given timestep (corresponding to a cloud level 1, cloud level 2, cloud level 3, etc …), the type of these measurements would be 1, 2, 3, etc … Any data point with a type of -1 will be flagged in the ampycloud plots as a vertical Visibility (VV) hit, but it will not be treated any differently than any other regular hit. Type 0 corresponds to no (cloud) detection, in which case the corresponding hit height should be a NaN.

Important

A non-detection corresponds to a valid measurement with a dt value, a type 0, and NaN as the height. It should not be confused with a non-observation, when no data was acquired at all !

If it all sounds confusing, it is possible to obtain an example of the required data format from the utils.mocker.canonical_demo_data() routine of the package, like so:

from ampycloud.utils import mocker
mock_data = mocker.canonical_demo_data()

As mentionned above, it is also possible to verify if a given pandas.DataFrame is meeting the ampycloud requirements ahead of time via ampycloud.utils.utils.check_data_consistency():

from ampycloud.utils.utils import check_data_consistency
checked_pdf = check_data_consistency(pdf)

This will raise an ampycloud.errors.AmpycloudError if:

pdf is not a pandas.DataFrame.

pdf is missing a required column.

pdf has a length of 0.

pdf has duplicated rows.

any time step for any ceilometer corresponds to both a type 0 (no hit) and not 0 (some hit)

any time step for any ceilometer corresponds to both a type -1 (VV hit) and not -1 (some hit/no hit)

The latter check implies that ampycloud cannot be fed a VV hit in parallel to a cloud base hit. Should a specific ceilometer return VV hits in parallel to cloud base hits, it is up to the user to decide whether to feed one or the other.

In addition, this will raise an ampycloud.errors.AmpycloudWarning if:

any of pdf column type is not as expected. Note that in this case, the code will try to correct the type on the fly.

pdf has any superfluous columns. In this case, the code will drop them automatically.

Any hit height is negative.

Any type 0 hit has a non-NaN height.

Any type 1 hit has a NaN height.

Any type 2 hit does not have a coincident type 1 hit.

Any type 3 hit does not have a coincident type 2 hit.

ampycloud.utils.utils.tmp_seed(seed: int)

Temporarily reset the numpy.random.seed() value.

Adapted from the reply of Paul Panzer on SO.

Example:

with temp_seed(42):
    np.random.random(1)

ampycloud.utils.utils.adjust_nested_dict(ref_dict: dict, new_dict: dict, lvls: list | None = None) → dict

Update a given (nested) dictionnary given a second (possibly incomplete) one.

Parameters:

ref_dict (dict) – reference dict of dict (of dict of dict …).
new_dict (dict) – values to update as a dict (of dict or dict of dict …)
lvls (list of str, optional) – names of the keys of the parent nested dict layers, used for reporting useful errors. This is used by the function itself when it calls itself. There is no need for the user to set this to anything at first. Defaults to None.

Returns:

dict – the updated dict (of dict of dict of dict …)

Note

Inspired from the reply of Alex Martelli and Alex Telon on SO.

ampycloud.utils.utils.calc_base_height(vals: ndarray, lookback_perc: int, height_perc: int) → float

Calculate the layer base height.

Parameters:

vals (npt.ArrayLike) – Ceilometer hits of a given layer. Must be a flat array/ Series of scalars and ordered in time, most recent entries last.
lookback_perc (int) – Percentage of points to take into account. 100% would correspond to all points, 50% to the recent half, etc.
height_perc (int) – Percentage of points that should be neglected when calculating the base height. Base height will be the minimum of the remaining points.

Returns:

float – The layer base height.

Raises:

AmpycloudError – Raised if the array passed to the n_largest percentile calculation is empty.