Using ampycloud

A no-words example for those that want to get started quickly

from datetime import datetime
import ampycloud
from ampycloud.utils import mocker
from ampycloud.plots import diagnostic

# Generate the canonical demo dataset for ampycloud
# Your data should have *exactly* this structure
mock_data = mocker.canonical_demo_data()

# Run the ampycloud algorithm on it, setting the MSA to 10'000 ft aal
chunk = ampycloud.run(mock_data, prms={'MSA': 10000},
                      geoloc='Mock data', ref_dt=datetime.now())

# Get the resulting METAR message
print(chunk.metar_msg())

# Display the full information available for the layers found
print(chunk.layers)

# And for the most motivated, plot the diagnostic diagram
diagnostic(chunk, upto='layers', show=True, save_stem='ampycloud_demo')

The input data

The ampycloud algorithm is meant to process cloud base hits derived from ceilometer observations. A given set of hits to be processed by the ampycloud package must be stored inside a pandas.DataFrame with a specific set of characteristics outlined below. Users can use the following utility function to check whether a given pandas.DataFrame meets all the requirements of ampycloud.

ampycloud.utils.utils.check_data_consistency(pdf: DataFrame, req_cols: dict | None = None) DataFrame

Assesses whether a given pandas.DataFrame is compatible with the requirements of ampycloud.

Parameters:
  • pdf (pd.DataFrame) – the data to check.

  • req_cols (dict) – A dictionary in which keys correspond to the required columns, and their value are the column type. Defaults to None = the ampycloud requirements.

Returns:

pd.DataFrame – the data, possibly cleaned-up of superfluous columns, and with corrected dtypes.

This function will raise an ampycloud.errors.AmpycloudError and/or an ampycloud.errors.AmpycloudWarning if it identifies very bad and/or very weird things in pdf.

Specifically, the input pdf must be a pandas.DataFrame with the following column names/types (formally defined in ampycloud.hardcoded.REQ_DATA_COLS):

'ceilo'/pd.StringDtype(), 'dt'/float, 'height'/float, 'type'/int

The ceilo column contains the names/ids of the ceilometers as pd.StringDtype(). See the pandas documentation for more info about this type.

The dt column contains time deltas, in seconds, between a given ceilometer observation and ref_dt (i.e. obs_time-ref_dt). Ideally, ref_dt would be the issuing time of the METAR message, such that dt values are negative, with the smallest one corresponding to the oldest measurement.

The height column contains the cloud base hit heights reported by the ceilometers, in ft above aerodrome level.

The type column contains integers that correspond to the hit sequence id. If a given ceilometer is reporting multiple hits for a given timestep (corresponding to a cloud level 1, cloud level 2, cloud level 3, etc …), the type of these measurements would be 1, 2, 3, etc … Any data point with a type of -1 will be flagged in the ampycloud plots as a vertical Visibility (VV) hit, but it will not be treated any differently than any other regular hit. Type 0 corresponds to no (cloud) detection, in which case the corresponding hit height should be a NaN.

Important

A non-detection corresponds to a valid measurement with a dt value, a type 0, and NaN as the height. It should not be confused with a non-observation, when no data was acquired at all !

If it all sounds confusing, it is possible to obtain an example of the required data format from the utils.mocker.canonical_demo_data() routine of the package, like so:

from ampycloud.utils import mocker
mock_data = mocker.canonical_demo_data()

As mentionned above, it is also possible to verify if a given pandas.DataFrame is meeting the ampycloud requirements ahead of time via ampycloud.utils.utils.check_data_consistency():

from ampycloud.utils.utils import check_data_consistency
checked_pdf = check_data_consistency(pdf)

This will raise an ampycloud.errors.AmpycloudError if:

  • pdf is not a pandas.DataFrame.

  • pdf is missing a required column.

  • pdf has a length of 0.

  • pdf has duplicated rows.

  • any time step for any ceilometer corresponds to both a type 0 (no hit) and not 0 (some hit)

  • any time step for any ceilometer corresponds to both a type -1 (VV hit) and not -1 (some hit/no hit)

The latter check implies that ampycloud cannot be fed a VV hit in parallel to a cloud base hit. Should a specific ceilometer return VV hits in parallel to cloud base hits, it is up to the user to decide whether to feed one or the other.

In addition, this will raise an ampycloud.errors.AmpycloudWarning if:

  • any of pdf column type is not as expected. Note that in this case, the code will try to correct the type on the fly.

  • pdf has any superfluous columns. In this case, the code will drop them automatically.

  • Any hit height is negative.

  • Any type 0 hit has a non-NaN height.

  • Any type 1 hit has a NaN height.

  • Any type 2 hit does not have a coincident type 1 hit.

  • Any type 3 hit does not have a coincident type 2 hit.

Running the algorithm

The ampycloud.core.run() function

Applying the ampycloud algorithm to a given set of ceilometer cloud base hits is done via the following function, that is also directly accessible as ampycloud.run().

ampycloud.core.run(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | datetime | None = None) CeiloChunk

Runs the ampycloud algorithm on a given dataset.

Parameters:
  • data (pd.DataFrame) – the data to be processed, as a pandas.DataFrame.

  • prms (dict, optional) – a (nested) dict of parameters to adjust for this specific run. This is meant as a thread-safe way of adjusting parameters for different runs. Any unspecified parameter will be taken from dynamic.AMPYCLOUD_PRMS at init time.

  • geoloc (str, optional) – the name of the geographic location where the data was taken. Defaults to None.

  • ref_dt (str|datetime.datetime, optional) – reference date and time of the observations, corresponding to Delta t = 0. Defaults to None. Note that if a datetime instance is specified, it will be turned almost immediately to str via str(ref_dt).

Returns:

data.CeiloChunk – the data chunk with all the processing outcome bundled cleanly.

All that is required to run the ampycloud algorithm is a properly formatted dataset. At the moment, specifying geoloc and ref_dt serves no purpose other than to enhance plots (should they be created). There is no special requirements for geoloc and ref_dt: as long as they are strings, you can set them to whatever you please.

Important

ampycloud treats Vertical Visibility hits no differently than any other hit. Hence, it is up to the user to adjust the Vertical Visibility hit height (and/or ignore some of them, for example) prior to feeding them to ampycloud, so that it can be used as a cloud hit.

Important

ampycloud uses the dt and ceilo values to decide if two hits are simultaenous, or not. It is thus important that the values of dt be sufficiently precise to distinguish between different measurements. Essentially, each measurement (which may be comprised of several hits) should be associated to a unique (ceilo; dt) set of values. Failure to do so may result in incorrect estimations of the cloud layer densities. See data.CeiloChunk.max_hits_per_layer for more details.

All the scientific parameters of the algorithm are set dynamically in the dynamic module. From within a Python session all these parameters can be changed directly. For example, to change the Minimum Sector Altitude (to be specified in ft aal), one would do:

from ampycloud import dynamic
dynamic.AMPYCLOUD_PRMS['MSA'] = 5000

Alternatively, the scientific parameters can also be defined and fed to ampycloud via a YAML file. See set_prms() for details.

Caution

By default, the function run() will use the parameter values set in dynamic.AMPYCLOUD_PRMS, which is not thread safe. Users interested to run multiple concurrent ampycloud calculations with distinct sets of parameters within the same Python session are thus urged to feed the required parameters directly to run() via the prms keyword argument, which expects a (nested) dictionnary with keys compatible with dynamic.AMPYCLOUD_PRMS.

Examples:

# Define only the parameters that are non-default. To adjust the MSA, use:
prms = {'MSA': 10000}

# Or to adjust some other algorithm parameters:
prms = {'LAYERING_PRMS':{'gmm_kwargs':{'scores': 'BIC'}, 'min_prob': 1.0}}

The data.CeiloChunk instance returned by this function contains all the information associated to the ampycloud algorithm, inclduing the raw data and slicing/grouping/layering info. Its method data.CeiloChunk.metar_msg() provides direct access to the resulting METAR-like message. Users that require the height, okta amount, and/or exact sky coverage fraction of layers can get them via the data.CeiloChunk.layers class property.

Example

In the following example, we create the canonical mock dataset of ampycloud, run the algorithm on it, and fetch the resulting METAR-like message:

from datetime import datetime
import ampycloud
from ampycloud.utils import mocker

# Generate the canonical demo dataset for ampycloud
mock_data = mocker.canonical_demo_data()

# Run the ampycloud algorithm on it, setting the MSA to 10'000 ft aal.
chunk = ampycloud.run(mock_data, prms={'MSA':10000},
                      geoloc='Mock data', ref_dt=datetime.now())

# Get the resulting METAR message
print(chunk.metar_msg())

# Display the full information available for the layers found
print(chunk.layers)

The ampycloud.data.CeiloChunk class

The function ampycloud.core.run() returns a ampycloud.data.CeiloChunk class instance, which is at the core of ampycloud. This class is used to load and format the user-supplied data, execute the different ampycloud algorithm steps, and format their outcomes.

The properties of the slices/groups/layers identified by the different steps of the ampycloud algorithm are accessible, as pandas.DataFrame instances, via the class properties ampycloud.data.CeiloChunk.slices, ampycloud.data.CeiloChunk.groups, and ampycloud.data.CeiloChunk.layers.

Note

ampycloud.data.CeiloChunk.metar_msg() relies on ampycloud.data.CeiloChunk.layers to derive the corresponding METAR-like message.

All these slices/groups/layer parameters are being compiled/computed by ampycloud.data.CeiloChunk.metarize(), which contains all the info about the different parameters.

ampycloud.data.CeiloChunk.metarize(self, which: str = 'slices') None

Assembles a pandas.DataFrame of slice/group/layer METAR properties of interest.

Parameters:

which (str, optional) – whether to process ‘slices’, ‘groups’, or ‘layers’. Defaults to ‘slices’.

The pandas.DataFrame generated by this method is subsequently available via the the appropriate class property CeiloChunk.slices, CeiloChunk.groups, or CeiloChunk.layers, depending on the value of the argument which.

The slice/group/layer parameters computed/derived by this method include:

  • n_hits (int): duplicate-corrected number of hits

  • perc (float): sky coverage percentage (between 0-100)

  • okta (int): okta count

  • height_base (float): base height

  • height_mean (float): mean height

  • height_std (float): height standard deviation

  • height_min (float): minimum height

  • height_max (float): maximum height

  • thickness (float): thickness

  • fluffiness (float): fluffiness (expressed in height units, i.e. ft)

  • code (str): METAR-like code

  • significant (bool): whether the layer is significant according to the ICAO rules. See icao.significant_cloud() for details.

  • cluster_id (int): an ampycloud-internal identification number

  • isolated (bool): isolation status (for slices only)

  • ncomp (int): the number of subcomponents (for groups only)

Important

The value of n_hits is corrected for duplicate hits, to ensure a correct estimation of the sky coverage fraction. Essentially, two (or more) simultaneous hits from the same ceilometer are counted as one only. In other words, if a Type 1 and 2 hits from the same ceilometer, at the same observation time are included in a given slice/group/layer, they are counted as one hit only. This is a direct consequence of the fact that clouds have a single base height at any given time [citation needed].

Note

The metarize function is modularized in private submethods defined above.

The no-plots-required shortcut

The following function, also accessible as ampycloud.metar(), will directly provide interested users with the ampycloud METAR-like message for a given dataset. It is a convenience function intended for users that do not want to generate diagnostic plots, but only seek the outcome of the ampycloud algorithm formatted as a METAR-like str.

ampycloud.core.metar(data: DataFrame) str

Run the ampycloud algorithm on a dataset and extract a METAR report of the cloud layers.

Parameters:

data (pd.DataFrame) – the data to be processed, as a pandas.DataFrame.

Returns:

str – the METAR-like message.

Example:

import ampycloud
from ampycloud.utils import mocker

# Generate the canonical demo dataset for ampycloud
mock_data = mocker.canonical_demo_data()

# Compute the METAR message
msg = ampycloud.metar(mock_data)
print(msg)

Adjusting the default algorithm parameters

The ampycloud parameters with a scientific impact on the outcome of the algorithm (see here for the complete list) are accessible via ampycloud.dynamic.AMPYCLOUD_PRMS as a nested dictionary. When a new ampycloud.data.CeiloChunk instance is being initiated, a copy of this nested dictionary is being stored as an instance variable. It is then possible to adjust specific parameters via the prms keyword argument when initializing a ampycloud.data.CeiloChunk instance.

There are thus 2+1 ways to adjust the ampycloud scientific parameters:

  • 1.a: Adjust them globally in ampycloud.dynamic.AMPYCLOUD_PRMS, like so:

    from ampycloud import dynamic
    
    dynamic.AMPYCLOUD_PRMS['MAX_HOLES_OKTA8'] = 0
    

    Important

    Always import the entire ampycloud.dynamic module and stick to the above example structure, if the updated parameters are to be seen by all the ampycloud modules.

  • 1.b: Adjust them globally via a YAML file, and ampycloud.core.set_prms(). With this approach, ampycloud.core.copy_prm_file() can be used to obtain a local copy of the default ampycloud parameters.

  • 2: Adjust them locally for a given execution of ampycloud by feeding a suitable nested dictionary to ampycloud.core.run() (that will create a new ampycloud.data.CeiloChunk instance behind the scene). The dictionary, the keys and levels of which should be consistent with ampycloud.dynamic.AMPYCLOUD_PRMS, only needs to contain the specific parameters that one requires to be different from the default values.

    # Define only the parameters that are non-default. To adjust the MSA, use:
    my_prms = {'MSA': 10000}
    
    # Or to adjust both the MSA and some other algorithm parameter:
    my_prms = {'MSA': 10000, 'GROUPING_PRMS':{'dt_scale_kwargs':{'scale': 300}}}
    
    # Then feed them directly to the run call
    chunk = ampycloud.run(some_data_tbd, prms=my_prms)
    

Warning

Options 1a and 1b are not thread-safe. Users planning to launch multiple ampycloud processes simultaneously are urged to use option 2, if they need to set distinct parameters between each. In case of doubts, the parameters used by a given ampycloud.data.CeiloChunk instance is accessible via the (parent) ampycloud.data.AbstractChunk.prms() property.

If all hope is lost and you wish to revert to the original (default) values of all the ampycloud scientific parameters, you can use ampycloud.core.reset_prms().

ampycloud.core.reset_prms(which: str | list | None = None) None

Reset the ampycloud dynamic=scientific parameters to their default values.

Parameters:

which (str|list, optional) – (list of) names of parameters to reset specifically. If not set (by default), all parameters will be reset.

Example

import ampycloud
from ampycloud import dynamic

# Change a parameter
dynamic.AMPYCLOUD_PRMS['MAX_HOLES_OKTA8'] = 0
# Reset them
ampycloud.reset_prms()
print('Back to the default value:', dynamic.AMPYCLOUD_PRMS['MAX_HOLES_OKTA8'])

Logging

A logging.NullHandler instance is being created by ampycloud, such that no logging will be apparent to the users unless they explicitly set it up themselves (see here for more details).

As an example, to enable ampycloud log messages all the way down to the DEBUG level, users can make the following call before running ampycloud functions:

import logging

logging.basicConfig()
logging.getLogger('ampycloud').setLevel('DEBUG')

Each ampycloud module has a dedicated logger based on the module __name__. Hence, users can adjust the logging level of each ampycloud module however they desire, e.g.:

logging.getLogger('ampycloud.wmo').setLevel('WARNING')
logging.getLogger('ampycloud.scaler').setLevel('DEBUG')

Plotting the diagnostic diagram

Users interested to plot the ampycloud diagnostic diagram can do using the following function, which is also accessible as ampycloud.plots.diagnostic():

ampycloud.plots.core.diagnostic(chunk: CeiloChunk, upto: str = 'layers', show_ceilos: bool = False, ref_metar: str | None = None, ref_metar_origin: str | None = None, show: bool = True, save_stem: str | None = None, save_fmts: list | str | None = None) None

A function to create the ampycloud diagnostic plot all the way to the layering step (included). This is the ultimate ampycloud plot that shows it all (or not - you choose !).

Parameters:
  • chunk (CeiloChunk) – the CeiloChunk to look at.

  • upto (str, optional) – up to which algorithm steps to plot. Can be one of [‘raw_data’, ‘slices’, ‘groups’, ‘layers’]. Defaults to ‘layers’.

  • show_ceilos (bool, optional) – if True, hits will be colored as a function of the responsible ceilometer. Defaults to False. No effects unless upto='raw data'.

  • ref_metar (str, optional) – reference METAR message. Defaults to None.

  • ref_metar_origin (str, optional) – name of the source of the reference METAR set with ref_metar. Defaults to None.

  • show (bool, optional) – will show the plot on the screen if True. Defaults to False.

  • save_stem (str, optional) – if set, will save the plot with this stem (which can include a path as well). Deafults to None.

  • save_fmts (list|str, optional) – a list of file formats to export the plot to. Defaults to None = [‘pdf’].

Example:

from datetime import datetime
import ampycloud
from ampycloud.utils import mocker
from ampycloud.plots import diagnostic

# First create some mock data for the example
mock_data = mocker.canonical_demo_data()

# Then run the ampycloud algorithm on it
chunk = ampycloud.run(mock_data, geoloc='Mock data', ref_dt=datetime.now())

# Create the full ampycloud diagnostic plot
diagnostic(chunk, upto='layers', show=True)

Adjusting the plotting style

ampycloud ships with its own set of matplotlib style files, that are used in context, and thus will not impact any user-specified setups.

Whereas the default parameters should ensure a decent-enough look for the diagnostic diagrams of ampycloud, a better look can be achieved by using a system-wide LaTeX installation. Provided this is available, users interested in creating nicer-looking diagnostic diagrams can do so by setting the appropriate ampycloud parameter:

from ampycloud import dynamic
dynamic.AMPYCLOUD_PRMS['MPL_STYLE'] = 'latex'

And for the most demanding users that want nothing but the best, they can create plots with actual okta symbols if they install the metsymb LaTeX package system-wide, and set:

from ampycloud import dynamic
dynamic.AMPYCLOUD_PRMS['MPL_STYLE'] = 'metsymb'

Important

Using a system-wide LaTeX installation to create matplotlib figures is not officially supported by matplotib, and thus not officially supported by ampycloud either.