Using ampycloud
A no-words example for those that want to get started quickly
from datetime import datetime
import ampycloud
from ampycloud.utils import mocker
from ampycloud.plots import diagnostic
# Generate the canonical demo dataset for ampycloud
# Your data should have *exactly* this structure
mock_data = mocker.canonical_demo_data()
# Run the ampycloud algorithm on it, setting the MSA to 10'000 ft aal
chunk = ampycloud.run(mock_data, prms={'MSA': 10000},
geoloc='Mock data', ref_dt=datetime.now())
# Get the resulting METAR message
print(chunk.metar_msg())
# Display the full information available for the layers found
print(chunk.layers)
# And for the most motivated, plot the diagnostic diagram
diagnostic(chunk, upto='layers', show=True, save_stem='ampycloud_demo')
The input data
The ampycloud algorithm is meant to process cloud base hits derived from ceilometer observations.
A given set of hits to be processed by the ampycloud package must be stored inside a
pandas.DataFrame
with a specific set of characteristics outlined below. Users can use
the following utility function to check whether a given pandas.DataFrame
meets all the
requirements of ampycloud.
- ampycloud.utils.utils.check_data_consistency(pdf: DataFrame, req_cols: dict | None = None) DataFrame
Assesses whether a given
pandas.DataFrame
is compatible with the requirements of ampycloud.- Parameters:
pdf (pd.DataFrame) – the data to check.
req_cols (dict) – A dictionary in which keys correspond to the required columns, and their value are the column type. Defaults to None = the ampycloud requirements.
- Returns:
pd.DataFrame – the data, possibly cleaned-up of superfluous columns, and with corrected dtypes.
This function will raise an
ampycloud.errors.AmpycloudError
and/or anampycloud.errors.AmpycloudWarning
if it identifies very bad and/or very weird things inpdf
.Specifically, the input
pdf
must be apandas.DataFrame
with the following column names/types (formally defined inampycloud.hardcoded.REQ_DATA_COLS
):'ceilo'/pd.StringDtype(), 'dt'/float, 'height'/float, 'type'/int
The
ceilo
column contains the names/ids of the ceilometers aspd.StringDtype()
. See the pandas documentation for more info about this type.The
dt
column contains time deltas, in seconds, between a given ceilometer observation andref_dt
(i.e.obs_time-ref_dt
). Ideally,ref_dt
would be the issuing time of the METAR message, such thatdt
values are negative, with the smallest one corresponding to the oldest measurement.The
height
column contains the cloud base hit heights reported by the ceilometers, in ft above aerodrome level.The
type
column contains integers that correspond to the hit sequence id. If a given ceilometer is reporting multiple hits for a given timestep (corresponding to a cloud level 1, cloud level 2, cloud level 3, etc …), thetype
of these measurements would be1
,2
,3
, etc … Any data point with atype
of-1
will be flagged in the ampycloud plots as a vertical Visibility (VV) hit, but it will not be treated any differently than any other regular hit. Type0
corresponds to no (cloud) detection, in which case the corresponding hit height should be a NaN.Important
A non-detection corresponds to a valid measurement with a
dt
value, atype 0
, andNaN
as the height. It should not be confused with a non-observation, when no data was acquired at all !If it all sounds confusing, it is possible to obtain an example of the required data format from the
utils.mocker.canonical_demo_data()
routine of the package, like so:from ampycloud.utils import mocker mock_data = mocker.canonical_demo_data()
As mentionned above, it is also possible to verify if a given
pandas.DataFrame
is meeting the ampycloud requirements ahead of time viaampycloud.utils.utils.check_data_consistency()
:from ampycloud.utils.utils import check_data_consistency checked_pdf = check_data_consistency(pdf)
This will raise an
ampycloud.errors.AmpycloudError
if:pdf
is not apandas.DataFrame
.pdf
is missing a required column.pdf
has a length of 0.pdf
has duplicated rows.any time step for any ceilometer corresponds to both a type 0 (no hit) and not 0 (some hit)
any time step for any ceilometer corresponds to both a type -1 (VV hit) and not -1 (some hit/no hit)
The latter check implies that ampycloud cannot be fed a VV hit in parallel to a cloud base hit. Should a specific ceilometer return VV hits in parallel to cloud base hits, it is up to the user to decide whether to feed one or the other.
In addition, this will raise an
ampycloud.errors.AmpycloudWarning
if:any of
pdf
column type is not as expected. Note that in this case, the code will try to correct the type on the fly.pdf
has any superfluous columns. In this case, the code will drop them automatically.Any hit height is negative.
Any
type 0
hit has a non-NaN height.Any
type 1
hit has a NaN height.Any
type 2
hit does not have a coincidenttype 1
hit.Any
type 3
hit does not have a coincidenttype 2
hit.
Running the algorithm
The ampycloud.core.run()
function
Applying the ampycloud algorithm to a given set of ceilometer cloud base hits is done via the
following function, that is also directly accessible as ampycloud.run()
.
- ampycloud.core.run(data: DataFrame, prms: dict | None = None, geoloc: str | None = None, ref_dt: str | datetime | None = None) CeiloChunk
Runs the ampycloud algorithm on a given dataset.
- Parameters:
data (pd.DataFrame) – the data to be processed, as a
pandas.DataFrame
.prms (dict, optional) – a (nested) dict of parameters to adjust for this specific run. This is meant as a thread-safe way of adjusting parameters for different runs. Any unspecified parameter will be taken from
dynamic.AMPYCLOUD_PRMS
at init time.geoloc (str, optional) – the name of the geographic location where the data was taken. Defaults to None.
ref_dt (str|datetime.datetime, optional) – reference date and time of the observations, corresponding to Delta t = 0. Defaults to None. Note that if a datetime instance is specified, it will be turned almost immediately to str via
str(ref_dt)
.
- Returns:
data.CeiloChunk
– the data chunk with all the processing outcome bundled cleanly.
All that is required to run the ampycloud algorithm is a properly formatted dataset. At the moment, specifying
geoloc
andref_dt
serves no purpose other than to enhance plots (should they be created). There is no special requirements forgeoloc
andref_dt
: as long as they are strings, you can set them to whatever you please.Important
ampycloud treats Vertical Visibility hits no differently than any other hit. Hence, it is up to the user to adjust the Vertical Visibility hit height (and/or ignore some of them, for example) prior to feeding them to ampycloud, so that it can be used as a cloud hit.
Important
ampycloud uses the
dt
andceilo
values to decide if two hits are simultaenous, or not. It is thus important that the values ofdt
be sufficiently precise to distinguish between different measurements. Essentially, each measurement (which may be comprised of several hits) should be associated to a unique(ceilo; dt)
set of values. Failure to do so may result in incorrect estimations of the cloud layer densities. Seedata.CeiloChunk.max_hits_per_layer
for more details.All the scientific parameters of the algorithm are set dynamically in the
dynamic
module. From within a Python session all these parameters can be changed directly. For example, to change the Minimum Sector Altitude (to be specified in ft aal), one would do:from ampycloud import dynamic dynamic.AMPYCLOUD_PRMS['MSA'] = 5000
Alternatively, the scientific parameters can also be defined and fed to ampycloud via a YAML file. See
set_prms()
for details.Caution
By default, the function
run()
will use the parameter values set indynamic.AMPYCLOUD_PRMS
, which is not thread safe. Users interested to run multiple concurrent ampycloud calculations with distinct sets of parameters within the same Python session are thus urged to feed the required parameters directly torun()
via theprms
keyword argument, which expects a (nested) dictionnary with keys compatible withdynamic.AMPYCLOUD_PRMS
.Examples:
# Define only the parameters that are non-default. To adjust the MSA, use: prms = {'MSA': 10000} # Or to adjust some other algorithm parameters: prms = {'LAYERING_PRMS':{'gmm_kwargs':{'scores': 'BIC'}, 'min_prob': 1.0}}
The
data.CeiloChunk
instance returned by this function contains all the information associated to the ampycloud algorithm, inclduing the raw data and slicing/grouping/layering info. Its methoddata.CeiloChunk.metar_msg()
provides direct access to the resulting METAR-like message. Users that require the height, okta amount, and/or exact sky coverage fraction of layers can get them via thedata.CeiloChunk.layers
class property.Example
In the following example, we create the canonical mock dataset of ampycloud, run the algorithm on it, and fetch the resulting METAR-like message:
from datetime import datetime import ampycloud from ampycloud.utils import mocker # Generate the canonical demo dataset for ampycloud mock_data = mocker.canonical_demo_data() # Run the ampycloud algorithm on it, setting the MSA to 10'000 ft aal. chunk = ampycloud.run(mock_data, prms={'MSA':10000}, geoloc='Mock data', ref_dt=datetime.now()) # Get the resulting METAR message print(chunk.metar_msg()) # Display the full information available for the layers found print(chunk.layers)
The ampycloud.data.CeiloChunk
class
The function ampycloud.core.run()
returns a ampycloud.data.CeiloChunk
class
instance, which is at the core of ampycloud. This class is used to load and format the
user-supplied data, execute the different ampycloud algorithm steps, and format their outcomes.
The properties of the slices/groups/layers identified by the different steps of the ampycloud
algorithm are accessible, as pandas.DataFrame
instances, via the class properties
ampycloud.data.CeiloChunk.slices
, ampycloud.data.CeiloChunk.groups
, and
ampycloud.data.CeiloChunk.layers
.
Note
ampycloud.data.CeiloChunk.metar_msg()
relies on
ampycloud.data.CeiloChunk.layers
to derive the corresponding METAR-like message.
All these slices/groups/layer parameters are being compiled/computed by
ampycloud.data.CeiloChunk.metarize()
, which contains all the info about the different
parameters.
- ampycloud.data.CeiloChunk.metarize(self, which: str = 'slices') None
Assembles a
pandas.DataFrame
of slice/group/layer METAR properties of interest.- Parameters:
which (str, optional) – whether to process ‘slices’, ‘groups’, or ‘layers’. Defaults to ‘slices’.
The
pandas.DataFrame
generated by this method is subsequently available via the the appropriate class propertyCeiloChunk.slices
,CeiloChunk.groups
, orCeiloChunk.layers
, depending on the value of the argumentwhich
.The slice/group/layer parameters computed/derived by this method include:
n_hits (int)
: duplicate-corrected number of hitsperc (float)
: sky coverage percentage (between 0-100)okta (int)
: okta countheight_base (float)
: base heightheight_mean (float)
: mean heightheight_std (float)
: height standard deviationheight_min (float)
: minimum heightheight_max (float)
: maximum heightthickness (float)
: thicknessfluffiness (float)
: fluffiness (expressed in height units, i.e. ft)code (str)
: METAR-like codesignificant (bool)
: whether the layer is significant according to the ICAO rules. Seeicao.significant_cloud()
for details.cluster_id (int)
: an ampycloud-internal identification numberisolated (bool)
: isolation status (for slices only)ncomp (int)
: the number of subcomponents (for groups only)
Important
The value of
n_hits
is corrected for duplicate hits, to ensure a correct estimation of the sky coverage fraction. Essentially, two (or more) simultaneous hits from the same ceilometer are counted as one only. In other words, if a Type1
and2
hits from the same ceilometer, at the same observation time are included in a given slice/group/layer, they are counted as one hit only. This is a direct consequence of the fact that clouds have a single base height at any given time [citation needed].Note
The metarize function is modularized in private submethods defined above.
The no-plots-required shortcut
The following function, also accessible as ampycloud.metar()
,
will directly provide interested users with the ampycloud METAR-like message for a given dataset.
It is a convenience function intended for users that do not want to generate diagnostic plots, but
only seek the outcome of the ampycloud algorithm formatted as a METAR-like str
.
- ampycloud.core.metar(data: DataFrame) str
Run the ampycloud algorithm on a dataset and extract a METAR report of the cloud layers.
- Parameters:
data (pd.DataFrame) – the data to be processed, as a
pandas.DataFrame
.- Returns:
str – the METAR-like message.
Example:
import ampycloud from ampycloud.utils import mocker # Generate the canonical demo dataset for ampycloud mock_data = mocker.canonical_demo_data() # Compute the METAR message msg = ampycloud.metar(mock_data) print(msg)
Adjusting the default algorithm parameters
The ampycloud parameters with a scientific impact on the outcome of the algorithm
(see here for the complete list)
are accessible via ampycloud.dynamic.AMPYCLOUD_PRMS
as a nested dictionary. When a new
ampycloud.data.CeiloChunk
instance is being initiated, a copy of this nested dictionary
is being stored as an instance variable. It is then possible to adjust specific parameters via the
prms
keyword argument when initializing a ampycloud.data.CeiloChunk
instance.
There are thus 2+1 ways to adjust the ampycloud scientific parameters:
1.a: Adjust them globally in
ampycloud.dynamic.AMPYCLOUD_PRMS
, like so:from ampycloud import dynamic dynamic.AMPYCLOUD_PRMS['MAX_HOLES_OKTA8'] = 0Important
Always import the entire
ampycloud.dynamic
module and stick to the above example structure, if the updated parameters are to be seen by all the ampycloud modules.1.b: Adjust them globally via a YAML file, and
ampycloud.core.set_prms()
. With this approach,ampycloud.core.copy_prm_file()
can be used to obtain a local copy of the default ampycloud parameters.2: Adjust them locally for a given execution of ampycloud by feeding a suitable nested dictionary to
ampycloud.core.run()
(that will create a newampycloud.data.CeiloChunk
instance behind the scene). The dictionary, the keys and levels of which should be consistent withampycloud.dynamic.AMPYCLOUD_PRMS
, only needs to contain the specific parameters that one requires to be different from the default values.# Define only the parameters that are non-default. To adjust the MSA, use: my_prms = {'MSA': 10000} # Or to adjust both the MSA and some other algorithm parameter: my_prms = {'MSA': 10000, 'GROUPING_PRMS':{'dt_scale_kwargs':{'scale': 300}}} # Then feed them directly to the run call chunk = ampycloud.run(some_data_tbd, prms=my_prms)
Warning
Options 1a and 1b are not thread-safe. Users planning to launch multiple ampycloud
processes simultaneously are urged to use option 2, if they need to set distinct parameters
between each. In case of doubts, the parameters used by a given
ampycloud.data.CeiloChunk
instance is accessible via the (parent)
ampycloud.data.AbstractChunk.prms()
property.
If all hope is lost and you wish to revert to the original (default) values of all the
ampycloud scientific parameters, you can use ampycloud.core.reset_prms()
.
- ampycloud.core.reset_prms(which: str | list | None = None) None
Reset the ampycloud dynamic=scientific parameters to their default values.
- Parameters:
which (str|list, optional) – (list of) names of parameters to reset specifically. If not set (by default), all parameters will be reset.
Example
import ampycloud from ampycloud import dynamic # Change a parameter dynamic.AMPYCLOUD_PRMS['MAX_HOLES_OKTA8'] = 0 # Reset them ampycloud.reset_prms() print('Back to the default value:', dynamic.AMPYCLOUD_PRMS['MAX_HOLES_OKTA8'])
Logging
A logging.NullHandler
instance is being created by ampycloud, such that no logging will
be apparent to the users unless they explicitly set it up themselves
(see here for
more details).
As an example, to enable ampycloud log messages all the way down to the DEBUG
level, users can
make the following call before running ampycloud functions:
import logging
logging.basicConfig()
logging.getLogger('ampycloud').setLevel('DEBUG')
Each ampycloud module has a dedicated logger based on the module __name__
. Hence, users
can adjust the logging level of each ampycloud module however they desire, e.g.:
logging.getLogger('ampycloud.wmo').setLevel('WARNING')
logging.getLogger('ampycloud.scaler').setLevel('DEBUG')
Plotting the diagnostic diagram
Users interested to plot the ampycloud diagnostic diagram can do using the following function,
which is also accessible as ampycloud.plots.diagnostic()
:
- ampycloud.plots.core.diagnostic(chunk: CeiloChunk, upto: str = 'layers', show_ceilos: bool = False, ref_metar: str | None = None, ref_metar_origin: str | None = None, show: bool = True, save_stem: str | None = None, save_fmts: list | str | None = None) None
A function to create the ampycloud diagnostic plot all the way to the layering step (included). This is the ultimate ampycloud plot that shows it all (or not - you choose !).
- Parameters:
chunk (CeiloChunk) – the CeiloChunk to look at.
upto (str, optional) – up to which algorithm steps to plot. Can be one of [‘raw_data’, ‘slices’, ‘groups’, ‘layers’]. Defaults to ‘layers’.
show_ceilos (bool, optional) – if True, hits will be colored as a function of the responsible ceilometer. Defaults to False. No effects unless
upto='raw data'
.ref_metar (str, optional) – reference METAR message. Defaults to None.
ref_metar_origin (str, optional) – name of the source of the reference METAR set with ref_metar. Defaults to None.
show (bool, optional) – will show the plot on the screen if True. Defaults to False.
save_stem (str, optional) – if set, will save the plot with this stem (which can include a path as well). Deafults to None.
save_fmts (list|str, optional) – a list of file formats to export the plot to. Defaults to None = [‘pdf’].
Example:
from datetime import datetime import ampycloud from ampycloud.utils import mocker from ampycloud.plots import diagnostic # First create some mock data for the example mock_data = mocker.canonical_demo_data() # Then run the ampycloud algorithm on it chunk = ampycloud.run(mock_data, geoloc='Mock data', ref_dt=datetime.now()) # Create the full ampycloud diagnostic plot diagnostic(chunk, upto='layers', show=True)
Adjusting the plotting style
ampycloud ships with its own set of matplotlib style files, that are used in context, and thus will not impact any user-specified setups.
Whereas the default parameters should ensure a decent-enough look for the diagnostic diagrams of ampycloud, a better look can be achieved by using a system-wide LaTeX installation. Provided this is available, users interested in creating nicer-looking diagnostic diagrams can do so by setting the appropriate ampycloud parameter:
from ampycloud import dynamic
dynamic.AMPYCLOUD_PRMS['MPL_STYLE'] = 'latex'
And for the most demanding users that want nothing but the best, they can create plots with actual okta symbols if they install the metsymb LaTeX package system-wide, and set:
from ampycloud import dynamic
dynamic.AMPYCLOUD_PRMS['MPL_STYLE'] = 'metsymb'
Important
Using a system-wide LaTeX installation to create matplotlib figures is not officially supported by matplotib, and thus not officially supported by ampycloud either.