dvas.tools package

Distributed under the terms of the GNU General Public License v3.0 or later.

SPDX-License-Identifier: GPL-3.0-or-later

This sub-package includes all the tools required for the dvas data processing.

Subpackages

Submodules

dvas.tools.chunks module

Distributed under the terms of the GNU General Public License v3.0 or later.

SPDX-License-Identifier: GPL-3.0-or-later

This module contains GRUAN-related utilities.

dvas.tools.chunks.merge_bin(wx_ps, binning)

Small utility function to sum individual profiles into bins.

Parameters:

wx_ps (pd.DataFrame) – the DataFrame to bin, with columns representing distinct profiles.
binning (int) – the vertical binning.

Returns:

pd.DataFrame – the binned profile.

dvas.tools.chunks.weighted_mean(df_chunk, binning=1, mode='arithmetic')

Compute the (respective) weighted mean of the ‘tdt’, ‘val’, and ‘alt’ columns of a pd.DataFrame, with weights defined in the ‘w_ps’ column. Also returns the Jacobian matrix for val to enable accurate error propagation.

Parameters:

df_chunk (pandas.DataFrame) – data containing the Profiles to merge.
binning (int, optional) – binning size. Defaults to 1 (=no binning).
mode (str, optional) – whether to compute an arithmetic or circular mean for ‘val’. An arithmetic mean is always computed for ‘tdt’ and ‘alt’.

Returns:

(pandas.DataFrame, np.ma.masked_array) – weighted mean profile, and associated Jacobian matrix. The matrix has a size of m * n, with m = len(df_chunk)/binning, and n = len(df_chunk) * n_profile.

Note

The input format for df_chunk is a pandas.DataFrame with a very specific structure. It requires a single index called _idx, with 5 columns per profiles with labels tdt, alt, val, ‘flg’, and w_ps. All these must be grouped together using pd.MultiIndex where the level 0 corresponds to the profile number (e.g. 0, 1, 2…), and the level 1 is the original column name, i.e.:

           0                                        1
         alt              tdt    val  flg  w_ps   alt  ... w_ps
_idx
0      486.7  0 days 00:00:00  284.7    0  55.8  485.9 ... 22.4
1      492.4  0 days 00:00:01  284.6    1  67.5  493.4 ... 26.3
...

Note

The function will ignore NaNs in a given bin, unless all the values in the bin are NaNs. See fancy_nansum() for details.

dvas.tools.chunks.delta(df_chunk, binning=1, mode='arithmetic')

Compute the delta of the ‘tdt’, ‘val’, and ‘alt’ columns of a pd.DataFrame containing exactly 2 Profiles. Also returns the Jacobian matrix for val to enable accurate error propagation.

Parameters:

df_chunk (pandas.DataFrame) – data containing the Profiles to merge.
binning (int, optional) – binning size. Defaults to 1 (=no binning).
mode (str, optional) – whether to compute an arithmetic or circular delta for ‘val’. An arithmetic delta is always computed for ‘tdt’ and ‘alt’. A circular delta will wrap results between [-180;180[.

Returns:

(pandas.DataFrame, np.ma.masked_array) – delta profile (1 - 0), and associated Jacobian matrix. The matrix has a size of m * n, with m = len(df_chunk)/binning, and n = len(df_chunk) * 2.

Note

The delta is computed as Profile2-Profile1, via a diff() function.

Note

The function will ignore NaNs in a given bin, unless all the values in the bin are NaNs. See fancy_nansum() for details.

Note

The input format for df_chunk is a pandas.DataFrame with a very specific structure. It requires a single index called _idx, with 4 columns per profiles with labels tdt, alt, val, and ‘flg’. All these must be grouped together using pd.MultiIndex where the level 0 corresponds to the profile number (i.e. 0 or 1), and the level 1 is the original column name, i.e.:

           0                               1
         alt              tdt    val flg alt    ...  flg
_idx
0      486.7  0 days 00:00:00  284.7   0 486.5  ...    0
1      492.4  0 days 00:00:01  284.6   0 491.9  ...    1
...

dvas.tools.chunks.biglambda(df_chunk)

Compute the Lambda value (which is the RMS) of the ‘val’ column of a pd.DataFrame containing a series of measurements, possibly from distinct profiles and out-of-order. Also returns the Jacobian matrix for val to enable accurate error propagation.

Parameters:: df_chunk (pandas.DataFrame) – data to process.
Returns:: (pandas.DataFrame, np.ma.masked_array) – lambda value, and associated Jacobian matrix. The matrix has a size of 1 * n, with n = len(df_chunk).

Note

The input format for df_chunk is a pandas.DataFrame with a very specific structure. It requires a single index called _idx, with 4 columns with labels tdt, alt, val, and ‘flg’. All these must be grouped together using pd.MultiIndex where the level 0 corresponds to the profile number which must be 0, and the level 1 is the original column name, i.e.:

           0
         alt  val flg
_idx
0      486.7  284.7   0
1      492.4  284.6   0
...

dvas.tools.chunks.process_chunk(df_chunk, binning=1, method='weighted arithmetic mean', return_V_mats=True, cov_mat_max_side=10000)

Process a DataFrame chunk and propagate the errors.

Parameters:

df_chunk (pandas.DataFrame) – data containing the Profiles to merge.
binning (int, optional) – binning size. Defaults to 1 (=no binning). No effect if method=’biglambda’.
method (str, optional) – the processing method. Can be one of [‘arithmetic mean’, ‘weighted arithmetic mean’, ‘circular mean’, ‘weighted circular mean’, ‘arithmetic delta’, ‘circular delta’, ‘biglambda’]. Defaults to ‘weighted arithmetic mean’.
return_V_mats (bool, optional) – if set to False, will not return the correlation matrices. Doing so saves a lot of memory. Defaults to True.
cov_mat_max_side (int, optional) – maximum size of the covariance matrix, above which it gets split and iterated over. Reduce this value in case of memory issues. Defaults to 10000, i.e. the matrix will never contain more than 10000 * 10000 elements.

Returns:

pandas.DataFrame, dict –

the processing outcome, including all the errors,: and the full correlation matrices (one per uncertainty type) as a dict.

Note

The input format for df_chunk is a pandas.DataFrame with a very specific structure. It requires a single index called _idx, with 13 columns per profiles with labels tdt, alt, val, ‘flg’, ucs, uct, ucu, uc_tot, oid, mid, eid, and rid. All these must be grouped together using pd.MultiIndex where the level 0 corresponds to the profile number (e.g. 0,1,2…), and the level 1 is the original column name, i.e.:

           0                                                1
         alt              tdt    val  ucs  ...   rid alt ...
_idx
0      486.7  0 days 00:00:00  284.7  NaN  ...     1 485.8
1      492.4  0 days 00:00:01  284.6  0.0  ...     1 493.4
...

Note

The function will ignore NaNs in a given bin, unless all the values in the bin are NaNs. See fancy_nansum() for details.

dvas.tools.math module

Distributed under the terms of the GNU General Public License v3.0 or later.

SPDX-License-Identifier: GPL-3.0-or-later

Module contents: Specialized mathematical operation on data

dvas.tools.math.crosscorr(datax, datay, lag=0, wrap=False, method='kendall')

Lag-N cross correlation. If wrap is False, shifted data are filled with NaNs.

Parameters:

datax, datay (pandas.Series) – Must be of equal length
lag(`obj` – int, optional): Default is 0
wrap (`obj` – bool, optional: (Default value = False)
method (`obj` – str, optional): kendall, pearson, spearman. Default is ‘kendall’

Returns:

float

dvas.tools.sync module

Distributed under the terms of the GNU General Public License v3.0 or later.

SPDX-License-Identifier: GPL-3.0-or-later

This module contains tools to synchronize profiles.

dvas.tools.sync.get_sync_shifts_from_time(prfs)

A routine that estimates the necessary synchronization shifts between profiles based on the measurement times.

Parameters:: prfs (dvas.data.data.MultiRSProfiles) – list of Profiles to sync.
Returns:: list of int – list of shifts required to sync the profiles with each other.

dvas.tools.sync.get_sync_shifts_from_alt(prfs, ref_alt=5000.0)

A routine that estimates the shifts required to synchronize profiles, based on the the altitude index.

This is a very crude function that does the sync based on a single altitude, and thus happily ignore any drift/stretch of any kind.

Parameters:

prfs (dvas.data.data.MultiRSProfiles) – list of Profiles to compare
ref_alt (float) – the altitude at which to sync the profiles.

Returns:

list of int –

list of shifts required to synchronize profiles in order to match ref_alt: Sign convention: profiles are synchronized when row n goes to row n+shift

dvas.tools.sync.get_sync_shifts_from_val(prfs, max_shift=100, first_guess=None, valid_value_range=None, sync_wrt_mid=None)

Estimates the shifts required to synchronize profiles, such that <abs(val_A-val_B)> is minimized.

Parameters:

prfs (dvas.data.data.MultiRSProfiles) – list of Profiles to compare
max_shift (int, optional) – maximum (absolute) shift to consider. Must be positive. Defaults to 100.
first_guess (int|list of int, optional) – starting guess around which to center the search. Defaults to None.
valid_value_range (list, optional) – if set, values outside the range set by this list of len(2) will be ignored.
sync_wrt_mid (str, optional) – if set, will sync all profiles with respect to this one. Defaults to 0 = first profile in the list.

Returns:

list of int –

list of shifts required to synchronize profiles.: Sign convention: profiles are synchronized when row n goes to row n+shift

dvas.tools.tools module

Distributed under the terms of the GNU General Public License v3.0 or later.

SPDX-License-Identifier: GPL-3.0-or-later

This module contains low-level, stand-alone, dvas tools.

dvas.tools.tools.fancy_nansum(vals, axis=None)

A custom nansum routine that treats NaNs as zeros, unless the data contains only NaNs, if which case it returns a NaN.

Parameters:

vals (pandas.DataFrame) – the data to sum.
axis (int, optional) – on which axis to run the fancy nansum. Defaults to None (=sum everything).

Returns:

float – the nansum(), or nan if the data contains only nans.

Example:

In: vals = pd.DataFrame(np.ones((4,3)))
In: vals.iloc[0] = np.nan
In: vals[0][1] = np.nan
In: vals[1][1] = np.nan
In: vals[0][2] = np.nan
In: vals
     0    1    2
0  NaN  NaN  NaN
1  NaN  NaN  1.0
2  NaN  1.0  1.0
3  1.0  1.0  1.0

In: fancy_nansum(vals)
6.0

In: fancy_nansum(vals, axis=1)
0    NaN
1    1.0
2    2.0
3    3.0
dtype: float64

In: vals.sum(skipna=True)
0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64

dvas.tools.tools.fancy_bitwise_or(vals, axis=None)

A custom bitwise_or routine to combine flags.

Parameters:

vals (pandas.DataFrame) – the data to sum.
axis (int, optional) – on which axis to run the fancy nansum. Defaults to None (=sum everything).

Returns:

int|pd.array – the result as a scalar if axis=None, and a pandas array if not.

This function got drastically simpler after #253 and the decision to drop NaN flags.

dvas.tools.tools.df_to_chunks(df, chunk_size)

A utility function that breaks a Pandas dataframe into chunks of a specified size.

Parameters:

df (pandas.DataFrame) – the pandas DataFrame to break-up.
chunk_size (int) – the length of each chunk. If len(pdf) % chunk_size !=0, the last chunk will be smaller than the other ones.

Returns:

list of pandas.DataFrame – the ordered pieces of the original DataFrame

dvas.tools.tools.wrap_angle(val)

Given an array of angles (in degrees), wrap them up in the range [-180;180[.

Parameters:: val (int, float) – the array of values to wrap.
Returns:: float – the wrapped values.

Note

Adapted from the reply of President james K. Polk on https://stackoverflow.com/questions/2320986 .

dvas.tools.wmo module

Distributed under the terms of the GNU General Public License v3.0 or later.

SPDX-License-Identifier: GPL-3.0-or-later

This module contains WMO-related tools.

dvas.tools.wmo.geom2geopot(vals, lat)

Convert geometric altitudes to geopotential heights.

Parameters:

vals (ndaray or pd.Series or pd.DataFrame) – geometric altitudes
lat (float) – geodetic latitude, in radians

Uses the Mahoney equations from the CIMO guide.

Reference:

WMO GUIDE TO METEOROLOGICAL INSTRUMENTS AND METHODS OF OBSERVATION (the CIMO Guide), WMO-No. 8 (2014 edition, updated in 2017), Part I - MEASUREMENT OF METEOROLOGICAL VARIABLES, Ch. 12 - Measurement of upper-air pressure, temperature, humidity, Sec. 12.3.6 - Use of geometric height observations instead of pressure sensor observations, p. 364.