pyrad.proc.process_centroids#

pyrad.proc.process_centroids(procstatus, dscfg, radar_list=None)[source]#

Computes centroids for the semi-supervised hydrometeor classification

Parameters:

procstatus (int) – Processing status: 0 initializing, 1 processing volume, 2 post-processing

dscfg (dictionary of dictionaries) –

data set configuration. Accepted Configuration Keywords:

datatype : list of string. Dataset keyword
    The input data types, must contain
    "dBZ" or "dBZc", and,
    "ZDR" or "ZDRc", and,
    "RhoHV", or "uRhoHV", or "RhoHVc", and,
    "KDP", or "KDPc", and,
    "TEMP" or "H_ISO0" (optional)
samples_per_vol : int. Dataset keyword
    Maximum number of samples per volume kept for further analysis.
    Default 20000
nbins : int.
    Number of bins of the histogram used to make the data platykurtic.
    Default 110
pdf_zh_max : int
    Multiplicative factor to the Guassian function used to make the
    distribution of the reflectivity platykurtic that determines the
    number of samples for each bin. Default 10000
pdf_relh_max : int
    Multiplicative factor to the Guassian function used to make the
    distribution of the height relative to the iso-0 platykurtic that
    determines the number of samples for each bin. Default 20000
sigma_zh, sigma_relh : float
    sigma of the respective Gaussian functions. Defaults 0.75 and 1.5
randomize : bool
    If True the data is randomized to avoid the effects of the
    quantization. Default True
platykurtic_dBZ : bool
    If True makes the reflectivity distribution platykurtic. Default
    True
platykurtic_H_ISO0 : bool
    If True makes the height respect to the iso-0 distribution
    platykurtic. Default True
relh_slope : float. Dataset keyword
    The slope used to transform the height relative to the iso0 into
    a sigmoid function. Default 0.001
external_iterations : int. Dataset keywords
    Number of iterations of the external loop. This number will
    determine how many medoids are computed for each hydrometeor
    class. Default 30
internal_iterations : int. Dataset keyword
    Maximum number of iterations of the internal loop. Default 10
sample_data : Bool.
    If True the data is going to be sampled prior to each external
    iteration. Default False
nsamples_iter : int.
    Number of samples per iteration. Default 20000
alpha : float
    Minimum value to accept the cluster according to p. Default 0.01
cv_approach : Bool
    If true it is used a critical value approach to reject or accept
    similarity between observations and reference. If false it is used
    a p-value approach. Default True
n_samples_syn : int
    Number of samples drawn from reference to compare it with
    observations in the KS test. Default 50
num_samples_arr : array of int
    Number of observation samples used in the KS test to choose from.
    Default (30, 35, 40)
acceptance_threshold : float. Dataset keyword
    Threshold on the inter-quantile coefficient of dispersion of the
    medoids above which the medoid of the class is not acceptable.
    Default 0.5
nmedoids_min : int
    Minimum number of intermediate medoids to compute the final
    result. Default 1
var_names : tupple
    The names of the features. Default ('dBZ', 'ZDR', 'KDP', 'RhoHV',
    'H_ISO0')
hydro_names: tupple
    The name of the hydrometeor types. Default ('AG', 'CR', 'LR',
    'RP', 'RN', 'VI', 'WS', 'MH', 'IH/HDG')
weight : tupple
    The weight given to each feature when comparing to the reference.
    It is in the same order as var_names. Default (1., 1., 1., 1.,
    0.75)
parallelized : bool
    If True the centroids search is going to be parallelized. Default
    False
kmax_iter : int
    Maximum number of iterations of the k-medoids algorithm. Default
    100
nsamples_small : int
    Maximum number before using the k-medoids CLARA algorithm. If this
    number is exceeded the CLARA algorithm will be used. Default 40000
sampling_size_clara : int
    Number of samples used in each iteration of the k-medoids CLARA
    algorithm. Default 10000
niter_clara : int
    Number of iterations performed by the k-medoids CLARA algorithm.
    Default 5
keep_labeled_data : bool
    If True the labeled data is going to be kept for storage. Default
    True
use_median : bool
    If True the intermediate centroids are computed as the median
    of the observation variables and the final centroids are computed
    as the median of the intermediate centroids. If false they are
    computed using the kmedoids algorithm. Default false
allow_label_duplicates : bool
    If True allow to label multiple clusters with the same label.
    Default True

radar_list (list of Radar objects) – Optional. list of radar objects

Returns:

new_dataset (dict) – dictionary containing the output centroids
ind_rad (int) – radar index

pyrad.proc.process_centroids#

This Page