pyrad.proc.process_centroids#

pyrad.proc.process_centroids(procstatus, dscfg, radar_list=None)[source]#

Computes centroids for the semi-supervised hydrometeor classification

Parameters:
  • procstatus (int) – Processing status: 0 initializing, 1 processing volume, 2 post-processing

  • dscfg (dictionary of dictionaries) –

    data set configuration. Accepted Configuration Keywords:

    datatype : list of string. Dataset keyword
        The input data types, must contain
        "dBZ" or "dBZc", and,
        "ZDR" or "ZDRc", and,
        "RhoHV", or "uRhoHV", or "RhoHVc", and,
        "KDP", or "KDPc", and,
        "TEMP" or "H_ISO0" (optional)
    samples_per_vol : int. Dataset keyword
        Maximum number of samples per volume kept for further analysis.
        Default 20000
    nbins : int.
        Number of bins of the histogram used to make the data platykurtic.
        Default 110
    pdf_zh_max : int
        Multiplicative factor to the Guassian function used to make the
        distribution of the reflectivity platykurtic that determines the
        number of samples for each bin. Default 10000
    pdf_relh_max : int
        Multiplicative factor to the Guassian function used to make the
        distribution of the height relative to the iso-0 platykurtic that
        determines the number of samples for each bin. Default 20000
    sigma_zh, sigma_relh : float
        sigma of the respective Gaussian functions. Defaults 0.75 and 1.5
    randomize : bool
        If True the data is randomized to avoid the effects of the
        quantization. Default True
    platykurtic_dBZ : bool
        If True makes the reflectivity distribution platykurtic. Default
        True
    platykurtic_H_ISO0 : bool
        If True makes the height respect to the iso-0 distribution
        platykurtic. Default True
    relh_slope : float. Dataset keyword
        The slope used to transform the height relative to the iso0 into
        a sigmoid function. Default 0.001
    external_iterations : int. Dataset keywords
        Number of iterations of the external loop. This number will
        determine how many medoids are computed for each hydrometeor
        class. Default 30
    internal_iterations : int. Dataset keyword
        Maximum number of iterations of the internal loop. Default 10
    sample_data : Bool.
        If True the data is going to be sampled prior to each external
        iteration. Default False
    nsamples_iter : int.
        Number of samples per iteration. Default 20000
    alpha : float
        Minimum value to accept the cluster according to p. Default 0.01
    cv_approach : Bool
        If true it is used a critical value approach to reject or accept
        similarity between observations and reference. If false it is used
        a p-value approach. Default True
    n_samples_syn : int
        Number of samples drawn from reference to compare it with
        observations in the KS test. Default 50
    num_samples_arr : array of int
        Number of observation samples used in the KS test to choose from.
        Default (30, 35, 40)
    acceptance_threshold : float. Dataset keyword
        Threshold on the inter-quantile coefficient of dispersion of the
        medoids above which the medoid of the class is not acceptable.
        Default 0.5
    nmedoids_min : int
        Minimum number of intermediate medoids to compute the final
        result. Default 1
    var_names : tupple
        The names of the features. Default ('dBZ', 'ZDR', 'KDP', 'RhoHV',
        'H_ISO0')
    hydro_names: tupple
        The name of the hydrometeor types. Default ('AG', 'CR', 'LR',
        'RP', 'RN', 'VI', 'WS', 'MH', 'IH/HDG')
    weight : tupple
        The weight given to each feature when comparing to the reference.
        It is in the same order as var_names. Default (1., 1., 1., 1.,
        0.75)
    parallelized : bool
        If True the centroids search is going to be parallelized. Default
        False
    kmax_iter : int
        Maximum number of iterations of the k-medoids algorithm. Default
        100
    nsamples_small : int
        Maximum number before using the k-medoids CLARA algorithm. If this
        number is exceeded the CLARA algorithm will be used. Default 40000
    sampling_size_clara : int
        Number of samples used in each iteration of the k-medoids CLARA
        algorithm. Default 10000
    niter_clara : int
        Number of iterations performed by the k-medoids CLARA algorithm.
        Default 5
    keep_labeled_data : bool
        If True the labeled data is going to be kept for storage. Default
        True
    use_median : bool
        If True the intermediate centroids are computed as the median
        of the observation variables and the final centroids are computed
        as the median of the intermediate centroids. If false they are
        computed using the kmedoids algorithm. Default false
    allow_label_duplicates : bool
        If True allow to label multiple clusters with the same label.
        Default True
    
  • radar_list (list of Radar objects) – Optional. list of radar objects

Returns:

  • new_dataset (dict) – dictionary containing the output centroids

  • ind_rad (int) – radar index