pyrad.proc.process_centroids#
- pyrad.proc.process_centroids(procstatus, dscfg, radar_list=None)[source]#
Computes centroids for the semi-supervised hydrometeor classification
- Parameters:
procstatus (int) – Processing status: 0 initializing, 1 processing volume, 2 post-processing
dscfg (dictionary of dictionaries) –
data set configuration. Accepted Configuration Keywords:
datatype : list of string. Dataset keyword The input data types, must contain "dBZ" or "dBZc", and, "ZDR" or "ZDRc", and, "RhoHV", or "uRhoHV", or "RhoHVc", and, "KDP", or "KDPc", and, "TEMP" or "H_ISO0" (optional) samples_per_vol : int. Dataset keyword Maximum number of samples per volume kept for further analysis. Default 20000 nbins : int. Number of bins of the histogram used to make the data platykurtic. Default 110 pdf_zh_max : int Multiplicative factor to the Guassian function used to make the distribution of the reflectivity platykurtic that determines the number of samples for each bin. Default 10000 pdf_relh_max : int Multiplicative factor to the Guassian function used to make the distribution of the height relative to the iso-0 platykurtic that determines the number of samples for each bin. Default 20000 sigma_zh, sigma_relh : float sigma of the respective Gaussian functions. Defaults 0.75 and 1.5 randomize : bool If True the data is randomized to avoid the effects of the quantization. Default True platykurtic_dBZ : bool If True makes the reflectivity distribution platykurtic. Default True platykurtic_H_ISO0 : bool If True makes the height respect to the iso-0 distribution platykurtic. Default True relh_slope : float. Dataset keyword The slope used to transform the height relative to the iso0 into a sigmoid function. Default 0.001 external_iterations : int. Dataset keywords Number of iterations of the external loop. This number will determine how many medoids are computed for each hydrometeor class. Default 30 internal_iterations : int. Dataset keyword Maximum number of iterations of the internal loop. Default 10 sample_data : Bool. If True the data is going to be sampled prior to each external iteration. Default False nsamples_iter : int. Number of samples per iteration. Default 20000 alpha : float Minimum value to accept the cluster according to p. Default 0.01 cv_approach : Bool If true it is used a critical value approach to reject or accept similarity between observations and reference. If false it is used a p-value approach. Default True n_samples_syn : int Number of samples drawn from reference to compare it with observations in the KS test. Default 50 num_samples_arr : array of int Number of observation samples used in the KS test to choose from. Default (30, 35, 40) acceptance_threshold : float. Dataset keyword Threshold on the inter-quantile coefficient of dispersion of the medoids above which the medoid of the class is not acceptable. Default 0.5 nmedoids_min : int Minimum number of intermediate medoids to compute the final result. Default 1 var_names : tupple The names of the features. Default ('dBZ', 'ZDR', 'KDP', 'RhoHV', 'H_ISO0') hydro_names: tupple The name of the hydrometeor types. Default ('AG', 'CR', 'LR', 'RP', 'RN', 'VI', 'WS', 'MH', 'IH/HDG') weight : tupple The weight given to each feature when comparing to the reference. It is in the same order as var_names. Default (1., 1., 1., 1., 0.75) parallelized : bool If True the centroids search is going to be parallelized. Default False kmax_iter : int Maximum number of iterations of the k-medoids algorithm. Default 100 nsamples_small : int Maximum number before using the k-medoids CLARA algorithm. If this number is exceeded the CLARA algorithm will be used. Default 40000 sampling_size_clara : int Number of samples used in each iteration of the k-medoids CLARA algorithm. Default 10000 niter_clara : int Number of iterations performed by the k-medoids CLARA algorithm. Default 5 keep_labeled_data : bool If True the labeled data is going to be kept for storage. Default True use_median : bool If True the intermediate centroids are computed as the median of the observation variables and the final centroids are computed as the median of the intermediate centroids. If false they are computed using the kmedoids algorithm. Default false allow_label_duplicates : bool If True allow to label multiple clusters with the same label. Default True
radar_list (list of Radar objects) – Optional. list of radar objects
- Returns:
new_dataset (dict) – dictionary containing the output centroids
ind_rad (int) – radar index