pyart.retrieve.compute_centroids#

pyart.retrieve.compute_centroids(features_matrix, weight=(1.0, 1.0, 1.0, 1.0, 0.75), var_names=('dBZ', 'ZDR', 'KDP', 'RhoHV', 'H_ISO0'), hydro_names=('AG', 'CR', 'LR', 'RP', 'RN', 'VI', 'WS', 'MH', 'IH/HDG'), nsamples_iter=20000, external_iterations=30, internal_iterations=10, alpha=0.01, cv_approach=True, num_samples_arr=(30, 35, 40), n_samples_syn=50, nmedoids_min=1, acceptance_threshold=0.5, band='C', relh_slope=0.001, parallelized=False, sample_data=True, kmax_iter=100, nsamples_small=40000, sampling_size_clara=10000, niter_clara=5, keep_labeled_data=True, use_median=False, allow_label_duplicates=False)[source]#

Given a features matrix computes the centroids

Parameters:
  • features_matrix (2D-array) – matrix of size (nsamples, nvariables)

  • weight (tuple) – Weight given to each feature in the KS test

  • var_names (tupple) – List of name variables

  • hydro_names (tupple) – List of hydrometeor types

  • nsamples_iter (int) – Number of samples of the features matrix in each external iteration

  • external_iterations (int) – Number of iterations of the external loop. This number will determine how many medoids are computed for each hydrometeor class.

  • internal_iterations (int) – Maximum number of iterations of the internal loop

  • acceptance_threshold (float) – Threshold on the inter-quantile coefficient of dispersion of the medoids above which the medoid of the class is not acceptable.

  • alpha (float) – Minimum value to accept the cluster according to p

  • cv_approach (bool) – If true it is used a critical value approach to reject or accept similarity between observations and reference. If false it is used a p-value approach

  • num_samples_arr (1D-array) – Array containing the possible number of observation samples to use when comparing with reference

  • n_samples_syn (int) – Number of samples from reference used in comparison

  • nmedoids_min (int) – Minimum number of valid intermediate medoids to compute a final medoid

  • band (str) – Frequency band of the radar data. Can be C, S or X

  • relh_slope (float) – The slope used to transform the height relative to the iso0 into a sigmoid function.

  • parallelized (bool) – If True the processing is going to be parallelized

  • sample_data (bool) – If True the data is going to be sampled at each external loop

  • kmax_iter (int) – Maximum number of iterations of the kmedoids algorithm

  • nsamples_small (int) – Maximum number before using the k-medoids CLARA algorithm. If this number is exceeded the CLARA algorithm will be used

  • sampling_size_clara (int) – Number of samples used in each iteration of the k-medoids CLARA algorithm.

  • niter_clara (int) – Number of iterations performed by the k-medoids CLARA algorithm

  • keep_labeled_data (bool) – If True the labeled data is going to be kept.

  • use_median (bool) – If True the intermediate medoids are computed as the median of each variable and the final medoids are computed as the median of each. Otherwise they are computed using the kmedoids algorithm.

  • allow_label_duplicates (bool) – If True allow to label multiple clusters with the same label

Returns:

  • labeled_data (2D-array) – matrix of size (nsamples, nvariables) containing the observations

  • labels (1D-array) – array with the labels index

  • medoids_dict (dict) – Dictionary containing the intermediate medoids for each hydrometeor type

  • final_medoids_dict (dict) – Dictionary containing the final medoids for each hydrometeor type