pyart.retrieve.compute_centroids#
- pyart.retrieve.compute_centroids(features_matrix, weight=(1.0, 1.0, 1.0, 1.0, 0.75), var_names=('dBZ', 'ZDR', 'KDP', 'RhoHV', 'H_ISO0'), hydro_names=('AG', 'CR', 'LR', 'RP', 'RN', 'VI', 'WS', 'MH', 'IH/HDG'), nsamples_iter=20000, external_iterations=30, internal_iterations=10, alpha=0.01, cv_approach=True, num_samples_arr=(30, 35, 40), n_samples_syn=50, nmedoids_min=1, acceptance_threshold=0.5, band='C', relh_slope=0.001, parallelized=False, sample_data=True, kmax_iter=100, nsamples_small=40000, sampling_size_clara=10000, niter_clara=5, keep_labeled_data=True, use_median=False, allow_label_duplicates=False)[source]#
Given a features matrix computes the centroids
- Parameters:
features_matrix (2D-array) – matrix of size (nsamples, nvariables)
weight (tuple) – Weight given to each feature in the KS test
var_names (tupple) – List of name variables
hydro_names (tupple) – List of hydrometeor types
nsamples_iter (int) – Number of samples of the features matrix in each external iteration
external_iterations (int) – Number of iterations of the external loop. This number will determine how many medoids are computed for each hydrometeor class.
internal_iterations (int) – Maximum number of iterations of the internal loop
acceptance_threshold (float) – Threshold on the inter-quantile coefficient of dispersion of the medoids above which the medoid of the class is not acceptable.
alpha (float) – Minimum value to accept the cluster according to p
cv_approach (bool) – If true it is used a critical value approach to reject or accept similarity between observations and reference. If false it is used a p-value approach
num_samples_arr (1D-array) – Array containing the possible number of observation samples to use when comparing with reference
n_samples_syn (int) – Number of samples from reference used in comparison
nmedoids_min (int) – Minimum number of valid intermediate medoids to compute a final medoid
band (str) – Frequency band of the radar data. Can be C, S or X
relh_slope (float) – The slope used to transform the height relative to the iso0 into a sigmoid function.
parallelized (bool) – If True the processing is going to be parallelized
sample_data (bool) – If True the data is going to be sampled at each external loop
kmax_iter (int) – Maximum number of iterations of the kmedoids algorithm
nsamples_small (int) – Maximum number before using the k-medoids CLARA algorithm. If this number is exceeded the CLARA algorithm will be used
sampling_size_clara (int) – Number of samples used in each iteration of the k-medoids CLARA algorithm.
niter_clara (int) – Number of iterations performed by the k-medoids CLARA algorithm
keep_labeled_data (bool) – If True the labeled data is going to be kept.
use_median (bool) – If True the intermediate medoids are computed as the median of each variable and the final medoids are computed as the median of each. Otherwise they are computed using the kmedoids algorithm.
allow_label_duplicates (bool) – If True allow to label multiple clusters with the same label
- Returns:
labeled_data (2D-array) – matrix of size (nsamples, nvariables) containing the observations
labels (1D-array) – array with the labels index
medoids_dict (dict) – Dictionary containing the intermediate medoids for each hydrometeor type
final_medoids_dict (dict) – Dictionary containing the final medoids for each hydrometeor type