ClusterizeInitializer#
- class ClusterizeInitializer(is_accurate, is_soft, clusterizer)[source]#
Bases:
InitializerCluster-based initializer for mixture model parameters.
This initializer uses clustering algorithms to partition the data and then estimates initial parameters for mixture components based on the clustering results. Supports both hard clustering (crisp assignments) and soft clustering (fuzzy assignments). For homogeneous mixtures fast (is_accurate = False) initialization is recommended.
- Variables:
MIN_SAMPLES (int) – Minimum number of samples required for a cluster to be considered valid. Mapping of cluster matching strategies to their implementation functions.
n_components (Optional[int]) – Number of mixture components to initialize.
cluster_match_strategy (ClusterMatchStrategy) – Strategy for matching clusters to distribution models.
estimation_strategies (list[EstimationStrategy]) – List of estimation strategies for each distribution model.
models (list[ContinuousDistribution]) – List of distribution models to initialize.
- Parameters:
is_accurate (bool) – If True, uses accurate initialization with optimal cluster-model matching. If False, uses fast initialization with direct cluster assignments.
is_soft (bool) – If True, uses soft clustering (fuzzy assignments). If False, uses hard clustering (crisp assignments).
clusterizer (Any) – The clustering algorithm instance. Must have fit_transform method for soft clustering or fit_predict method for hard clustering.
Methods
- perform(X, dists, cluster_match_strategy, estimation_strategies, optimizer)
Performs cluster-based initialization of mixture model parameters.
Notes
Supported Clustering Types
Soft clustering: Requires clusterizer with fit_transform method that returns a weight matrix where each element represents the probability of a data point belonging to a cluster.
Hard clustering: Requires clusterizer with fit_predict method that returns cluster labels for each data point.
Initialization Modes
Accurate mode: Uses optimal cluster-model matching based on the specified strategy (likelihood or AIC e.t.c). Evaluates multiple assignments to find the best fit.
Fast mode: Directly assigns each cluster to a model in order, providing faster but potentially suboptimal initialization.
Error Handling
Validates clusterizer compatibility with the specified clustering type.
Handles outliers in hard clustering by distributing weights evenly.
Falls back to fast initialization if accurate initialization fails.