Quickstart Guide ================ This guide will walk you through the basics of the different classes of estimators and how to use them. Discrete estimators ------------------- The basic form of the discrete estimators is a Python dictionary, which represents the joint probability distribution over some set of elements. The keys are tuples (which can be intergers, strings, tuples, anything), and the values are the probabilities. .. code-block:: python # An example AND distribution joint_distribution = { (0,0,0) : 1/4, (0,1,0) : 1/4, (1,0,0) : 1/4, (1,1,1) : 1/4, } The discrete estimators generally take in indices, represented as tuples of integers that give in the indices of the subsets of elements being considered, and return two values: a dictionary of the pointwise (local) measures, and the expected values. .. code-block:: python from syntropy.discrete import ( conditional_entropy, mutual_information ) lce, ce = conditional_entropy( idxs_x = (0,), idxs_y = (2,), joint_distribution = joint_distribution ) lmi, mi = mutual_information( idxs_x = (0,1), idxs_y = (2,), joint_distribution = joint_distribution ) The pointwise dictionary keys are a tuple of tuples, indicating the states of all elements in `idxs_x` and `idxs_y` respectively. .. code-block:: python lce = { ((1,),(0,)) : 1.585, ((1,),(1,)) : 0, ((0,),(0,)) : 0.585 } lmi = { ((1, 0), (0,)): 0.415, ((0, 0), (0,)): 0.415, ((1, 1), (1,)): 2.0), ((0, 1), (0,)): 0.415 } Gaussian estimators ------------------- The Gaussian estimators are computed from continuous, multidimensional numpy arrays. Unlike the discrete and KNN estimators, there are separate functions for expected and local measures - this was done because, in my cases, expected values can be computed from the covariance matrix directly, which is much more efficient than vectorized local computation .. code-block:: python import numpy as np from syntropy.gaussian import ( mutual_information, local_mutual_information ) num_samples = 100_000 rand = np.random.randn(num_samples) data = np.zeros((3, num_samples)) data[0,:] = rand data[1,:] = 0.5 * data[0,:] + np.sqrt(1 - 0.5**2) + np.random.randn(num_samples) data[2,:] = -0.3 * data[0,:] + np.sqrt(1 - -0.3**2) + np.random.randn(num_samples) cov = np.cov(data, ddof=0.0) mi = mutual_information( idxs_x = (0,1), idxs_y = (2,), cov = cov ) lmi = local_mutual_information( idxs_x = (0,1), idxs_y = (2,), data = data ) The average mutual information is a floating point value, while the local mutual information is a 1-dimensional Numpy array, with one cell for each sample KNN estimators -------------- K-nearest neighbors estimators behave similarly to the local Gaussian estimators: they take in the full, multidimensional dataset, and given a `k` value, returns both the local and average values. .. code-block:: python from syntropy.knn import mutual_information # Using the same data as above. num_samples = 10_000 rand = np.random.randn(num_samples) data = np.zeros((3, num_samples)) data[0,:] = rand data[1,:] = 0.5 * data[0,:] + np.sqrt(1 - 0.5**2) + np.random.randn(num_samples) data[2,:] = -0.3 * data[0,:] + np.sqrt(1 - -0.3**2) + np.random.randn(num_samples) lmi, mi = mutual_information( idxs_x = (0,1), idxs_y = (2,), data = data, k=4, algorithm = 1 ) Once again, the `lmi` object is a Numpy array, while mi is a floating point value Neural estimators ----------------- The neural estimators use normalizing flows to estimate information-theoretic quantities for continuous random variables. They require PyTorch tensors as input and use the `nflows` library under the hood. .. code-block:: python import torch from syntropy.neural import ( differential_entropy, mutual_information, total_correlation, higher_order_information ) # Generate sample data (samples x features format) num_samples = 10_000 rand = torch.randn(num_samples) data = torch.zeros((num_samples, 3)) data[:, 0] = rand data[:, 1] = 0.5 * data[:, 0] + torch.sqrt(torch.tensor(1 - 0.5**2)) * torch.randn(num_samples) data[:, 2] = -0.3 * data[:, 0] + torch.sqrt(torch.tensor(1 - 0.3**2)) * torch.randn(num_samples) # Compute mutual information ptw_mi, mi = mutual_information( idxs_x=(0, 1), idxs_y=(2,), data=data, verbose=True # Print training progress ) The neural estimators return both pointwise (local) values as a tensor and the average value as a float. You can also pass separate test data using the ``data_test`` parameter for out-of-sample evaluation. For multivariate measures like total correlation and O-information: .. code-block:: python # Compute total correlation tc = total_correlation( idxs=(0, 1, 2), data=data, verbose=True ) # Compute O-information, S-information, total correlation, and dual total correlation together results = higher_order_information( idxs=(0, 1, 2), data=data, verbose=True ) print(results["o_information"]) print(results["s_information"]) print(results["total_correlation"]) print(results["dual_total_correlation"]) Customizing the normalizing flow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can customize the normalizing flow architecture and training process using ``flow_kwargs`` and ``train_kwargs``: .. code-block:: python # Custom flow architecture flow_kwargs = { "num_layers": 8, # Number of flow layers (default: 5) "hidden_features": 128, # Neurons per hidden layer (default: 64) "dropout_probability": 0.2 # Dropout rate (default: 0.1) } # Custom training parameters train_kwargs = { "batch_size": 512, # Batch size (default: 256) "lr": 1e-3, # Learning rate (default: 1e-4) "num_epochs": 200, # Training epochs (default: 100) "weight_decay": 1e-4, # L2 regularization (default: 1e-5) "convergence_threshold": 0.01 # Early stopping threshold (default: 0.0) } ptw_mi, mi = mutual_information( idxs_x=(0,), idxs_y=(1,), data=data, flow_kwargs=flow_kwargs, train_kwargs=train_kwargs ) The default hyperparameters work reasonably well for most cases, but may need tuning for specific applications. Sample sizes of 10,000 or more typically produce reliable convergence. Mixed estimators ---------------- Mixed estimators estimate the entropy or mutual information between a discrete and continuous random variable (or set of random variables). Currently, a Gaussian estimator is used for the continuous entropies, although a KNN-based alternative will also be introduced. .. code-block:: python from syntropy.mixed import mutual_information # Using the same data as above. num_samples = 10_000 rand = np.random.randn(num_samples) data = np.zeros((2, num_samples)) data[0,:] = rand data[1,:] = 0.5 * data[0,:] + np.sqrt(1 - 0.5**2) + np.random.randn(num_samples) continuous = data[[0],:] discrete = (data[[1],:] > 0).astype(int) ptw, mi = mutual_information(discrete_vars = discrete, continuous_vars = continuous).