graphslim.condensation package

graphslim.condensation.gcond_base module

class graphslim.condensation.gcond_base.GCondBase(setting, data, args, **kwargs)[source]

Bases: object

A base class for graph condition generation and training.

Parameters:
  • setting (str) – The setting for the graph condensation process.

  • data (object) – The data object containing the dataset.

  • args (Namespace) – Arguments and hyperparameters for the model and training process.

  • **kwargs (keyword arguments) – Additional arguments for initialization.

check_bn(model)[source]

Checks if the model contains BatchNorm layers and fixes their mean and variance after training.

Parameters:

model (torch.nn.Module) – The model object.

Returns:

The model with BatchNorm layers fixed.

Return type:

torch.nn.Module

generate_labels_syn(data)[source]

Generates synthetic labels to match the target number of samples.

Parameters:

data (object) – The graph data object, which includes features, adjacency matrix, labels, etc.

Returns:

A numpy array of synthetic labels.

Return type:

np.ndarray

get_loops(args)[source]

Retrieves the outer-loop and inner-loop hyperparameters.

Parameters:

args (Namespace) – Arguments object containing hyperparameters for training and model.

Returns:

Outer-loop and inner-loop hyperparameters.

Return type:

tuple

init(with_adj=False, reuse_init=False)[source]

Initializes synthetic features and (optionally) adjacency matrix.

Parameters:

with_adj (bool, optional) – Whether to initialize the adjacency matrix (default is False).

Returns:

A tuple containing the synthetic features and (optionally) the adjacency matrix.

Return type:

tuple

intermediate_evaluation(best_val, loss_avg=None, save=True, save_valid_acc=False)[source]

Performs intermediate evaluation and saves the best model.

Parameters:
  • best_val (float) – The best validation accuracy observed so far.

  • loss_avg (float) – The average loss.

  • save (bool, optional) – Whether to save the model (default is True).

Returns:

The updated best validation accuracy.

Return type:

float

reset_parameters()[source]

Resets the parameters of the model.

test_with_val(verbose=False, setting='trans', iters=200, best_val=None)[source]

Conducts validation testing and returns results.

Parameters:
  • verbose (bool, optional) – Whether to print verbose output (default is False).

  • setting (str, optional) – The setting type (default is ‘trans’).

  • iters (int, optional) – Number of iterations for validation testing (default is 200).

Returns:

A list containing validation results.

Return type:

list

train_class(model, adj, features, labels, labels_syn, args, soft=False)[source]

Trains the model and computes the loss.

Parameters:
  • model (torch.nn.Module) – The model object.

  • adj (torch.Tensor) – The adjacency matrix.

  • features (torch.Tensor) – The feature matrix.

  • labels (torch.Tensor) – The actual labels.

  • labels_syn (torch.Tensor) – The synthetic labels.

  • args (Namespace) – Arguments object containing hyperparameters for training and model.

Returns:

The computed loss value.

Return type:

torch.Tensor

graphslim.condensation.gcond module

class graphslim.condensation.gcond.GCond(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Graph Condensation for Graph Neural Networks” https://cse.msu.edu/~jinwei2/files/GCond.pdf

reduce(data, verbose=True)[source]

graphslim.condensation.gcondx module

class graphslim.condensation.gcondx.GCondX(setting, data, args, **kwargs)[source]

Bases: GCondBase

A structure-free variant of GCond. “Graph Condensation for Graph Neural Networks” https://cse.msu.edu/~jinwei2/files/GCond.pdf

reduce(data, verbose=True)[source]

graphslim.condensation.doscond module

class graphslim.condensation.doscond.DosCond(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Condensing Graphs via One-Step Gradient Matching” https://arxiv.org/abs/2206.07746

reduce(data, verbose=True)[source]

graphslim.condensation.doscondx module

class graphslim.condensation.doscondx.DosCondX(setting, data, args, **kwargs)[source]

Bases: GCondBase

A structure-free variant of DosCond. “Condensing Graphs via One-Step Gradient Matching” https://arxiv.org/abs/2206.07746

reduce(data, verbose=True)[source]

graphslim.condensation.gcsntk module

class graphslim.condensation.gcsntk.GCSNTK(setting, data, args, **kwargs)[source]

Bases: GCondBase

“GFast Graph Conensation with Structure-based Neural Tangent Kernel” https://arxiv.org/pdf/2310.11046

reduce(data, verbose=True)[source]
test(KRR, G_t, G_s, y_t, y_s, E_t, E_s, loss_fn)[source]
train(KRR, G_t, G_s, y_t, y_s, E_t, E_s, loss_fn, optimizer, accumulate_steps=None, i=None, TRAIN_K=None)[source]

graphslim.condensation.geom module

class graphslim.condensation.geom.GEOM(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching.” https://arxiv.org/pdf/2402.05011.pdf

buffer_cl(data)[source]
expert_load()[source]
get_coreset_init(features, adj, labels)[source]
init_coreset_select(data)[source]
reduce(data, verbose=True)[source]
synset_save()[source]

graphslim.condensation.msgc module

class graphslim.condensation.msgc.FixLenList(lenth)[source]

Bases: object

append(element)[source]
class graphslim.condensation.msgc.MSGC(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Multiple sparse graphs condensation” https://www.sciencedirect.com/science/article/pii/S0950705123006548

generate_labels_syn(data)[source]

Generates synthetic labels to match the target number of samples.

Parameters:

data (object) – The graph data object, which includes features, adjacency matrix, labels, etc.

Returns:

A numpy array of synthetic labels.

Return type:

np.ndarray

get_adj_t_syn()[source]
reduce(data, verbose=True)[source]
reset_adj_batch()[source]

graphslim.condensation.sfgc module

class graphslim.condensation.sfgc.SFGC(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data.” https://arxiv.org/pdf/2306.02664.pdf

expert_load(expert_dir)[source]

randomly select one expert from expert files

reduce(data, verbose=True)[source]

graphslim.condensation.sgdd module

class graphslim.condensation.sgdd.SGDD(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Does Graph Distillation See Like Vision Dataset Counterpart?” https://openreview.net/pdf?id=VqIWgUVsXc

reduce(data, verbose=True)[source]

graphslim.condensation.utils module

graphslim.condensation.utils.GCF(adj, x, k=1)[source]

Apply Graph Convolution Filter (GCF) to features using the adjacency matrix.

Parameters:
  • adj (torch.Tensor) – Adjacency matrix of the graph. It must include self-loops. Shape: (N, N), where N is the number of nodes.

  • x (torch.Tensor) – Node features. Shape: (N, F), where F is the number of features for each node.

  • k (int, optional) – Number of hops (or layers) to apply the filter. Default is 1.

Returns:

Filtered features after applying the graph convolution. Shape: (N, F).

Return type:

torch.Tensor

graphslim.condensation.utils.difficulty_measurer(data, adj, label)[source]

Measure the difficulty of nodes in the graph based on their neighborhood label distribution.

Parameters:
  • data (Data) – PyG Data object containing node features and labels.

  • adj (torch.Tensor) – Sparse adjacency matrix of the graph. The shape is (N, N) where N is the number of nodes.

  • label (torch.Tensor) – Tensor containing the label of each node. Shape: (N,)

Returns:

Difficulty scores for each node. Higher scores indicate more difficult nodes.

Return type:

torch.Tensor

graphslim.condensation.utils.difficulty_measurer_in(data, adj, label)[source]

Measure the difficulty of each node in a graph based on local entropy of neighbor labels.

Parameters:
  • data (Data) – PyG Data object containing node features and labels.

  • adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.

  • label (torch.Tensor) – Tensor containing the label of each node (shape: N,).

Returns:

Tensor of local difficulty scores for each node.

Return type:

torch.Tensor

graphslim.condensation.utils.distance_wb(gwr, gws)[source]

Computes the distance between two tensors representing gradients using cosine similarity.

Parameters:
  • gwr (torch.Tensor) – The real gradient tensor.

  • gws (torch.Tensor) – The synthetic gradient tensor.

Returns:

The computed distance between the real and synthetic gradients.

Return type:

torch.Tensor

graphslim.condensation.utils.get_eigh(laplacian_matrix, data_name, save=True)[source]
graphslim.condensation.utils.get_embed_mean(embed_sum, label)[source]
graphslim.condensation.utils.get_embed_sum(eigenvals, eigenvecs, x)[source]
graphslim.condensation.utils.get_largest_cc(adj, num_nodes, data_name)[source]
graphslim.condensation.utils.get_subspace_covariance_matrix(eigenvecs, x)[source]
graphslim.condensation.utils.get_subspace_embed(eigenvecs, x)[source]
graphslim.condensation.utils.get_syn_eigen(real_eigenvals, real_eigenvecs, eigen_k, ratio, step=1)[source]
graphslim.condensation.utils.get_train_lcc(idx_lcc, idx_train, y_full, num_nodes, num_classes)[source]
graphslim.condensation.utils.load_eigen(dataset, load_path)[source]
graphslim.condensation.utils.match_loss(gw_syn, gw_real, args, device)[source]

Computes the loss between synthetic and real gradients based on the specified distance metric.

Parameters:
  • gw_syn (list of torch.Tensor) – List of synthetic gradients for different model parameters.

  • gw_real (list of torch.Tensor) – List of real gradients for different model parameters.

  • args (Namespace) – Arguments object containing hyperparameters for training and model.

  • device (torch.device) – Device (CPU or GPU) on which computations are performed.

Returns:

The computed distance (loss) between synthetic and real gradients.

Return type:

torch.Tensor

graphslim.condensation.utils.neighborhood_difficulty_measurer(data, adj, label)[source]

Measure the difficulty of neighborhoods in the graph based on the label distribution.

Parameters:
  • data (Data) – PyG Data object containing node features and labels.

  • adj (torch.Tensor) – Sparse adjacency matrix of the graph. The shape is (N, N) where N is the number of nodes.

  • label (torch.Tensor) – Tensor containing the label of each node. Shape: (N,)

Returns:

Difficulty scores for each node. Higher scores indicate more difficult neighborhoods.

Return type:

torch.Tensor

graphslim.condensation.utils.neighborhood_difficulty_measurer_in(data, adj, label)[source]

Measure the difficulty of each node in a graph based on the entropy of neighbor labels.

Parameters:
  • data (Data) – PyG Data object containing node features and labels.

  • adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.

  • label (torch.Tensor) – Tensor containing the label of each node (shape: N,).

Returns:

Tensor of local difficulty scores for each node.

Return type:

torch.Tensor

graphslim.condensation.utils.normalize_data(data)[source]

Normalize the input data using mean and standard deviation.

Parameters:

data (torch.Tensor) – The data to be normalized. Each column represents a feature, and normalization is applied to each feature independently.

Returns:

The normalized data where each feature has zero mean and unit variance.

Return type:

torch.Tensor

graphslim.condensation.utils.sort_training_nodes(data, adj, label)[source]

Sort training nodes based on their difficulty measured by neighborhood label distribution.

Parameters:
  • data (Data) – PyG Data object containing node features and labels.

  • adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.

  • label (torch.Tensor) – Tensor containing the label of each node (shape: N,).

Returns:

Indices of the training nodes sorted by their difficulty, from easiest to hardest.

Return type:

numpy.ndarray

graphslim.condensation.utils.sort_training_nodes_in(data, adj, label)[source]

Sort training nodes based on their difficulty scores in ascending order.

Parameters:
  • data (Data) – PyG Data object containing node features and labels.

  • adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.

  • label (torch.Tensor) – Tensor containing the label of each node (shape: N,).

Returns:

Indices of training nodes sorted by difficulty scores.

Return type:

numpy.ndarray

graphslim.condensation.utils.sub_E(idx, A)[source]

Generates a sparse adjacency matrix of the subgraph defined by the given indices.

Parameters:
  • idx (torch.Tensor) – A tensor containing the indices of the nodes that define the subgraph.

  • A (torch.Tensor) – The original adjacency matrix of the graph.

Returns:

The sparse adjacency matrix of the subgraph.

Return type:

torch.sparse_coo_tensor

graphslim.condensation.utils.training_scheduler(lam, t, T, scheduler='geom')[source]

Adjust the value of a parameter based on the chosen scheduling strategy.

Parameters:
  • lam (float) – The initial value or a baseline value for the parameter (0 <= lam <= 1).

  • t (int) – The current training iteration or epoch.

  • T (int) – The total number of training iterations or epochs.

  • scheduler (str, optional) – The type of scheduling strategy to use. Options are ‘linear’, ‘root’, or ‘geom’. Default is ‘geom’.

Returns:

The adjusted value of the parameter at iteration t based on the scheduling strategy.

Return type:

float

graphslim.condensation.utils.update_E(x_s, neig)[source]

Update the adjacency matrix based on the features of the nodes and the average number of neighbors.

Parameters:
  • x_s (torch.Tensor) – A tensor containing the feature vectors of the nodes.

  • neig (float) – The average number of neighbors each node should have.

Returns:

The sparse adjacency matrix based on the updated similarities.

Return type:

torch.sparse_coo_tensor