graphslim.condensation package

graphslim.condensation.gcond_base module

class graphslim.condensation.gcond_base.GCondBase(setting, data, args, **kwargs)[source]

Bases: object

A base class for graph condition generation and training.

Parameters:

setting (str) – The setting for the graph condensation process.
data (object) – The data object containing the dataset.
args (Namespace) – Arguments and hyperparameters for the model and training process.
**kwargs (keyword arguments) – Additional arguments for initialization.

check_bn(model)[source]

Checks if the model contains BatchNorm layers and fixes their mean and variance after training.

Parameters:: model (torch.nn.Module) – The model object.
Returns:: The model with BatchNorm layers fixed.
Return type:: torch.nn.Module

generate_labels_syn(data)[source]

Generates synthetic labels to match the target number of samples.

Parameters:: data (object) – The graph data object, which includes features, adjacency matrix, labels, etc.
Returns:: A numpy array of synthetic labels.
Return type:: np.ndarray

get_loops(args)[source]

Retrieves the outer-loop and inner-loop hyperparameters.

Parameters:: args (Namespace) – Arguments object containing hyperparameters for training and model.
Returns:: Outer-loop and inner-loop hyperparameters.
Return type:: tuple

init(with_adj=False, reuse_init=False)[source]

Initializes synthetic features and (optionally) adjacency matrix.

Parameters:: with_adj (bool, optional) – Whether to initialize the adjacency matrix (default is False).
Returns:: A tuple containing the synthetic features and (optionally) the adjacency matrix.
Return type:: tuple

intermediate_evaluation(best_val, loss_avg=None, save=True, save_valid_acc=False)[source]

Performs intermediate evaluation and saves the best model.

Parameters:

best_val (float) – The best validation accuracy observed so far.
loss_avg (float) – The average loss.
save (bool, optional) – Whether to save the model (default is True).

Returns:

The updated best validation accuracy.

Return type:

float

reset_parameters()[source]: Resets the parameters of the model.

test_with_val(verbose=False, setting='trans', iters=200, best_val=None)[source]

Conducts validation testing and returns results.

Parameters:

verbose (bool, optional) – Whether to print verbose output (default is False).
setting (str, optional) – The setting type (default is ‘trans’).
iters (int, optional) – Number of iterations for validation testing (default is 200).

Returns:

A list containing validation results.

Return type:

list

train_class(model, adj, features, labels, labels_syn, args, soft=False)[source]

Trains the model and computes the loss.

Parameters:

model (torch.nn.Module) – The model object.
adj (torch.Tensor) – The adjacency matrix.
features (torch.Tensor) – The feature matrix.
labels (torch.Tensor) – The actual labels.
labels_syn (torch.Tensor) – The synthetic labels.
args (Namespace) – Arguments object containing hyperparameters for training and model.

Returns:

The computed loss value.

Return type:

torch.Tensor

graphslim.condensation.gcond module

class graphslim.condensation.gcond.GCond(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Graph Condensation for Graph Neural Networks” https://cse.msu.edu/~jinwei2/files/GCond.pdf

reduce(data, verbose=True)[source]

graphslim.condensation.gcondx module

class graphslim.condensation.gcondx.GCondX(setting, data, args, **kwargs)[source]

Bases: GCondBase

A structure-free variant of GCond. “Graph Condensation for Graph Neural Networks” https://cse.msu.edu/~jinwei2/files/GCond.pdf

reduce(data, verbose=True)[source]

graphslim.condensation.doscond module

class graphslim.condensation.doscond.DosCond(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Condensing Graphs via One-Step Gradient Matching” https://arxiv.org/abs/2206.07746

reduce(data, verbose=True)[source]

graphslim.condensation.doscondx module

class graphslim.condensation.doscondx.DosCondX(setting, data, args, **kwargs)[source]

Bases: GCondBase

A structure-free variant of DosCond. “Condensing Graphs via One-Step Gradient Matching” https://arxiv.org/abs/2206.07746

reduce(data, verbose=True)[source]

graphslim.condensation.gcsntk module

class graphslim.condensation.gcsntk.GCSNTK(setting, data, args, **kwargs)[source]

Bases: GCondBase

“GFast Graph Conensation with Structure-based Neural Tangent Kernel” https://arxiv.org/pdf/2310.11046

reduce(data, verbose=True)[source]

test(KRR, G_t, G_s, y_t, y_s, E_t, E_s, loss_fn)[source]

train(KRR, G_t, G_s, y_t, y_s, E_t, E_s, loss_fn, optimizer, accumulate_steps=None, i=None, TRAIN_K=None)[source]

graphslim.condensation.geom module

class graphslim.condensation.geom.GEOM(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching.” https://arxiv.org/pdf/2402.05011.pdf

buffer_cl(data)[source]

expert_load()[source]

get_coreset_init(features, adj, labels)[source]

init_coreset_select(data)[source]

reduce(data, verbose=True)[source]

synset_save()[source]

graphslim.condensation.msgc module

class graphslim.condensation.msgc.FixLenList(lenth)[source]

Bases: object

append(element)[source]

class graphslim.condensation.msgc.MSGC(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Multiple sparse graphs condensation” https://www.sciencedirect.com/science/article/pii/S0950705123006548

generate_labels_syn(data)[source]

Generates synthetic labels to match the target number of samples.

Parameters:: data (object) – The graph data object, which includes features, adjacency matrix, labels, etc.
Returns:: A numpy array of synthetic labels.
Return type:: np.ndarray

get_adj_t_syn()[source]

reduce(data, verbose=True)[source]

reset_adj_batch()[source]

graphslim.condensation.sfgc module

class graphslim.condensation.sfgc.SFGC(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data.” https://arxiv.org/pdf/2306.02664.pdf

expert_load(expert_dir)[source]: randomly select one expert from expert files

reduce(data, verbose=True)[source]

graphslim.condensation.sgdd module

class graphslim.condensation.sgdd.SGDD(setting, data, args, **kwargs)[source]

Bases: GCondBase

“Does Graph Distillation See Like Vision Dataset Counterpart?” https://openreview.net/pdf?id=VqIWgUVsXc

reduce(data, verbose=True)[source]

graphslim.condensation.utils module

graphslim.condensation.utils.GCF(adj, x, k=1)[source]

Apply Graph Convolution Filter (GCF) to features using the adjacency matrix.

Parameters:

adj (torch.Tensor) – Adjacency matrix of the graph. It must include self-loops. Shape: (N, N), where N is the number of nodes.
x (torch.Tensor) – Node features. Shape: (N, F), where F is the number of features for each node.
k (int, optional) – Number of hops (or layers) to apply the filter. Default is 1.

Returns:

Filtered features after applying the graph convolution. Shape: (N, F).

Return type:

torch.Tensor

graphslim.condensation.utils.difficulty_measurer(data, adj, label)[source]

Measure the difficulty of nodes in the graph based on their neighborhood label distribution.

Parameters:

data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph. The shape is (N, N) where N is the number of nodes.
label (torch.Tensor) – Tensor containing the label of each node. Shape: (N,)

Returns:

Difficulty scores for each node. Higher scores indicate more difficult nodes.

Return type:

torch.Tensor

graphslim.condensation.utils.difficulty_measurer_in(data, adj, label)[source]

Measure the difficulty of each node in a graph based on local entropy of neighbor labels.

Parameters:

data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.
label (torch.Tensor) – Tensor containing the label of each node (shape: N,).

Returns:

Tensor of local difficulty scores for each node.

Return type:

torch.Tensor

graphslim.condensation.utils.distance_wb(gwr, gws)[source]

Computes the distance between two tensors representing gradients using cosine similarity.

Parameters:

gwr (torch.Tensor) – The real gradient tensor.
gws (torch.Tensor) – The synthetic gradient tensor.

Returns:

The computed distance between the real and synthetic gradients.

Return type:

torch.Tensor

graphslim.condensation.utils.get_eigh(laplacian_matrix, data_name, save=True)[source]

graphslim.condensation.utils.get_embed_mean(embed_sum, label)[source]

graphslim.condensation.utils.get_embed_sum(eigenvals, eigenvecs, x)[source]

graphslim.condensation.utils.get_largest_cc(adj, num_nodes, data_name)[source]

graphslim.condensation.utils.get_subspace_covariance_matrix(eigenvecs, x)[source]

graphslim.condensation.utils.get_subspace_embed(eigenvecs, x)[source]

graphslim.condensation.utils.get_syn_eigen(real_eigenvals, real_eigenvecs, eigen_k, ratio, step=1)[source]

graphslim.condensation.utils.get_train_lcc(idx_lcc, idx_train, y_full, num_nodes, num_classes)[source]

graphslim.condensation.utils.load_eigen(dataset, load_path)[source]

graphslim.condensation.utils.match_loss(gw_syn, gw_real, args, device)[source]

Computes the loss between synthetic and real gradients based on the specified distance metric.

Parameters:

gw_syn (list of torch.Tensor) – List of synthetic gradients for different model parameters.
gw_real (list of torch.Tensor) – List of real gradients for different model parameters.
args (Namespace) – Arguments object containing hyperparameters for training and model.
device (torch.device) – Device (CPU or GPU) on which computations are performed.

Returns:

The computed distance (loss) between synthetic and real gradients.

Return type:

torch.Tensor

graphslim.condensation.utils.neighborhood_difficulty_measurer(data, adj, label)[source]

Measure the difficulty of neighborhoods in the graph based on the label distribution.

Parameters:

data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph. The shape is (N, N) where N is the number of nodes.
label (torch.Tensor) – Tensor containing the label of each node. Shape: (N,)

Returns:

Difficulty scores for each node. Higher scores indicate more difficult neighborhoods.

Return type:

torch.Tensor

graphslim.condensation.utils.neighborhood_difficulty_measurer_in(data, adj, label)[source]

Measure the difficulty of each node in a graph based on the entropy of neighbor labels.

Parameters:

data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.
label (torch.Tensor) – Tensor containing the label of each node (shape: N,).

Returns:

Tensor of local difficulty scores for each node.

Return type:

torch.Tensor

graphslim.condensation.utils.normalize_data(data)[source]

Normalize the input data using mean and standard deviation.

Parameters:: data (torch.Tensor) – The data to be normalized. Each column represents a feature, and normalization is applied to each feature independently.
Returns:: The normalized data where each feature has zero mean and unit variance.
Return type:: torch.Tensor

graphslim.condensation.utils.sort_training_nodes(data, adj, label)[source]

Sort training nodes based on their difficulty measured by neighborhood label distribution.

Parameters:

data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.
label (torch.Tensor) – Tensor containing the label of each node (shape: N,).

Returns:

Indices of the training nodes sorted by their difficulty, from easiest to hardest.

Return type:

numpy.ndarray

graphslim.condensation.utils.sort_training_nodes_in(data, adj, label)[source]

Sort training nodes based on their difficulty scores in ascending order.

Parameters:

data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.
label (torch.Tensor) – Tensor containing the label of each node (shape: N,).

Returns:

Indices of training nodes sorted by difficulty scores.

Return type:

numpy.ndarray

graphslim.condensation.utils.sub_E(idx, A)[source]

Generates a sparse adjacency matrix of the subgraph defined by the given indices.

Parameters:

idx (torch.Tensor) – A tensor containing the indices of the nodes that define the subgraph.
A (torch.Tensor) – The original adjacency matrix of the graph.

Returns:

The sparse adjacency matrix of the subgraph.

Return type:

torch.sparse_coo_tensor

graphslim.condensation.utils.training_scheduler(lam, t, T, scheduler='geom')[source]

Adjust the value of a parameter based on the chosen scheduling strategy.

Parameters:

lam (float) – The initial value or a baseline value for the parameter (0 <= lam <= 1).
t (int) – The current training iteration or epoch.
T (int) – The total number of training iterations or epochs.
scheduler (str, optional) – The type of scheduling strategy to use. Options are ‘linear’, ‘root’, or ‘geom’. Default is ‘geom’.

Returns:

The adjusted value of the parameter at iteration t based on the scheduling strategy.

Return type:

float

graphslim.condensation.utils.update_E(x_s, neig)[source]

Update the adjacency matrix based on the features of the nodes and the average number of neighbors.

Parameters:

x_s (torch.Tensor) – A tensor containing the feature vectors of the nodes.
neig (float) – The average number of neighbors each node should have.

Returns:

The sparse adjacency matrix based on the updated similarities.

Return type:

torch.sparse_coo_tensor