graphslim.condensation package
graphslim.condensation.gcond_base module
- class graphslim.condensation.gcond_base.GCondBase(setting, data, args, **kwargs)[source]
Bases:
objectA base class for graph condition generation and training.
- Parameters:
setting (str) – The setting for the graph condensation process.
data (object) – The data object containing the dataset.
args (Namespace) – Arguments and hyperparameters for the model and training process.
**kwargs (keyword arguments) – Additional arguments for initialization.
- check_bn(model)[source]
Checks if the model contains BatchNorm layers and fixes their mean and variance after training.
- Parameters:
model (torch.nn.Module) – The model object.
- Returns:
The model with BatchNorm layers fixed.
- Return type:
torch.nn.Module
- generate_labels_syn(data)[source]
Generates synthetic labels to match the target number of samples.
- Parameters:
data (object) – The graph data object, which includes features, adjacency matrix, labels, etc.
- Returns:
A numpy array of synthetic labels.
- Return type:
np.ndarray
- get_loops(args)[source]
Retrieves the outer-loop and inner-loop hyperparameters.
- Parameters:
args (Namespace) – Arguments object containing hyperparameters for training and model.
- Returns:
Outer-loop and inner-loop hyperparameters.
- Return type:
tuple
- init(with_adj=False, reuse_init=False)[source]
Initializes synthetic features and (optionally) adjacency matrix.
- Parameters:
with_adj (bool, optional) – Whether to initialize the adjacency matrix (default is False).
- Returns:
A tuple containing the synthetic features and (optionally) the adjacency matrix.
- Return type:
tuple
- intermediate_evaluation(best_val, loss_avg=None, save=True, save_valid_acc=False)[source]
Performs intermediate evaluation and saves the best model.
- Parameters:
best_val (float) – The best validation accuracy observed so far.
loss_avg (float) – The average loss.
save (bool, optional) – Whether to save the model (default is True).
- Returns:
The updated best validation accuracy.
- Return type:
float
- test_with_val(verbose=False, setting='trans', iters=200, best_val=None)[source]
Conducts validation testing and returns results.
- Parameters:
verbose (bool, optional) – Whether to print verbose output (default is False).
setting (str, optional) – The setting type (default is ‘trans’).
iters (int, optional) – Number of iterations for validation testing (default is 200).
- Returns:
A list containing validation results.
- Return type:
list
- train_class(model, adj, features, labels, labels_syn, args, soft=False)[source]
Trains the model and computes the loss.
- Parameters:
model (torch.nn.Module) – The model object.
adj (torch.Tensor) – The adjacency matrix.
features (torch.Tensor) – The feature matrix.
labels (torch.Tensor) – The actual labels.
labels_syn (torch.Tensor) – The synthetic labels.
args (Namespace) – Arguments object containing hyperparameters for training and model.
- Returns:
The computed loss value.
- Return type:
torch.Tensor
graphslim.condensation.gcond module
graphslim.condensation.gcondx module
graphslim.condensation.doscond module
graphslim.condensation.doscondx module
graphslim.condensation.gcsntk module
- class graphslim.condensation.gcsntk.GCSNTK(setting, data, args, **kwargs)[source]
Bases:
GCondBase“GFast Graph Conensation with Structure-based Neural Tangent Kernel” https://arxiv.org/pdf/2310.11046
graphslim.condensation.geom module
- class graphslim.condensation.geom.GEOM(setting, data, args, **kwargs)[source]
Bases:
GCondBase“Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching.” https://arxiv.org/pdf/2402.05011.pdf
graphslim.condensation.msgc module
- class graphslim.condensation.msgc.MSGC(setting, data, args, **kwargs)[source]
Bases:
GCondBase“Multiple sparse graphs condensation” https://www.sciencedirect.com/science/article/pii/S0950705123006548
graphslim.condensation.sfgc module
- class graphslim.condensation.sfgc.SFGC(setting, data, args, **kwargs)[source]
Bases:
GCondBase“Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data.” https://arxiv.org/pdf/2306.02664.pdf
graphslim.condensation.sgdd module
graphslim.condensation.utils module
- graphslim.condensation.utils.GCF(adj, x, k=1)[source]
Apply Graph Convolution Filter (GCF) to features using the adjacency matrix.
- Parameters:
adj (torch.Tensor) – Adjacency matrix of the graph. It must include self-loops. Shape: (N, N), where N is the number of nodes.
x (torch.Tensor) – Node features. Shape: (N, F), where F is the number of features for each node.
k (int, optional) – Number of hops (or layers) to apply the filter. Default is 1.
- Returns:
Filtered features after applying the graph convolution. Shape: (N, F).
- Return type:
torch.Tensor
- graphslim.condensation.utils.difficulty_measurer(data, adj, label)[source]
Measure the difficulty of nodes in the graph based on their neighborhood label distribution.
- Parameters:
data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph. The shape is (N, N) where N is the number of nodes.
label (torch.Tensor) – Tensor containing the label of each node. Shape: (N,)
- Returns:
Difficulty scores for each node. Higher scores indicate more difficult nodes.
- Return type:
torch.Tensor
- graphslim.condensation.utils.difficulty_measurer_in(data, adj, label)[source]
Measure the difficulty of each node in a graph based on local entropy of neighbor labels.
- Parameters:
data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.
label (torch.Tensor) – Tensor containing the label of each node (shape: N,).
- Returns:
Tensor of local difficulty scores for each node.
- Return type:
torch.Tensor
- graphslim.condensation.utils.distance_wb(gwr, gws)[source]
Computes the distance between two tensors representing gradients using cosine similarity.
- Parameters:
gwr (torch.Tensor) – The real gradient tensor.
gws (torch.Tensor) – The synthetic gradient tensor.
- Returns:
The computed distance between the real and synthetic gradients.
- Return type:
torch.Tensor
- graphslim.condensation.utils.get_syn_eigen(real_eigenvals, real_eigenvecs, eigen_k, ratio, step=1)[source]
- graphslim.condensation.utils.get_train_lcc(idx_lcc, idx_train, y_full, num_nodes, num_classes)[source]
- graphslim.condensation.utils.match_loss(gw_syn, gw_real, args, device)[source]
Computes the loss between synthetic and real gradients based on the specified distance metric.
- Parameters:
gw_syn (list of torch.Tensor) – List of synthetic gradients for different model parameters.
gw_real (list of torch.Tensor) – List of real gradients for different model parameters.
args (Namespace) – Arguments object containing hyperparameters for training and model.
device (torch.device) – Device (CPU or GPU) on which computations are performed.
- Returns:
The computed distance (loss) between synthetic and real gradients.
- Return type:
torch.Tensor
- graphslim.condensation.utils.neighborhood_difficulty_measurer(data, adj, label)[source]
Measure the difficulty of neighborhoods in the graph based on the label distribution.
- Parameters:
data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph. The shape is (N, N) where N is the number of nodes.
label (torch.Tensor) – Tensor containing the label of each node. Shape: (N,)
- Returns:
Difficulty scores for each node. Higher scores indicate more difficult neighborhoods.
- Return type:
torch.Tensor
- graphslim.condensation.utils.neighborhood_difficulty_measurer_in(data, adj, label)[source]
Measure the difficulty of each node in a graph based on the entropy of neighbor labels.
- Parameters:
data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.
label (torch.Tensor) – Tensor containing the label of each node (shape: N,).
- Returns:
Tensor of local difficulty scores for each node.
- Return type:
torch.Tensor
- graphslim.condensation.utils.normalize_data(data)[source]
Normalize the input data using mean and standard deviation.
- Parameters:
data (torch.Tensor) – The data to be normalized. Each column represents a feature, and normalization is applied to each feature independently.
- Returns:
The normalized data where each feature has zero mean and unit variance.
- Return type:
torch.Tensor
- graphslim.condensation.utils.sort_training_nodes(data, adj, label)[source]
Sort training nodes based on their difficulty measured by neighborhood label distribution.
- Parameters:
data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.
label (torch.Tensor) – Tensor containing the label of each node (shape: N,).
- Returns:
Indices of the training nodes sorted by their difficulty, from easiest to hardest.
- Return type:
numpy.ndarray
- graphslim.condensation.utils.sort_training_nodes_in(data, adj, label)[source]
Sort training nodes based on their difficulty scores in ascending order.
- Parameters:
data (Data) – PyG Data object containing node features and labels.
adj (torch.Tensor) – Sparse adjacency matrix of the graph (shape: N x N) with self-loops.
label (torch.Tensor) – Tensor containing the label of each node (shape: N,).
- Returns:
Indices of training nodes sorted by difficulty scores.
- Return type:
numpy.ndarray
- graphslim.condensation.utils.sub_E(idx, A)[source]
Generates a sparse adjacency matrix of the subgraph defined by the given indices.
- Parameters:
idx (torch.Tensor) – A tensor containing the indices of the nodes that define the subgraph.
A (torch.Tensor) – The original adjacency matrix of the graph.
- Returns:
The sparse adjacency matrix of the subgraph.
- Return type:
torch.sparse_coo_tensor
- graphslim.condensation.utils.training_scheduler(lam, t, T, scheduler='geom')[source]
Adjust the value of a parameter based on the chosen scheduling strategy.
- Parameters:
lam (float) – The initial value or a baseline value for the parameter (0 <= lam <= 1).
t (int) – The current training iteration or epoch.
T (int) – The total number of training iterations or epochs.
scheduler (str, optional) – The type of scheduling strategy to use. Options are ‘linear’, ‘root’, or ‘geom’. Default is ‘geom’.
- Returns:
The adjusted value of the parameter at iteration t based on the scheduling strategy.
- Return type:
float
- graphslim.condensation.utils.update_E(x_s, neig)[source]
Update the adjacency matrix based on the features of the nodes and the average number of neighbors.
- Parameters:
x_s (torch.Tensor) – A tensor containing the feature vectors of the nodes.
neig (float) – The average number of neighbors each node should have.
- Returns:
The sparse adjacency matrix based on the updated similarities.
- Return type:
torch.sparse_coo_tensor