Skip to contents

Stacks multiple GAT layers with multi-head attention.

Usage

model_gat(
  in_features,
  hidden_dims,
  out_features,
  heads = 8,
  out_heads = 1,
  activation = nnf_elu,
  out_activation = NULL,
  dropout = 0.6,
  att_dropout = 0.6,
  negative_slope = 0.2
)

Arguments

in_features

Integer. Number of input features per node

hidden_dims

Integer vector. Dimensions of hidden layers (length = L)

out_features

Integer. Number of output features (typically 1 for regression)

heads

Integer. Number of attention heads for hidden layers. Default: 8

out_heads

Integer. Number of attention heads for output layer. Default: 1

activation

Function. Activation for hidden layers. Default: torch::nnf_elu

out_activation

Function or NULL. Activation for output layer. Default: NULL

dropout

Numeric. Dropout rate (0-1) applied to attention and features. Default: 0.6

att_dropout

Numeric. Dropout rate for attention coefficients. Default: 0.6

negative_slope

Numeric. Negative slope for LeakyReLU in attention. Default: 0.2

x

Tensor n_nodes x in_features. Node feature matrix (dense or sparse)

adj

Sparse torch tensor n_nodes x n_nodes. Adjacency matrix defining graph structure. Must be a sparse COO tensor.

Value

Tensor n_nodes x out_features. Final predictions

Details

Architecture:

  • L hidden GAT layers with configurable activation

  • 1 output GAT layer with optional output activation

  • Total layers = length(hidden_dims) + 1

Each layer uses multi-head attention to learn importance weights for neighbor aggregation. Hidden layers typically concatenate attention heads, while the output layer averages them.

Forward pass

References

Veličković P., Cucurull, G., Casanova, A., Romero, A., Li, P., & Bengio, Y. (2018). Graph Attention Networks. International Conference on Learning Representations. doi:10.48550/arXiv.1710.10903

Examples

if (FALSE) { # \dontrun{
# Binary classification with 8-head attention
model <- gat_model(14, c(8, 8), 1, output_activation = nnf_sigmoid)

# Multi-class with 4 heads
model <- gat_model(
  14,
  c(16, 16),
  3,
  heads = 4,
  output_activation = function(x) nnf_softmax(x, dim = -1)
)

# Regression with custom dropout
model <- gat_model(14, c(32, 32), 1, dropout = 0.5, att_dropout = 0.5)
} # }