Improving Article Classification with Edge-Heterogeneous Graph Neural Networks
| Authors |
|
|---|---|
| Publication date | 20-09-2023 |
| Number of pages | 10 |
| Publisher | ArXiv |
| Organisations |
|
| Abstract |
Classifying research output into context-specific label taxonomies is a challenging and relevant downstream task, given the volume of existing and newly published articles. We propose a method to enhance the performance of article classification by enriching simple Graph Neural Networks (GNN) pipelines with edge-heterogeneous graph representations. SciBERT is used for node feature generation to capture higher-order semantics within the articles’ textual metadata. Fully supervised transductive node classification experiments are conducted on the Open Graph Benchmark (OGB) ogbn-arxiv dataset and the PubMed diabetes dataset, augmented with additional metadata from
Microsoft Academic Graph (MAG) and PubMed Central, respectively. The results demonstrate that edge-heterogeneous graphs consistently improve the performance of all GNN models compared to the edge-homogeneous graphs. The transformed data enable simple and shallow GNN pipelines to achieve results on par with more complex architectures. On ogbn-arxiv, we achieve a top-15 result in the OGB competition with a 2-layer GCN (accuracy 74.61%), being the highest-scoring solution with sub-1 million parameters. On PubMed, we closely trail SOTA GNN architectures using a 2-layer GraphSAGE by including additional co-authorship edges in the graph (accuracy 89.88%). The implementation is available at: https://github.com/lyvykhang/edgehetero-nodeproppred. |
| Document type | Preprint |
| Language | English |
| Published at | https://doi.org/10.48550/arXiv.2309.11341 |
| Downloads |
Improving Article Classification with Edge-Heterogeneous Graph Neural Networks
(Submitted manuscript)
|
| Permalink to this page | |
