Research Projects

We first characterize the hybrid execution patterns of GCNs on Intel Xeon CPU. Guided by the characterization, we design a GCN accelerator, HyGCN, using a hybrid architecture to efficiently perform GCNs. In addition, we first identify the communication pattern and challenges of multi-node acceleration for GCNs on large-scale graphs. Guided by the above observations, we then propose MultiGCN, an efficient MultiAccSys for large-scale GCNs that trades network latency for network bandwidth.
HyGCN: A GCN Accelerator with Hybrid Architecture (HPCA’20)
HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation （IEEE TPDS 2024）
GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator with Graph Decoupling and Recouping (DAC’24)
Characterizing and Understanding HGNN Training on GPUs (ACM TACO 2024)
Multi-Node Acceleration for Large-Scale GCNs (IEEE TC 2022)
Characterizing and Understanding GCNs on GPU (IEEE CAL 2020)

To fully alleviate the irregularities at their origin—the data-dependent program behavior, we propose GraphDynS, a hardware/software co-design with decoupled datapath and data-aware dynamic scheduling. Aware of data dependencies extracted from the decoupled datapath, GraphDynS can elaborately schedule the program on-the-fly to maximize parallelism.
Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach (MICRO’19)
Alleviating Datapath Conflicts and Design Centralization in Graph Analytics Acceleration (DAC’22)

We first quantitatively characterize a set of representative HGNN models on GPU to disclose the execution bound of each stage, inter-semantic-graph parallelism, and inter-semantic-graph data reusability in HGNNs. Guided by our findings, we propose a high-performance HGNN accelerator, HiHGNN, to alleviate the execution bound and exploit the newfound parallelism and data reusability in HGNNs.
HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation
[GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator with Graph Decoupling and Recouping] (DAC’24).
Characterizing and Understanding HGNNs on GPUs (IEEE CAL 2022)

To accelerate time-consuming multi-objective design space exploration of CPU microarchitecture, we investigate various prediction models and find out the most accurate basic model. We enhance the model by ensemble learning and generate Pareto-rank-based sample weights to improve prediction accuracy. A hypervolume-improvement-based optimization method to trade off between multiple objectives is proposed together with a uniformity-aware selection algorithm to jump out of the local optimum. Furthermore, the exploration time is reduced owing to a proposed Pareto-aware filter.
A High-accurate Multi-objective Exploration Framework for Design Space of CPU (DAC’23).
A Transfer Learning Framework for High-Accurate Cross-Workload Design Space Exploration of CPU (ICCAD’23)
MoDSE: A High-Accurate Multi-Objective Design Space Exploration Framework for CPU Microarchitectures (IEEE TCAD 2024)