RelBench Leaderboard

Classification

Metric: AUROC on the official test split (higher is better).

#MethodRegimeMeanrel-amazon
user-churn
rel-amazon
item-churn
rel-avito
user-visits
rel-avito
user-clicks
rel-event
user-repeat
rel-event
user-ignore
rel-f1
driver-dnf
rel-f1
driver-top3
rel-hm
user-churn
rel-stack
user-engagement
rel-stack
user-badge
rel-trial
study-outcome
1KumoRFM (fine-tuned)task-specific81.170.582.878.366.880.689.482.699.671.290.789.971.2
2PluRel (pretrained + fine-tuned)task-specific79.763.282.860.158.683.091.280.189.363.895.694.394.6
3KumoRFM-2 (in-context)zero-shot79.669.182.269.467.481.790.884.692.269.389.487.272.0
4RT (pretrained + fine-tuned)task-specific78.970.883.466.665.877.487.184.292.170.590.288.770.2
5GelGTtask-specific78.770.583.067.068.483.687.876.184.170.090.990.472.5
6RelAgent (GPT-5.2 agent)task-specific78.470.882.867.868.478.287.278.385.271.190.488.471.9
7RGPtask-specific78.270.982.666.669.478.984.478.487.970.290.588.770.3
8RelGNNtask-specific78.171.082.666.268.279.686.275.385.770.990.889.071.2
9Rel-LLM (Llama-3.2-1B + GNN soft prompts, fine-tuned)task-specific77.871.983.467.066.779.383.777.182.270.591.289.671.0
10RT (from scratch)task-specific77.170.583.265.063.679.785.178.782.769.990.088.568.6
11KumoRFM (in-context)zero-shot76.767.379.964.864.176.189.282.491.167.787.180.070.8
12RelGTtask-specific76.670.482.566.868.376.181.675.983.569.390.586.368.6
13RDL (GraphSAGE)task-specific75.870.482.866.265.976.981.672.675.569.990.688.968.6
14GINtask-specific75.270.582.766.066.074.479.571.873.669.990.588.768.4
15RDB-PFN (fine-tuned)task-specific73.765.880.566.064.674.682.872.373.667.488.384.564.3
16RDB-PFN (ICL, 1,024-example context)zero-shot73.264.878.265.562.775.382.771.981.266.586.681.361.6
17TabPFN-2.5 + DFS (ICL, 1,024-example context)zero-shot72.964.579.661.763.373.183.271.780.466.885.382.162.6
18RELATE (RelGNN backbone)task-specific72.868.981.266.266.167.181.168.969.069.490.186.658.4
19TabICL v1.1 + DFS (ICL, 1,024-example context)zero-shot72.464.878.964.461.870.080.871.780.666.685.483.060.8
20HGT+PE (Laplacian positional encodings)task-specific72.266.278.065.064.665.481.671.276.365.788.285.759.2
21HGTtask-specific71.866.478.064.363.865.082.570.870.867.088.586.158.4
22PluRel (synthetic + real)zero-shot71.865.072.563.447.976.081.081.088.466.086.282.051.8
23RT (zero-shot, leave-one-DB-out)zero-shot71.164.070.961.859.572.683.681.289.362.875.780.151.8
24GATtask-specific70.863.270.064.865.868.282.070.360.064.789.684.566.2
25RELATE (HGT+PE backbone)task-specific69.665.575.162.664.372.385.166.547.865.288.082.359.8
26PluRel (synthetic only)zero-shot68.264.471.063.545.953.180.176.782.663.782.481.453.8
27LightGBM (raw entity features)task-specific63.752.262.553.053.668.079.968.673.955.263.463.470.1

Regression

Metric: NMAE = MAE / train-split std, on the official test split (lower is better).

#MethodRegimeMeanrel-amazon
user-ltv
rel-amazon
item-ltv
rel-avito
ad-ctr
rel-event
user-attendance
rel-f1
driver-position
rel-hm
item-sales
rel-stack
post-votes
rel-trial
study-adverse
rel-trial
site-success
1RT (pretrained + fine-tuned)task-specific0.23280.25690.08040.43190.03030.37570.09480.14550.12750.5519
2PluRel (pretrained + fine-tuned)task-specific0.23700.26720.08400.39230.07080.37450.09660.14720.12400.5766
3KumoRFM (fine-tuned)task-specific0.26040.24740.08240.35540.31100.38870.06860.12730.13040.6325
4RelGNNtask-specific0.28540.24750.08250.38670.31100.54060.10900.12730.13110.6325
5PluRel (synthetic + real)zero-shot0.28980.28520.10410.41820.08780.48350.15550.16540.17310.7350
6KumoRFM-2 (in-context)zero-shot0.29130.24210.07950.35540.30710.40620.06860.12540.12770.9099
7RelGTtask-specific0.29200.24810.08280.36060.32700.55750.10820.12810.12970.6857
8GelGTtask-specific0.29510.24790.08330.37840.31670.53150.11310.12700.12550.7324
9RelAgent (GPT-5.2 agent)task-specific0.29580.24260.07070.34490.31500.57200.07070.12540.10970.8112
10KumoRFM (in-context)zero-shot0.30360.28100.09350.36580.34500.39100.08080.12730.17170.8763
11Rel-LLM (Llama-3.2-1B + GNN soft prompts, fine-tuned)task-specific0.31060.24500.08160.38670.32800.56460.10500.12150.12880.8343
12PluRel (synthetic only)zero-shot0.31100.33880.11540.42520.08780.54260.17490.18000.18890.7457
13RT (from scratch)task-specific0.31590.25900.08450.40640.50400.47750.10010.14710.13060.7341
14Data Scientist + LightGBMtask-specific0.32020.24220.06960.45990.37120.56410.07270.12730.11970.8553
15RDL (GraphSAGE)task-specific0.32040.24890.08470.42850.33720.57250.11310.12730.13110.8406
16GINtask-specific0.32140.24900.08480.42850.34500.57960.11100.12730.13090.8364
17Data Scientist + AutoGluontask-specific0.33250.25040.07680.47030.33460.60510.08680.13320.13180.9036
18GATtask-specific0.33820.28910.09970.44940.34370.60750.15950.13320.13570.8259
19LightGBM (raw entity features)task-specific0.34120.29190.10250.42850.34500.59350.15340.13320.12980.8931
20RT (zero-shot, leave-one-DB-out)zero-shot0.34610.32770.10290.62350.06620.43100.17190.21280.22330.9552
21HGTtask-specific0.34640.26800.09450.48290.34440.60150.12940.13300.13320.9305
22HGT+PE (Laplacian positional encodings)task-specific0.35040.27590.09450.50480.34120.62510.12900.13320.12580.9238
23Griffin (fine-tuned)task-specific0.36860.34090.11300.45260.48460.55960.12050.27330.17430.7988
24Entity Mediantask-specific0.42780.30300.11240.48080.35161.21250.15750.13520.17080.9267
25Entity Meantask-specific0.45510.33140.13270.48080.39731.21000.22410.20770.17080.9414

Recommendation

Metric: MAP on the official test split (higher is better).

#MethodRegimeMeanrel-amazon
user-item-purchase
rel-amazon
user-item-rate
rel-amazon
user-item-review
rel-avito
user-ad-visit
rel-f1
driver-circuit-compete
rel-hm
user-item-purchase
rel-stack
user-post-comment
rel-stack
post-post-related
rel-trial
condition-sponsor-run
rel-trial
site-sponsor-run
1ID-GNN (4 layers)task-specific14.00.10.10.13.976.22.913.812.511.319.0
2ID-GNN (2 layers)task-specific12.30.10.10.13.662.32.812.710.711.419.0
3LightGBM (entity features + heuristic ranks)task-specific7.30.10.20.10.157.80.40.01.94.58.2
4Global Popularitytask-specific5.90.20.10.10.050.10.30.01.52.53.8
5Past Visittask-specific5.30.10.10.01.920.80.91.41.78.417.3
6GraphSAGE (two-tower) (4 layers)task-specific3.40.91.00.60.116.60.70.20.12.711.1
7GraphSAGE (two-tower) (2 layers)task-specific2.60.70.80.50.09.70.80.20.03.110.4