RelBench Leaderboard

We are redesigning the RelBench leaderboard to better reflect progress on Relational Foundation Models (RFMs) and Relational Deep Learning (RDL), expected to be live soon. Stay tuned!

Classification

#	Method	Regime	Mean	rel-amazon user-churn	rel-amazon item-churn	rel-avito user-visits	rel-avito user-clicks	rel-event user-repeat	rel-event user-ignore	rel-f1 driver-dnf	rel-f1 driver-top3	rel-hm user-churn	rel-stack user-engagement	rel-stack user-badge	rel-trial study-outcome
1	KumoRFM (fine-tuned)	task-specific	81.1	70.5	82.8	78.3	66.8	80.6	89.4	82.6	99.6	71.2	90.7	89.9	71.2
2	PluRel (pretrained + fine-tuned)	task-specific	79.7	63.2	82.8	60.1	58.6	83.0	91.2	80.1	89.3	63.8	95.6	94.3	94.6
3	KumoRFM-2 (in-context)	zero-shot	79.6	69.1	82.2	69.4	67.4	81.7	90.8	84.6	92.2	69.3	89.4	87.2	72.0
4	RT (pretrained + fine-tuned)	task-specific	78.9	70.8	83.4	66.6	65.8	77.4	87.1	84.2	92.1	70.5	90.2	88.7	70.2
5	GelGT	task-specific	78.7	70.5	83.0	67.0	68.4	83.6	87.8	76.1	84.1	70.0	90.9	90.4	72.5
6	RelAgent (GPT-5.2 agent)	task-specific	78.4	70.8	82.8	67.8	68.4	78.2	87.2	78.3	85.2	71.1	90.4	88.4	71.9
7	RGP	task-specific	78.2	70.9	82.6	66.6	69.4	78.9	84.4	78.4	87.9	70.2	90.5	88.7	70.3
8	RelGNN	task-specific	78.1	71.0	82.6	66.2	68.2	79.6	86.2	75.3	85.7	70.9	90.8	89.0	71.2
9	Rel-LLM (Llama-3.2-1B + GNN soft prompts, fine-tuned)	task-specific	77.8	71.9	83.4	67.0	66.7	79.3	83.7	77.1	82.2	70.5	91.2	89.6	71.0
10	RT (from scratch)	task-specific	77.1	70.5	83.2	65.0	63.6	79.7	85.1	78.7	82.7	69.9	90.0	88.5	68.6
11	KumoRFM (in-context)	zero-shot	76.7	67.3	79.9	64.8	64.1	76.1	89.2	82.4	91.1	67.7	87.1	80.0	70.8
12	RelGT	task-specific	76.6	70.4	82.5	66.8	68.3	76.1	81.6	75.9	83.5	69.3	90.5	86.3	68.6
13	RDL (GraphSAGE)	task-specific	75.8	70.4	82.8	66.2	65.9	76.9	81.6	72.6	75.5	69.9	90.6	88.9	68.6
14	GIN	task-specific	75.2	70.5	82.7	66.0	66.0	74.4	79.5	71.8	73.6	69.9	90.5	88.7	68.4
15	RDB-PFN (fine-tuned)	task-specific	73.7	65.8	80.5	66.0	64.6	74.6	82.8	72.3	73.6	67.4	88.3	84.5	64.3
16	RDB-PFN (ICL, 1,024-example context)	zero-shot	73.2	64.8	78.2	65.5	62.7	75.3	82.7	71.9	81.2	66.5	86.6	81.3	61.6
17	TabPFN-2.5 + DFS (ICL, 1,024-example context)	zero-shot	72.9	64.5	79.6	61.7	63.3	73.1	83.2	71.7	80.4	66.8	85.3	82.1	62.6
18	RELATE (RelGNN backbone)	task-specific	72.8	68.9	81.2	66.2	66.1	67.1	81.1	68.9	69.0	69.4	90.1	86.6	58.4
19	TabICL v1.1 + DFS (ICL, 1,024-example context)	zero-shot	72.4	64.8	78.9	64.4	61.8	70.0	80.8	71.7	80.6	66.6	85.4	83.0	60.8
20	HGT+PE (Laplacian positional encodings)	task-specific	72.2	66.2	78.0	65.0	64.6	65.4	81.6	71.2	76.3	65.7	88.2	85.7	59.2
21	HGT	task-specific	71.8	66.4	78.0	64.3	63.8	65.0	82.5	70.8	70.8	67.0	88.5	86.1	58.4
22	PluRel (synthetic + real)	zero-shot	71.8	65.0	72.5	63.4	47.9	76.0	81.0	81.0	88.4	66.0	86.2	82.0	51.8
23	RT (zero-shot, leave-one-DB-out)	zero-shot	71.1	64.0	70.9	61.8	59.5	72.6	83.6	81.2	89.3	62.8	75.7	80.1	51.8
24	GAT	task-specific	70.8	63.2	70.0	64.8	65.8	68.2	82.0	70.3	60.0	64.7	89.6	84.5	66.2
25	RELATE (HGT+PE backbone)	task-specific	69.6	65.5	75.1	62.6	64.3	72.3	85.1	66.5	47.8	65.2	88.0	82.3	59.8
26	PluRel (synthetic only)	zero-shot	68.2	64.4	71.0	63.5	45.9	53.1	80.1	76.7	82.6	63.7	82.4	81.4	53.8
27	LightGBM (raw entity features)	task-specific	63.7	52.2	62.5	53.0	53.6	68.0	79.9	68.6	73.9	55.2	63.4	63.4	70.1

Regression

Metric: NMAE = MAE / train-split std, on the official test split (lower is better).

#	Method	Regime	Mean	rel-amazon user-ltv	rel-amazon item-ltv	rel-avito ad-ctr	rel-event user-attendance	rel-f1 driver-position	rel-hm item-sales	rel-stack post-votes	rel-trial study-adverse	rel-trial site-success
1	RT (pretrained + fine-tuned)	task-specific	0.2328	0.2569	0.0804	0.4319	0.0303	0.3757	0.0948	0.1455	0.1275	0.5519
2	PluRel (pretrained + fine-tuned)	task-specific	0.2370	0.2672	0.0840	0.3923	0.0708	0.3745	0.0966	0.1472	0.1240	0.5766
3	KumoRFM (fine-tuned)	task-specific	0.2604	0.2474	0.0824	0.3554	0.3110	0.3887	0.0686	0.1273	0.1304	0.6325
4	RelGNN	task-specific	0.2854	0.2475	0.0825	0.3867	0.3110	0.5406	0.1090	0.1273	0.1311	0.6325
5	PluRel (synthetic + real)	zero-shot	0.2898	0.2852	0.1041	0.4182	0.0878	0.4835	0.1555	0.1654	0.1731	0.7350
6	KumoRFM-2 (in-context)	zero-shot	0.2913	0.2421	0.0795	0.3554	0.3071	0.4062	0.0686	0.1254	0.1277	0.9099
7	RelGT	task-specific	0.2920	0.2481	0.0828	0.3606	0.3270	0.5575	0.1082	0.1281	0.1297	0.6857
8	GelGT	task-specific	0.2951	0.2479	0.0833	0.3784	0.3167	0.5315	0.1131	0.1270	0.1255	0.7324
9	RelAgent (GPT-5.2 agent)	task-specific	0.2958	0.2426	0.0707	0.3449	0.3150	0.5720	0.0707	0.1254	0.1097	0.8112
10	KumoRFM (in-context)	zero-shot	0.3036	0.2810	0.0935	0.3658	0.3450	0.3910	0.0808	0.1273	0.1717	0.8763
11	Rel-LLM (Llama-3.2-1B + GNN soft prompts, fine-tuned)	task-specific	0.3106	0.2450	0.0816	0.3867	0.3280	0.5646	0.1050	0.1215	0.1288	0.8343
12	PluRel (synthetic only)	zero-shot	0.3110	0.3388	0.1154	0.4252	0.0878	0.5426	0.1749	0.1800	0.1889	0.7457
13	RT (from scratch)	task-specific	0.3159	0.2590	0.0845	0.4064	0.5040	0.4775	0.1001	0.1471	0.1306	0.7341
14	Data Scientist + LightGBM	task-specific	0.3202	0.2422	0.0696	0.4599	0.3712	0.5641	0.0727	0.1273	0.1197	0.8553
15	RDL (GraphSAGE)	task-specific	0.3204	0.2489	0.0847	0.4285	0.3372	0.5725	0.1131	0.1273	0.1311	0.8406
16	GIN	task-specific	0.3214	0.2490	0.0848	0.4285	0.3450	0.5796	0.1110	0.1273	0.1309	0.8364
17	Data Scientist + AutoGluon	task-specific	0.3325	0.2504	0.0768	0.4703	0.3346	0.6051	0.0868	0.1332	0.1318	0.9036
18	GAT	task-specific	0.3382	0.2891	0.0997	0.4494	0.3437	0.6075	0.1595	0.1332	0.1357	0.8259
19	LightGBM (raw entity features)	task-specific	0.3412	0.2919	0.1025	0.4285	0.3450	0.5935	0.1534	0.1332	0.1298	0.8931
20	RT (zero-shot, leave-one-DB-out)	zero-shot	0.3461	0.3277	0.1029	0.6235	0.0662	0.4310	0.1719	0.2128	0.2233	0.9552
21	HGT	task-specific	0.3464	0.2680	0.0945	0.4829	0.3444	0.6015	0.1294	0.1330	0.1332	0.9305
22	HGT+PE (Laplacian positional encodings)	task-specific	0.3504	0.2759	0.0945	0.5048	0.3412	0.6251	0.1290	0.1332	0.1258	0.9238
23	Griffin (fine-tuned)	task-specific	0.3686	0.3409	0.1130	0.4526	0.4846	0.5596	0.1205	0.2733	0.1743	0.7988
24	Entity Median	task-specific	0.4278	0.3030	0.1124	0.4808	0.3516	1.2125	0.1575	0.1352	0.1708	0.9267
25	Entity Mean	task-specific	0.4551	0.3314	0.1327	0.4808	0.3973	1.2100	0.2241	0.2077	0.1708	0.9414

Recommendation

#	Method	Regime	Mean	rel-amazon user-item-purchase	rel-amazon user-item-rate	rel-amazon user-item-review	rel-avito user-ad-visit	rel-f1 driver-circuit-compete	rel-hm user-item-purchase	rel-stack user-post-comment	rel-stack post-post-related	rel-trial condition-sponsor-run	rel-trial site-sponsor-run
1	ID-GNN (4 layers)	task-specific	14.0	0.1	0.1	0.1	3.9	76.2	2.9	13.8	12.5	11.3	19.0
2	ID-GNN (2 layers)	task-specific	12.3	0.1	0.1	0.1	3.6	62.3	2.8	12.7	10.7	11.4	19.0
3	LightGBM (entity features + heuristic ranks)	task-specific	7.3	0.1	0.2	0.1	0.1	57.8	0.4	0.0	1.9	4.5	8.2
4	Global Popularity	task-specific	5.9	0.2	0.1	0.1	0.0	50.1	0.3	0.0	1.5	2.5	3.8
5	Past Visit	task-specific	5.3	0.1	0.1	0.0	1.9	20.8	0.9	1.4	1.7	8.4	17.3
6	GraphSAGE (two-tower) (4 layers)	task-specific	3.4	0.9	1.0	0.6	0.1	16.6	0.7	0.2	0.1	2.7	11.1
7	GraphSAGE (two-tower) (2 layers)	task-specific	2.6	0.7	0.8	0.5	0.0	9.7	0.8	0.2	0.0	3.1	10.4