Metric: AUROC on the official test split (higher is better).
| # | Method | Regime | Mean | rel-amazon user-churn | rel-amazon item-churn | rel-avito user-visits | rel-avito user-clicks | rel-event user-repeat | rel-event user-ignore | rel-f1 driver-dnf | rel-f1 driver-top3 | rel-hm user-churn | rel-stack user-engagement | rel-stack user-badge | rel-trial study-outcome |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | KumoRFM (fine-tuned) | task-specific | 81.1 | 70.5 | 82.8 | 78.3 | 66.8 | 80.6 | 89.4 | 82.6 | 99.6 | 71.2 | 90.7 | 89.9 | 71.2 |
| 2 | PluRel (pretrained + fine-tuned) | task-specific | 79.7 | 63.2 | 82.8 | 60.1 | 58.6 | 83.0 | 91.2 | 80.1 | 89.3 | 63.8 | 95.6 | 94.3 | 94.6 |
| 3 | KumoRFM-2 (in-context) | zero-shot | 79.6 | 69.1 | 82.2 | 69.4 | 67.4 | 81.7 | 90.8 | 84.6 | 92.2 | 69.3 | 89.4 | 87.2 | 72.0 |
| 4 | RT (pretrained + fine-tuned) | task-specific | 78.9 | 70.8 | 83.4 | 66.6 | 65.8 | 77.4 | 87.1 | 84.2 | 92.1 | 70.5 | 90.2 | 88.7 | 70.2 |
| 5 | GelGT | task-specific | 78.7 | 70.5 | 83.0 | 67.0 | 68.4 | 83.6 | 87.8 | 76.1 | 84.1 | 70.0 | 90.9 | 90.4 | 72.5 |
| 6 | RelAgent (GPT-5.2 agent) | task-specific | 78.4 | 70.8 | 82.8 | 67.8 | 68.4 | 78.2 | 87.2 | 78.3 | 85.2 | 71.1 | 90.4 | 88.4 | 71.9 |
| 7 | RGP | task-specific | 78.2 | 70.9 | 82.6 | 66.6 | 69.4 | 78.9 | 84.4 | 78.4 | 87.9 | 70.2 | 90.5 | 88.7 | 70.3 |
| 8 | RelGNN | task-specific | 78.1 | 71.0 | 82.6 | 66.2 | 68.2 | 79.6 | 86.2 | 75.3 | 85.7 | 70.9 | 90.8 | 89.0 | 71.2 |
| 9 | Rel-LLM (Llama-3.2-1B + GNN soft prompts, fine-tuned) | task-specific | 77.8 | 71.9 | 83.4 | 67.0 | 66.7 | 79.3 | 83.7 | 77.1 | 82.2 | 70.5 | 91.2 | 89.6 | 71.0 |
| 10 | RT (from scratch) | task-specific | 77.1 | 70.5 | 83.2 | 65.0 | 63.6 | 79.7 | 85.1 | 78.7 | 82.7 | 69.9 | 90.0 | 88.5 | 68.6 |
| 11 | KumoRFM (in-context) | zero-shot | 76.7 | 67.3 | 79.9 | 64.8 | 64.1 | 76.1 | 89.2 | 82.4 | 91.1 | 67.7 | 87.1 | 80.0 | 70.8 |
| 12 | RelGT | task-specific | 76.6 | 70.4 | 82.5 | 66.8 | 68.3 | 76.1 | 81.6 | 75.9 | 83.5 | 69.3 | 90.5 | 86.3 | 68.6 |
| 13 | RDL (GraphSAGE) | task-specific | 75.8 | 70.4 | 82.8 | 66.2 | 65.9 | 76.9 | 81.6 | 72.6 | 75.5 | 69.9 | 90.6 | 88.9 | 68.6 |
| 14 | GIN | task-specific | 75.2 | 70.5 | 82.7 | 66.0 | 66.0 | 74.4 | 79.5 | 71.8 | 73.6 | 69.9 | 90.5 | 88.7 | 68.4 |
| 15 | RDB-PFN (fine-tuned) | task-specific | 73.7 | 65.8 | 80.5 | 66.0 | 64.6 | 74.6 | 82.8 | 72.3 | 73.6 | 67.4 | 88.3 | 84.5 | 64.3 |
| 16 | RDB-PFN (ICL, 1,024-example context) | zero-shot | 73.2 | 64.8 | 78.2 | 65.5 | 62.7 | 75.3 | 82.7 | 71.9 | 81.2 | 66.5 | 86.6 | 81.3 | 61.6 |
| 17 | TabPFN-2.5 + DFS (ICL, 1,024-example context) | zero-shot | 72.9 | 64.5 | 79.6 | 61.7 | 63.3 | 73.1 | 83.2 | 71.7 | 80.4 | 66.8 | 85.3 | 82.1 | 62.6 |
| 18 | RELATE (RelGNN backbone) | task-specific | 72.8 | 68.9 | 81.2 | 66.2 | 66.1 | 67.1 | 81.1 | 68.9 | 69.0 | 69.4 | 90.1 | 86.6 | 58.4 |
| 19 | TabICL v1.1 + DFS (ICL, 1,024-example context) | zero-shot | 72.4 | 64.8 | 78.9 | 64.4 | 61.8 | 70.0 | 80.8 | 71.7 | 80.6 | 66.6 | 85.4 | 83.0 | 60.8 |
| 20 | HGT+PE (Laplacian positional encodings) | task-specific | 72.2 | 66.2 | 78.0 | 65.0 | 64.6 | 65.4 | 81.6 | 71.2 | 76.3 | 65.7 | 88.2 | 85.7 | 59.2 |
| 21 | HGT | task-specific | 71.8 | 66.4 | 78.0 | 64.3 | 63.8 | 65.0 | 82.5 | 70.8 | 70.8 | 67.0 | 88.5 | 86.1 | 58.4 |
| 22 | PluRel (synthetic + real) | zero-shot | 71.8 | 65.0 | 72.5 | 63.4 | 47.9 | 76.0 | 81.0 | 81.0 | 88.4 | 66.0 | 86.2 | 82.0 | 51.8 |
| 23 | RT (zero-shot, leave-one-DB-out) | zero-shot | 71.1 | 64.0 | 70.9 | 61.8 | 59.5 | 72.6 | 83.6 | 81.2 | 89.3 | 62.8 | 75.7 | 80.1 | 51.8 |
| 24 | GAT | task-specific | 70.8 | 63.2 | 70.0 | 64.8 | 65.8 | 68.2 | 82.0 | 70.3 | 60.0 | 64.7 | 89.6 | 84.5 | 66.2 |
| 25 | RELATE (HGT+PE backbone) | task-specific | 69.6 | 65.5 | 75.1 | 62.6 | 64.3 | 72.3 | 85.1 | 66.5 | 47.8 | 65.2 | 88.0 | 82.3 | 59.8 |
| 26 | PluRel (synthetic only) | zero-shot | 68.2 | 64.4 | 71.0 | 63.5 | 45.9 | 53.1 | 80.1 | 76.7 | 82.6 | 63.7 | 82.4 | 81.4 | 53.8 |
| 27 | LightGBM (raw entity features) | task-specific | 63.7 | 52.2 | 62.5 | 53.0 | 53.6 | 68.0 | 79.9 | 68.6 | 73.9 | 55.2 | 63.4 | 63.4 | 70.1 |
Metric: NMAE = MAE / train-split std, on the official test split (lower is better).
| # | Method | Regime | Mean | rel-amazon user-ltv | rel-amazon item-ltv | rel-avito ad-ctr | rel-event user-attendance | rel-f1 driver-position | rel-hm item-sales | rel-stack post-votes | rel-trial study-adverse | rel-trial site-success |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | RT (pretrained + fine-tuned) | task-specific | 0.2328 | 0.2569 | 0.0804 | 0.4319 | 0.0303 | 0.3757 | 0.0948 | 0.1455 | 0.1275 | 0.5519 |
| 2 | PluRel (pretrained + fine-tuned) | task-specific | 0.2370 | 0.2672 | 0.0840 | 0.3923 | 0.0708 | 0.3745 | 0.0966 | 0.1472 | 0.1240 | 0.5766 |
| 3 | KumoRFM (fine-tuned) | task-specific | 0.2604 | 0.2474 | 0.0824 | 0.3554 | 0.3110 | 0.3887 | 0.0686 | 0.1273 | 0.1304 | 0.6325 |
| 4 | RelGNN | task-specific | 0.2854 | 0.2475 | 0.0825 | 0.3867 | 0.3110 | 0.5406 | 0.1090 | 0.1273 | 0.1311 | 0.6325 |
| 5 | PluRel (synthetic + real) | zero-shot | 0.2898 | 0.2852 | 0.1041 | 0.4182 | 0.0878 | 0.4835 | 0.1555 | 0.1654 | 0.1731 | 0.7350 |
| 6 | KumoRFM-2 (in-context) | zero-shot | 0.2913 | 0.2421 | 0.0795 | 0.3554 | 0.3071 | 0.4062 | 0.0686 | 0.1254 | 0.1277 | 0.9099 |
| 7 | RelGT | task-specific | 0.2920 | 0.2481 | 0.0828 | 0.3606 | 0.3270 | 0.5575 | 0.1082 | 0.1281 | 0.1297 | 0.6857 |
| 8 | GelGT | task-specific | 0.2951 | 0.2479 | 0.0833 | 0.3784 | 0.3167 | 0.5315 | 0.1131 | 0.1270 | 0.1255 | 0.7324 |
| 9 | RelAgent (GPT-5.2 agent) | task-specific | 0.2958 | 0.2426 | 0.0707 | 0.3449 | 0.3150 | 0.5720 | 0.0707 | 0.1254 | 0.1097 | 0.8112 |
| 10 | KumoRFM (in-context) | zero-shot | 0.3036 | 0.2810 | 0.0935 | 0.3658 | 0.3450 | 0.3910 | 0.0808 | 0.1273 | 0.1717 | 0.8763 |
| 11 | Rel-LLM (Llama-3.2-1B + GNN soft prompts, fine-tuned) | task-specific | 0.3106 | 0.2450 | 0.0816 | 0.3867 | 0.3280 | 0.5646 | 0.1050 | 0.1215 | 0.1288 | 0.8343 |
| 12 | PluRel (synthetic only) | zero-shot | 0.3110 | 0.3388 | 0.1154 | 0.4252 | 0.0878 | 0.5426 | 0.1749 | 0.1800 | 0.1889 | 0.7457 |
| 13 | RT (from scratch) | task-specific | 0.3159 | 0.2590 | 0.0845 | 0.4064 | 0.5040 | 0.4775 | 0.1001 | 0.1471 | 0.1306 | 0.7341 |
| 14 | Data Scientist + LightGBM | task-specific | 0.3202 | 0.2422 | 0.0696 | 0.4599 | 0.3712 | 0.5641 | 0.0727 | 0.1273 | 0.1197 | 0.8553 |
| 15 | RDL (GraphSAGE) | task-specific | 0.3204 | 0.2489 | 0.0847 | 0.4285 | 0.3372 | 0.5725 | 0.1131 | 0.1273 | 0.1311 | 0.8406 |
| 16 | GIN | task-specific | 0.3214 | 0.2490 | 0.0848 | 0.4285 | 0.3450 | 0.5796 | 0.1110 | 0.1273 | 0.1309 | 0.8364 |
| 17 | Data Scientist + AutoGluon | task-specific | 0.3325 | 0.2504 | 0.0768 | 0.4703 | 0.3346 | 0.6051 | 0.0868 | 0.1332 | 0.1318 | 0.9036 |
| 18 | GAT | task-specific | 0.3382 | 0.2891 | 0.0997 | 0.4494 | 0.3437 | 0.6075 | 0.1595 | 0.1332 | 0.1357 | 0.8259 |
| 19 | LightGBM (raw entity features) | task-specific | 0.3412 | 0.2919 | 0.1025 | 0.4285 | 0.3450 | 0.5935 | 0.1534 | 0.1332 | 0.1298 | 0.8931 |
| 20 | RT (zero-shot, leave-one-DB-out) | zero-shot | 0.3461 | 0.3277 | 0.1029 | 0.6235 | 0.0662 | 0.4310 | 0.1719 | 0.2128 | 0.2233 | 0.9552 |
| 21 | HGT | task-specific | 0.3464 | 0.2680 | 0.0945 | 0.4829 | 0.3444 | 0.6015 | 0.1294 | 0.1330 | 0.1332 | 0.9305 |
| 22 | HGT+PE (Laplacian positional encodings) | task-specific | 0.3504 | 0.2759 | 0.0945 | 0.5048 | 0.3412 | 0.6251 | 0.1290 | 0.1332 | 0.1258 | 0.9238 |
| 23 | Griffin (fine-tuned) | task-specific | 0.3686 | 0.3409 | 0.1130 | 0.4526 | 0.4846 | 0.5596 | 0.1205 | 0.2733 | 0.1743 | 0.7988 |
| 24 | Entity Median | task-specific | 0.4278 | 0.3030 | 0.1124 | 0.4808 | 0.3516 | 1.2125 | 0.1575 | 0.1352 | 0.1708 | 0.9267 |
| 25 | Entity Mean | task-specific | 0.4551 | 0.3314 | 0.1327 | 0.4808 | 0.3973 | 1.2100 | 0.2241 | 0.2077 | 0.1708 | 0.9414 |
Metric: MAP on the official test split (higher is better).
| # | Method | Regime | Mean | rel-amazon user-item-purchase | rel-amazon user-item-rate | rel-amazon user-item-review | rel-avito user-ad-visit | rel-f1 driver-circuit-compete | rel-hm user-item-purchase | rel-stack user-post-comment | rel-stack post-post-related | rel-trial condition-sponsor-run | rel-trial site-sponsor-run |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ID-GNN (4 layers) | task-specific | 14.0 | 0.1 | 0.1 | 0.1 | 3.9 | 76.2 | 2.9 | 13.8 | 12.5 | 11.3 | 19.0 |
| 2 | ID-GNN (2 layers) | task-specific | 12.3 | 0.1 | 0.1 | 0.1 | 3.6 | 62.3 | 2.8 | 12.7 | 10.7 | 11.4 | 19.0 |
| 3 | LightGBM (entity features + heuristic ranks) | task-specific | 7.3 | 0.1 | 0.2 | 0.1 | 0.1 | 57.8 | 0.4 | 0.0 | 1.9 | 4.5 | 8.2 |
| 4 | Global Popularity | task-specific | 5.9 | 0.2 | 0.1 | 0.1 | 0.0 | 50.1 | 0.3 | 0.0 | 1.5 | 2.5 | 3.8 |
| 5 | Past Visit | task-specific | 5.3 | 0.1 | 0.1 | 0.0 | 1.9 | 20.8 | 0.9 | 1.4 | 1.7 | 8.4 | 17.3 |
| 6 | GraphSAGE (two-tower) (4 layers) | task-specific | 3.4 | 0.9 | 1.0 | 0.6 | 0.1 | 16.6 | 0.7 | 0.2 | 0.1 | 2.7 | 11.1 |
| 7 | GraphSAGE (two-tower) (2 layers) | task-specific | 2.6 | 0.7 | 0.8 | 0.5 | 0.0 | 9.7 | 0.8 | 0.2 | 0.0 | 3.1 | 10.4 |