-
Notifications
You must be signed in to change notification settings - Fork 16
Description
There is a bug in select_primary_ligands_in_df (posebench/analysis/inference_analysis.py) that causes the true ligand coordinates (mol_true_frag) to be unintentionally modified during fragment matching. This results in the reference mol_true being aligned to a predicted pose before evaluation, leading to very low RMSD values and overestimated model performance.
Reproduction (example: PDB 1G9V_RQ3)
Added debug prints of Chem.MolToMolBlock(mol_true_frags[0]) before and after this line:
| find_most_similar_frag(mol_true_frag, mol_pred_frags)[0] |
if row.pdb_id == "1G9V_RQ3":
print(Chem.MolToMolBlock(mol_true_frags[0]))
mol_pred_frags = [
find_most_similar_frag(mol_true_frag, mol_pred_frags)[0]
# find_most_similar_frag(Chem.Mol(mol_true_frag), mol_pred_frags)[0]
for mol_true_frag in mol_true_frags
]
if row.pdb_id == "1G9V_RQ3":
print(Chem.MolToMolBlock(mol_true_frags[0]))
exit()- Before (original true coordinates):
1G9V_RQ3_A_801
RDKit 3D
25 26 0 0 0 0 0 0 0 0999 V2000
16.1500 22.3350 42.8110 C 0 0 0 0 0 0 0 0 0 0 0 0
15.3160 21.8930 41.9800 O 0 0 0 0 0 0 0 0 0 0 0 0
16.9980 23.2070 42.5060 O 0 0 0 0 0 0 0 0 0 0 0 0
16.1410 21.7710 44.2030 C 0 0 0 0 0 0 0 0 0 0 0 0
15.1160 22.5540 45.0230 C 0 0 0 0 0 0 0 0 0 0 0 0
17.5310 21.9290 44.8610 C 0 0 0 0 0 0 0 0 0 0 0 0
15.7170 20.3840 44.1390 O 0 0 0 0 0 0 0 0 0 0 0 0
16.5230 19.2430 43.6250 C 0 0 0 0 0 0 0 0 0 0 0 0
16.2460 17.9700 44.1170 C 0 0 0 0 0 0 0 0 0 0 0 0
16.9520 16.8770 43.6600 C 0 0 0 0 0 0 0 0 0 0 0 0
17.9400 17.0290 42.7100 C 0 0 0 0 0 0 0 0 0 0 0 0
17.5250 19.4050 42.6630 C 0 0 0 0 0 0 0 0 0 0 0 0
18.2350 18.2860 42.2070 C 0 0 0 0 0 0 0 0 0 0 0 0
18.6450 15.8760 42.2400 C 0 0 0 0 0 0 0 0 0 0 0 0
17.8550 15.0410 41.2140 C 0 0 0 0 0 0 0 0 0 0 0 0
16.7450 15.4130 40.8360 O 0 0 0 0 0 0 0 0 0 0 0 0
18.2860 13.8040 41.0020 N 0 0 0 0 0 0 0 0 0 0 0 0
17.7310 12.8920 40.1270 C 0 0 0 0 0 0 0 0 0 0 0 0
16.3370 12.7960 39.9850 C 0 0 0 0 0 0 0 0 0 0 0 0
15.7870 11.7440 39.2710 C 0 0 0 0 0 0 0 0 0 0 0 0
16.6210 10.7910 38.7010 C 0 0 0 0 0 0 0 0 0 0 0 0
18.5490 11.9350 39.5470 C 0 0 0 0 0 0 0 0 0 0 0 0
18.0040 10.8950 38.8420 C 0 0 0 0 0 0 0 0 0 0 0 0
14.3030 11.6400 39.1100 C 0 0 0 0 0 0 0 0 0 0 0 0
18.9090 9.8760 38.2190 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
1 3 2 0
1 4 1 0
4 5 1 0
4 6 1 0
4 7 1 0
7 8 1 0
8 9 2 0
9 10 1 0
10 11 2 0
8 12 1 0
11 13 1 0
12 13 2 0
11 14 1 0
14 15 1 0
15 16 2 0
15 17 1 0
17 18 1 0
18 19 2 0
19 20 1 0
20 21 2 0
18 22 1 0
21 23 1 0
22 23 2 0
20 24 1 0
23 25 1 0
M END
- After (coordinates changed):
1G9V_RQ3_A_801
RDKit 3D
25 26 0 0 0 0 0 0 0 0999 V2000
7.9435 24.1721 34.3343 C 0 0 0 0 0 0 0 0 0 0 0 0
8.5512 23.6262 35.2904 O 0 0 0 0 0 0 0 0 0 0 0 0
7.4807 25.3348 34.4147 O 0 0 0 0 0 0 0 0 0 0 0 0
7.7445 23.3855 33.0704 C 0 0 0 0 0 0 0 0 0 0 0 0
8.9890 23.5655 32.2015 C 0 0 0 0 0 0 0 0 0 0 0 0
6.5146 23.9093 32.2937 C 0 0 0 0 0 0 0 0 0 0 0 0
7.6213 21.9802 33.4129 O 0 0 0 0 0 0 0 0 0 0 0 0
6.4473 21.3349 34.0619 C 0 0 0 0 0 0 0 0 0 0 0 0
6.2312 19.9793 33.8276 C 0 0 0 0 0 0 0 0 0 0 0 0
5.1680 19.3303 34.4195 C 0 0 0 0 0 0 0 0 0 0 0 0
4.3058 20.0134 35.2512 C 0 0 0 0 0 0 0 0 0 0 0 0
5.5758 22.0346 34.9025 C 0 0 0 0 0 0 0 0 0 0 0 0
4.4993 21.3632 35.4978 C 0 0 0 0 0 0 0 0 0 0 0 0
3.2212 19.3119 35.8667 C 0 0 0 0 0 0 0 0 0 0 0 0
3.6414 18.4619 37.0812 C 0 0 0 0 0 0 0 0 0 0 0 0
4.8093 18.4687 37.4675 O 0 0 0 0 0 0 0 0 0 0 0 0
2.7802 17.5339 37.4788 N 0 0 0 0 0 0 0 0 0 0 0 0
2.9539 16.6703 38.5416 C 0 0 0 0 0 0 0 0 0 0 0 0
4.2116 16.1015 38.8012 C 0 0 0 0 0 0 0 0 0 0 0 0
4.3288 15.0811 39.7307 C 0 0 0 0 0 0 0 0 0 0 0 0
3.1988 14.6282 40.3989 C 0 0 0 0 0 0 0 0 0 0 0 0
1.8372 16.2098 39.2215 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9543 15.2004 40.1397 C 0 0 0 0 0 0 0 0 0 0 0 0
5.6670 14.4759 40.0170 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7336 14.7236 40.8665 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
1 3 2 0
1 4 1 0
4 5 1 0
4 6 1 0
4 7 1 0
7 8 1 0
8 9 2 0
9 10 1 0
10 11 2 0
8 12 1 0
11 13 1 0
12 13 2 0
11 14 1 0
14 15 1 0
15 16 2 0
15 17 1 0
17 18 1 0
18 19 2 0
19 20 1 0
20 21 2 0
18 22 1 0
21 23 1 0
22 23 2 0
20 24 1 0
23 25 1 0
M END
The topology remains identical, but the coordinate frame is clearly shifted/rotated — meaning mol_true_frag was modified in-place during the call to find_most_similar_frag.
Impact
Downstream RMSD calculations use an already-aligned mol_true, so reported pose accuracy (especially success rate @2Å) is significantly biased high. This affects evaluations on Astex Diverse, PoseBusters, DockGen when select_most_similar_pred_frag=True (default).
Fix
Pass a copy of the true fragment to avoid modifying the original:
find_most_similar_frag(Chem.Mol(mol_true_frag), mol_pred_frags)(or Chem.Mol(mol_true_frag) inside the function before any RMSD computation)
After applying this fix
On AlphaFold3 (AF3) for Astex Diverse apo docking, the success rate (r.m.s.d. ≤ 2 Å) drops to approximately 60% — a much more realistic value compared to the inflated number before the fix.
Would appreciate if this could be fixed to ensure fair benchmarking. Happy to provide more details or a PR if needed.
Thanks for the great benchmark!