Prediction of Protein Binding Regions in Disordered Proteins


How to use


Human calpastatin (UniProt ID: P20810)


Calpastatin is a fully disordered protein that inhibits calpain. It contains an N-terminal domain (1-136) and four subsequent inhibitory domains (137-277, 278-426, 427-563 and 564-708) that can bind to calpain. Each inhibitory domain contains three binding segments A, B and C. All three of these have transient structures even in the unbound form with segments A and C exhibiting a strong alpha-helical preference, while segment B shows a type-I beta structural preference. The figure shows the N terminal part of calpastatin containing the first inhibitory domain with all three preformed binding elements (A: 148-166, B: 186-206, C:223-241) correctly predicted.

Human calcium/calmodulin-dependent protein kinase IV (UniProt ID: Q98TZ2)

Calcium/calmodulin-dependent protein kinase IV

Calcium/calmodulin-dependent kinase IV binds to calmodulin via the basic 1-8-14 motif (a subclass of 1-14 CaM binding motifs) near its C-terminal end (residues 327-346). This patch is correctly identified using ANCHOR as shown in the figure together with the binding motif ([RK][RK][RK][FILVW]......[FAILVW].....[FILVW]).

Human p53 (UniProt ID: P04637)


p53 is a tumor suppressor protein involved in apoptosis, cell cycle control and transcription regulation. It consists of the disordered N-terminal and C-terminal parts and the central largely ordered DNA binding domain.
The N-terminal part is known to bind to at least three different globular protein partners: the segment between residues 17-27 binds to MDM2 via an MDM2 binding motif and a prefromed alpha-helix, the other two binding sites overlap with residues 33-56 binding to RPA 70N and residues 45-58 binding to the B subunit of RNA polymerase II.
The C-reminal part contains a tetramerization domain (325-356) and a regulatory region (360-391) that is able to bind to a number of globular proteins with overlapping binding sites (USP7 via the USP7 binding motif: 359-363, SET9: 369-374, Suritin: 372-384, S100bb: 375-388, CyclinA via the cyclin binding motif: 378-386, CBP: 380-385). This region is able to adopt all three secondary structures upon binding.
All the above binding sites are correctly recovered in the prediction. The figure also shows the motif search results for the three known binding motifs that p53 contains. Note, however, that some of the hits are false positives with only three of them being known true positive instances (19-26 for the MDM2 binding motif, 359-368 for the USP binding motif and 381-385 for the cyclin binding motif).

Wiskott-Aldrich syndrome protein (WASp) (UniProt ID: P42768)


WASp is composed of various functional domains: it contains an ordered WH1 domain near the N terminus (residues 39-148), the disordered GTPase-binding domain (GBD, 230-310) that binds to CDC42 and EspFU, a polyproline-rich region and a disordered C-terminal verpolin homology/central region/acidic region (VCA, 430-502) domain that interacts with actin, Arp 2/3 complex and FBAA. These binding sites are correctly identified by ANCHOR.
There is experimental evidence that WASp hubs a number of interactions with many more partners including RAC, NCK, FYN, SRC kinase FGR, BTK, ABL, PSTPIP1, WIP, and the p85 subunit of PLC-gamma. However, the location of many of these binding regions is not known.
WASp binds to SRC Homology 3 (SH3) domains through one of its proline rich regions (157-192 and 309-416) although the exact binding site is not known. The interaction with SH3 domains is usually mediated by an SH3 motif that is present in the interaction partner. The figure shows that the found motifs are clustered in two separate regions mainly falling into the proline-rich regions. As linear motifs were shown to have a preference to reside in disordered regions, it is plausible to expect ANCHOR to be able to recognize the SH3 binding region of WASp. This prediction can restrict the candidate sequence regions for SH3 binding and can guide experimental studies to localize true binding sites.

Human glycophorin-A (UniProt ID: P02724)


Glycophorin-A is a single pass transmembrane protein that contains a signal peptide at its N-terminus (1-19) and a transmembrane segment (92-114). The figure shows that although both regions would be predicted to be disordered binding regions due to their strongly hydrophobic composition and their relatively disordered sequential environment. However, the built-in filter of ANCHOR removes these false positive hits leaving no predicted binding regions in the final prediction (no solid blue boxes in the 'Binding regions' bar).

Human hemoglobin subunit alpha (UniProt ID: P69905)

Hemoglobin subunit alpha

Hemoglobin subunits are globular proteins, lacking disordered regions and thus lacking disordered binding regions as well. The figure shows that the IUPred prediction (red line) is consistent with this fact, predicting no disorder in the protein. There is however one short segment for which the ANCHOR prediction (blue line) is positive (although very slightly). This false positive hit is filtered out by one of the built-in filters of ANCHOR yielding a correct final prediction of no disordered binding sites at all (no solid blue boxes in the 'Binding regions' bar).

Bálint Mészáros, István Simon Zsuzsanna Dosztányi (2009)
Prediction of Protein Binding Regions in Disordered Proteins
PLoS Comput Biol 5(5): e1000376. doi:10.1371/journal.pcbi.1000376