Prediction of Protein Binding Regions in Disordered Proteins

 

ANCHOR
Theory
How to use
IUPred
Downloads
Comments
 

Sequence input

There are two ways to input a protein sequence:

I - If the protein is deposited in the UniProt database (either in SwissProt or TrEMBL) you can specify the accession code or the ID of the protein in the "Enter SWISS-PROT/TrEMBL identifier or accession number" filed. The ANCHOR server is always linked to newest version of UniProt. The header of the UniProt entry will be displayed as the title in the results page.
II - Type or cut and paste your sequence in the "paste the amino acid sequence" filed. The amino acid sequence must be in the standard single letter code format. Spaces and other non-standard characters within the pasted sequence are permitted, however they will be removed with the remaining sequence treated as a single continuous chain. If the first line starts with the ">" character (e.g FASTA sequence headers) it will be used as the title in the results page. The minimum sequence length is 6 residues. The recommended sequence format is this:

>Name of the sequence
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIE


Multiple sequences

It is possible to input more than one sequence using the multiple sequence version of ANCHOR. Sequences can be input in either multiple FASTA format or by supplying a list of UniProt IDs/ACs. It is also possible to upload a file containing the sequences or a list of UniProt IDs/ACs. Motif searches are also supported and motifs can be uploaded in a file as well. The output is provided in a text-only form and is available via a temporary link that is provided in the results page and is also sent to the email address provided.

Motif search

The prediction of disordered binding regions can be complemented by motif searches. The motifs are specified by a standard regular expression (read more about regular expression syntax here). The format is

motif [name]
where name is optional. There should be only one motif per line. For example:
F...W..[LIV]  MDM2
[RK].L.{0,1}[FYLIVMP]	CYCLIN_1
[PA][^P][^FYWIL]S[^P] USP7_1

A complete list of current ELM motifs from the ELM database can be found here and a list of calmodulin binding motifs from the Calmodulin Target Database can be found here converted into this format. You can copy and paste it in the appropiate field, or when using ELM motifs it is possible to just specify the name of the motif and exclude the pattern itself, hence instead of:
[RKY]..P..P	LIG_SH3_1
P..P.[KR]	LIG_SH3_2
...[PV]..P	LIG_SH3_3
KP..[QK]...	LIG_SH3_4
P..DY	LIG_SH3_5
it is possible to write just:
LIG_SH3_1
LIG_SH3_2
LIG_SH3_3
LIG_SH3_4
LIG_SH3_5

Other motifs can also be specified. For example a motif to find proline-rich regions can be:
P+.?P{2,}.?P+     Poly-Proline
The server returns the starting and ending position of each hit of every motif searched together with the matched sequence. If the found motif is a known true positive instance of an ELM then the UniProt ID of the protein containing that true positive hit is also returned. If the graphical output mode is selected (see "Output type" section below), the results of the motif search are shown with colored boxes. Known true positive hits are indicated by red boxes and the rest of the hits are indicated by orange boxes (see eg. p53 in the Examples section).


Output type

Generate plot:
The graphical image (a png file) is generated using the JpGraph software. Large sequences are chopped into smaller fragments, but the user can change the window size of this plot. The server generates a plot with the profiles calculated by IUPred, a general disorder prediction method (in red), and ANCHOR, a prediction of disordered binding regions (in blue). Underneath the profile, predicted binding regions are indicated by horizontal bars. The bar is shaded according to the prediction score. Regions that are filtered out are marked by empty bars. If motifs were specified, the matching motifs are also indicated with colored boxes. The text output is also appended.

Raw data only:
This offers a simple text output and composed of several parts. The first part returns the list of the predicted binding regions. If some regions are filtered out, these are listed separately. The hits of the specified motifs are provided next. Finally, the prediction profile is returned. For each residue, it specifies its sequential number, residues type, and its score to be in disordered binding regions. This score can be between 0 and 1. An additional column indicates predicted binding regions by 1, otherwise it is 0. This takes into account the results of filtering.

Filtering

Currently there are two filtering criteria. Short regions with length below 6 residues and regions with an average IUPred score below 0.1 are filtered out (see the predictions for hemoglobin and glycophorin in the "Examples" section for demonstration on the effect of the two filtering criteria).

Examples

6 sample runs of ANCHOR are provided here to demonstrate the application of the server.

 
 
References:

Bálint Mészáros, István Simon and Zsuzsanna Dosztányi (2009)
Prediction of Protein Binding Regions in Disordered Proteins
PLoS Comput Biol 5(5): e1000376. doi:10.1371/journal.pcbi.1000376

Zsuzsanna Dosztányi, Bálint Mészáros and István Simon (2009)
ANCHOR: web server for predicting protein binding regions in disordered proteins
Bioinformatics 25(20): 2745-2746.