The online interactive tool, SAPS (Statistical Analysis of Protein Sequences) evaluates by statistical criteria a wide variety of protein sequence properties. Here's what the program returned when the amino acid sequence of bovine prion protein was run on August 27:
| A | 10( 3.8%) | C | 3( 1.1%) | D- | 6( 2.3%) | E | 8( 3.0%) | F | 7( 2.7%) |
| G++ | 51(19.3%) | H+ | 12( 4.5%) | I | 11( 4.2%) | K | 11( 4.2%) | L- | 11( 4.2%) |
| M | 9(3.4%) | N | 11(4.2%) | P | 18(6.8%) | Q | 17(6.4%) | R | 11(4.2%) |
| S | 15(5.7%) | T | 12(4.5%) | V | 18(6.8%) | W+ | 10(3.8%) | Y | 13(4.9%) |
| + charge | 22(8.3%) | - charge | 14(5.3%) | total charge | 36(13.6%) | net + charge | 8(3.0%) |
| FIKMNY | 62(23.5%) | AGP+ | 79(29.9%) | ser-thr | 27(10.2%) | hydrophobics | 56(21.2%) |
1 00+0000000 000000000- 0000+++0+0 000000000+ 0000000000 +000000000
61 0000000000 0000000000 0000000000 0000000000 0000000000 0+00+0+000
121 +000000000 0000000000 000000+000 0000-0--+0 0+-000+000 0000+00-00
181 00000000-0 00000+-000 0000+0-000 -0-0+00-+0 0-00000000 +-000000+0
241 0000000000 0000000000 0000
Positive charge clusters (cmin = 8/30 or 11/45 or 13/60): none Negative charge clusters (cmin = 6/30 or 8/45 or 10/60): none Mixed charge clusters (cmin = 11/30 or 15/45 or 18/60): none High scoring positive charge segments: none High scoring negative charge segments: none High scoring mixed charge segments: none
Aligned matching blocks: [ 56- 70] GGGGWGQPHGGGWGQ [ 97- 109] GGGGWGQ__GGTHGQ Simple tandem repeat: [ 57- 64] _GGGWGQPH [ 65- 72] _GGGWGQPH [ 73- 80] _GGGWGQPH [ 81- 88] _GGGWGQPH [ 89- 96] _GGGWGQPH [ 97- 103] GGGGWGQ__ [ 104- 105] GG
Nothing significant found.
1. Total number of amino acid multiplets: 26 (Expected range: 5-- 30) 2. Histogram of spacings between consecutive amino acid multiplets: (1-5) 11 (6-10) 8 (11-20) 5 (>=21) 3 3. Clusters of amino acid multiplets: none 4. Significant specific amino acid altplets (e.g., RG, EAEAEA, etc.):AA ........ Observed (Critical number)
Leu-Ile 7 (5) position: 10, 149, 244, 252, 254, 258, 261
Nothing significant was found:
1. Total number of charge multiplets: 2 (Expected range: 0-- 7) 1 +plets (f+: 8.3%), 1 -plets (f-: 5.3%) Total number of charge altplets: 5 (Critical number: 9) 2. Histogram of spacings between consecutive charge multiplets: (1-5) 0 (6-10) 0 (11-20) 0 (>=21) 3
AMINO ACID ALPHABET
Location Period Element
31- 102 6 G.G... 10 4
60- 99 8 WGQPHGGG 5 5 !
90- 110 3 G.. 6 4
130- 145 4 G... 4 4
HYDROPHOBICITY ALPHABET {*= KRED; i= LVIF; 0}
Location .... Period ..... Element
251- 263 1 LVIF-type 12 7 ! 1
Location (Quartile) Spacing Rank P-value Interpretation 0- 157 (2.) E( 157)E 1 of 9 0.0056 large 1. maximal spacing 10- 150 (2.) I( 140)I 1 of 12 0.0023 large maximal spacing 20- 155 (2.) -( 135)- 1 of 15 0.0005 large maximal spacing 21- 123 (2.) V( 102)V 1 of 19 0.0020 large maximal spacing 34- 60 (1.) W( 26)W 2 of 11 0.9959 small 2. maximal spacing 51- 112 (2.) *( 61)* 1 of 37 0.0013 large maximal spacing 110- 265 (3.) W( 155)W 1 of 11 0.0012 large 1. maximal spacing 153- 206 (3.) G( 53)G 1 of 52 0.0001 large 1. maximal spacing 163- 197 (3.) E( 34)E 2 of 9 0.9694 small 2. maximal spacing 206- 240 (4.) G( 34)G 2 of 52 0.0000 large 2. maximal spacing
64_73 GGTNWG..QP .....HP... ......... 74_83 GGSNWG..QP .....HP... ......... 84_92 GGSSWG..QP .....H.... ......... 93_100 GGSNWG..QG .......... ......... 101_129 GYNKWKPDKP KTNLKHVAGA AAAGAVVGG 57_66 GWGH...... .P........ QGGGT..... .... 67_76 NWGQ...... .P........ HPGGS..... .... 77_86 NWGQ...... .P........ HPGGS..... .... 87_95 SWGQ...... .P........ HGGS...... .... 96_129 NWGQGGYNKW KPDKPKTNLK HVAGAAAAGA VVGG 56_75 PGWGHPQGGG TNWG..QP.. ...HPGG... ....... 76_94 SNWGQPHPGG SSWG..QP.. ...HGG.... ....... 95_129 SNWGQ..GGY NKWKPDKPKT NLKHVAGAAA AGAVVGG 4_16 IQLGYWILVL FIV 245_257 VTLLFLSFLI FLI 18_47 WSDLGLCKKP KPRPGGG..W NSGGSNRY.P GQP 78_110 WGQPHPGGSS WGQPHGGSNW GQGGYNKWKP DKP 29_70 PRPGGGWNSG GSNRYPGQPG SPGGNRYPGW GHPQGGGTNW GQ 71_99 PHPG...... GSN..WGQP. HPGGS...SW GQPHGG.SNW GQ 37_45 SGGSNRYPG 49_57 SPGGNRYPG 42_57 RYPGQPGSPG GNRYPG 161_176 RYPNQVMYRP IDQYSS 139_149 MSRPVIHFGN E 167_177 MYRPIDQYSS Q 143_167 VIHFGNEYED ...RYYRENQ YRYPNQVM 220_247 ITQYQAEYEA AAQRAYNMAF FSAPPVTL
The sets are ordered on the numbers of repeating fragments they obtain. Finally, for each set the repeat segments are multiply aligned. The repeat fragments in different sets can be similar but reflect different sequence site ranges (i.e. the repeats are shifted with one to a few residues).
repeat 1 length = 20 score = 89.00 mean score = 4.45 56 PGWGHPQGGG TNWGQPHPGG 75 || | || ||||| | 76 SNWGQPHPGG SSWGQPHGGS 95
- repeat 2 length = 43 score = 82.00 mean score = 1.91 29 PRPGGG-WNS GGSNRYPGQP GSPGGNRYPG WGHPQGGGTN WGQ 70 | ||| | ||| ||| || | || | ||| 71 PHPGGSNW-- -------GQP -HPGG---SS WGQPH-GGSN WGQ 99
- repeat 3 length = 10 score = 58.00 mean score = 5.80 67 NWGQPHPGGS 76 |||||||||| 77 NWGQPHPGGS 86
- repeat 4 length = 33 score = 47.00 mean score = 1.42 18 WSD--LGLCK KPKPRPGGGW NSGGSNRY-P GQP 47 | | | | | || | | | 78 WGQPHPGGSS WGQPHGGSNW GQGGYNKWKP DKP 110
- repeat 5 length = 10 score = 45.00 mean score = 4.50 77 NWGQPHPGGS 86 ||||| | 87 SWGQPHGGSN 96
- repeat 6 length = 18 score = 42.00 mean score = 2.33 74 GGSNWGQPHP GGSSWGQP 91 ||||||| | | 93 GGSNWGQ--G GYNKWKPD 108
- repeat 7 length = 39 score = 39.00 mean score = 1.00 56 PGWGHPQGGG TNWGQPHPGG SNWGQPHPGG SSWGQPHGG 94 || ||| | | | | || 95 SNWG--QGGY NKWKPDKPKT NL--KHVAGA AAAGAVVGG 129
- repeat 8 length = 8 score = 38.00 mean score = 4.75 84 GGSSWGQP 91 ||| ||| 93 GGSNWGQG 100
- repeat 9 length = 28 score = 37.00 mean score = 1.32 143 VIHFGNEYE- --DRYYRENQ YRYPNQVM 167 ||| | | | 220 ITQYQAEYEA AAQRAYNMAF FSAPPVTL 247
- repeat 10 length = 9 score = 36.00 mean score = 4.00 37 SGGSNRYPG 45 | | ||||| 49 SPGGNRYPG 57
- repeat 11 length = 11 score = 31.00 mean score = 2.82 139 MSRPVIHFGN E 149 | || 167 MYRPIDQYSS Q 177
- repeat 12 length = 16 score = 30.00 mean score = 1.88 42 RYPGQPGSPG GNRYPG 57 ||| | | 161 RYPNQVMYRP IDQYSS 176
- repeat 13 length = 13 score = 30.00 mean score = 2.31 4 IQLGYWILVL FIV 16 | | 245 VTLLFLSFLI FLI 257
-
Search performed with following values:
pI = 9.4
Mw = 25000
delta-pI = 0.3
delta-Mw = 5000
OS or OC = MAMMALIA
---------------------
154 proteins found (only a few are retained here)
ABME_RABIT (P47855)
[pI = 9.17; Mw = 27719.0]
MASEK GPSNK DYTLR RRIEP WEFEV FFDPQ ELRKE ACLLY
ACRO_PIG (P08001)
[pI = 9.66; Mw = 43837.24]
[pI = 8.95; Mw = 2611]
[pI = 9.35; Mw = 2790]
MLPTAVLLVLAVSVAARDNATCDGPCGLRFRQKLESGMRV
ARL4_HUMAN (P40617)
[pI = 9.26; Mw = 22615.0]
MGNGLSDQTSILSNLPSFQSFHIVILGLDCAGKTTVLYRL
ARL4_RAT (P41275)
[pI = 9.26; Mw = 22588.0]
MGNGLSDQTSILSSLPSFQSFHIVILGLDCAGKTTVLYRL
ASH2_RAT (P19360)
[pI = 9.32; Mw = 27694.7]
MESHFNWYGVPRLQKASDACPRESCSSALPEAREGANVHF
ATP6_HALGR (P38591)
[pI = 9.30; Mw = 24862.6]
MNENLFASFTTPTMMGLPIVILIVLFPSILFPSPDRLINN
ATP6_PHOVI (Q00521)
[pI = 9.69; Mw = 24793.5]
MNENLFASFATPTMMGLPIVILIVLFPSILFPSPDRLINN
ATP6_RAT (P05504)
[pI = 9.60; Mw = 25050.1]
MNENLFASFITPTMMGLPIVVTIIMFPSILFPSSERLISN
ATPF_BOVIN (P13619)
[pI = 9.14; Mw = 24668.5]
PVPPLPEHGGKVRFGLIPEEFFQFLYPKTGVTGPYVLGTG
MIMSSYLMDSNYIDPKFPPCEEYSQNSYIPEHSPEYYGRT
PRIO_BOVIN (P10279)
[pI = 9.49; Mw = 25939.9]
MVKSHIGSWILVLFVAMWSDVGLCKKRPKPGGGWNTGGSR
PRIO_HUMAN (P04156)
[pI = 9.39; Mw = 25235.2]
MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPG
PRIO_MACFA (P40254)
[pI = 9.45; Mw = 25250.3]
MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPG
PRIO_MESAU (P04273)
[pI = 9.58; Mw = 22973.4]
MANLSYWLLALFVAMWTDVGLCKKRPKPGGWNTGGSRYPG
PRIO_MOUSE (P04925)
[pI = 9.56; Mw = 25477.4]
MANLGYWLLALFVTMWTDVGLCKKRPKPGGWNTGGSRYPG
PRIO_MUSVI (P40244)
[pI = 9.45; Mw = 25074.1]
MVKSHIGSWLLVLFVATWSDIGFCKKRPKPGGGWNTGGSR
SPC4_RAT (P42667)
[pI = 9.15; Mw = 20599.1]
MLSLDFLDDVRRMNKRQLYYQVLNFGMIVSSALMIWKGLM
; Mw = 25791.3]
MGVFCLGPWGLGRKLRTPGKGPLQLLSRLCGDHLQAIPAK
VATD_BOVIN (P39942)
[pI = 9.44; Mw = 28333.6]
MSGKDRIEIFPSRMAQTIMKARLKGAQTGRNLLKKKSDAL
VATE_MOUSE (P50518)
[pI = 9.28; Mw = 26587.8]
1 8 PRIO_HUMAN 9.39 25235 PRION PROTEIN. 2 73 C1QA_HUMAN 9.34 23688 COMPLEMENT C1Q SUBCOMPONENT, A CHAIN. 3 193 HXD4_HUMAN 9.44 27895 HOMEOBOX PROTEIN HOX-D4 (HOX-4B) (HOX-5 4 230 CAP7_HUMAN 9.53 24051 AZUROCIDIN. 5 231 MCP1_HUMAN 9.60 25030 CHYMASE. 6 231 UNG1_HUMAN 9.28 25791 URACIL-DNA GLYCOSYLASE. 7 257 C24A_HUMAN 9.61 20827 CYTOCHROME B-245 LIGHT CHAIN (P22 8 278 MYP0_HUMAN 9.57 24763 MYELIN P0 PROTEIN. 9 286 RS3_HUMAN 9.68 26688 40S RIBOSOMAL PROTEIN S3. 10 287 BTF3_HUMAN 9.41 22168 BTF3A. 11 321 OC3B_HUMAN 9.39 30084 OCTAMER-BINDING TRANSCRIPTION FACTOR 3B 12 327 HXA5_HUMAN 9.32 29359 HOMEOBOX PROTEIN HOX-A5 (HOX-1C). 13 343 CD28_HUMAN 9.39 23085 T-CELL-SPECIFIC SURFACE GLYCOPROTEINCD28 14 344 RU2B_HUMAN 9.72 25486 U2 SMALL NUCLEAR RIBONUCLEOPROTEIN B". 15 345 TCF1_HUMAN 9.62 30264 T-CELL-SPECIFIC TRANSCRIPTION FACTOR 1 16 345 T2FB_HUMAN 9.24 28380 TRANSCRIPTION INITIATION FACTOR IIF, 17 352 CDX1_HUMAN 9.58 28124 HOMEOBOX PROTEIN CDX-1. 18 355 G11_HUMAN 9.30 28885 G11 PROTEIN. 19 356 S5A2_HUMAN 9.47 28393 3-OXO-5-ALPHA-STEROID 4-DEHYDROGENASE 2 20 363 HXC5_HUMAN 9.57 24976 HOMEOBOX PROTEIN HOX-C5 (HOX-3D) (CP11)
... rat prion ...... 914 chicken prion ...... 443 fruit fly hnRNP ..... 151 human fibrinogen alpha-1 chain precursor 137...
* Run with chicken prion protein on the theory that it might be more closely related to older homologous proteins.
... rat prion 392 slime mold annexin 204 human annexin 147 ...* Test against fragment of chicken hexapeptide repeat: PSGGGWGAGS HRQPSYPRQP GYPHN PGYPH is interesting because other prions are not near neighbors.
... slime mold annexin human coup transcription ...
Nature 362: 213-4 (1993) "An anti-prion protein?" [letter], Moser M; Oesch B; Bueler H Nature 351: 106 (1991) "Anticipating the anti-prion protein?"[letter], Goldgaber D Gene 159: 181-186 1995 Windl, O et al
Using neural networks trained on eukaryotic signal peptide data. prion: first 43 residues of consensus sequence # pos aa C S Y 1 M 0.011 0.919 0.000 2 V 0.011 0.942 0.000 3 K 0.011 0.945 0.000 4 S 0.011 0.945 0.000 5 H 0.013 0.949 0.000 6 I 0.010 0.940 0.000 7 G 0.012 0.928 0.000 8 S 0.012 0.962 0.000 9 W 0.012 0.950 0.000 10 L 0.011 0.971 0.005 11 L 0.013 0.982 0.016 12 V 0.012 0.981 0.022 13 L 0.012 0.978 0.034 14 F 0.014 0.966 0.046 15 V 0.013 0.920 0.055 16 A 0.017 0.929 0.071 17 T 0.033 0.910 0.111 18 W 0.042 0.920 0.138 19 S 0.047 0.907 0.157 20 D 0.201 0.832 0.347 21 L 0.068 0.820 0.212 22 G 0.136 0.799 0.312 23 L 0.111 0.436 0.292 24 C 0.074 0.395 0.236 25 K 0.656 0.153 0.694 26 K 0.075 0.129 0.224 27 R 0.106 0.098 0.255 28 P 0.016 0.079 0.094 29 K 0.060 0.057 0.167 30 P 0.013 0.049 0.071 31 G 0.020 0.041 0.080 32 G 0.025 0.034 0.078 33 G 0.017 0.035 0.055 34 W 0.050 0.044 0.072 35 N 0.017 0.030 0.035 36 T 0.014 0.028 0.022 37 G 0.013 0.029 0.017 38 G 0.011 0.029 0.012 39 S 0.014 0.036 0.009 40 R 0.014 0.031 0.005 41 Y 0.017 0.030 0.000 42 P 0.012 0.037 0.000 43 G 0.011 0.037 0.000 Signal peptide? C max = 0.656 at pos. 25 S max = 0.982 at pos. 11 YES (cutoff = 0.87) Y max = 0.694 at pos. 25 YES (cutoff = 0.31) S mean (from 1 to 24) = 0.849 YES (cutoff = 0.50) Most likely cleavage site between pos. 24 and 25