Optimal Homology probes

The normal function of prion protein is not currently understood. By comparing the sequence of prion protein to the existing databases of known sequences, homologous proteins of known function in other species might be found. In organisms such as yeast, the genome is entirely sequenced and there are very rapid methods for identifying protein function.

These homologous proteins might have a [slightly] different biological role in their host but might have binding sites for the same or similar small ligands. And this is all that is needed to construct therapeutic substrate analogues that could covalently and irreversibly inactivate the rogue form of prion protein.

Now homology searches for the entire prion amino acid sequence do not return matches other than prion proteins themselves. The reason for this is partly that the prion gene is single copy and without 'close' members of its superfamily. Also, homology search engines such as BLASTp are poor at inserting gaps -- the prion gene has numerous small gaps already in the mammals.

The octapeptide repeat has a variable expansion length. This region also amplifies slightly different palindromes in older lines. Its high glycine content further misleads BLASTp into high priority returned mismatches with unrelated proteins with poly-glycine stretches. The repeat generating region is a better candidate for searches than the repeats themselves.

It also makes no sense to include the N-terminal signal peptide, which is found in many unrelated proteins and quite variable in length and sequence, not requiring a canonical sequence. (Note however that the basic residues C-terminal to the cleavage are strongly conserved.)

For the same reasons, the GPI sequence should not be used in deep homology probes: it weakens the signal used by the homology search engine. There are separate compilations of all known GPI proteins that can be accessed by a Medline search and examined individually.

Since we are looking for distant homology (to nematode, fruit fly, zebrafish, yeast, or bacteria) or weak paralogous homology within mammals, the search should avoid using regions already hyper-variable within the placental mammals, such as the post-helix H3 stretch. Special emphasis should be given to invariant (or quasi-invariant) residues, especially those that are conserved in marsupial and chicken. In cases where placental mammal disagrees with chicken and marsupial, but a common placental substitution does agree, that variant should rule in the probe. Residues that are not quite invariant may only exhibit conservative subsitutions (e.g., valine to leucine) -- information that yields partial agreement in comparisons.

Proteins are commonly composed of domains. In some cases, domains are seemingly assembled from disparate sources by recombination and transposition; however, here the prion protein consists of a single exon. The search should still reflect domains identified in the NMR structure as well as apparent exposed hinges inferred from hydrolytic cleavage by proteases. Anchors such as the cysteines and substitued asparagines are also of value.

In short, a series of probes that span single or adjacent well-conserved structural features offer the optimal possibilities for finding homologies. Candidates can then be further scrutinized by more subtle testing. On this basis, the best probes for prion protein are:

Recommended Probe SequenceProbe Name
rpkgggwntggsnrypgqpgspggnryphpre-repeat P1
kpskpktnlkhvagaaaagavvgglggymlgsams core P2a
-----pktnlkhvagaaaagavvgglggymlg---- strong core P2b
msrpiihfgneyedryyrenmyrypnqvyyrp Helix H1 P3
yssqnnfvhdcvnitvkqhtvttttkgenftetdikimervveqmcitqy Helix H2 P4

Post-signal, pre-repeat: 
rpkgggwntggs-r---ypgq-gspggnryph: probe aligned
rpkgggwntggs-r---ypgq-gspggnrypp: mammal
rp-gggwnsggsnr---ypgqpgspggnryph: marsupial
kpsgggw-gagshrqpsyprqpg------yph: chicken

Mid-core past beta B1:
kpsk-pktnlkhvagaaaagavvgglggymlgsams: probe aligned
kpwkppktnfkhvagaaaagavvgglg-yamgrvms: chicken
kpdk-pktnlkhvagaaaagavvgglggymlgsams: marsupial
kpsk-pktnmkhmagaaaagavvgglggymlgsams: mammal
---------l--v-----------------------: relevent mmalian variants

Helix H1 through beta B2:
msrpiihfgneyedryyrenmyrypnqvyyrd: probe aligned
msgmnyhfdspdeyrwwsensarypnrvyyrd: chicken
msrpvihfgneyedryyrenqyrypnqvmyrp: marsupial
msrpiihfgsdyedryyrenmhrypnqvyyrp: mammal
---------ne----------y----------: mammal variants

Mid-helix H2 through helix H3:
yss---qnnfvhdcvnitv---------kqhtvttttkgenftetdikime-rvv-----eqmcitqy: probe aligned
ysspvpqdvfvadcfnitvteysigpaakkntsea-vaaanqteve---menkvvtkvire-mcvqqy: chicken
yss---qnnfvhdcvnitv---------kqhttttttkgenftetdikime-rvv-----eqmcitqy: marsupial
ysn---qnnfvhdcvniti---------kqhtvttttkgenftetdvkmme-rvv-----eqmcitqy: mammal
--s---------------v---------------------------i-i---------------v---: rel.mammal variants

An all-in-one probe:

rpkgggwntggsnrypgqpgspggnryph
kpskpktnlkhvagaaaagavvgglggymlg
samsmsrpiihfgneyedryyrenmyrypnqvyyrpyssqnnfvhdcvnitvkqhtvttttkgenftetdikimervveqmcitqy

A DNA probe for exon 1:
tcccccgcgttgtcggatcagcagaccgattctgggcgctgcgtcgcatcggtggcag

A DNA probe for exon 2:
gactcctgagtatatttcagaactgaaccatttcaaccgagctgaagcattctgccttcctagtggtaccagtccaatttaggagagccaagcagactg

A DNA probe for exon 3, UTR 5' leader:
ttttgcag agaagtcatc ATG [end of intron, beginning of exon 3, beginning of ORF]

A DNA probe for exon 3, UTR 3' trailer:
gggaggccttcctgcttgttccttcgcatttctcgtggtctaggctgggggaggggttatcc

11 Feb 1997 Homology searches:
Exon 1 Blastn results:
Hamster (Syrian golden) 

Query:     1 TCCCCCGCGTTGTCGGATCAGCAGACCGA 29
             |||||||||  ||| || |||||||||||
Sbjct:   345 TCCCCCGCGGCGTCCGAGCAGCAGACCGA 373
tccccc gcggcgtccg agcagcagac cgagaaggca catcgagtcc actcgtcgcg tcggtggcag	M.auratus match

Exon 2 Blastn results:
sequence esentially unique to prion promoters.

Exon 3 Blastn results:
seqkuence a little short for good probabilities.

Exon 3 3' trailer Blastn results:
some human matches but none significant.

C. elegans to GenBank similarities
C18H7 - Chromosome 4 - Finisher Aye Mon Tin 960924
Production: Bill Fronick
Bases: 40978
Bases: 40978
Genes: 13
cDNAs: 11
111.00  SW:PRIO_CHICK P27177 MAJOR PRION PROTEIN HOMOLOG PRECURSOR(ACETYLCHOLINE RECEPTOR-INDUCING ACTIVITY) 
740.00  TR:G559703 MRNA (KIAA0068) FOR ORF, PARTIAL CDS
110.00  SW:FES_FSVST P00543 TYROSINE-PROTEIN KINASE TRANSFORMING PROTEIN FES 
515.00  TR:G395145 CUTICULAR COLLAGEN

f11g11/950504
F11G11 - finishers Phil Latreille Rebecca Deadman 950504
Bases: 34348
Bases: 34348
Genes: 13
tRNAs: 0
cDNAs: 5
93.00  KCRB_CHICK P05122 CREATINE KINASE, B CHAIN
233.00  CC13_CAEEL P20631 CUTICLE COLLAGEN 13 PRECURSOR
94.00  THTR_CHICK P25324 THIOSULFATE SULFURTRANSFERASE
26.00  PRIO_CHICK P27177 MAJOR PRION PROTEIN HOMOLOG PRECURSOR

C. elegans BLAST search 16 Feb 1997
F25E2
  Length = 29,976

  Plus Strand HSPs:

 Score = 101 (27.9 bits), Expect = 5.4, P = 1.0
 Identities = 33/49 (67%), Positives = 33/49 (67%), Strand = Plus / Plus

Query:      1 ATGGTGAAAATCCACATAGGCAGCTGGATCCTGGTTCTCTTTGTGGCCA 49
              |||||||||||||  ||| | || ||||  |  || |   ||  | |||
Sbjct:  19395 ATGGTGAAAATCCGGATAAGGAGTTGGAAACCCGTACAAGTTCGGACCA 19443


>F47C12
  Length = 34,425

  Minus Strand HSPs:

 Score = 108 (29.8 bits), Expect = 3.2, P = 0.96
 Identities = 28/36 (77%), Positives = 28/36 (77%), Strand = Minus / Plus

Query:     47 CTTCAGCTCGGTTGAAATGGTTCAGTTCTGAAATAT 12
              |||||| | || || ||||||||||| | | ||| |
Sbjct:  30343 CTTCAGATAGGGTGGAATGGTTCAGTGCCGCAATGT 30378



>B0205
  Length = 40,951

  Minus Strand HSPs:

 Score = 115 (31.8 bits), Expect = 0.021, P = 0.020
 Identities = 31/41 (75%), Positives = 31/41 (75%), Strand = Minus / Plus

Query:     62 GGATAACCCCTCCCCCAGCCTAGACCACGAGAAATGCGAAG 22
              ||| | || | | | | ||||  ||||||||||||||| ||
Sbjct:  26547 GGACAGCCTCACTCACGGCCTCCACCACGAGAAATGCGCAG 26587


>T27D1
  Length = 22,559

  Minus Strand HSPs:

 Score = 96 (26.5 bits), Expect = 8.2, P = 1.0
 Identities = 24/30 (80%), Positives = 24/30 (80%), Strand = Minus / Plus

Query:     51 CCCCCAGCCTAGACCACGAGAAATGCGAAG 22
              | | | ||||  ||||||||||||||| ||
Sbjct:  18677 CACACGGCCTCCACCACGAGAAATGCGCAG 18706

------------
Feb 18 1996 yeast search with chicken core probe
 Score = 46 (21.1 bits), Expect = 0.35, P = 0.29
 Identities = 11/27 (40%), Positives = 16/27 (59%)

Query:     1 PKTNLKHVAGAAAAGAVVGGLGGYMLG 27
             P+T L+ +AG   +G  +GGL  Y  G
Sbjct:  1407 PQTPLRSLAGLIDSGIPLGGLTLYGSG 1433

YDR420W 1306301 1311709 HKR1  Hanenula mrakii killer toxin-resistance protein 
Annotation : ann-05369

Gene_info HKR1 
Summary References available in SGD for HKR1 
Reference Yabe, T., et al. (1996) HKR1 encodes a     
            cell surface protein that regulates both 
            cell wall beta-glucan synthesis and      
            budding pattern in the yeast             
            Saccharomyces cerevisiae. J Bacteriol    
            178:477-483                              
          Kasahara, S., et al. (1994) Cloning of     
            the Saccharomyces cerevisiae gene whose  
            overexpression overcomes the effects of  
            HM-1 killer toxin, which inhibits        
            beta-glucan synthesis. J Bacteriol       
            176:1488-1499                            

results for search with entry wp:k04h4.1>k04h4.1 ce00246 emb-9: collagen (cambridge) sw:p17139

msrlsllgltaavvllssfcqdrihvdaaaackgcappcvcpgtkgergnpgfggepghpgapgqdgpegapgapgmfgaegdfgdmgskgargdrglpgspghpglqgldglpglkgeegipgcngtdvsdlsksdicniihlsdvvsvlrvslecpdlldlqgnldktetlddqdspdhqekevsihkdakelkenledqefqvfqgnsgypglkgakgdpgpyglpgfpgvsglkgrmgvrtsgvkgekglpgppgppgqpgsypwaskpiemevlqglsdqlvgvkgekgrdgpvgppgmlgldgppgypglkgqkgdlgdagqrgkrgkdgvpgnygekgsqgeqglggtpgypgtkggagepgypgrpgfegdcgpegplgegtgapgqpgidgmpgytekgdrgedgypgfagepglpgepgdcgypgedglpgydiqgppgldgqsgrdgfpgipgdigdpgysgekgfpgtgvnkvgppgmtglpgepgmpgrigvdgypgppgnngergedcgycpdgvpgnagdpgfpgmngypgppgpngdhgdcgmpgapgkprsagsdglsgspglpgipgypgmkgeageivgpmenpagipglkgdhglpglpgrpgsdglpgypggpgqngfpglqgepglagidgkrgrqgslgipglqgppgdsfpgqpgtpgykgergadglpglpgaqgprgipaplrivnqvagqpgvdgmpglpgdrgadglpglpgpvgpdgypgtpgergmdglpgfpglhgepgmrgqqgevgfngidgdcgepgldgypgatrapgapgetgfgfpgqvgypgpngdagaaglpgpdgypgrdglpgtpgypgeagmngqdgapgqpgsrgesglvgidgkkgrdgtpgtrgqdggpgysgeagapgqngmdgypgapgdqgypgspgqdgypgpsgipgedglvgfpglrgehgdnglpglegecgeegsrgldgvpgypgehgtdglpglpgadgqpgfvgeagepgtpgyrgqpgepgnlaypgqpgdvgypgpdgppglpgqdglpglngergdngdsypgnpglsgqpgdagydgldgvpgppgypgitgmpglkgesglpglpgrqgndgipgqpglegecgedgfpgspgqpgypgqqgregekgypgipgenglpglrgqdgqpglkgengldgqpgypgsagqlgtpgdvgypgapgengdngnqgrdgqpglrgesgqpgqpglpgrdgqpgpvgppgddgypgapgqdiygptgqagqdgypgldglpgapglngepgspgqygmpglpggpgesglpgypgerglpgldgkrghdglpgapgvpgvegvpglegdcgedgypgapgapgsngypgerglpgvpgqqgrsgdngypgapgqpgikgprgddgfpgrdgldglpgrpgreglpgpmamavrnppgqpgengypgekgypglpgdnglsgppgkagypgapgtdgypgppglsgmpghggdqgfqgaagrtgnpglpgtpgypgspggwapsrgftfakhsqttavpqcppgasqlwegysllyvqgngrasgqdlgqpgsclskfntmpfmfcnmnsvchvssrndysfwlstdepmtpmmnpvtgtairpyisrcavcevptqiiavhsqdtsvpqcpqgwsgmwtgysfvmhtaagaegtgqslqspgscleefravpfiechgrgtcnyyatnhgfwlsivdqdkqfrkpmsqtlkagglkdrvsrcqvclknr