Grantee Research Project Results
Final Report: Risk Assessment of Food Allergenicity by a Data Base Approach
EPA Grant Number: R833137Title: Risk Assessment of Food Allergenicity by a Data Base Approach
Investigators: Braun, Werner
Institution: The University of Texas Medical Branch - Galveston
EPA Project Officer: Hahn, Intaek
Project Period: October 1, 2006 through September 30, 2009
Project Amount: $600,000
RFA: Biotechnology: Potential Allergenicity of Genetically Engineered Foods (2006) RFA Text | Recipients Lists
Research Category: Human Health
Objective:
The allergenic potential of genetically-engineered food products needs to be carefully assessed prior to their entry into the market. As the number and complexity of these bioengineered foods increases, the agency supports the development of scientific resources for determining potential allergenicity, including the availability of up-to-date bioinformatics tools to make such assessments. In this project we focused on the further development of bioinformatics tools to increase the specificity and sensitivity of bioinformatics tools in predicting the allergenicity of food proteins.
Summary/Accomplishments (Outputs/Outcomes):
Achievements:
Maintaining our Structural Database of Allergenic Proteins (SDAP).
The SDAP Scientific Advisory Board. To obtain outside advice for the content and user interface of SDAP we have established a Scientific Advisory Board consisting of the prominent researchers in allergy research: Heimo Breiteneder, Head, Division of Medical Biotechnology, Medical University of Vienna, Chairman, IUIS Allergen Nomenclature Sub-committee; Martin D. Chapman, President, INDOOR Biotechnologies, Member, IUIS Allergen Standardization, Member, IUIS Allergen Nomenclature Sub-committee; Soheila J. Maleki, USDA-ARS-SRRC, and Tulane School of Medicine.
Current state of the content of SDAP. The update of the content and format of the Structural Database of Allergenic Proteins (SDAP, SDAP 2.0 - Structural Database of Allergenic Proteinsis a continuous process, and new allergens are added after we find publications on new allergens in literature or in the IUIS allergen data base. The database component of SDAP contains information regarding the allergen name, source, sequence, structure, IgE epitopes, protein family based on Pfam classification, and literature references. The allergen nomenclature is developed and maintained by International Union of Immunological Societies (IUIS) Allergen Nomenclature Sub-committee (www.allergen.org). This systematic nomenclature is based on the Linnean system and provides a unique and unambiguous classification for allergenic proteins. The IUIS allergen database is the major source of primary data for SDAP, and we actively coordinate our updates with those of the IUIS database. After a new allergen is added to SDAP, we identify and collect sequence and structure data from other general databases, such as gene databases (Genbank), protein sequence databases (Uniprot), protein structure databases (PDB), or protein family databases (Pfam). Currently, the SDAP database consists of:
- 1396 Allergens and isoallergens
- 1221 Protein sequences for allergens and isoallergens
- 70 Allergens with experimentally determined three-dimensional structures
- 582 3D models for allergens and isoallergens
- 28 Allergens with IgE epitope sets
- 130 Pfam allergen classes
Update of the SDAP software. Because the SDAP database of allergens grows constantly due to the increasing number of publications on allergens, we updated the SDAP software to allow for faster searches of the MySQL database. Other software updates were performed as a consequence of changes in external protein databases and bioinformatics servers. As SwissProt transferred the protein database to UniProt, we changed all corresponding links, and they all now point to UniProt records. Blast and Fasta allergen searches in Genbank, PIR and Swissprot were updated due to changes in the corresponding servers. The current protein viewer in SDAP is Jmol, because it is actively maintained and developed. The use of other viewers was dropped because they are no longer maintained and we noticed connectivity problems. Allergen 3D models generated with MPACK are now included in SDAP, and users may download them, or use Jmol to investigate their structure. MolMol pictures of all allergen models are provided, and a new list with allergen models was added to SDAP.
Publications on bioinformatics tools in SDAP:
Three publications give a detailed account of the capabilities and software tools implemented in our
Structural Database of Allergenic Proteins (SDAP):
- Schein, C.H. Ivanciuc O. and Braun, W. Bioinformatics approaches to classifying allergens and predicting cross-reactivity. Immunol. Allergy Clin. North Am., 27(1):1-27 2007.
- Ivanciuc, O., Schein, C.H., Garcia, T.I., Oezguen, N., Negi, S.S. and Braun, W. Structural analysis of linear and conformational epitopes of allergens. Regul. Toxicol. Pharmacol., 54(3 Suppl):S11-19, 2009.
- Schein, C.H., Ivanciuc, O., Midoro-Horiuti, T., Goldblum, R.M. and Braun, W. An allergen portrait gallery: representative structures and an overview of IgE binding surfaces. Bioinform. Biol. Insights, 4, 113-125, 2010.
Use of the SDAP Web server. SDAP is heavily used by the scientific community as indicated by an analysis of the SDAP web server log files. The statistics for the last nine months (February 2010 to October 2010) from Figure 1 shows the monthly counts for the number of sites (individual computers) that use SDAP, data transferred from SDAP in Gbytes, number of work sessions, and number of pages served. The number of monthly users is about 1,500, with more than 25,000 work sessions each moth, which indicates that SDAP is a mature server with a consistent base of users.
Figure 1. SDAP usage statistics for February-October 2010: (a) number of sites (number of unique IP addresses/hostnames that made requests to SDAP); (b) data transferred from SDAP in Gbytes; (c) number of visits (work sessions); (d) number of pages served.
Comprehensive 3D modeling of allergenic proteins. Similarities in sequences and 3D structures of allergenic proteins provide vital clues to identify clinically relevant IgE cross-reactivities, yet experimental 3D structures are only available for a small fraction of allergens. We performed a large- scale homology modeling for all allergen sequences with no experimentally determined 3D structure, and we included in SDAP 582 3D allergen models that passed several quality criteria. These models may be viewed or downloaded from SDAP, and can be used for epitope mapping or design of hypoallergenic proteins.
As an example of their use, experimentally derived “continuous IgE epitopes” were mapped on three experimentally determined structures and 13 of our 3D-models of allergenic proteins. Large portions of those continuous sequences are not entirely on the surface and therefore cannot interact with IgE or other proteins. Only the surface exposed residues are constituents of “conformational IgE epitopes” which are not in all cases continuous in sequence. The surface exposed parts of the experimentally determined continuous IgE epitopes showed a distinct statistical distribution as compared to their presence in typical protein-protein interfaces. The amino acids Ala, Ser, Asn, Gly and particularly Lys have a high propensity to occur in IgE binding sites. The 3D-models will facilitate further analysis of the common properties of IgE binding sites of allergenic proteins. These results are published in Oezguen et al., Mol Immunol. 45, 3740-3747, 2008.
AllerML: A New Markup Language for Exchange of Data between different Databases. The information exchange between major allergen services, databases and users is currently hindered by the absence of a common standard language that may be used by software systems to compare databases, import, export and process information. We started the development of a new allergen markup language (AllerML), a general and uniform system for representing structural biology data on allergens. AllerML is proposed as a standard computer readable format in storing and exchanging allergen data between different databases, bioinformatics servers, and users. Using allergen information and computational tools from SDAP as blueprint, we developed a comprehensive set of AllerML tags, as well as rules for encoding allergens and isoallergens, protein sequences, PDB structures, 3D-models, IgE epitopes, MotifMate motifs, references, and cross-references with other protein databases. The AllerML may be extended with new tags to accommodate information from other allergen databases. In the next stage we will implement AllerML in SDAP thus providing a computer-readable access to our allergen database. A manuscript describing the AllerML tags and their usage is in preparation.
Validation of the PD index for cross-reactive peptides. We suggested previously the use of a new sequence similarity measure, the PD index, to locate potential IgE epitopes of related allergens that might be responsible for clinical observed cross-reactivity between allergens. The identification of the structural basis of cross-reactivity for proteins from genetically modified organisms could lead to more precise predictions of the potential allergenicity of novel recombinant food products and plant incorporated protectants. The PD score is computed from the amino acids descriptors E1-E5 representing the principal components of a dataset of 237 properties of amino acids. To demonstrate the usefulness of the PD score in detecting peptides that could cross-react with known IgE epitopes, we designed a peptide array starting from three IgE epitopes (i.e., AFNQFGPNAGQR, MPRARYGL, and WRSTRDAFING) of Jun a 1, the dominant allergen from the mountain cedar pollen. For each epitope, we generated peptides with PD values between 0 and 10 and also selected sequences from SDAP with PD values in this area. The peptides were synthesized on a membrane that was probed with a pool of sera from patients allergic to Jun a 1, and the experimental results showed that a low PD value of less then 5 is a good predictor for cross-reactivity (Ivanciuc et al., Mol. Immunol., 46(5), 873-883, 2009).
A new analysis showed that we also can extract the relevant information for position specific properties for our data. We adopted a procedure widely used in QSAR (quantitative structure-activity relationships), namely PLS (partial least squares). The peptide sequences are translated into an array of E1-E5 descriptors, and then principal components are extracted with the PLS algorithm. These principal components are then used to establish sequence-activity models that highlight the main physicochemical properties of amino acid positions for cross-reactivity to Jun a 1 (ms in preparation).
Using a combined sequence/ structural analysis in IgE epitope prediction for walnut and peanut allergens. The PD scale is the first validated algorithm for predicting IgE epitopes. Given a known linear IgE binding sequence, the automated PD tool in SDAP finds peptides in other allergens that have a low physicochemical distance, i.e., are peptides that are similar in their physicochemical properties. This was demonstrated for peptides that had a low PD value to known IgE epitopes of cedar pollen allergen Jun a 1. Our recent work has been to determine whether combining the PD tool with structural visualization of the known linear IgE epitopes, catalogued in SDAP, can increase the success rate in determining linear sequences likely to bind to patient IgE. This work revealed that specific domains of walnut and peanut allergens that are similar in structure may contribute to cross- reactivity between these nuts (ms in preparation).
Determining allergen-specific motifs. Allergen-specific motifs, defined as a short sequence that is common to many related known allergens, provide an alternative way to predict IgE cross-reactivity. Thus we defined motifs in families of allergenic proteins (i.e., groups of allergens that belong to the same protein family as defined in the data base Pfam) that distinguished them from non-allergenic proteins of the same Pfam. We defined specific sequence regions with common physicochemical properties, PCP-motifs that may distinguish allergenic proteins. We made a comprehensive assignment of all known allergens according to the existing classification scheme available on our SDAP web site. All major allergens belong to about 30 structural families, consistent with the results of others. We showed in three examples that motifs we defined as characteristic of allergens in a given
Pfam coincided with previously determined IgE epitopes. The motifs thus represent a promising way to identify linear IgE epitopes that are likely to be responsible for IgE cross-reactivities. All sequence motifs for the major Pfam families with allergens can be obtained from our web server MotifMate. Thus the PD scale and PCPmer motifs can be used
to screen novel proteins for the presence of sequences similar to those found in known allergens to assess the potential risk of allergenicity in recombinant food products.
Determining conformational epitopes. We developed also a fully automated search method, EpiSearchEpiSearch: Mapping of Conformational Epitopesthat predicts the possible location of conformational epitopes on the surface of an antigen. The algorithm uses peptide sequences from phage display experiments as input, and ranks all surface exposed patches according to the frequency distribution of similar residues in the peptides and in the patch. We have tested the performance of the EpiSearch algorithm for six experimental data sets of phage display experiments. In all these examples the conformational epitopes as determined by the X-ray crystal structures of the antibody-antigen complexes, were found within the highest scoring patches of EpiSearch, covering in most cases more than 50% residues of experimental observed conformational epitopes. (Published in Negi, S.S. and Braun, W. Automated Detection of Conformational Epitopes using Phage Display Peptide Sequences. Bioinform. Biol. Insights, 3:1-12, 2009).
Experimental test with phage display data from cockroach allergen Bla g 2. As an alternative test we also assessed computational predictions using EpiSearch and phage display data from the cockroach allergen Bla g 2. We compared the EpiSearch predictions to the antibody binding site as determined by X-ray crystallography. Peptides binding to the antibody mAb 7C11 were obtained in the laboratory of Dr. R. Goldblum from a series of phage panning experiments. MAb 7C11 was used to screen a random 12 mer peptide phage display library. After three rounds of panning, 40 phage clones were isolated and their peptides sequenced. The relative affinities of the mAb for the phage-born peptides were tested by ELISA. The peptides were then mapped to the surface of the known 3D structure of the allergen Bla g 2 to predict the conformational epitope most consistent with these random peptide sequences using our EpiSearch program. After we finished our prediction, the X-ray crystal structure of Bla g 2/ mAb 7C11 was made known to us. Indeed, the prediction of EpiSearch was consistent with the experimental result. Precise determination of conformational IgE epitopes is a key step in predicting potential cross-reactivities among related allergens, and we propose that phage display methods can be used as an alternative strategy to identify potential cross-reactivity (ms in preparation).
In silico screening of cross-reactive epitopes in a 3D data base of allergens. In the past computational methods to predict clinically observed cross-reactivity were based entirely on amino acid sequence analysis of the proteins involved. We developed a new computational method that also includes 3D structural information on conformational epitopes in the prediction of cross-reactivity. IgE antibodies recognize a small patch of amino acids in the folded protein surface known as conformational epitope. Experimentally conformational epitopes can be determined from X-ray crystallography or phage display experiments. Our new method to predict the cross-reactivity between allergenic proteins using structural similarity measures the similarity of surface exposed patches. The new method successfully predicted the cross-reactivity of eight independent allergen proteins consistent with known clinical observations. Our result shows that the structural feature of an allergenic protein plays an important to determine its cross reactivity. A ms describing the methods is in preparation.
Experimental validation of the computational tools for mountain cedar allergens. We also progressed in the last year of this grant with our continued efforts to provide experimental validation of the computational prediction of allergenicity and IgE epitopes. The phage display studies cited above have provided new data for validating the combination of optimized phage display to identify peptide mimics of epitopes and EpiSearch mapping on the experimentally resolved 3D surface of allergens. However, the very limited number of co-crystal structures of allergens and antibody fragments (Fabs) has deterred more detailed understanding of the structural nature of IgE epitopes. Further, when IgE epitopes are mapped by the system described for Bla g 2, these require experimental validation to have a high degree of certainty that these epitopes are mapped correctly. Site-specific mutagenesis is the classical method to achieve this. However, for our model allergen system, Jun a 1 from mountain cedar pollen, expression of properly folded wild-type protein has not been achieved. We have now identified a method for expressing Jun a 1 in tobacco leaves, using a tobacco mosaic virus vector that targets the rJun a 1 to the periplasmic space, where it can be readily isolated in reasonable quantities.
Conclusions:
Understanding the characteristics of allergens is important for avoiding new potential allergens. Recent progress in the biochemical classification and three-dimensional structure determination of allergens and allergen–IgE complexes has enhanced our understanding of the molecular determinants of allergenicity. Our studies revealed common physicochemical and structural features of allergens
that can account for their induction of IgE antibody responses and allergic inflammation.
The Structural Database of Allergenic Proteins (SDAP) (http://fermi.utmb.edu/SDAP/) provides rapid, cross-referenced access to sequences, structures, and IgE epitopes of more than 900 allergens. SDAP designed to enable rapid sequence searches to aid in assessing the potential allergenic risk of new food products, is heavily used by the allergy community. SDAP contains now a broad array of bioinformatics and computational tools that: (1) can evaluate the overall sequence similarity to a known allergen based on FASTA alignments, (2) evaluate the WHO/FAO rules, (3) find regions identical or similar in sequence using the PD tool or PCPMer motifs (4) use 3D structural similarities of experimentally determined structures or 3D models to identify the amino acids that are important in IgE binding. In addition we assessed our software tools with experimental data from cedar pollen allergens and cockroach allergens.
Our studies are important to provide a solid scientific foundation in the general discussion on the potential risk of genetically modified (GM) foods. The statistical results and the novel bioinformatics tools can help the EPA to formulate more specific bioinformatics guidelines for companies that would like to bring new recombinant crops to the market place. Since food allergies can result in fatal reactions, the allergenic potential of genetically-engineered food products needs to be carefully assessed prior to their entry into the market. There is a vital need for faster and reliable methods to evaluate the potential allergenicity of proteins that have not previously been part of the food supply. Our novel approaches can reduce some uncertainty for those crops that may be potentially allergenic for some sensitive sub-population.
Journal Articles on this Report : 13 Displayed | Download in RIS Format
Other project views: | All 46 publications | 13 publications in selected types | All 13 journal articles |
---|
Type | Citation | ||
---|---|---|---|
|
Bonds RS, Midoro-Horiuti T, Goldblum R. A structural basis for food allergy: the role of cross-reactivity. Current Opinion in Allergy and Clinical Immunology 2008;8(1):82-86. |
R833137 (Final) |
Exit |
|
Fujimura T, Futamura N, Midoro-Horiuti T, Togawa A, Goldblum RM, Yasueda H, Saito A, Shinohara K, Masuda K, Kurata K, Sakaguchi M. Isolation and characterization of native Cry j 3 from Japanese cedar (Cryptomeria japonica) pollen. Allergy 2007;62(5):547-533. |
R833137 (Final) |
Exit Exit Exit |
|
Ivanciuc O, Schein CH, Garcia T, Oezguen N, Negi SS, Braun W. Structural analysis of linear and conformational epitopes of allergens. Regulatory Toxicology and Pharmacology 2009;54(3 Suppl):S11-S19. |
R833137 (2008) R833137 (Final) |
Exit Exit Exit |
|
Ivanciuc O, Garcia T, Torres M, Schein CH, Braun W. Characteristic motifs for families of allergenic proteins. Molecular Immunology 2009;46(4):559-568. |
R833137 (2008) R833137 (Final) |
Exit |
|
Ivanciuc O, Midoro-Horiuti T, Schein CH, Xie L, Hillman GR, Goldblum RM, Braun W. The property distance index PD predicts peptides that cross-react with IgE antibodies. Molecular Immunology 2009;46(5):873-883. |
R833137 (2008) R833137 (Final) |
Exit |
|
Liu Z, Bhattacharyya S, Ning B, Midoro-Horiuti T, Czerwinski EW, Goldblum RM, Mort A, Kearney CM. Plant-expressed recombinant mountain cedar allergen Jun a 1 is allergenic and has limited pectate lyase activity. International Archives of Allergy and Clinical Immunology 2010;153(4):347-358. |
R833137 (Final) |
Exit Exit |
|
Moehnke MH, Midoro-Horiuti T, Goldblum RM, Kearney CM. The expression of a mountain cedar allergen comparing plant-viral apoplastic and yeast expression systems. Biotechnology Letters 2008;30(7):1259-1264. |
R833137 (Final) |
Exit |
|
Negi SS, Braun W. Automated detection of conformational epitopes using phage display peptide sequences. Bioinformatics and Biology Insights 2009;3:71-81. |
R833137 (2008) R833137 (Final) |
Exit Exit |
|
Oezguen N, Zhou B, Negi SS, Ivanciuc O, Schein CH, Labesse G, Braun W. Comprehensive 3D-modeling of allergenic proteins and amino acid composition of potential conformational IgE epitopes. Molecular Immunology 2008;45(14):3740-3747. |
R833137 (2008) R833137 (Final) |
Exit |
|
Schein CH, Ivanciuc O, Braun W. Bioinformatics approaches to classifying allergens and predicting cross-reactivity. Immunology and Allergy Clinics of North America 2007;27(1):1-27. |
R833137 (2007) R833137 (Final) |
Exit |
|
Schein CH, Ivanciuc O, Midoro-Horiuti T, Goldblum RM, Braun W. An allergen portrait gallery: representative structures and an overview of IgE binding surfaces. Bioinformatics and Biology Insights 2010;4:113-125. |
R833137 (Final) R834066 (Final) |
Exit |
|
Tiwari R, Negi SS, Braun B, Braun W, Pomes A, Chapman MD, Goldblum RM, Midoro-Horiuti T. Validation of a phage display and computational algorithm by mapping a conformational epitope of Bla g 2. International Archives of Allergy and Immunology 2012;157(4):323-330. |
R833137 (Final) R834823 (2012) R834823 (2013) R834823 (Final) |
Exit Exit |
|
Varshney S, Goldblum RM, Kearney C, Watanabe M, Midoro-Horiuti T. Major mountain cedar allergen, Jun a 1 contains conformational as well as linear IgE epitopes. Molecular Immunology 2007;44(10):2781-2785. |
R833137 (Final) |
Exit |
Supplemental Keywords:
Health, Scientific Discipline, Health Risk Assessment, Risk Assessments, Allergens/Asthma, Biochemistry, Biology, food allergenicity, genetically engineered food, dietary proteins, human exposure, oral allergy syndrome, bioinformatics, data base development, allergic responseProgress and Final Reports:
Original AbstractThe perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.