Grantee Research Project Results

2005 Progress Report: Development of a Virulence Factor Biochip and its Validation for Microbial Risk Assessment in Drinking Water

EPA Grant Number: R831628
Title: Development of a Virulence Factor Biochip and its Validation for Microbial Risk Assessment in Drinking Water
Investigators: Rose, Joan B. , Whittam, Thomas S. , Gulari, Erdogan , Hashsham, Syed
Institution: Michigan State University , University of Michigan
EPA Project Officer: Packard, Benjamin H
Project Period: November 1, 2004 through October 31, 2007 (Extended to April 30, 2009)
Project Period Covered by this Report: November 1, 2004 through October 31, 2005
Project Amount: $600,000
RFA: Microbial Risk in Drinking Water (2003) RFA Text | Recipients Lists
Research Category: Drinking Water , Water

Objective:

The concept of using genetic databases for identifying microbial risks in water, coined as Virulence-Factor Activity Relationships (VFARs), was first developed by the Committee on Drinking Water Contaminants, National Research Council as an approach to screen microorganisms for their occurrence in water and/or their ability to cause harm and waterborne disease. We currently are developing a bioinformatics pilot program for the assessment of VFARs and have developed a biochip known as GeneScreen for the detection of Escherichia coli bacteria exploiting the sequence variability inherent in 16S and 23S rRNAs, spacer region, and virulence and functional genes that can provide identification (taxonomy) to the genus and species, as well as pathogenicity and potential risk.

In our original proposal, we had proposed the development and validation of a high-density biochip for some 15 groups or genera/species of targeted bacteria and viruses, some of which are on the Contaminant Candidate List and others that are important for the assessment of the microbial safety of water. We targeted traditional indicator organisms, pathogenic bacterial strains and viruses, as well as key virulence factors. Our specific goals were to: (1) select gene targets to encompass the microorganisms of interest to water safety; (2) design probes to uniquely identify each of the above microorganisms and provide reliable detection; (3) synthesize microfluidic biochips containing the above set of probes in replicate with positive and negative controls (biochip synthesis); (4) validate and field test the synthesized biochips using standard individual targets, appropriate target mixtures, and field samples (validation and field testing); and (5) undertake a pilot risk analysis of a water system(s), testing a variety of computational approaches for interpreting the results of the biochip (analysis).

In this report, we summarize the work done and results achieved thus far in three areas: (1) the design, construction, and validation of a bacterial indicator chip; (2) the design, construction, and validation of a viral microarray; and (3) the design of an Enterococcus indicator biochip.

Progress Summary:

Indicator Chip

We have designed an in situ synthesized indicator biochip containing 8,000 probes (45-mers). The probes target gene sequences obtained from the organisms shown in Table 1. An earlier version of the biochip contained only 16S rRNA genes from these organisms. However, evaluation of the chip with environmental samples containing fecal matter did not result in a human-specific marker. This was somewhat expected considering the poor resolution of 16S as a marker of host (Figure 1).

Table 1. List of Potential Indicator Organisms

Genus	*Functional and isr genes**		species	sequences	species name
Bacteroides		7	5	43	distasonis, forsythus, fragilis, fragsin, vulgatus
Bifidobacterium		2	13	41	adolescentis, angulatum, animalis, bivdum, breve, cuniculi, dentium, infantis, lactis, longum, magnum, pseudolongum
Butyrivibrio		1	1	12	fibrisolvens
Clostridium		9	3	36	beijerinckii, perfringens, tyrobutyricum
Enterococcus		10	20	70	aerogenes, avium, casseliflavus, cecorum, durans, columbae, pseudoavium, faeca, faecium, faeciumium, hirae, malodoratus, mundtii, pallens, raffinosus, ratti, saccharolyticus, solitarius, sulfureus, villorum
Escherichia		8	1	63	coli
Eubacterium		1	1	1	ramulus
Fusobacterium		1	10	29	Canifelinum, mortiferum, naviforme, necrogenes, nucleatum, periodonticum, russii, simiae, ulcerans, varium
Lactobacillus		5	22	77	acidophilus, amylovorus, casei, crispatus, curvatus, delbrueckii, fermentum, frumenti, gasseri, hamsteri, helveticus, johnsonii, paracasei, paraplantarum, pentosus, plantarum, reuteri, rhamnosus, ruminis, sakei, suntoryeus, zeae
Lactococcus		1	1	4	lactis
Ruminococcus		6	4	15	albus, flavefaciens, gnavus, hansenii
Salmonella		1	2	2	enterica, typhimurium
Shigella		4	3	10	boyd, flexneri, sonnei
Streptococcus		8	23	84	agalactiae, alactolyticus, anginosus, bovis, canis, constellatus, dysgalactiae, equinus, gordonii, intermedius, lutetiensis, macedonicus, mitis, mutans, oralis, parasanguinis, parasanguis, pneumoniae, porcinus, pyogenes, salivarius, suis, thermophilus
Total: 14 genera		64	109	487

* 16S-23S intergenic spacer region

Figure 1. Phylogenetic Tree (Sequence Relationship Among the 16S rRNA Genes) of Emerging Potential Indicator Organisms

Figure 1. Phylogenetic Tree (Sequence Relationship Among the 16S rRNA Genes) of Emerging Potential Indicator Organisms

The new chip contains genes that have higher potential to yield human-specific markers because it uses gene sequences related to specific functions. Bioinformatics tools to search for unique marker genes among the potential indicator organisms are not available at present. Hence, the gene sequences listed in Table 2 have been manually selected. We are continuously updating this list to include other genes of interest as markers. Our next step is to test the validity of the above chip using real world samples. By following an iterative procedure involving subtractive hybridization with respect to samples, we expect that the probes serving as human-specific markers can be identified. After hybridization and validation with environmental samples, we expect to obtain a few markers that are unique to human fecal matter and are always absent in other types of samples. An example of the hybridization signal with subtractive hybridization is shown in Figure 2. The actual hybridization signals will obviously be different from what is shown in Figure 2. The validation step will also be the most time consuming step in the overall scheme because it will require collection, processing, and hybridizing many samples followed by data analysis to extract unique signals. From the experiences gained with the 16S rRNA gene indicator chip, statistical tools have also emerged that can predict the reliability of detected signal based on replication and signal intensity.

Table 2. List of Potential Indicator Marker Genes

Gene name and No. of sequences*
Bacteroides		Bifidobacterium		Clostridium		Enterococcus		Lactobacillus		Ruminococcus		Streptococcus
bft	7	isr	19	cirA	2	VanD	7	acdA	1	albB	1	cylE	4
bspA	1	recA	22	cloA	1	ace	6	acdT	2	celA	2	gtfB	7
cfiA	17			cpa	6	as-48	3	curA	4	cesA	1	isr	41
cfxA	2			cpb	2	enlA	1	isr	63	endB	1	mutB	3
nanH	3			cpb2	7	entP	4	recA	7	recA	1	ply	4
pm	1			cpe	6	esp	4			rumA	9	recA	16
recA	12			etx	5	gelE	2					sda	3
				isr	5	hyl	1					var	6
				recA	2	isr	26
						recA	16

* Includes genes related to specific functions, marker genes, and 16S-23S intergenic spacer region

Figure 2. A Microfluidic Biochip Showing a Few Positive Spots That Are Unique to One Sample (in Green) Versus Many that are present in Other Types of Samples (Red)

Figure 2. A Microfluidic Biochip Showing a Few Positive Spots That Are Unique to One Sample (in Green) Versus Many that are present in Other Types of Samples (Red)

Virus Microarray

The virus microarray was designed to detect sequences from the group of viruses that are known or are suspected to cause diseases in drinking water. This group is comprised primarily of the enteroviruses but also includes several other viral families, for example, the hepatitis A virus, hepatitis E virus, sapovirus, Norwalk virus, rotavirus, etc. Table 3 shows the 23 major families of viral pathogens chosen for the chip, the reference genetic sequences used to perform the analysis (listed by accession number), and the sequence length. Wherever possible, complete viral genomes were chosen for probe design; however, in a few instances, such information was lacking and whatever sequence information available was used instead (e.g., the three human rotavirus families were represented by their viral protein 4 and 7 [VP4 and VP7, respectively], toroviruses were represented by the genes encoding hemagglutin-esterase and nucleocapside protein mRNA, picobirnaviruses were represented by their RNA-dependent RNA polymerase sequence and the first segment of an as-yet-unknown gene).

This approach of using representative genome sequences for probe design was adopted to provide as broad a chance of identifying a target virus from among these 23 target groups as possible, while at the same time being able to differentiate between these groups as specifically as possible.

Table 3. Virus Classes, Sequence Description, and Genbank Accession No.

Virus	Sequence description	Type of genome	Accession no.	Sequence length (bp)
Hepatitis A virus	complete genome	ssRNA positive no DNA stage	NC_001489	7478
Hepatitis E virus	complete genome	ssRNA positive no DNA stage	NC_001434	7176
Human adenovirus A	complete genome	dsDNA	NC_001460	34125
Human adenovirus B	complete genome	dsDNA	NC_004001	34794
Human adenovirus C	complete genome	dsDNA	NC_001405	35937
Human adenovirus D	complete genome	dsDNA	NC_002067	35100
Human adenovirus E	complete genome	dsDNA	NC_003266	35994
Human adenovirus F	complete genome	dsDNA	NC_001454	34214
Norwalk virus	complete genome	ssRNA positive no DNA state	NC_001959	7654
Sapovirus	complete genome	ssRNA positive no DNA stage	NC_010624	7458
Human enterovirus A	complete genome	ssRNA positive no DNA stage	NC_001612	7413
Human enterovirus B	complete genome	ssRNA positive no DNA stage	NC_001472	7389
Human enterovirus C	complete genome	ssRNA positive no DNA stage	NC_001428	7401
Human enterovirus D	complete genome	ssRNA positive no DNA stage	NC_001430	7390
Human enterovirus E	complete genome	ssRNA positive no DNA stage	NC_003988	7374
Poliovirus	complete genome	ssRNA positive no DNA stage	NC_002058	7440

Table 3. Virus Classes, Sequence Description, and Genbank Accession No. (Cont.)

Virus	Sequence description	Type of genome	Accession no.	Sequence length (bp)
	VP7		AB071404	1062
rotavirus B	VP4	dsRNA virus	AY539857	2306
	VP7	dsRNA virus	AY539856	814
rotavirus C	VP4	dsRNA virus	AB008670	2283
	VP7	dsRNA virus	AB008671	1063
coronavirus	complete genome	ssRNA positive no DNA stage	NC_002645	27317
cytomegalovirus (HH5)	complete genome	dsDNA virus	NC_006273	235645
torovirus	hemagglutinin-esterase	ssRNA positive no DNA stage	AF159585	1251
torovirus	Human torovirus nucleocapsid protein mRNA	ssRNA positive no DNA stage	AF024539	219
picobirnavirus	RNA dependent RNA pol	dsRNA virus	AF246940	1674
picobirnavirus	segment 1 unknown gene	dsRNA virus	AF246941	1572

Detection of Viral Pathogens in Water. Advantages to using a microarray approach versus conventional detection methods is that they can potentially be used to detect the gene sequences of a large number of viral targets in a single reaction. The use of multiple probes for a single viral target has the two-fold benefit of increasing specificity while reducing the likelihood that a mutation in the viral genome will result in false negatives. Unlike PCR, in which the specificity of detection is the result of the selective binding of primers to nucleic acid sequences followed by subsequent amplification, our approach in the viral microarray is to use random six-base nucleotide primers (random hexamers) to nonspecifically label the sample with amino allyl 2-deoxyuridine 5’-triphosphate, which then can be coupled to fluorescent dye molecules. These fluorescently labeled strands of nucleic acid then can be specifically detected using probes bound to the silica-based microarray.

Probe Design. Probes were designed using the OligoArray version 2.1 software available from http://berry.engin.umich.edu/oligoarray2_1/ Exit .

The probes were designed to conform to the following specifications:

Maximum Temperature: 75°C (except for torovirus: 80°C)
Minimum Temperature: 70°C (except for torovirus: 65°C)
Maximum GC: 60%
Minimum GC: 40%
Maximum temperature for secondary structure: 45°C
Maximum temperature for cross-hybridization: 45°C

Probes were designed from the positive strand of the genetic sequence. The local BLAST database against which the probes were compared is comprised of all the probe sequences in both positive- and negative-sense strands and also the sequences that showed a large degree of homology to nonspecific sequences as determined by a MEGABLAST search using the following criteria (database: nr; E value: 10; Wordsize: 11). This allowed the OligoArray software to filter out nonspecific gene sequences from the probes designed.

A total of 690 specific probes were designed targeting the 23 viral families (approximately 30 probes per viral family target). Generating multiple probes for each target family would enhance the reliability of detection.

Microarray Construction. An initial batch of three microarray chips was synthesized by the University of Michigan Engineering Machine Shop to specifications determined by Dr. Gulari. The microarray chip format used for the virus chip was a 68 by 119 array with a potential for containing a maximum of 8092 wells; 3054 wells were randomly populated with five copies of each of the 690 designed probes representing 42 percent of the chip capacity. Multiple copies of probes were used to provide technical replication of the signals.

Probes were synthesized in situ in an automated process similar to making oligonucleotides on a DNA synthesizer. The major difference in the process is the use of a photo-generated acid rather than an acid in the dimethoxytrityl (DMT) deprotection step to control the parallel synthesis. This deprotection is initiated by directing light at selected three-dimensional nano-chambers in microfluidic chips. In a synthesis cycle, upon light activation, acid forms in seconds, removing the DMT group. An incoming phosphoramidite nucleoside monomer then is coupled to the growing oligonucleotide chain. The synthesis cycle is repeated for each additional monomer until an array of thousands of oligonucleotides in a microfluidic chip is formed.

Arrays of oligonucleotides are made by in-situ coupling of DMT-protected nucleotide monomers at selected reaction sites according to the sequences of the oligonucleotides at each synthesis cycle. The process uses computer-generated light patterns to control a projection device (similar to a seminar presentation using a PowerPoint file), which in turn projects a light pattern onto the chip at each reaction cycle to create a specific chip reaction pattern. The localized light energy generates the photo-generated reagent allowing selective deprotection; only these deprotected sites couple with the incoming monomer.

These synthesis cycles are repeated to produce the desired oligonucleotide arrays. This digital photolithography process avoids the expensive and time-consuming photomasks used in conventional photolithographic processes and, more importantly, it enables flexibility and enhances efficiency for oligonucleotide array synthesis (Figure 4).

Figure 4. Light Beams From a UV-Vis Lamp is Controlled by a Microprocessor to Directed Portions of the Chip.

Figure 4. Light Beams From a UV-Vis Lamp is Controlled by a Microprocessor to Directed Portions of the Chip. Incidence of a light beam on a section of a chip causes the formation of acid ions that deprotect the site and allows probe elongation. Incremental addition of probe bases results in the synthesis of complete probes of desired length and base sequence.

Sample Processing. Viruses were extracted from cell culture supernatant by three quick cycles of freeze-thaw. The viruses were concentrated using an Amicon Ultra 100k™ ultrafiltration column (Millipore Inc., Billerica, MA) following manufacturer’s instructions. Viruses were extracted using the Ultrasens Viral Nucleic Extraction kit from Qiagen, which extracts both viral RNA and viral DNA.

Viral nucleic acid extracts then were divided in half to be processed for RNA and for DNA viruses. RNA was labeled using a modified BioPrime labeling protocol and The Institute for Genomic Research microarray protocol (http://pga.tigr.org/sop/M004_1a.pdf Exit ). Briefly, between 2-5 μg of template RNA is used to generate a first strand cDNA molecule using reverse transcriptase and incubation at 45°C. Reverse transcriptase uses RNA as a template and synthesizes a complementary strand of DNA (complementary DNA). The synthesized cDNA strand then is coupled to N-hydroxysuccinimide (NHS)-ester cyanine (Cy) dye available from Amersham Biosciences.

DNA targets were generated in a manner similar to that used for RNA targets. However, the large fragment of the DNA polymerase I enzyme (Klenow fragment) was used instead of reverse transcriptase to generate the modified DNA daughter strands (Figure 5). Less template is required for DNA targets (between 0.5 and 1 μg of template DNA) and incubation is carried out at 37°C as opposed to 45°C. The modified DNA daughter strand then is coupled to NHS-ester Cy dye. Different colored Cy dyes are used to differentially label RNA and DNA targets.

The labeled DNA and RNA targets then were hybridized together or separately to the microarray. They can then be detected using a microarray scanner where they will fluoresce at their own specific wavelengths.

Microarray Hybridization. Microarray hybridization is performed using a Xeotron™ microfluidic hybridization station. Hybridization is initially carried out at 20°C to allow the target DNA to bind to the probes on the chip for 16-18 hours. The array then is subsequently washed at 1°C incremental temperatures with a flow rate of 500 μl/minute for 1.4 minutes in the presence of hybridization wash buffer (10 mM Na₂HPO₄ 5 mM EDTA pH 6.6). The microarray is scanned between each wash cycle to generate a melting curve.

Microarray Scanning and Data Collection. Microarray scanning is carried out using the GenePix 4000B Microarray Scanner (Molecular Devices Inc., Sunnyvale, CA). The software used to analyze the scanned images was Genepix Pro 5.0. Data were collected and normalized using filters built into the software. Data then were graphed using the Microsoft Excel spreadsheet program.

Initial Results. Preliminary results were obtained for the hybridization of labeled poliovirus LSC-1 to the viral microarray. The 30 probes generated for poliovirus were found to be completely specific (30/30 hybridization). There was no significant cross hybridization with nonpoliovirus probes (0/660) throughout the melting curve temperature range (25°C - 60°C). Signal intensities for poliovirus specific probes were between 2.5 to 75 times greater than non-poliovirus probes within the 30°C - 48°C temperature range (Figure 6). Two additional repeat hybridizations generated similar results.

Figure 5. Experimental Steps for the Labeling of RNA Targets to be Hybridized to the Array.

Figure 5. Experimental Steps for the Labeling of DNA Targets to be Hybridized to the Array.

Figure 5. Top: Experimental Steps for the Labeling of RNA Targets to be Hybridized to the Array. Bottom: Experimental Steps for the Labeling of DNA Targets to be Hybridized to the Array.

Figure 6. Melt Curve for Poliovirus Hybridization Experiment.

Figure 6. Melt Curve for Poliovirus Hybridization Experiment. There was complete specificity of probes for poliovirus and between 2.5x and 75x signal intensity.

Enterococcus Indicator Microarray

The aim of this work is to develop a microarray that includes probes for Enterococcus sequences to characterize the occurrence of these bacteria in water. Enterococcus bacteria are members of the Group D Streptococcus and are characterized by their ability to grow at low and elevated temperatures (10°C and 45°C), at elevated pH (9.5), and in 6.5 percent NaCl. This group includes 27 species, of which E. faecalis and E. faecium are the most prevalent in water. Sources of enterococci include the feces of mammals and birds, and they also have been isolated from algae mats and plants.

A dataset has been developed that includes 147 Enterococcus DNA sequences for input to a microarray aimed at species and source identification of Enterococcus in water. These sequences include six genus-specific sequences and 141 sequences from 14 different Enterococcus species, the majority of sequences being from E. faecium (65 sequences) and E. faecalis (53 sequences). These sequences were drawn from the published literature and also from the National Center for Biotechnology Information database. The microarray dataset includes the source of the sequence and, where available, the link to the source and/or related publication. Sequences included are related to a variety of genes and functions, with a focus on bacterial identification, virulence and pathogenicity. Examples include sequences coding for antibiotic resistance, enterococcal surface protein (Esp), and putative pathogenicity islands.

The Esp Protein As A Target. A putative human-derived marker that is associated with the presence of fecal contamination from human sources has been identified (Scott, et al., 2005). The enterococcal surface protein, originally found in E. faecalis, has been associated with increased virulence in human infections. E. faecalis is a leading etiological agent of urinary tract infections and the Esp is shown to contribute to colonization and persistence of E. faecalis in the urinary tract. The presence of this marker sequence in a water sample thus indicates the contribution of a human source of contamination. The next steps in the development of this microarray are to design the probes from this sequence dataset and their placement on the microarray chip and then to perform microchip validation.

Future Activities:

The bacterial indicator microarray and the Enterococcus microarray both require further development and validation. The virus microarray has been partially validated using type strains of poliovirus LSC-1 and will be further validated using adenovirus type strain 41 and other virus type strains from American Type Culture Collection (ATCC) collections and from donated positive patient specimens. Validation of the virus microarray also will be carried out using environmental isolates. Thus far, we have selected gene targets to encompass the microorganisms of interest to water safety. Second, designing probes for the unique identification of each target microorganism has been undertaken for the virus microarray and shortly will be achieved for the bacterial indicator microarray and the enterococci. Thus far, three copies of the virus microarray have been synthesized and initial hybridization experiments have been conducted to begin testing the chip. The virus microarray has been tested against poliovirus LSC-1 and has been found to be very specific. A methodology for processing samples for analysis on the viral microarray also has been developed. The next set of experiments will begin the empirical testing of the virus microarray with sewage samples.

References:

Scott, et al. Potential use of a host associated molecular marker in Enterococcus faecium as an index of human fecal pollution. Environmental Science & Technology 2005;39:283-287.

Journal Articles:

No journal articles submitted with this report: View all 14 publications for this project

Supplemental Keywords:

RFA, Health, Scientific Discipline, PHYSICAL ASPECTS, Water, Ecosystem Protection/Environmental Exposure & Risk, Environmental Chemistry, Health Risk Assessment, Risk Assessments, Monitoring/Modeling, Environmental Monitoring, Physical Processes, Drinking Water, microbial contamination, monitoring, measurement , microbial risk assessment, biochip, microbiological organisms, detection, exposure and effects, virulence factor activity relationships, virulence factor biochip, bacteria monitoring, exposure, other - risk assessment, E. Coli, human exposure, microbial risk management, microorganism, measurement, assessment technology, drinking water contaminants, other - risk management

Progress and Final Reports:

Original Abstract

The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.