I am working to sequence negative strand RNA viral genomes de novo (i.e. not the specific-primer based approach that works so very well) and am having trouble with host contamination. In fact I began the project expecting data with ~90% host sequence, but now have ~99.5% host and less than 0.5% target sequence! At this rate a full lane of Illumina sequencing is not enough to provide a good draft sequence for a 20Kb genome.
Current protocol (in brief):
1. Virus is grown up in Vero E6 cells and the Trizol cell lysate shipped here
2. Chloroform extraction of the Trizol
3. cDNA synthesis from total RNA
4. Illumina/454 library prep (adaptors added, etc) and sequencing
I've thought about shearing up some monkey (the cell line is Vero E6) DNA, binding it to streptavidin beads. Using those beads to pull out the host DNA and only synthesizing cDNA from the unbound nucleic acid. After speaking with some co-workers (and reps at Invitrogen) this seems unlikely to work well.
Currently I am leaning towards running the RNA extracts on an agarose gel (containing formamide or urea of course) and cutting out the largest portion...assuming that the viral genomic RNA is larger than the host mRNA/rRNA and unsheared.
Does this sounds like a viable path forward or does anyone have a better suggestion? Thanks in advance!