Is DNA sequencing coverage a function of sample purity?

Is DNA sequencing coverage a function of sample purity?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

How is coverage affected by the purity of a sample? And can coverage for a sample be affected by other things, like the library preparation or manner in which the sample was stored?

So factors which go into coverage in short-read sequencing (and Sanger sequencing).

  • Depth is the obvious one. All else being equal you get a bang-on Poisson distribution of coverage over a genome, which is why more sequencing = more coverage.

  • The human genome and transcriptome is not all equal. Illumina sequencing is ultimately (most protocols) dependent on PCR amplification which means that any factors which influence PCR-amplification ability will also effect sequencing depth. The two obvious ones in this respect are: 1) Regions of extreme GC content and 2) Regions containing secondary structures (RNA mainly).

  • Alignment is the next major source of error for regions being 'dropped off' depending on how your aligner deals with multi-mapping reads. Some genomes have very young transposable elements which are highly similar to one another and very abundant. For most aligners that means that you can get either huge coverage or really small coverage depending on how the genome is assembled.

  • For low-input sequencing techniques you get major sources of DNA/RNA contamination from the enzymes, reagents and preparation of the samples. This can largely be minimized by having 'clean' samples and using large amounts of input DNA/RNA.

Depending on the sequences you're studying read-depth can be the only factor you consider or if you have difficult regions or specific questions you can deal with a whole swath of problems.

So in answer to your question: yes DNA quality will directly effect coverage. Sometimes you can 'just sequence more' to get over this problem, sometimes you need to go back to the drawing board.

The largest contributor to sequencing coverage (on a Hiseq, I assume) is how many reads you get, which mostly depends on what percentage of a flow cell you give it. Obviously the rest that you mentioned contributes, and purity might be a big issue if you are looking for a subset of cells you were trying to isolate out of a larger pool, but the easiest way to get more coverage is to hog more real estate on the flow cell.