Wouldn't it be wonderful if you knew exactly what measures you could take to stave off, or even prevent, the onset of disease? Wouldn't it be a relief to know that you are not allergic to the drugs your doctor just prescribed? Wouldn't it be a comfort to know that the treatment regimen you are undergoing has a good chance of success because it was designed just for you? With the availability of millions of single nucleotide polymophisms (SNPs), biomedical researchers now believe that such exciting medical advances are not that far away.
With the advance of the next generation sequencing (NGS) techniques, scientists can detect much more single nucleotide polymophisms (SNPs) using short reads than the first generation techniques can. One major disadvantage of the NGS is that it can produce high sequence error. Controlling for the false positive rate due to the sequence error while maintaining high power is an essential issue in detecting SNPs.
The existing methods for detecting SNPs usually arbitrarily specifies a threshold value, and a suspicious nucleotide locus is claimed to be a SNP if a defined score exceeds the threshold value. Such methods cannot control for the false positive rate at a nominal level.
In this talk, I will introduce a rigorous statistical method for detecting SNPs using short reads, which accounts for the dependence of the sequence errors. A simulation study shows that the proposed method can well control for the false positive rate and has satisfying power.