![]() ![]() Most existing gap-closing methods including Sealer, GapCloser, and GapFiller are based on the de Bruijn graph (DBG) approach. Notably, the insert size of paired-end or mate-pair reads carries genomic information that can be utilized to facilitate assembly. It is desirable to reconstruct, at least some, regions of repetitive elements by the SGS reads. Nevertheless, many genomes have been or are still sequenced, partially or fully, by Second Generation Sequencing (SGS) technology, due to its high accuracy, low cost, and wide availability. Third Generation Sequencing (TGS) technology is expected to resolve the repeat problem if its long reads can span the repeat regions. Particularly, it is very difficult to resolve the copy number of a tandem repeat whenever it occurs. Repetitive elements, as well as chimeric reads, cause ambiguities such as false alignments that disrupt the local assembly in gaps. The robust regression approach has a prospect to be incorporated into the layout module of long read assemblers.Ĭlosing gaps in draft genomes, as an important step in the de novo assembly pipeline, remains a challenge due to the ubiquitous repetitive elements in genomes. RegCloser is a competitive gap-closing tool. We also tested the robust regression approach on layout generation of long reads. Applying RegCloser to a plateau zokor draft genome that had been improved by long reads further increased contig N50 to 3-fold long. On both simulated and real datasets, RegCloser outperformed other popular methods in accurately resolving the copy number of tandem repeats, and achieved superior completeness and contiguity. The global optimum is obtained by iteratively solving the sparse system of linear equations. We solved the problem by a customized robust regression procedure that resists the influence of false overlaps by optimizing a convex global Huber loss function. Under this linear regression framework, the local DNA assembly becomes a robust parameter estimation problem. The optimal overlap is searched only in the restricted range consistent with insert sizes. ![]() It represents read coordinates and their overlaps respectively by parameters and observations in a linear regression model. We propose a novel local assembly approach to gap closing, called RegCloser. Besides, chimeric reads will cause erroneous k-mers in the former and false overlaps of reads in the latter. The ubiquitous genomic repeats are challenges to the existing gap-closing methods, based on either the k-mer representation by the de Bruijn graph or the overlap-layout-consensus paradigm. Closing gaps in draft genomes leads to more complete and continuous genome assemblies. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |