TY - JOUR
T1 - Tackling the challenges of FASTQ referential compression
AU - Guerra, Aníbal
AU - Lotero, Jaime
AU - Aedo, José Édinson
AU - Isaza, Sebastián
N1 - Funding Information:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Administrative Department of Science, Technology and Innovation of Colombia (COLCIENCIAS), Call 757, Grant BEC17-2-27, and by the University of Antioquia through multiple CODI instruments.
Funding Information:
FundIng: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Administrative Department of Science, Technology and Innovation of Colombia (COLCIENCIAS), Call 757, Grant BEC17-2-27, and by the University of Antioquia through multiple CODI instruments.
Publisher Copyright:
© The Author(s) 2019.
PY - 2019
Y1 - 2019
N2 - The exponential growth of genomic data has recently motivated the development of compression algorithms to tackle the storage capacity limitations in bioinformatics centers. Referential compressors could theoretically achieve a much higher compression than their non-referential counterparts; however, the latest tools have not been able to harness such potential yet. To reach such goal, an efficient encoding model to represent the differences between the input and the reference is needed. In this article, we introduce a novel approach for referential compression of FASTQ files. The core of our compression scheme consists of a referential compressor based on the combination of local alignments with binary encoding optimized for long reads. Here we present the algorithms and performance tests developed for our reads compression algorithm, named UdeACompress. Our compressor achieved the best results when compressing long reads and competitive compression ratios for shorter reads when compared to the best programs in the state of the art. As an added value, it also showed reasonable execution times and memory consumption, in comparison with similar tools.
AB - The exponential growth of genomic data has recently motivated the development of compression algorithms to tackle the storage capacity limitations in bioinformatics centers. Referential compressors could theoretically achieve a much higher compression than their non-referential counterparts; however, the latest tools have not been able to harness such potential yet. To reach such goal, an efficient encoding model to represent the differences between the input and the reference is needed. In this article, we introduce a novel approach for referential compression of FASTQ files. The core of our compression scheme consists of a referential compressor based on the combination of local alignments with binary encoding optimized for long reads. Here we present the algorithms and performance tests developed for our reads compression algorithm, named UdeACompress. Our compressor achieved the best results when compressing long reads and competitive compression ratios for shorter reads when compared to the best programs in the state of the art. As an added value, it also showed reasonable execution times and memory consumption, in comparison with similar tools.
UR - http://www.scopus.com/inward/record.url?scp=85069471444&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069471444&partnerID=8YFLogxK
U2 - 10.1177/1177932218821373
DO - 10.1177/1177932218821373
M3 - Research Article
AN - SCOPUS:85069471444
SN - 1177-9322
VL - 13
JO - Bioinformatics and Biology Insights
JF - Bioinformatics and Biology Insights
ER -