We present an experimental performance comparison of lossless compression programs for DNA raw data in FASTQ format files. General-purpose (PBZIP2, P7ZIP and PIGZ) and domain-specific compressors (SCALCE, QUIP, FASTQZ and DSRC) were analyzed in terms of compression ratio, execution speed, parallel scalability and memory consumption. Results showed that domain-specific tools increased the compression ratios up to 70 %, while reducing the runtime of general-purpose tools up to 7 × during compression and up to 3 × during decompression. Parallelism scaled performance up to 13 × when using 20 threads. Our analysis indicates that QUIP, DSRC and PBZIP2 are the best tools in their respective categories, with acceptable memory requirements. Nevertheless, the end user must consider the features of available hardware and define the priorities among its optimization objectives (compression ratio, runtime during compression or decompression, scalability, etc.) to properly select the best application for each particular scenario.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Information Systems
- Hardware and Architecture