Performance comparison of sequential and parallel compression applications for DNA raw data

Aníbal Guerra, Jaime Lotero, Sebastián Isaza

Producción científica: Contribución a una revistaArtículo de Investigaciónrevisión exhaustiva

8 Citas (Scopus)

Resumen

We present an experimental performance comparison of lossless compression programs for DNA raw data in FASTQ format files. General-purpose (PBZIP2, P7ZIP and PIGZ) and domain-specific compressors (SCALCE, QUIP, FASTQZ and DSRC) were analyzed in terms of compression ratio, execution speed, parallel scalability and memory consumption. Results showed that domain-specific tools increased the compression ratios up to 70 %, while reducing the runtime of general-purpose tools up to 7 × during compression and up to 3 × during decompression. Parallelism scaled performance up to 13 × when using 20 threads. Our analysis indicates that QUIP, DSRC and PBZIP2 are the best tools in their respective categories, with acceptable memory requirements. Nevertheless, the end user must consider the features of available hardware and define the priorities among its optimization objectives (compression ratio, runtime during compression or decompression, scalability, etc.) to properly select the best application for each particular scenario.

Idioma originalInglés estadounidense
Páginas (desde-hasta)4696-4717
Número de páginas22
PublicaciónJournal of Supercomputing
Volumen72
N.º12
DOI
EstadoPublicada - dic. 1 2016
Publicado de forma externa

Áreas temáticas de ASJC Scopus

  • Software
  • Ciencia computacional teórica
  • Sistemas de información
  • Hardware y arquitectura

Huella

Profundice en los temas de investigación de 'Performance comparison of sequential and parallel compression applications for DNA raw data'. En conjunto forman una huella única.

Citar esto