malva.xopen module

Adapted from xopen: Open compressed files transparently.

This modifies slightly so we can also use “bgzip” for faster reading of fastq.gz files

malva.xopen.xopen(filename, mode='r', compresslevel=None, threads=None, *, encoding='utf-8', errors=None, newline=None, format=None)[source]

A replacement for the “open” function that can also read and write compressed files transparently. The supported compression formats are gzip, bzip2, xz and zstandard. If the filename is ‘-’, standard output (mode ‘w’) or standard input (mode ‘r’) is returned. Filename can be a string or a file object. (See https://docs.python.org/3/glossary.html#term-file-object.)

When writing, the file format is chosen based on the file name extension: - .gz uses gzip compression - .bz2 uses bzip2 compression - .xz uses xz/lzma compression - .zst uses zstandard compression - otherwise, no compression is used

When reading, if a file name extension is available, the format is detected using it, but if not, the format is detected from the contents.

mode can be: ‘rt’, ‘rb’, ‘at’, ‘ab’, ‘wt’, or ‘wb’. Also, the ‘t’ can be omitted, so instead of ‘rt’, ‘wt’ and ‘at’, the abbreviations ‘r’, ‘w’ and ‘a’ can be used.

compresslevel is the compression level for writing to gzip, xz and zst files. This parameter is ignored for the other compression formats. If set to None, a default depending on the format is used: gzip: 6, xz: 6, zstd: 3.

When threads is None (the default), compressed file formats are read or written using a pipe to a subprocess running an external tool such as, pbzip2, gzip etc., see PipedGzipWriter, PipedGzipReader etc. If the external tool supports multiple threads, threads can be set to an int specifying the number of threads to use. If no external tool supporting the compression format is available, the file is opened calling the appropriate Python function (that is, no subprocess is spawned).

Set threads to 0 to force opening the file without using a subprocess.

encoding, errors and newline are used when opening a file in text mode. The parameters have the same meaning as in the built-in open function, except that the default encoding is always UTF-8 instead of the preferred locale encoding.

format overrides the autodetection of input and output formats. This can be useful when compressed output needs to be written to a file without an extension. Possible values are “gz”, “xz”, “bz2”, “zst”.

Return type:

IO