Re: kernel decompressor interface

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "H. Peter Anvin" <hpa@zytor.com>
To: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: Ferenc Wagner <wferi@niif.hu>, Alain Knaff <alain@knaff.lu>,
	linux-kernel@vger.kernel.org
Subject: Re: kernel decompressor interface
Date: Wed, 31 Mar 2010 10:45:10 -0700	[thread overview]
Message-ID: <4BB38A26.4070903@zytor.com> (raw)
In-Reply-To: <4BB29675.9050507@lougher.demon.co.uk>

On 03/30/2010 05:25 PM, Phillip Lougher wrote:
> Ferenc Wagner wrote:
>> Hi,
>>
>> While  working with SquashFS code recently, I got the impression that the
>> current decompress_fn interface isn't best suited for general use: it
>> rules out real scatter/gather operation, which -- one hopes -- is a
>> general feature of stream decompressors.  For example, if one has to
>> decompress data from a series of buffer_heads into a bunch of (cache)
>> pages (typical operation in compressed file systems), the inflate
>> interface in zlib.h provides the possibility of changing input and
>> output buffer addresses, but decompress_fn does not, necessitating extra
>> memory copying.  On the other hand, the latter is admittedly simpler.
> 
> The decompress_fn interface is rather limited, however, it must
> be borne in mind that it was adequate for the original intended
> users (initramfs/initrd decompression). Squashfs (and other filesystems) on
> the other hand can certainly make use of a much better multi-call interface.
> My strategy in adding LZMA support to Squashfs has been to get an implementation
> using the current interface mainlined, and one this has been done to look at
> improving the decompress_fn interface.

Well, it's adequate for the *current form* of initramfs decompression,
which is rather crippled: we fail to progressively free the memory used,
simply because we have no way to track it.

This is, in my opinion, a major shortcoming of the current implementation.

> LZMA decompressors have a quirk in that they use the output buffer
> as the history buffer (e.g. look for peek_old_byte() in decompress_unlzma.c).
> This means any multi-call interface such as zlib which modifies the output
> buffer pointer dynamically (without allowing the decompressor to look back at
> previously passed in buffers) won't work.   A multi-call interface that
> passes the output buffers in an iovec style array should work though
> (incidentally this is why Squashfs passes the output buffers as an array
> to the decompressor wrapper even though LZMA cannot as yet make use of it)

inflate has exactly the same behavior, except for the fact that the
standard zlib implementation maintains this state internally instead of
relying on being able to peek in the output buffer.  It's thus not an
inherent property of the compression algorithm.

The requirement that the output can't be processed incrementally is
another major disadvantage, which I'm not sure how to address (LZMA
requires insane amounts of memory if you don't let it use its output as
its look-behind buffer, which means that either for small or large
outputs we're wasting tons of memory -- in the former case with a
separate buffer and in the latter case with a "decompress all at once"
buffer.)

	-hpa

next prev parent reply	other threads:[~2010-03-31 17:49 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <877hotu6x9.fsf@tac.ki.iif.hu>
2010-03-30 17:45 ` kernel decompressor interface H. Peter Anvin
2010-03-31  0:25 ` Phillip Lougher
2010-03-31 17:45   ` H. Peter Anvin [this message]
2010-04-01 12:11     ` Ferenc Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BB38A26.4070903@zytor.com \
    --to=hpa@zytor.com \
    --cc=alain@knaff.lu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phillip@lougher.demon.co.uk \
    --cc=wferi@niif.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox