public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Phillip Lougher <phillip@lougher.demon.co.uk>
To: linux-kernel@vger.kernel.org
Cc: pmarques@grupopie.com
Subject: Re: Compression filter for Loopback device
Date: Fri, 23 Jul 2004 19:20:29 +0100	[thread overview]
Message-ID: <410156ED.40102@lougher.demon.co.uk> (raw)

On Thu, 2004-07-23 Paulo Marques wrote:
 >
 >I did start working on something like that a while ago. I even
 >registered for a project on sourceforge:
 >
 >http://sourceforge.net/projects/zloop/
 >
 >    - The block device doesn't understand anything about files. This is
 >an advantage because it will compress the filesystem metadata
 >transparently, but it is bad because it compresses "unused" blocks of
 >data. This could probably be avoided with a patch I saw floating around
 >a while ago that zero'ed delete ext2 files. Zero'ed blocks didn't accupy
 >any space at all in my compressed scheme, only metadata (only 2 bytes
 >per block).
 >

The fact the block device doesn't understand anything about the 
filesystem is a *major* disadvantage.  Cloop has a I/O and seeking 
performance hit because it doesn't understand the filesystem, and this 
will be far worse for write compression.  Every time a block update is 
seen by your block layer you'll have to recompress the block, it is 
going to be difficult to cache the block because you're below the block 
cache (any writes you see shouldn't be cached).  If you use a larger 
compressed block size than the block size, you'll also have to 
decompress each compressed block to obtain the missing data to 
recompress.  Obviously Linux I/O scheduling has a large part to play, 
and you better hope to see bursts of writes to consecutive disk blocks.

 >I did a proof of concept using a nbd server. This way I could test
 >everything in user space.
 >
 >With this NBD server I tested the compression ratios that my scheme
 >could achieve, and they were much better than those achieved by cramfs,
 >and close to tar.gz ratios. This I wasn't expecting, but it was a nice
 >surprise :)

I'm very surprised you got ratios better than CramFS, which were close 
to tar.gz.  Cramfs is actually quite efficient in it's use of metadata, 
what lets cramfs down is that it compresses in units of the page size or 
4K blocks.  Cloop/Squashfs/tar.gz use much larger blocks which obtain 
much better compression ratios.

What size blocks did you do your compression and/or what compression 
algorithm did you use?  There is a dramatic performance trade-off here. 
If you used larger than 4K blocks every time your compressing block 
device is presented with a (probably 4K) block update, you need to 
decompress your larger compression block, very slow.  If you used 4K 
blocks then I cannot see how you obtained better compression than cramfs.

Phillip


             reply	other threads:[~2004-07-23 19:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-23 18:20 Phillip Lougher [this message]
2004-07-26 12:38 ` Compression filter for Loopback device Paulo Marques
  -- strict thread matches above, loose matches on Subject: below --
2004-07-26 12:48 Lei Yang
2004-08-27 10:34 ` BAIN
2004-08-27 10:38   ` BAIN
     [not found]     ` <412F3210.3030506@nec-labs.com>
2004-08-28  7:57       ` BAIN
2004-07-22 19:27 Lei Yang
2004-07-22 19:44 ` Luiz Fernando N. Capitulino
2004-07-23 11:16 ` Paulo Marques

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=410156ED.40102@lougher.demon.co.uk \
    --to=phillip@lougher.demon.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmarques@grupopie.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox