public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Compression filter for Loopback device
@ 2004-07-23 18:20 Phillip Lougher
  2004-07-26 12:38 ` Paulo Marques
  0 siblings, 1 reply; 9+ messages in thread
From: Phillip Lougher @ 2004-07-23 18:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: pmarques

On Thu, 2004-07-23 Paulo Marques wrote:
 >
 >I did start working on something like that a while ago. I even
 >registered for a project on sourceforge:
 >
 >http://sourceforge.net/projects/zloop/
 >
 >    - The block device doesn't understand anything about files. This is
 >an advantage because it will compress the filesystem metadata
 >transparently, but it is bad because it compresses "unused" blocks of
 >data. This could probably be avoided with a patch I saw floating around
 >a while ago that zero'ed delete ext2 files. Zero'ed blocks didn't accupy
 >any space at all in my compressed scheme, only metadata (only 2 bytes
 >per block).
 >

The fact the block device doesn't understand anything about the 
filesystem is a *major* disadvantage.  Cloop has a I/O and seeking 
performance hit because it doesn't understand the filesystem, and this 
will be far worse for write compression.  Every time a block update is 
seen by your block layer you'll have to recompress the block, it is 
going to be difficult to cache the block because you're below the block 
cache (any writes you see shouldn't be cached).  If you use a larger 
compressed block size than the block size, you'll also have to 
decompress each compressed block to obtain the missing data to 
recompress.  Obviously Linux I/O scheduling has a large part to play, 
and you better hope to see bursts of writes to consecutive disk blocks.

 >I did a proof of concept using a nbd server. This way I could test
 >everything in user space.
 >
 >With this NBD server I tested the compression ratios that my scheme
 >could achieve, and they were much better than those achieved by cramfs,
 >and close to tar.gz ratios. This I wasn't expecting, but it was a nice
 >surprise :)

I'm very surprised you got ratios better than CramFS, which were close 
to tar.gz.  Cramfs is actually quite efficient in it's use of metadata, 
what lets cramfs down is that it compresses in units of the page size or 
4K blocks.  Cloop/Squashfs/tar.gz use much larger blocks which obtain 
much better compression ratios.

What size blocks did you do your compression and/or what compression 
algorithm did you use?  There is a dramatic performance trade-off here. 
If you used larger than 4K blocks every time your compressing block 
device is presented with a (probably 4K) block update, you need to 
decompress your larger compression block, very slow.  If you used 4K 
blocks then I cannot see how you obtained better compression than cramfs.

Phillip


^ permalink raw reply	[flat|nested] 9+ messages in thread
* RE: Compression filter for Loopback device
@ 2004-07-26 12:48 Lei Yang
  2004-08-27 10:34 ` BAIN
  0 siblings, 1 reply; 9+ messages in thread
From: Lei Yang @ 2004-07-26 12:48 UTC (permalink / raw)
  To: pmarques, Phillip Lougher; +Cc: linux-kernel

Hmm, I am a bit surprised to see this...
Since I am the one who posted the question, could anyone pls give me
some clue of what is going on? Or something like a summary. Many many
thanks!!

Lei

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paulo Marques
Sent: Monday, July 26, 2004 8:38 AM
To: Phillip Lougher
Cc: linux-kernel@vger.kernel.org
Subject: Re: Compression filter for Loopback device

On Fri, 2004-07-23 at 19:20, Phillip Lougher wrote:
> On Thu, 2004-07-23 Paulo Marques wrote:
>  >
>  >I did start working on something like that a while ago. I even
>  >registered for a project on sourceforge:
>  >
>  >http://sourceforge.net/projects/zloop/
>  >
>  >    - The block device doesn't understand anything about files. This
is
>  >an advantage because it will compress the filesystem metadata
>  >transparently, but it is bad because it compresses "unused" blocks
of
>  >data. This could probably be avoided with a patch I saw floating
around
>  >a while ago that zero'ed delete ext2 files. Zero'ed blocks didn't
accupy
>  >any space at all in my compressed scheme, only metadata (only 2
bytes
>  >per block).
>  >
> 
> The fact the block device doesn't understand anything about the 
> filesystem is a *major* disadvantage.  Cloop has a I/O and seeking 
> performance hit because it doesn't understand the filesystem, and this

> will be far worse for write compression.  Every time a block update is

> seen by your block layer you'll have to recompress the block, it is 
> going to be difficult to cache the block because you're below the
block 
> cache (any writes you see shouldn't be cached).  If you use a larger 
> compressed block size than the block size, you'll also have to 
> decompress each compressed block to obtain the missing data to 
> recompress.  Obviously Linux I/O scheduling has a large part to play, 
> and you better hope to see bursts of writes to consecutive disk
blocks.

Yes, I agree it is a major disadvantage. That is way I listed this as
one of the reasons to drop the project altogether :)

Anyway, my main concern was compression ratio, not performance. 

Seek times are very bad for live CD distros, but are not so bad for
flash or ram media. 

>  >I did a proof of concept using a nbd server. This way I could test
>  >everything in user space.
>  >
>  >With this NBD server I tested the compression ratios that my scheme
>  >could achieve, and they were much better than those achieved by
cramfs,
>  >and close to tar.gz ratios. This I wasn't expecting, but it was a
nice
>  >surprise :)
> 
> I'm very surprised you got ratios better than CramFS, which were close

> to tar.gz.  Cramfs is actually quite efficient in it's use of
metadata, 
> what lets cramfs down is that it compresses in units of the page size
or 
> 4K blocks.  Cloop/Squashfs/tar.gz use much larger blocks which obtain 
> much better compression ratios.
> 
> What size blocks did you do your compression and/or what compression 
> algorithm did you use?  There is a dramatic performance trade-off
here. 
> If you used larger than 4K blocks every time your compressing block 
> device is presented with a (probably 4K) block update, you need to 
> decompress your larger compression block, very slow.  If you used 4K 
> blocks then I cannot see how you obtained better compression than
cramfs.

You are absolutely correct. I was using 32k block size, with 512 byte
"sector size". A 32k block would have to compress into an integer number
of 512 byte sectors. Most of my wasted space comes from this, but I was
assuming that this would have to work over a real block device, so I
tried as much as possible to make every read/write request to the
underlying file to be "512-byte block"-aligned.

The compression algorithm was simply the standard zlib deflate.

But has I said before, my major concern was compression ratio. 

I left the block size selectable on mk.zloop, so that I could test
several block sizes and measure compress ratio / performance. 

>From what I remember, 4k block sizes really hurt compression ratio. 32k
was almost as good as 128k or higher.

What I would really like to know is if anyone has real world
applications for a compression scheme like this, or is this just a waste
of time...

-- 
Paulo Marques - www.grupopie.com
"In a world without walls and fences who needs windows and gates?"

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread
* Compression filter for Loopback device
@ 2004-07-22 19:27 Lei Yang
  2004-07-22 19:44 ` Luiz Fernando N. Capitulino
  2004-07-23 11:16 ` Paulo Marques
  0 siblings, 2 replies; 9+ messages in thread
From: Lei Yang @ 2004-07-22 19:27 UTC (permalink / raw)
  To: linux-kernel

Hi all,

Is there anything like 'losetup' that allows choosing encryption
algorithm for a loopback device that can be used on compression
algorithms? Or in other words, when the data passes through loopback
device to a real storage device, it can be filtered and the filter is
compression instead of encryption; when kernel ask for data in a storage
device that is mounted to a loopback device with compression, it will be
filtered again -- decompressed.

If there is not a ready-to-use method for this, is there any way I can
implement the idea?

TIA! Really appreciate any comments.

Lei

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-08-28  7:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-23 18:20 Compression filter for Loopback device Phillip Lougher
2004-07-26 12:38 ` Paulo Marques
  -- strict thread matches above, loose matches on Subject: below --
2004-07-26 12:48 Lei Yang
2004-08-27 10:34 ` BAIN
2004-08-27 10:38   ` BAIN
     [not found]     ` <412F3210.3030506@nec-labs.com>
2004-08-28  7:57       ` BAIN
2004-07-22 19:27 Lei Yang
2004-07-22 19:44 ` Luiz Fernando N. Capitulino
2004-07-23 11:16 ` Paulo Marques

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox