All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paulo Marques <pmarques@grupopie.com>
To: Guillaume@Lacote.name
Cc: "Jörn Engel" <joern@wohnheim.fh-wedel.de>,
	linux-kernel@vger.kernel.org, Linux@glacote.com
Subject: Re: Using compression before encryption in device-mapper
Date: Wed, 14 Apr 2004 13:44:33 +0100	[thread overview]
Message-ID: <407D3231.2080605@grupopie.com> (raw)
In-Reply-To: 200404141202.07021.Guillaume@Lacote.name

Guillaume Lacôte wrote:

>>>Oops ! I thought it was possible to guarantee with the Huffman encoding
>>>(which is more basic than Lempev-Zif) that the compressed data use no
>>>more than 1 bit for every byte (i.e. 12,5% more space).

WTF??

Zlib gives a maximum increase of 0.1%  + 12 bytes (from the zlib manual), which 
for a 512 block will be a 2.4% guaranteed increase.

I think that zlib already does the "if this is bigger than original, just mark 
the block type as uncompressed" algorithm internally, so the increase is minimal 
in the worst case.

A while ago I started working on a proof of concept kind of thing, that was a 
network block device server that compressed the data sent to it.

I think that if we want to go ahead with this, we really should make the extra 
effort to have actual compression, and use the extra space.

 From my experience it is possible to get "near tar.gz" compression ratios on a 
scheme like this.

Biggest problems:

1 - We really need to have an extra layer of metadata. This is the worse part. 
Not only makes the thing much more complex, but it brings new problems, like 
making sure that the order the data is written to disk is transparent to the 
upper layers and won't wreck the journal on a journaling file system.

2 - The compression layer should report a large block size upwards, and use a 
little block size downwards, so that compression is as efficient as possible. 
Good results are obtained with a 32kB / 512 byte ratio. This can cause extra 
read-modify-write cycles upwards.

3 - If we use a fixed size partition to store compressed data, the apparent 
uncompressed size of the block device would have to change. Filesystems aren't 
prepared to have a block device that changes size on-the-fly. If we can solve 
this problem, then this compression layer could be really useful. Otherwise it 
can only be used over loopback on files that can grow in size (this could still 
be useful this way).

4 - The block device has no knowledge of what blocks are actually being used by 
the filesystem above, so it has to compress and store blocks that are actually 
"unused". This is not an actual problem, is just that it could be a little more 
efficient if it could ignore unused blocks.

When I did the tests I mounted an ext2 filesystem over the network block device. 
At least with ext2, the requests were gathered so that the server would often 
receive requests of 128kB, so the big block size problem is not too serious 
(performance will be bad anyway, this is a clear space/speed trade-off). This 
was kernel 2.4. I don't know enough about the kernel internals to know which 
layer is responsible for this "gathering".

On the up side, having an extra metadata layer already provides the "is not 
compressed" bit, so that we never need more than a block of disk to store one 
block of uncompressed data.

Anyway, I really think that if there is no solution for problem 3, there is 
little point in pushing this forward.

For a "better encryption only" scheme, we could use a much simpler scheme, like 
having a number of reserved blocks on the start of the block device to hold a 
bitmap of all the blocks. On this bitmap a 1 means that the block is 
uncompressed, so that if, after compression, the block is bigger than the 
original we can store it uncompressed.

-- 
Paulo Marques - www.grupopie.com
"In a world without walls and fences who needs windows and gates?"


  parent reply	other threads:[~2004-04-14 13:30 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-04-13 15:44 Using compression before encryption in device-mapper Guillaume Lacôte
2004-04-13 16:57 ` Timothy Miller
2004-04-14  6:48   ` Guillaume Lacôte
2004-04-13 17:45 ` Jörn Engel
2004-04-13 19:42   ` Ville Herva
2004-04-14  6:54   ` Guillaume Lacôte
2004-04-14  9:43     ` Jörn Engel
2004-04-14 10:02       ` Guillaume Lacôte
2004-04-14 11:25         ` Jörn Engel
2004-04-14 12:44         ` Paulo Marques [this message]
2004-04-14 13:34           ` Jörn Engel
2004-04-14 13:58           ` maccorin
2004-04-14 14:02           ` Guillaume Lacôte
2004-04-14 14:39             ` Grzegorz Kulewski
2004-04-14 15:07               ` Guillaume Lacôte
2004-04-14 16:14                 ` Grzegorz Kulewski
2004-04-14 15:23             ` Paulo Marques
2004-04-14 15:32               ` Guillaume Lacôte
2004-04-14 17:25           ` Bill Davidsen
2004-04-15  9:28 ` Jörn Engel
2004-04-22  7:59   ` Guillaume Lacôte
2004-04-22  9:18     ` Jörn Engel
2004-04-22 10:20       ` Guillaume Lacôte
2004-04-22 12:15         ` Jörn Engel
2004-04-22 13:06           ` Guillaume Lacôte
2004-04-22 16:00             ` Jörn Engel
2004-04-23 15:16               ` Guillaume Lacôte
2004-04-23 16:57                 ` Jörn Engel
     [not found] <1KykU-4VD-17@gated-at.bofh.it>
     [not found] ` <1KPvh-26S-7@gated-at.bofh.it>
     [not found]   ` <1KSMw-4P1-13@gated-at.bofh.it>
     [not found]     ` <1KTfJ-5gK-25@gated-at.bofh.it>
2004-04-14 15:02       ` Pascal Schmidt
2004-04-14 15:25         ` Guillaume Lacôte
2004-04-14 19:29           ` Pascal Schmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=407D3231.2080605@grupopie.com \
    --to=pmarques@grupopie.com \
    --cc=Guillaume@Lacote.name \
    --cc=Linux@glacote.com \
    --cc=joern@wohnheim.fh-wedel.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.