[RFC] Breaking data compatibility with userspace bzlib

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] Breaking data compatibility with userspace bzlib
@ 2003-06-20 18:59 Jörn Engel
  2003-06-20 19:09 ` Jeff Garzik
  2003-06-20 19:45 ` [RFC] Breaking data compatibility with userspace bzlib David S. Miller
  0 siblings, 2 replies; 13+ messages in thread
From: Jörn Engel @ 2003-06-20 18:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: jmorris, davem, David Woodhouse

Hi!

I'm almost finished with adding bzlib support to the kernel.  You
could also say, I'm already 150% finished, as there is only cleanup
left to do.  Since bzlib is new to the kernel, I don't see a point in
keeping the uglies in the userspace interface for the kernel as well
and have already killed most of them.

However, one of the uglies, blockSize100k, is also part of the data
format as well, being one field in the header.  So now I have to
decide whether to kill this wart or to keep it for compatibility.

The whole interface of the bzlib was modelled after the zlib
interface.  blockSize100k is supposed to imitate the zlib compression
levels, the valid range is from 1 to 9 as well.  The semantic is that
a block of 100k * blockSize100k is compressed at a time.

Now, the cost of the underlying BWT is O(n) in memory and O(n*ln(n))
in time.  That given, I consider it odd to use a linear semantic of
blockSizeXXXX and would prefer an exponential one, as the zlib uses
here and there.  Thus blockSizeBits would now give the blockSize as
1 << blockSizeBits, allowing to go well below 100k, resulting in lower
memory consumption for some and well above 900k, giving better
compression ratios.

Long intro, short question: Jay O Nay?

The change would make bzlib much more useful for jffs2, assuming it is
useful for jffs2 at all.  But if any kernel bzlib users have to
interface with a userspace bzlib, things will break.  That might be a
good reason to postpone the change for a while and rename the whole
thing when it gets done,,,, so my personal tendency is Nay.

Jörn

PS: Kept the surplus commata as a personal reminder that 2.5.72 still
has problems with my keyboard.  Should check Andrews Must-Fix List and
add this one it not yet present.

-- 
The competent programmer is fully aware of the strictly limited size of
his own skull; therefore he approaches the programming task in full
humility, and among other things he avoids clever tricks like the plague. 
-- Edsger W. Dijkstra

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bzlib
  2003-06-20 18:59 [RFC] Breaking data compatibility with userspace bzlib Jörn Engel
@ 2003-06-20 19:09 ` Jeff Garzik
  2003-06-20 19:45   ` Jörn Engel
  2003-06-20 20:27   ` [RFC] Breaking data compatibility with userspace bz2lib Nicholas Wourms
  2003-06-20 19:45 ` [RFC] Breaking data compatibility with userspace bzlib David S. Miller
  1 sibling, 2 replies; 13+ messages in thread
From: Jeff Garzik @ 2003-06-20 19:09 UTC (permalink / raw)
  To: J?rn Engel; +Cc: linux-kernel, jmorris, davem, David Woodhouse

On Fri, Jun 20, 2003 at 08:59:15PM +0200, J?rn Engel wrote:
> Now, the cost of the underlying BWT is O(n) in memory and O(n*ln(n))
> in time.  That given, I consider it odd to use a linear semantic of
> blockSizeXXXX and would prefer an exponential one, as the zlib uses
> here and there.  Thus blockSizeBits would now give the blockSize as
> 1 << blockSizeBits, allowing to go well below 100k, resulting in lower
> memory consumption for some and well above 900k, giving better
> compression ratios.
> 
> 
> Long intro, short question: Jay O Nay?

The big question is whether the bzip2 better compression is actually
useful in a kernel context?  Patches to do bzip2 for initrd, for
example, have been around for ages:

	http://gtf.org/garzik/kernel/files/initrd-bzip2-2.2.13-2.patch.gz

But the compression and decompression overhead is _much_ larger
than gzip.  It was so huge for maximal compression that dialing back
compression reaching a point of diminishing returns rather quickly,
when compared to gzip memory usage and compression.

I talked a bit with the bzip2 author a while ago about memory usage.
He eventually added the capability to only require small blocks
for decompression (64K IIRC?), but there was a significant loss in
compression factor.

So... even in 2003, I really don't know of many (any?) tasks which
would benefit from bzip2, considering the additional memory and
cpu overhead.

	Jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bzlib
  2003-06-20 18:59 [RFC] Breaking data compatibility with userspace bzlib Jörn Engel
  2003-06-20 19:09 ` Jeff Garzik
@ 2003-06-20 19:45 ` David S. Miller
  2003-06-20 19:56   ` Jörn Engel
  1 sibling, 1 reply; 13+ messages in thread
From: David S. Miller @ 2003-06-20 19:45 UTC (permalink / raw)
  To: joern; +Cc: linux-kernel, jmorris, dwmw2

   From: Jörn Engel <joern@wohnheim.fh-wedel.de>
   Date: Fri, 20 Jun 2003 20:59:15 +0200

   The whole interface of the bzlib was modelled after the zlib
   interface.

The zlib interface is actually problematic, it doesn't allow
handling of scatter-gather lists on input or output for example.

Therefore we couldn't make the compress cryptolib interface
take scatterlists elements either, which is a huge problem.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bzlib
  2003-06-20 19:09 ` Jeff Garzik
@ 2003-06-20 19:45   ` Jörn Engel
  2003-06-20 19:48     ` David Lang
  2003-06-20 20:27   ` [RFC] Breaking data compatibility with userspace bz2lib Nicholas Wourms
  1 sibling, 1 reply; 13+ messages in thread
From: Jörn Engel @ 2003-06-20 19:45 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, jmorris, davem, David Woodhouse

On Fri, 20 June 2003 15:09:57 -0400, Jeff Garzik wrote:
> 
> The big question is whether the bzip2 better compression is actually
> useful in a kernel context?  Patches to do bzip2 for initrd, for
> example, have been around for ages:
> 
> 	http://gtf.org/garzik/kernel/files/initrd-bzip2-2.2.13-2.patch.gz
> 
> But the compression and decompression overhead is _much_ larger
> than gzip.  It was so huge for maximal compression that dialing back
> compression reaching a point of diminishing returns rather quickly,
> when compared to gzip memory usage and compression.
> 
> I talked a bit with the bzip2 author a while ago about memory usage.
> He eventually added the capability to only require small blocks
> for decompression (64K IIRC?), but there was a significant loss in
> compression factor.

You are puzzling me a bit.  What exactly do you consider to be the
overhead?  Code size?  Memory consumption?  CPU time?

When it comes to compression, the combination of compressed data and
decompression code should be larger than uncompressed data, everything
else is secondary.  The secondary stuff might still affect some users,
but it is not a general problem.

> So... even in 2003, I really don't know of many (any?) tasks which
> would benefit from bzip2, considering the additional memory and
> cpu overhead.

That overhead will become less important with time.  And if we will
merge bzlib anyway, we may as well do it today.

But I do agree with you that some choices made by the maintainer were
rather stupid.  And one of them is the reason for this thread and
appears to be the whole point behind your argument.

Jörn

-- 
The cost of changing business rules is much more expensive for software
than for a secretaty.
-- unknown

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bzlib
  2003-06-20 19:45   ` Jörn Engel
@ 2003-06-20 19:48     ` David Lang
  2003-06-20 20:05       ` Jörn Engel
  0 siblings, 1 reply; 13+ messages in thread
From: David Lang @ 2003-06-20 19:48 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Jeff Garzik, linux-kernel, jmorris, davem, David Woodhouse

he is saying that the memory and CPU requirements to do the bzip
uncompression are so much larger then what is nessasary to do the gzip
uncompression that the small amount of space saved is almost never worth
the cost.

David Lang


 On Fri, 20 Jun 2003, Jörn Engel wrote:

> Date: Fri, 20 Jun 2003 21:45:17 +0200
> From: Jörn Engel <joern@wohnheim.fh-wedel.de>
> To: Jeff Garzik <jgarzik@pobox.com>
> Cc: linux-kernel@vger.kernel.org, jmorris@intercode.com.au, davem@redhat.com,
>      David Woodhouse <dwmw2@infradead.org>
> Subject: Re: [RFC] Breaking data compatibility with userspace bzlib
>
> On Fri, 20 June 2003 15:09:57 -0400, Jeff Garzik wrote:
> >
> > The big question is whether the bzip2 better compression is actually
> > useful in a kernel context?  Patches to do bzip2 for initrd, for
> > example, have been around for ages:
> >
> > 	http://gtf.org/garzik/kernel/files/initrd-bzip2-2.2.13-2.patch.gz
> >
> > But the compression and decompression overhead is _much_ larger
> > than gzip.  It was so huge for maximal compression that dialing back
> > compression reaching a point of diminishing returns rather quickly,
> > when compared to gzip memory usage and compression.
> >
> > I talked a bit with the bzip2 author a while ago about memory usage.
> > He eventually added the capability to only require small blocks
> > for decompression (64K IIRC?), but there was a significant loss in
> > compression factor.
>
> You are puzzling me a bit.  What exactly do you consider to be the
> overhead?  Code size?  Memory consumption?  CPU time?
>
> When it comes to compression, the combination of compressed data and
> decompression code should be larger than uncompressed data, everything
> else is secondary.  The secondary stuff might still affect some users,
> but it is not a general problem.
>
> > So... even in 2003, I really don't know of many (any?) tasks which
> > would benefit from bzip2, considering the additional memory and
> > cpu overhead.
>
> That overhead will become less important with time.  And if we will
> merge bzlib anyway, we may as well do it today.
>
> But I do agree with you that some choices made by the maintainer were
> rather stupid.  And one of them is the reason for this thread and
> appears to be the whole point behind your argument.
>
> Jörn
>
> --
> The cost of changing business rules is much more expensive for software
> than for a secretaty.
> -- unknown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bzlib
  2003-06-20 19:45 ` [RFC] Breaking data compatibility with userspace bzlib David S. Miller
@ 2003-06-20 19:56   ` Jörn Engel
  2003-06-20 20:23     ` David S. Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Jörn Engel @ 2003-06-20 19:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, jmorris, dwmw2

On Fri, 20 June 2003 12:45:10 -0700, David S. Miller wrote:
>    From: Jörn Engel <joern@wohnheim.fh-wedel.de>
>    Date: Fri, 20 Jun 2003 20:59:15 +0200
>    
>    The whole interface of the bzlib was modelled after the zlib
>    interface.
> 
> The zlib interface is actually problematic, it doesn't allow
> handling of scatter-gather lists on input or output for example.
> 
> Therefore we couldn't make the compress cryptolib interface
> take scatterlists elements either, which is a huge problem.

Is there a reason to use the zlib and nothing but the zlib for the
cryptolib?  RFCs 1950 - 1952?  Or would any form of compression do, in
principle at least?

In the worst case, I consider it not too hard to add a wrapper
interface to the zlib to do the scatter-gather handling.  Actually
going deeply into the guts of the zlib is not good for a kernel
hackers sanity though.  The massive use of macros with magic knowledge
of the surrounding functions is - well - interesting.

Jörn

-- 
If you're willing to restrict the flexibility of your approach,
you can almost always do something better.
-- John Carmack

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bzlib
  2003-06-20 19:48     ` David Lang
@ 2003-06-20 20:05       ` Jörn Engel
  2003-06-20 21:53         ` Jeff Garzik
  0 siblings, 1 reply; 13+ messages in thread
From: Jörn Engel @ 2003-06-20 20:05 UTC (permalink / raw)
  To: David Lang; +Cc: Jeff Garzik, linux-kernel, jmorris, davem, David Woodhouse

On Fri, 20 June 2003 12:48:51 -0700, David Lang wrote:
> 
> he is saying that the memory and CPU requirements to do the bzip
> uncompression are so much larger then what is nessasary to do the gzip
> uncompression that the small amount of space saved is almost never worth
> the cost.

Which is why kernel images usually come as .bz2 today. ;)

Read my first posting, read the source (grep for BZMALLOC, there are
just a few).  It is impossible in it's current state to use less than
roughly 1MB of memory, even though the algorithm doesn't give that
restriction at all.  Drop that down to 280k, the current zlib value
and you won't see a difference in compression ratios for jffs2 at
least.

This might boil down to bzlib merge 100% (minus testing and sending
patches) done and <nifty_name>, which is based on bzlib, just started.

Comments?

Jörn

-- 
Beware of bugs in the above code; I have only proved it correct, but
not tried it.
-- Donald Knuth

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bzlib
  2003-06-20 19:56   ` Jörn Engel
@ 2003-06-20 20:23     ` David S. Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David S. Miller @ 2003-06-20 20:23 UTC (permalink / raw)
  To: joern; +Cc: linux-kernel, jmorris, dwmw2

   From: Jörn Engel <joern@wohnheim.fh-wedel.de>
   Date: Fri, 20 Jun 2003 21:56:58 +0200

   On Fri, 20 June 2003 12:45:10 -0700, David S. Miller wrote:
   > Therefore we couldn't make the compress cryptolib interface
   > take scatterlists elements either, which is a huge problem.

   Is there a reason to use the zlib and nothing but the zlib for the
   cryptolib?  RFCs 1950 - 1952?  Or would any form of compression do, in
   principle at least?

Because it's not very sensible to have 2 pieces of code that
do exactly the same thing? :-)

   In the worst case, I consider it not too hard to add a wrapper
   interface to the zlib to do the scatter-gather handling.

It's not that simple.

Is the wrapper called in IRQ or BH or user context?  This matters
to determine what kind of kmap() operations you must use in the
wrapper.

That's merely the beginning of the problems with this kind of idea.

Don't let my statements stall your work, it was just a FYI and
something we should take care of in the 2.7.x timeframe not now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bz2lib
  2003-06-20 19:09 ` Jeff Garzik
  2003-06-20 19:45   ` Jörn Engel
@ 2003-06-20 20:27   ` Nicholas Wourms
  2003-06-20 20:51     ` Jörn Engel
  1 sibling, 1 reply; 13+ messages in thread
From: Nicholas Wourms @ 2003-06-20 20:27 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, jmorris, davem, David Woodhouse

Jeff Garzik wrote:
> On Fri, Jun 20, 2003 at 08:59:15PM +0200, J?rn Engel wrote:
> The big question is whether the bzip2 better compression is actually
> useful in a kernel context?  Patches to do bzip2 for initrd, for
> example, have been around for ages:
> 
> 	http://gtf.org/garzik/kernel/files/initrd-bzip2-2.2.13-2.patch.gz
> 

Not to mention the more current ongoing work by Christian Ludwig for 
2.4.2x support at:

http://shepard.kicks-ass.net/~cc/

Cheers,
Nicholas


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bz2lib
  2003-06-20 20:27   ` [RFC] Breaking data compatibility with userspace bz2lib Nicholas Wourms
@ 2003-06-20 20:51     ` Jörn Engel
  2003-06-20 21:34       ` Jeff Garzik
  0 siblings, 1 reply; 13+ messages in thread
From: Jörn Engel @ 2003-06-20 20:51 UTC (permalink / raw)
  To: Nicholas Wourms
  Cc: Jeff Garzik, linux-kernel, jmorris, davem, David Woodhouse

On Fri, 20 June 2003 16:27:24 -0400, Nicholas Wourms wrote:
> Jeff Garzik wrote:
> >On Fri, Jun 20, 2003 at 08:59:15PM +0200, J?rn Engel wrote:
> >The big question is whether the bzip2 better compression is actually
> >useful in a kernel context?  Patches to do bzip2 for initrd, for
> >example, have been around for ages:
> >
> >	http://gtf.org/garzik/kernel/files/initrd-bzip2-2.2.13-2.patch.gz
> >
> 
> Not to mention the more current ongoing work by Christian Ludwig for 
> 2.4.2x support at:
> 
> http://shepard.kicks-ass.net/~cc/

Hmm, both of them are already behind my work, and I just started this
week. :)

Ok, this isn't fair as I mainly concentrated on the bzlib itself,
jffs2 is likely the easiest user of compression in the kernel.  No
personal offense please.

Jeff didn't copy lib/inflate.c, a horrible hack that has to go, so his
patch is much better from that perspective.  But he kept the bzlib
pretty much verbatim, incl. _WIN32, functions working with FILE* and
so on.  Once you cut all that out, you start to see the real beauty of
the bzlib, it is much nicer than the older zlib.

Still, the blockSize100k part is just plain stupid and should die. 

Oh yes, thanks for the link.  Didn't know that patch.

Jörn

-- 
To recognize individual spam features you have to try to get into the
mind of the spammer, and frankly I want to spend as little time inside
the minds of spammers as possible.
-- Paul Graham

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bz2lib
  2003-06-20 20:51     ` Jörn Engel
@ 2003-06-20 21:34       ` Jeff Garzik
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Garzik @ 2003-06-20 21:34 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Nicholas Wourms, linux-kernel, jmorris, davem, David Woodhouse

My bzlib initrd stuff was for 2.2.x era kernel, and was a 5-minute hack. 
  Don't assume thought was put into it, I just wanted something that 
worked :)  There are at least 3-4 bzlib patches floating around, all 
developed independently.  Mine, C Ludwig's, and a couple others.

	Jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bzlib
  2003-06-20 20:05       ` Jörn Engel
@ 2003-06-20 21:53         ` Jeff Garzik
  2003-06-20 21:55           ` Jeff Garzik
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2003-06-20 21:53 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Lang, linux-kernel, jmorris, davem, David Woodhouse

Jörn Engel wrote:
> On Fri, 20 June 2003 12:48:51 -0700, David Lang wrote:
>>he is saying that the memory and CPU requirements to do the bzip
>>uncompression are so much larger then what is nessasary to do the gzip
>>uncompression that the small amount of space saved is almost never worth
>>the cost.

exactly


> just a few).  It is impossible in it's current state to use less than
> roughly 1MB of memory, even though the algorithm doesn't give that
> restriction at all.  Drop that down to 280k, the current zlib value
> and you won't see a difference in compression ratios for jffs2 at
> least.

Well, if you drop that down to 280k, and you do not beat zlib 
compression by a "comfortable margin", it's just pointless code churn in 
addition to swapping out the faster algorithm for the slower algorithm.

If you drop that down to 280k with much better compression than zlib, 
that's fantastic and useful, and people won't mind the slower algorithm :)

The issue of memory usage at high compression rates was the main reason 
why I didn't push the patch upstream and pursue bzlib further.

	Jeff




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Breaking data compatibility with userspace bzlib
  2003-06-20 21:53         ` Jeff Garzik
@ 2003-06-20 21:55           ` Jeff Garzik
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Garzik @ 2003-06-20 21:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jörn Engel, David Lang, jmorris, davem, David Woodhouse

Jeff Garzik wrote:
> If you drop that down to 280k with much better compression than zlib, 
> that's fantastic and useful, and people won't mind the slower algorithm :)

Just so people don't misinterpret this, I'm just talking about the 
_addition_ of bzlib as an option.  zlib isn't going anywhere.

	Jeff




^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-06-20 21:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-20 18:59 [RFC] Breaking data compatibility with userspace bzlib Jörn Engel
2003-06-20 19:09 ` Jeff Garzik
2003-06-20 19:45   ` Jörn Engel
2003-06-20 19:48     ` David Lang
2003-06-20 20:05       ` Jörn Engel
2003-06-20 21:53         ` Jeff Garzik
2003-06-20 21:55           ` Jeff Garzik
2003-06-20 20:27   ` [RFC] Breaking data compatibility with userspace bz2lib Nicholas Wourms
2003-06-20 20:51     ` Jörn Engel
2003-06-20 21:34       ` Jeff Garzik
2003-06-20 19:45 ` [RFC] Breaking data compatibility with userspace bzlib David S. Miller
2003-06-20 19:56   ` Jörn Engel
2003-06-20 20:23     ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox