From: John Richard Moser <nigelenki@comcast.net>
To: "Jörn Engel" <joern@wohnheim.fh-wedel.de>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Compressed filesystems: Better compression?
Date: Wed, 29 Sep 2004 13:33:25 -0400 [thread overview]
Message-ID: <415AF1E5.6010101@comcast.net> (raw)
In-Reply-To: <20040929123637.GA17952@wohnheim.fh-wedel.de>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jörn Engel wrote:
| On Tue, 28 September 2004 23:46:54 -0400, John Richard Moser wrote:
|
|>In my own personal tests, I've gotten a 6.25% increase in compression
|>ratio over bzip2 using the above lzma code. These were very weak tests
|>involving simply bunzipping a 32MiB tar.bz2 of the Mozilla 1.7 source
|>tree and recompressing it with lzma, which produced a 30MiB tar.lzma. I
|>tried, but could not get it to compress much better than that (I think I
|>touched 29.5 at some point but not sure, it was a while ago).
|
|
| Sounds sane. bzip2 is really hurt by the hart limit of 900k for block
| sorting.
|
| Inside the kernel, other things start to matter, though. If you
| really want to impress me, take some large test data (your mozilla.tar
| or whatever), cut it up into chunks of 4k and compress each chunk
| individually. Does lzma still beat gzip?
|
I'll try that. I'm more interested in 32-128k chunks, however. Based
on prior experience, I've come to rely on 32-64k being "optimal" for
compression; bigger block sizes don't seem to produce much of a gain
(some, but nothing amazing). These are also the ranges that would be
used for compressed filesystems such as squashfs. For filesystems such
as zisofs, it would be possible to split files up into blocks as well,
to lower the memory footprint and increase seek speed through the file.
[BlkSz][DictSz][CompressedData...........]
By placing an indicator of block size (compressed) on each block, and
indicating the size of uncompressed blocks elsewhere (in the file header
etc), compressed data can be quickly seeked through without
decompressing the entire stream (at max 1 block).
| If you can at least get it to compress better for 64k chunks, that's
| already quite interesting. But excellent compression with infinite
| chunk-size and infinite memory is quite pointless inside the kernel.
| Such things should be left in userspace where they belong.
|
Yes, this needs to be practically useful; compressing 800M files in the
kernel using 16G of memory is NOT practical. :)
| Jörn
|
- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBWvHlhDd4aOud5P8RAjkLAJ9YQa4dAA8cbEJZwOSm1AqDho24bQCeNsqA
eTvya0mNXt2JJb4Fi95IeEY=
=pe0m
-----END PGP SIGNATURE-----
prev parent reply other threads:[~2004-09-29 17:34 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-29 3:46 Compressed filesystems: Better compression? John Richard Moser
2004-09-29 3:53 ` John Richard Moser
2004-09-29 16:08 ` John Richard Moser
2004-09-29 16:39 ` John Richard Moser
2004-09-29 3:55 ` John Richard Moser
2004-09-29 10:46 ` Giuseppe Bilotta
2004-09-29 8:56 ` David Woodhouse
2004-09-29 12:18 ` Matti Aarnio
2004-09-29 16:07 ` John Richard Moser
2004-09-29 12:36 ` Jörn Engel
2004-09-29 17:33 ` John Richard Moser [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=415AF1E5.6010101@comcast.net \
--to=nigelenki@comcast.net \
--cc=joern@wohnheim.fh-wedel.de \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox