From: Eli Carter <eli.carter@inet.com>
To: John Bradford <john@grabjohn.com>
Cc: jw schultz <jw@pegasys.ws>, linux-kernel@vger.kernel.org
Subject: Re: Transparent compression in the FS
Date: Fri, 17 Oct 2003 11:22:51 -0500 [thread overview]
Message-ID: <3F90175B.2000502@inet.com> (raw)
In-Reply-To: 200310171527.h9HFRvU2001448@81-2-122-30.bradfords.org.uk
John Bradford wrote:
>>>The upshot of all that would be that if you needed space, it would be
>>>there, (just overwrite the uncompressed versions of files), but until
>>>you do, you can access the uncompressed data quickly.
>>>
>>>You could even take it one step further, and compress files with gzip
>>>by default, and re-compress them with bzip2 after long periods of
>>>inactivity.
>>
>>Note that a file compressed with bzip2 is not necessarily smaller than
>>the same file compressed with gzip. (It can be quite a bit larger in fact.)
>
>
> Have you noticed that with real-life data, or only test cases?
Real-life data. I don't remember the exact details for certain, but as
best as I can recall: I was dealing with copies of output from build
logs, telnet sessions, messages files, or the like (i.e. text) that were
(many,) many MB in size (and probably highly repetitititititive). I
wound up with a loop that compressed each file into a gzip and a bzip2,
compared the sizes, and killed the larger. There were a number of .gz's
that won. (I have also read that gzip is better at text compression
whereas bzip2 is better at binary compression. No, I don't remember the
source.)
But that is immaterial... You have to deal with the case where the
'better' algorithm gives 'worse' results (by size). Keep in mind that
some data won't compress at all (for a given algorithm), and winds up
needing more space in the compressed form. (In which case we add a byte
to say "this is not compressed" and keep the original form.)
uncompressed -> gzip; gzip -> bzip2 would be by far the normal case
But, sometimes gzip can't get it any smaller, or would increase the
size. (Keep in mind we may be storing a file that is already compressed...)
So your scheme needs to note when compression fails so it doesn't try
again, so we see:
uncompressed -> gzip or uncompressed(gzip failed)
gzip -> bzip2 or gzip(bzip2 failed)
uncompressed(gzip failed) -> bzip2 or uncompressed(bzip2 failed)
If it were me, I'd do it with one compression algorthim as a
proof-of-concept, then add a second, and then generalize it to N cases
(which would not be hard once the 2 cases was done).
But I must say, I like your idea of keeping the uncompressed form around
until we need the space. (I'd also want to track reads separately from
writes.)
Eli
--------------------. "If it ain't broke now,
Eli Carter \ it will be soon." -- crypto-gram
eli.carter(a)inet.com `-------------------------------------------------
next prev parent reply other threads:[~2003-10-17 16:22 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-14 20:30 Transparent compression in the FS Josh Litherland
2003-10-15 13:33 ` Erik Mouw
2003-10-15 13:45 ` Josh Litherland
2003-10-15 13:50 ` Nikita Danilov
2003-10-15 14:27 ` Erik Mouw
2003-10-15 14:33 ` Nikita Danilov
2003-10-15 15:54 ` Richard B. Johnson
2003-10-15 16:21 ` Nikita Danilov
2003-10-15 17:19 ` Richard B. Johnson
2003-10-15 17:37 ` Andreas Dilger
2003-10-15 17:48 ` Dave Jones
2003-10-15 18:19 ` Richard B. Johnson
2003-10-15 18:06 ` Hans Reiser
2003-10-17 12:51 ` Edward Shushkin
2003-10-15 16:04 ` Erik Mouw
2003-10-15 17:24 ` Josh Litherland
2003-10-15 18:53 ` Erik Bourget
2003-10-15 19:03 ` Geert Uytterhoeven
2003-10-15 19:14 ` Valdis.Kletnieks
2003-10-15 19:24 ` Geert Uytterhoeven
2003-10-15 18:54 ` root
2003-10-16 2:11 ` Chris Meadors
2003-10-16 3:01 ` Shawn
2003-10-15 14:47 ` Erik Bourget
2003-10-15 15:05 ` Nikita Danilov
2003-10-15 15:06 ` Erik Bourget
2003-10-15 21:36 ` Tomas Szepe
2003-10-16 8:04 ` Ville Herva
2003-10-17 1:32 ` Eric W. Biederman
2003-10-15 15:13 ` Jeff Garzik
2003-10-15 21:00 ` Christopher Li
2003-10-16 16:29 ` Andrea Arcangeli
2003-10-16 16:41 ` P
2003-10-16 17:20 ` Jeff Garzik
2003-10-16 23:12 ` jw schultz
2003-10-17 8:03 ` John Bradford
2003-10-17 14:53 ` Eli Carter
2003-10-17 15:27 ` John Bradford
2003-10-17 16:22 ` Eli Carter [this message]
2003-10-17 17:15 ` John Bradford
2003-10-16 17:10 ` Jeff Garzik
2003-10-16 17:41 ` Andrea Arcangeli
2003-10-16 17:29 ` Larry McVoy
2003-10-16 17:49 ` Val Henson
2003-10-16 21:02 ` Jeff Garzik
2003-10-16 21:18 ` Chris Meadors
2003-10-16 21:25 ` Jeff Garzik
2003-10-16 21:33 ` Davide Libenzi
2003-10-17 3:47 ` Mark Mielke
2003-10-17 14:31 ` Jörn Engel
2003-10-16 23:04 ` jw schultz
2003-10-16 23:30 ` Jeff Garzik
2003-10-16 23:58 ` jw schultz
2003-10-16 23:53 ` David Lang
2003-10-17 1:19 ` Jeff Garzik
2003-10-17 0:45 ` Christopher Li
2003-10-17 1:16 ` Jeff Garzik
2003-10-17 1:32 ` jlnance
2003-10-17 1:47 ` Eric Sandall
2003-10-17 8:11 ` John Bradford
2003-10-17 17:53 ` Eric Sandall
2003-10-17 13:07 ` jlnance
2003-10-17 14:16 ` Jeff Garzik
2003-10-17 15:06 ` Valdis.Kletnieks
2003-10-17 1:49 ` Davide Libenzi
2003-10-17 1:59 ` Larry McVoy
2003-10-17 2:19 ` jw schultz
2003-10-17 9:44 ` Pavel Machek
2003-10-17 12:33 ` jlnance
2003-10-17 18:23 ` jw schultz
2003-10-27 2:08 ` Mike Fedyk
2003-10-27 2:15 ` jw schultz
2003-10-27 2:22 ` Mike Fedyk
2003-10-27 2:45 ` jw schultz
2003-10-16 18:28 ` John Bradford
2003-10-16 18:31 ` Robert Love
2003-10-16 20:18 ` Jeff Garzik
2003-10-16 18:43 ` Muli Ben-Yehuda
2003-10-16 18:56 ` Richard B. Johnson
2003-10-16 19:00 ` Robert Love
2003-10-16 19:27 ` John Bradford
2003-10-16 19:03 ` John Bradford
2003-10-16 19:20 ` Richard B. Johnson
2003-10-17 13:16 ` Ingo Oeser
2003-10-16 23:20 ` jw schultz
2003-10-17 14:47 ` Eli Carter
2003-10-16 8:27 ` tconnors+linuxkernel1066292516
2003-10-17 10:55 ` Ingo Oeser
2003-10-15 16:25 ` David Woodhouse
2003-10-15 16:56 ` Andreas Dilger
2003-10-15 17:44 ` David Woodhouse
[not found] <GTJr.60q.17@gated-at.bofh.it>
[not found] ` <GU2N.6v7.17@gated-at.bofh.it>
[not found] ` <GVBC.Ep.23@gated-at.bofh.it>
[not found] ` <Hjkq.3Al.1@gated-at.bofh.it>
[not found] ` <Hkgx.4Vu.7@gated-at.bofh.it>
[not found] ` <HkA0.5lh.9@gated-at.bofh.it>
[not found] ` <HnxT.3BB.27@gated-at.bofh.it>
2003-10-17 8:15 ` Ihar 'Philips' Filipau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3F90175B.2000502@inet.com \
--to=eli.carter@inet.com \
--cc=john@grabjohn.com \
--cc=jw@pegasys.ws \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.