From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Tomasz Chmielewski <tch@virtall.com>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: compression disk space saving - what are your results?
Date: Wed, 2 Dec 2015 08:03:27 -0500 [thread overview]
Message-ID: <565EEC1F.7070600@gmail.com> (raw)
In-Reply-To: <4082684905f25f921ae4564b1c8a892e@admin.virtall.com>
[-- Attachment #1: Type: text/plain, Size: 4096 bytes --]
On 2015-12-02 04:46, Tomasz Chmielewski wrote:
> What are your disk space savings when using btrfs with compression?
>
> I have a 200 GB btrfs filesystem which uses compress=zlib, only stores
> text files (logs), mostly multi-gigabyte files.
>
>
> It's a "single" filesystem, so "df" output matches "btrfs fi df":
>
> # df -h
> Filesystem Size Used Avail Use% Mounted on
> (...)
> /dev/xvdb 200G 124G 76G 62% /var/log/remote
>
>
> # du -sh /var/log/remote/
> 153G /var/log/remote/
>
>
> From these numbers (124 GB used where data size is 153 GB), it appears
> that we save around 20% with zlib compression enabled.
> Is 20% reasonable saving for zlib? Typically text compresses much better
> with that algorithm, although I understand that we have several
> limitations when applying that on a filesystem level.
This is actually an excellent question. A couple of things to note
before I share what I've seen:
1. Text compresses better with any compression algorithm. It is by
nature highly patterned and moderately redundant data, which is what
benefits the most from compression.
2. When BTRFS does in-line compression, it uses 128k blocks. Because of
this, there are diminishing returns for smaller files when using
compression.
3. The best compression ratio I've ever seen from zlib on real data is
about 65-70%, and that was using SquashFS, which is designed to take up
as little room as possible.
4. LZO gets a worse compression ratio than zlib (around 40-50% if you're
lucky), but is a _lot_ faster.
5. By playing around with the -c option for defrag, you can compress or
uncompress different parts of the filesystem, and get a rough idea of
what compresses best.
Now, to my results. These are all from my desktop system, with no
deduplication, and the data for zlib is somewhat outdated (I've not used
it since LZO support stabilized).
For the filesystems I have on traditional hard disks:
1. For /home (mostly text files, some SQLite databases, and a couple of
git repositories), I get about 15-20% space savings with zlib, and about
a 2-4$ performance hit. I get about 5-10% space savings with lzo, but
performance is about 5-8% better than uncompressed.
2. For /usr/src (50/50 mix of text and executable code), I get about 25%
space savings with zlib with a 5-7% hit to performance, and about 10%
with lzo with a 7% boost in performance relative to uncompressed.
3. For /usr/portage and /var/lib/layman (lots of small text files, a
number of VCS repos, and about 2000 compressed source archives), I get
about 25% space savings with zlib, with a 15% performance hit (yes,
seriously 15%), and with lzo I get about 25% space savings with no
measurable performance difference relative to uncompressed.
For the filesystems I have on SSD's:
1. For /var/tmp (huge assortment of different things, but usually
similar to /usr/src because this is where packages get built), I get
almost no space savings with either type of compression, and see a
performance reduction of about 5% for both.
2. For /var/log (Lots of text files (notably, I don't compress rotated
logs, and I don't have systemd's insane binary log files), I get about
30% space savings with zlib, but it makes the _whole_ system run about
5% slower, and I get about 20% space savings with lzo, with no
measurable performance difference relative to uncompressed.
3. For /var/spool (Lots of really short text files, mostly stuff from
postfix and CUPS), I actually see higher disk usage with both types of
compression, but almost zero performance impact from either of them.
4. For /boot (a couple of big binary files that already have built-in
compression), I see no net space savings, and don't have any numbers
regarding performance impact.
5. For / (everything that isn't on one of the other filesystems I listed
above), I see about 10-20% space savings from zlib, with a roughly 5%
performance hit, and about 5-15% space savings with lzo, with no
measurable performance difference.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
next prev parent reply other threads:[~2015-12-02 13:03 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-02 9:46 compression disk space saving - what are your results? Tomasz Chmielewski
2015-12-02 10:36 ` Duncan
2015-12-02 14:03 ` Imran Geriskovan
2015-12-02 14:39 ` Austin S Hemmelgarn
2015-12-03 6:29 ` Duncan
2015-12-03 12:09 ` Imran Geriskovan
2015-12-04 12:33 ` Austin S Hemmelgarn
2015-12-04 12:37 ` Austin S Hemmelgarn
2015-12-02 13:03 ` Austin S Hemmelgarn [this message]
2015-12-02 13:53 ` Tomasz Chmielewski
2015-12-02 14:03 ` Wang Shilong
2015-12-02 14:06 ` Tomasz Chmielewski
2015-12-02 14:49 ` Austin S Hemmelgarn
2015-12-22 3:55 ` Kai Krakow
2015-12-22 17:25 ` james northrup
2015-12-05 13:37 ` Marc Joliet
2015-12-05 14:11 ` Marc Joliet
2015-12-06 4:21 ` Duncan
2015-12-06 11:26 ` Marc Joliet
2015-12-05 19:38 ` guido_kuenne
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=565EEC1F.7070600@gmail.com \
--to=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=tch@virtall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.