All of lore.kernel.org
 help / color / mirror / Atom feed
From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Tomasz Chmielewski <tch@virtall.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: compression disk space saving - what are your results?
Date: Wed, 2 Dec 2015 08:03:27 -0500	[thread overview]
Message-ID: <565EEC1F.7070600@gmail.com> (raw)
In-Reply-To: <4082684905f25f921ae4564b1c8a892e@admin.virtall.com>

[-- Attachment #1: Type: text/plain, Size: 4096 bytes --]

On 2015-12-02 04:46, Tomasz Chmielewski wrote:
> What are your disk space savings when using btrfs with compression?
>
> I have a 200 GB btrfs filesystem which uses compress=zlib, only stores
> text files (logs), mostly multi-gigabyte files.
>
>
> It's a "single" filesystem, so "df" output matches "btrfs fi df":
>
> # df -h
> Filesystem      Size  Used Avail Use% Mounted on
> (...)
> /dev/xvdb       200G  124G   76G  62% /var/log/remote
>
>
> # du -sh /var/log/remote/
> 153G    /var/log/remote/
>
>
>  From these numbers (124 GB used where data size is 153 GB), it appears
> that we save around 20% with zlib compression enabled.
> Is 20% reasonable saving for zlib? Typically text compresses much better
> with that algorithm, although I understand that we have several
> limitations when applying that on a filesystem level.

This is actually an excellent question.  A couple of things to note 
before I share what I've seen:
1. Text compresses better with any compression algorithm.  It is by 
nature highly patterned and moderately redundant data, which is what 
benefits the most from compression.
2. When BTRFS does in-line compression, it uses 128k blocks.  Because of 
this, there are diminishing returns for smaller files when using 
compression.
3. The best compression ratio I've ever seen from zlib on real data is 
about 65-70%, and that was using SquashFS, which is designed to take up 
as little room as possible.
4. LZO gets a worse compression ratio than zlib (around 40-50% if you're 
lucky), but is a _lot_ faster.
5. By playing around with the -c option for defrag, you can compress or 
uncompress different parts of the filesystem, and get a rough idea of 
what compresses best.

Now, to my results.  These are all from my desktop system, with no 
deduplication, and the data for zlib is somewhat outdated (I've not used 
it since LZO support stabilized).

For the filesystems I have on traditional hard disks:
1. For /home (mostly text files, some SQLite databases, and a couple of 
git repositories), I get about 15-20% space savings with zlib, and about 
a 2-4$ performance hit.  I get about 5-10% space savings with lzo, but 
performance is about 5-8% better than uncompressed.
2. For /usr/src (50/50 mix of text and executable code), I get about 25% 
space savings with zlib with a 5-7% hit to performance, and about 10% 
with lzo with a 7% boost in performance relative to uncompressed.
3. For /usr/portage and /var/lib/layman (lots of small text files, a 
number of VCS repos, and about 2000 compressed source archives), I get 
about 25% space savings with zlib, with a 15% performance hit (yes, 
seriously 15%), and with lzo I get about 25% space savings with no 
measurable performance difference relative to uncompressed.

For the filesystems I have on SSD's:
1. For /var/tmp (huge assortment of different things, but usually 
similar to /usr/src because this is where packages get built), I get 
almost no space savings with either type of compression, and see a 
performance reduction of about 5% for both.
2. For /var/log (Lots of text files (notably, I don't compress rotated 
logs, and I don't have systemd's insane binary log files), I get about 
30% space savings with zlib, but it makes the _whole_ system run about 
5% slower, and I get about 20% space savings with lzo, with no 
measurable performance difference relative to uncompressed.
3. For /var/spool (Lots of really short text files, mostly stuff from 
postfix and CUPS), I actually see higher disk usage with both types of 
compression, but almost zero performance impact from either of them.
4. For /boot (a couple of big binary files that already have built-in 
compression), I see no net space savings, and don't have any numbers 
regarding performance impact.
5. For / (everything that isn't on one of the other filesystems I listed 
above), I see about 10-20% space savings from zlib, with a roughly 5% 
performance hit, and about 5-15% space savings with lzo, with no 
measurable performance difference.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

  parent reply	other threads:[~2015-12-02 13:03 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-02  9:46 compression disk space saving - what are your results? Tomasz Chmielewski
2015-12-02 10:36 ` Duncan
2015-12-02 14:03   ` Imran Geriskovan
2015-12-02 14:39     ` Austin S Hemmelgarn
2015-12-03  6:29       ` Duncan
2015-12-03 12:09         ` Imran Geriskovan
2015-12-04 12:33           ` Austin S Hemmelgarn
2015-12-04 12:37         ` Austin S Hemmelgarn
2015-12-02 13:03 ` Austin S Hemmelgarn [this message]
2015-12-02 13:53   ` Tomasz Chmielewski
2015-12-02 14:03     ` Wang Shilong
2015-12-02 14:06       ` Tomasz Chmielewski
2015-12-02 14:49     ` Austin S Hemmelgarn
2015-12-22  3:55       ` Kai Krakow
2015-12-22 17:25         ` james northrup
2015-12-05 13:37 ` Marc Joliet
2015-12-05 14:11   ` Marc Joliet
2015-12-06  4:21     ` Duncan
2015-12-06 11:26       ` Marc Joliet
2015-12-05 19:38 ` guido_kuenne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=565EEC1F.7070600@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=tch@virtall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.