Re: zstd compression

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: zstd compression
Date: Thu, 16 Nov 2017 13:43:24 +0000 (UTC)	[thread overview]
Message-ID: <pan$a1486$84968dd3$dcbd0f46$977f6603@cox.net> (raw)
In-Reply-To: 37eb6ee9-2f7e-de42-3f7c-32db11d7648a@gmail.com

Austin S. Hemmelgarn posted on Thu, 16 Nov 2017 07:30:47 -0500 as
excerpted:

> On 2017-11-15 16:31, Duncan wrote:
>> Austin S. Hemmelgarn posted on Wed, 15 Nov 2017 07:57:06 -0500 as
>> excerpted:
>> 
>>> The 'compress' and 'compress-force' mount options only impact newly
>>> written data.  The compression used is stored with the metadata for
>>> the extents themselves, so any existing data on the volume will be
>>> read just fine with whatever compression method it was written with,
>>> while new data will be written with the specified compression method.
>>>
>>> If you want to convert existing files, you can use the '-c' option to
>>> the defrag command to do so.
>> 
>> ... Being aware of course that using defrag to recompress files like
>> that will break 100% of the existing reflinks, effectively (near)
>> doubling data usage if the files are snapshotted, since the snapshot
>> will now share 0% of its extents with the newly compressed files.
> Good point, I forgot to mention that.
>> 
>> (The actual effect shouldn't be quite that bad, as some files are
>> likely to be uncompressed due to not compressing well, and I'm not sure
>> if defrag -c rewrites them or not.  Further, if there's multiple
>> snapshots data usage should only double with respect to the latest one,
>> the data delta between it and previous snapshots won't be doubled as
>> well.)
> I'm pretty sure defrag is equivalent to 'compress-force', not
> 'compress', but I may be wrong.

But... compress-force doesn't actually force compression _all_ the time.  
Rather, it forces btrfs to continue checking whether compression is worth 
it for each "block"[1] of the file, instead of giving up if the first 
quick try at the beginning says that block won't compress.  

So what I'm saying is that if the snapshotted data is already compressed, 
think (pre-)compressed tarballs or image files such as jpeg that are 
unlikely to /easily/ compress further and might well actually be _bigger_ 
once the compression algorithm is run over them, defrag -c will likely 
fail to compress them further even if it's the equivalent of compress-
force, and thus /should/ leave them as-is, not breaking the reflinks of 
the snapshots and thus not doubling the data usage for that file, or more 
exactly, that extent of that file.

Tho come to think of it, is defrag -c that smart, to actually leave the 
data as-is if it doesn't compress further, or does it still rewrite it 
even if it doesn't compress, thus breaking the reflink and doubling the 
usage regardless?

---
[1] Block:  I'm not positive it's the usual 4K block in this case.  I 
think I read that it's 16K, but I might be confused on that.  But 
regardless the size, the point is, with compress-force btrfs won't give 
up like simple compress will if the first "block" doesn't compress, it'll 
keep trying.

Of course the new compression heuristic changes this a bit too, but the 
same general idea holds, compress-force continues to try for the entire 
file, compress will give up much faster. 

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2017-11-16 13:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-15  8:51 zstd compression Imran Geriskovan
2017-11-15 10:09 ` Lukas Pirl
2017-11-15 10:35   ` Imran Geriskovan
2017-11-15 12:57     ` Austin S. Hemmelgarn
2017-11-15 21:31       ` Duncan
2017-11-16 12:30         ` Austin S. Hemmelgarn
2017-11-16 12:51           ` Imran Geriskovan
2017-11-16 13:43           ` Duncan [this message]
2017-11-16 16:32             ` Austin S. Hemmelgarn
2017-11-16 20:36               ` Timofey Titovets

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$a1486$84968dd3$dcbd0f46$977f6603@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).