Re: Query about proposed dedup patches and behaviours

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Query about proposed dedup patches and behaviours
Date: Fri, 15 Jan 2016 12:18:46 +0000 (UTC)	[thread overview]
Message-ID: <pan$5ad99$de23f375$ce80d2dd$892b7fc5@cox.net> (raw)
In-Reply-To: CAGkb5vdQU6z0SaMAW+WHTNxdKj8L0mdDe6zOyByzYLiK=CKfsw@mail.gmail.com

James Hogarth posted on Fri, 15 Jan 2016 09:33:44 +0000 as excerpted:

> On 15 January 2016 at 01:47, Duncan <1i5t5.duncan@cox.net> wrote:
>>
>> Hugo should really explain as he was the one that said that, but 
>> [...]  In general, autodefrag remains bad for reflinks, but
>> apparently not h***-bad, as manual defrag is.
>>
> As I recall it's something like autodefrag will break the reflink pretty
> much to the same extent as if you just starting writing to each
> instance.
> 
> http://article.gmane.org/gmane.comp.file-systems.btrfs/51441

That's it, yes.  Thanks.  =:^)

I think I had read it but hadn't actually explained it to anyone myself 
yet, which tends to solidify it in my mind, so forgot enough of the 
detail that I couldn't easily do so.  Let's see if the below explanation 
solves that for next time.  =:^)

Tho just writing to the file would normally only copy the 4096-byte 
block, while autodefrag will check how fragmented the file is around that 
block, and if the extents are small enough to trigger a defrag, it'll 
rewrite rather more of the file into a (hopefully) larger single extent.

So autodefrag will break reflinks to a rather larger extent (literally, 
file extent) than will writing to an individual block within a file, but 
(on a reasonably large file, say 100-MiB scale) it should still be a much 
smaller effect (breaking reflinks for a rather smaller part of the file) 
than defragging the entire file, which is what a manual defrag would do.

And as Hugo said, of course if you're rewriting most of the file, it's 
likely all or almost all the file will be reflink-broken, but that would 
be expected anyway, if you're rewriting the file.

So as I said, autodefrag is a bit bad for reflinks, yes, but not h***-bad 
for them, as manual defrag is.

> It does appear that btrfs-progs is only being extended to enable or
> disable dedup on a whole pool rather than to dedup X files

> So I see two things out of this:
> 
> 1) A least a note in the man page (or command output as well preferably)
> reminding that autodefrag will to an extent work against dedupe (and it
> may be worth testing the effect of both enabled and if poor preventing
> one whilst the other is there).

Agreed, a manpage (and wiki mount options page) note explaining that 
autodefrag can partially undo dedup's work, would be useful.

> 2) Qu is there any intention to be able to do btrfs dedup /path1../pathN
> or is the intention for this work only to enable in-band across an
> entire pool (less any files with the proposed attribute changed to say
> nodedup)?
> 
> If there is no intention for 2 then the duperemove packaging is still
> worthwhile to carry out.

Previous discussion has made plain that this is /inline/ dedup, write new 
data, and it's compared against existing data (hashes or the like) to see 
if part or all of it can be reflinked instead of written separately.

And while not yet part of the patches, a per-file nodedup property is 
intended as well, which if set, will mean the file isn't included as 
existing data when the comparison of new data against existing data is 
made.

As such, there will indeed be no way to specifically dedup one already 
existing file against another (unless of course you rewrite it, 
triggering the inline dedup), which is precisely where (separate) out-of-
line dedup comes in.

So yes, dupremove, as one option for that separate out-of-line dedup, 
will still be worthwhile, with the two functionalities complimenting each 
other.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2016-01-15 12:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-14 16:13 Query about proposed dedup patches and behaviours James Hogarth
2016-01-14 16:46 ` Austin S. Hemmelgarn
2016-01-14 19:26   ` Liu Bo
2016-01-14 19:41     ` Austin S. Hemmelgarn
2016-01-15  1:47       ` Duncan
2016-01-15  9:33         ` James Hogarth
2016-01-15 12:18           ` Duncan [this message]
2016-01-20 15:33       ` Interjection: autodefrag mount option aye, nae? Al
2016-01-20 15:39         ` Austin S. Hemmelgarn
2016-01-20 18:39           ` Duncan
2016-01-21 20:59           ` Kai Krakow
2016-01-22 12:14             ` Austin S. Hemmelgarn
2016-01-22 19:43               ` Kai Krakow
2016-01-23 22:11 ` Query about proposed dedup patches and behaviours Mark Fasheh
2016-01-24  5:12   ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$5ad99$de23f375$ce80d2dd$892b7fc5@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).