From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Query about proposed dedup patches and behaviours
Date: Fri, 15 Jan 2016 12:18:46 +0000 (UTC) [thread overview]
Message-ID: <pan$5ad99$de23f375$ce80d2dd$892b7fc5@cox.net> (raw)
In-Reply-To: CAGkb5vdQU6z0SaMAW+WHTNxdKj8L0mdDe6zOyByzYLiK=CKfsw@mail.gmail.com
James Hogarth posted on Fri, 15 Jan 2016 09:33:44 +0000 as excerpted:
> On 15 January 2016 at 01:47, Duncan <1i5t5.duncan@cox.net> wrote:
>>
>> Hugo should really explain as he was the one that said that, but
>> [...] In general, autodefrag remains bad for reflinks, but
>> apparently not h***-bad, as manual defrag is.
>>
> As I recall it's something like autodefrag will break the reflink pretty
> much to the same extent as if you just starting writing to each
> instance.
>
> http://article.gmane.org/gmane.comp.file-systems.btrfs/51441
That's it, yes. Thanks. =:^)
I think I had read it but hadn't actually explained it to anyone myself
yet, which tends to solidify it in my mind, so forgot enough of the
detail that I couldn't easily do so. Let's see if the below explanation
solves that for next time. =:^)
Tho just writing to the file would normally only copy the 4096-byte
block, while autodefrag will check how fragmented the file is around that
block, and if the extents are small enough to trigger a defrag, it'll
rewrite rather more of the file into a (hopefully) larger single extent.
So autodefrag will break reflinks to a rather larger extent (literally,
file extent) than will writing to an individual block within a file, but
(on a reasonably large file, say 100-MiB scale) it should still be a much
smaller effect (breaking reflinks for a rather smaller part of the file)
than defragging the entire file, which is what a manual defrag would do.
And as Hugo said, of course if you're rewriting most of the file, it's
likely all or almost all the file will be reflink-broken, but that would
be expected anyway, if you're rewriting the file.
So as I said, autodefrag is a bit bad for reflinks, yes, but not h***-bad
for them, as manual defrag is.
> It does appear that btrfs-progs is only being extended to enable or
> disable dedup on a whole pool rather than to dedup X files
> So I see two things out of this:
>
> 1) A least a note in the man page (or command output as well preferably)
> reminding that autodefrag will to an extent work against dedupe (and it
> may be worth testing the effect of both enabled and if poor preventing
> one whilst the other is there).
Agreed, a manpage (and wiki mount options page) note explaining that
autodefrag can partially undo dedup's work, would be useful.
> 2) Qu is there any intention to be able to do btrfs dedup /path1../pathN
> or is the intention for this work only to enable in-band across an
> entire pool (less any files with the proposed attribute changed to say
> nodedup)?
>
> If there is no intention for 2 then the duperemove packaging is still
> worthwhile to carry out.
Previous discussion has made plain that this is /inline/ dedup, write new
data, and it's compared against existing data (hashes or the like) to see
if part or all of it can be reflinked instead of written separately.
And while not yet part of the patches, a per-file nodedup property is
intended as well, which if set, will mean the file isn't included as
existing data when the comparison of new data against existing data is
made.
As such, there will indeed be no way to specifically dedup one already
existing file against another (unless of course you rewrite it,
triggering the inline dedup), which is precisely where (separate) out-of-
line dedup comes in.
So yes, dupremove, as one option for that separate out-of-line dedup,
will still be worthwhile, with the two functionalities complimenting each
other.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2016-01-15 12:18 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-14 16:13 Query about proposed dedup patches and behaviours James Hogarth
2016-01-14 16:46 ` Austin S. Hemmelgarn
2016-01-14 19:26 ` Liu Bo
2016-01-14 19:41 ` Austin S. Hemmelgarn
2016-01-15 1:47 ` Duncan
2016-01-15 9:33 ` James Hogarth
2016-01-15 12:18 ` Duncan [this message]
2016-01-20 15:33 ` Interjection: autodefrag mount option aye, nae? Al
2016-01-20 15:39 ` Austin S. Hemmelgarn
2016-01-20 18:39 ` Duncan
2016-01-21 20:59 ` Kai Krakow
2016-01-22 12:14 ` Austin S. Hemmelgarn
2016-01-22 19:43 ` Kai Krakow
2016-01-23 22:11 ` Query about proposed dedup patches and behaviours Mark Fasheh
2016-01-24 5:12 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$5ad99$de23f375$ce80d2dd$892b7fc5@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).