linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Fasheh <mfasheh@suse.de>
To: James Hogarth <james.hogarth@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Query about proposed dedup patches and behaviours
Date: Sat, 23 Jan 2016 14:11:16 -0800	[thread overview]
Message-ID: <20160123221116.GD24672@wotan.suse.de> (raw)
In-Reply-To: <CAGkb5vfzcOmj1Js=pBR6=-Uad7jY2bJy5ht3F-m-1TEOBjwCYA@mail.gmail.com>

On Thu, Jan 14, 2016 at 04:13:00PM +0000, James Hogarth wrote:
> The duperemove[1] tool is in the process for packaging for Fedora at
> present but I was wondering what future this may have with the 4.5
> dedup patches being proposed.
> 
> WIll the btrfs command have the ability to out-of-line dedup files
> similar to duperemove (thus negating the need for it) or will this
> only control in-line dedup with a tool like duperemove still being
> required for periodic only (or restricted path) dedup?

Similar to dupremove, I doubt it. Duperemove is about 12,000 lines at this
point and very little of it is duplicated from btrfs-progs. Much of it is
concerned with efficiently scanning files, making extents from duplicated
blocks, managing a sqlite db, etc. Things that the btrfs command doesn't
need to handle.

Also Ocfs2 should be able to support extent-same at some point and
duperemove will want to run on that FS as well.

We could always have a small wrapper to the ioctl but again the difference
between 'hey dedupe a couple of files' and 'scan terabytes of data to
dedupe' is pretty big if you care about getting it done efficiently.


> To avoid memory usage bloat if the btrfs command can order dedup  of X
> files on the path correctly can it be passed a path to carry the hash
> map in some form (similar to how dupeemeove can use sqlite for this)
> or is this another use case for the external tool?

I'm not totally clear on what you're asking here. Do you want the duperemove
hashes passed into the kernel? There's no point since we just use that map
to call our ioctl...


> Finally what's the present situation with regards to defragmentation
> and deduplication? Is it safe to turn on autodefrag now when using
> snapshots and duperemove? What should the behaviour be with the
> proposed 4.5 dedup patches if both inline dedup and autodefrag are
> enabled as mount options?

Was there ever a reason it was unsafe to do dedupe + autodefrag? To my
knowledge this should be fine.

Thanks,
	--Mark

--
Mark Fasheh

  parent reply	other threads:[~2016-01-23 22:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-14 16:13 Query about proposed dedup patches and behaviours James Hogarth
2016-01-14 16:46 ` Austin S. Hemmelgarn
2016-01-14 19:26   ` Liu Bo
2016-01-14 19:41     ` Austin S. Hemmelgarn
2016-01-15  1:47       ` Duncan
2016-01-15  9:33         ` James Hogarth
2016-01-15 12:18           ` Duncan
2016-01-20 15:33       ` Interjection: autodefrag mount option aye, nae? Al
2016-01-20 15:39         ` Austin S. Hemmelgarn
2016-01-20 18:39           ` Duncan
2016-01-21 20:59           ` Kai Krakow
2016-01-22 12:14             ` Austin S. Hemmelgarn
2016-01-22 19:43               ` Kai Krakow
2016-01-23 22:11 ` Mark Fasheh [this message]
2016-01-24  5:12   ` Query about proposed dedup patches and behaviours Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160123221116.GD24672@wotan.suse.de \
    --to=mfasheh@suse.de \
    --cc=james.hogarth@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).