All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kai Krakow <hurikhan77@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs dedup - available or experimental? Or yet to be?
Date: Sun, 29 Mar 2015 13:43:08 +0200	[thread overview]
Message-ID: <d35lub-tec.ln1@hurikhan77.spdns.de> (raw)
In-Reply-To: CAGfcS_k=__EfJssYt9Ra77uAtW6PfqV1fqAX03VfJiPi_T5TBQ@mail.gmail.com

Rich Freeman <r-btrfs@thefreemanclan.net> schrieb:

> On Thu, Mar 26, 2015 at 8:07 PM, Martin <m_btrfs@ml1.co.uk> wrote:
>>
>> Anyone with any comments on how well duperemove performs for TB-sized
>> volumes?
> 
> Took many hours but less than a day for a few TB - I'm not sure
> whether it is smart enough to take less time on subsequent scans like
> bedup.
> 
>>
>> Does it work across subvolumes? (Presumably not...)
> 
> As far as I can tell, yes.  Unless you pass a command-line option it
> crosses filesystem boundaries and even scans non-btrfs filesystems
> (like /proc, /dev, etc).  Obviously you'll want to avoid that since it
> only wastes time and I can just imagine it trying to hash kcore and
> such.
> 
> Other than being less-than-ideal intelligence-wise, it seemed
> effective.  I can live with that in an early release like this.

This is mainly in there to support deduping across different subvolumes 
within the same device pool. So I think the idea was neither less-than-
ideal, nor unintelligent, and it has nothing to do with performance.

But your warning is still valid: One should take care not to "dedupe" 
special filesystems (but that is the same with every other tool out there, 
like rsync, cp, essentially everything that supports recursion), nor is it 
very effective for the deduplication process to cross a boundary to a non-
btrfs device - for one or more exceptions: You may want duperemove to write 
hashes for a non-btrfs device and use the result for other purposes outside 
of duperemoves scope, or you are nesting btrfs into non-btrfs into btrfs 
mounts, or...

Concluding that: duperemove should probably not try to become smart about 
filesystem boundaries. It should either cross them or not as it is now - the 
option is left to the user (as is the task to supply proper cmdline 
arguments with that).

With the planned performance improvements, I'm guessing the best way will 
become mounting the root subvolume (subvolid 0) and letting duperemove work 
on that as a whole - including crossing all fs boundaries.

-- 
Replies to list only preferred.


  reply	other threads:[~2015-03-29 11:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-23 23:10 btrfs dedup - available or experimental? Or yet to be? Martin
2015-03-23 23:22 ` Hugo Mills
2015-03-25  1:30   ` Rich Freeman
2015-03-27  0:07     ` Martin
2015-03-27  0:30       ` Rich Freeman
2015-03-29 11:43         ` Kai Krakow [this message]
2015-03-29 12:31           ` Rich Freeman
2015-03-29 14:44             ` Kai Krakow
2015-03-29 17:54               ` Christoph Anton Mitterer
2015-03-29 17:51           ` Christoph Anton Mitterer
2015-03-27 20:51       ` Mark Fasheh
2015-03-27 20:44     ` Mark Fasheh
2015-05-13 16:23   ` Learner Study
2015-05-13 21:08     ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d35lub-tec.ln1@hurikhan77.spdns.de \
    --to=hurikhan77@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.