linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Fasheh <mfasheh@suse.de>
To: dsterba@suse.cz, Qu Wenruo <quwenruo@cn.fujitsu.com>,
	Chris Mason <clm@fb.com>, Josef Bacik <jbacik@fb.com>,
	btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: About in-band dedupe for v4.7
Date: Thu, 12 May 2016 13:54:26 -0700	[thread overview]
Message-ID: <20160512205426.GL7633@wotan.suse.de> (raw)
In-Reply-To: <20160511173659.GI29353@suse.cz>

On Wed, May 11, 2016 at 07:36:59PM +0200, David Sterba wrote:
> On Tue, May 10, 2016 at 07:52:11PM -0700, Mark Fasheh wrote:
> > Taking your history with qgroups out of this btw, my opinion does not
> > change.
> > 
> > With respect to in-memory only dedupe, it is my honest opinion that such a
> > limited feature is not worth the extra maintenance work. In particular
> > there's about 800 lines of code in the userspace patches which I'm sure
> > you'd want merged, because how could we test this then?
> 
> I like the in-memory dedup backend. It's lightweight, only a heuristic,
> does not need any IO or persistent storage. OTOH I consider it a subpart
> of the in-band deduplication that does all the persistency etc. So I
> treat the ioctl interface from a broader aspect.

Those are all nice qualities, but what do they all get us?

For example, my 'large' duperemove test involves about 750 gigabytes of
general purpose data - quite literally /home off my workstation.

After the run I'm usually seeing between 65-75 gigabytes saved for a total
of only 10% duplicated data. I would expect this to be fairly 'average' -
/home on my machine has the usual stuff - documents, source code, media,
etc.

So if you were writing your whole fs out you could expect about the same
from inline dedupe - 10%-ish. Let's be generous and go with that number
though as a general 'this is how much dedupe we get'.

What the memory backend is doing then is providing a cache of sha256/block
calculations. This cache is very expensive to fill, and every written block
must go through it. On top of that, the cache does not persist between
mounts, and has items regularly removed from it when we run low on memory.
All of this will drive down the amount of duplicated data we can find.

So our best case savings is probably way below 10% - let's be _really_ nice
and say 5%.

Now ask yourself the question - would you accept a write cache which is
expensive to fill and would only have a hit rate of less than 5%?

Oh and there's 800 lines of userspace we'd merge to manage this cache too,
kernel ioctls which would have to be finalized, etc.


> A usecase I find interesting is to keep the in-memory dedup cache and
> then flush it to disk on demand, compared to automatically synced dedup
> (eg. at commit time).

What's the benefit here? We're still going to be hashing blocks on the way
in, and if we're not deduping them at write time then we're just have to
remove the extents and dedupe them later.


> > A couple examples sore points in my review so far:
> > 
> > - Internally you're using a mutex (instead of a spinlock) to lock out queries
> >  to the in-memory hash, which I can see becoming a performance problem in the
> >  write path.
> > 
> > - Also, we're doing SHA256 in the write path which I expect will
> >  slow it down even more dramatically. Given that all the work done gets
> >  thrown out every time we fill the hash (or remount), I just don't see much
> >  benefit to the user with this.
> 
> I had some ideas to use faster hashes and do sha256 when it's going to
> be stored on disk, but there were some concerns. The objection against
> speed and performance hit at write time is valid. But we'll need to
> verify that in real performance tests, which haven't happend yet up to
> my knowledge.

This is the type of thing that IMHO absolutely must be provided with each
code drop of the feature. Dedupe is nice but _nobody_ will use it if it's
slow. I know this from experience. I personally feel that btrfs has had
enough of 'cute' and 'almost working' features. If we want inline dedupe we
should do it correctly and with the right metrics from the beginning.


This is slightly unrelated to our discussion but my other unsolicited
opinion: As a kernel developer and maintainer of a file system for well over
a decade I will say that balancing the number of out of tree patches is
necessary but we should never be accepting of large features just because
'they've been out for a long time'. Again I mention this because other parts
of the discussion felt like they were going in that direction.

Thanks,
	--Mark

--
Mark Fasheh

  reply	other threads:[~2016-05-12 20:54 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-10  7:19 About in-band dedupe for v4.7 Qu Wenruo
2016-05-10 22:11 ` Mark Fasheh
2016-05-11  1:03   ` Qu Wenruo
2016-05-11  2:52     ` Mark Fasheh
2016-05-11  9:14       ` Qu Wenruo
2016-05-11 17:36       ` David Sterba
2016-05-12 20:54         ` Mark Fasheh [this message]
2016-05-13  7:14           ` Duncan
2016-05-13 12:14           ` Austin S. Hemmelgarn
2016-05-13 14:25             ` Qu Wenruo
2016-05-13 16:37             ` Zygo Blaxell
2016-05-16 15:26           ` David Sterba
2016-05-13  6:01         ` Zygo Blaxell
2016-05-11 16:56   ` David Sterba
2016-05-13  3:13   ` Wang Shilong
2016-05-13  3:44     ` Qu Wenruo
2016-05-13  6:21       ` Zygo Blaxell
2016-05-16 16:40     ` David Sterba
2016-05-11  0:37 ` Chris Mason
2016-05-11  1:40   ` Qu Wenruo
2016-05-11  2:26     ` Satoru Takeuchi
2016-05-11  4:22     ` Mark Fasheh
2016-05-11 16:39 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160512205426.GL7633@wotan.suse.de \
    --to=mfasheh@suse.de \
    --cc=clm@fb.com \
    --cc=dsterba@suse.cz \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).