linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: James Pharaoh <james@wellbehavedsoftware.com>
Cc: Mark Fasheh <mfasheh@versity.com>, linux-btrfs@vger.kernel.org
Subject: Re: Announcing btrfs-dedupe
Date: Mon, 14 Nov 2016 13:07:15 -0500	[thread overview]
Message-ID: <20161114180714.GF21290@hungrycats.org> (raw)
In-Reply-To: <345b3aa4-6644-6730-09dd-549d320f58cb@wellbehavedsoftware.com>

[-- Attachment #1: Type: text/plain, Size: 4115 bytes --]

On Mon, Nov 07, 2016 at 07:49:51PM +0100, James Pharaoh wrote:
> Annoyingly I can't find this now, but I definitely remember reading someone,
> apparently someone knowledgable, claim that the latest version of the kernel
> which I was using at the time, still suffered from issues regarding the
> dedupe code.

> This was a while ago, and I would be very pleased to hear that there is high
> confidence in the current implementation! I'll post a link if I manage to
> find the comments.

I've been running the btrfs dedup ioctl 7 times per second on average
over 42TB of test data for most of a year (and at a lower rate for two
years).  I have not found any data corruptions due to _dedup_.  I did find
three distinct data corruption kernel bugs unrelated to dedup, and two
test machines with bad RAM, so I'm pretty sure my corruption detection
is working.

That said, I wouldn't run dedup on a kernel older than 4.4.  LTS kernels
might be OK too, but only if they're up to date with backported btrfs
fixes.

Kernels older than 3.13 lack the FILE_EXTENT_SAME ioctl and can
only deduplicate static data (i.e. data you are certain is not being
concurrently modified).  Before 3.12 there are so many bugs you might
as well not bother.

Older kernels are bad for dedup because of non-corruption reasons.
Between 3.13 and 4.4, the following bugs were fixed:

	- false-negative capability checks (e.g. same-inode, EOF extent)
	reduce dedup efficiency

	- ctime updates (older versions would update ctime when a file was
	deduped) mess with incremental backup tools, build systems, etc.

	- kernel memory leaks (self-explanatory)

	- multiple kernel hang/panic bugs (e.g. a deadlock if two threads
	try to read the same extent at the same time, and at least one
	of those threads is dedup; and there was some race condition
	leading to invalid memory access on dedup's comparison reads)
	which won't eat your data, but they might ruin your day anyway.

There is also a still-unresolved problem where the filesystem CPU usage
rises exponentially for some operations depending on the number of shared
references to an extent.  Files which contain blocks with more than a few
thousand shared references can trigger this problem.  A file over 1TB can
keep the kernel busy at 100% CPU for over 40 minutes at a time.

There might also be a correlation between delalloc data and hangs in
extent-same, but I have NOT been able to confirm this.  All I know
at this point is that doing a fsync() on the source FD just before
doing the extent-same ioctl dramatically reduces filesystem hang rates:
several weeks between hangs (or no hangs at all) with fsync, vs. 18 hours
or less without.

> James
> 
> On 07/11/16 18:59, Mark Fasheh wrote:
> >Hi James,
> >
> >Re the following text on your project page:
> >
> >"IMPORTANT CAVEAT — I have read that there are race and/or error
> >conditions which can cause filesystem corruption in the kernel
> >implementation of the deduplication ioctl."
> >
> >Can you expound on that? I'm not aware of any bugs right now but if
> >there is any it'd absolutely be worth having that info on the btrfs
> >list.
> >
> >Thanks,
> >    --Mark
> >
> >
> >On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh
> ><james@wellbehavedsoftware.com> wrote:
> >>Hi all,
> >>
> >>I'm pleased to announce my btrfs deduplication utility, written in Rust.
> >>This operates on whole files, is fast, and I believe complements the
> >>existing utilities (duperemove, bedup), which exist currently.
> >>
> >>Please visit the homepage for more information:
> >>
> >>http://btrfs-dedupe.com
> >>
> >>James Pharaoh
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

  parent reply	other threads:[~2016-11-14 18:07 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-06 13:30 Announcing btrfs-dedupe James Pharaoh
2016-11-07 14:02 ` David Sterba
2016-11-07 17:48   ` Mark Fasheh
2016-11-07 20:54     ` Adam Borowski
2016-11-08  2:17       ` Darrick J. Wong
2016-11-08 18:59         ` Mark Fasheh
2016-11-08 19:47           ` Darrick J. Wong
2016-11-09 15:02       ` David Sterba
2016-11-08  2:40   ` Christoph Anton Mitterer
2016-11-08  6:11     ` James Pharaoh
2016-11-08 13:26     ` Austin S. Hemmelgarn
2016-11-08 16:57       ` Darrick J. Wong
2016-11-08 17:04         ` Austin S. Hemmelgarn
2016-11-08 18:49     ` Mark Fasheh
2016-11-07 17:59 ` Mark Fasheh
2016-11-07 18:49   ` James Pharaoh
2016-11-07 18:53     ` James Pharaoh
2016-11-14 18:07     ` Zygo Blaxell [this message]
2016-11-14 18:22       ` James Pharaoh
2016-11-14 18:39         ` Austin S. Hemmelgarn
2016-11-14 19:51           ` Zygo Blaxell
2016-11-14 19:56             ` Austin S. Hemmelgarn
2016-11-14 21:10               ` Zygo Blaxell
2016-11-15 12:26                 ` Austin S. Hemmelgarn
2016-11-15 17:52                   ` Zygo Blaxell
2016-11-16 22:24                     ` Niccolò Belli
2016-11-17  3:01                       ` Zygo Blaxell
2016-11-18 10:36                         ` Niccolò Belli
2016-11-14 20:07             ` James Pharaoh
2016-11-14 21:22               ` Zygo Blaxell
2016-11-14 18:43         ` Zygo Blaxell
2016-11-08 11:06 ` Niccolò Belli
2016-11-08 11:38   ` James Pharaoh
2016-11-08 16:57     ` Niccolò Belli
2016-11-08 16:58       ` James Pharaoh
2016-11-08 17:08         ` Niccolò Belli
2016-11-14 18:27   ` Zygo Blaxell
2016-11-08 22:36 ` Saint Germain
2016-11-09 11:24   ` Niccolò Belli
2016-11-09 12:47     ` Saint Germain
2016-11-13 12:45   ` James Pharaoh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161114180714.GF21290@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=james@wellbehavedsoftware.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mfasheh@versity.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).