linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5
Date: Sun, 26 Jun 2016 02:53:03 +0000 (UTC)	[thread overview]
Message-ID: <pan$60ea0$54eb541$a3b060fc$181835e5@cox.net> (raw)
In-Reply-To: CAJCQCtTXJhgKsmQGUXrppntYj8u6vk3jzP9Q4UMBomJ255QTQw@mail.gmail.com

Chris Murphy posted on Sat, 25 Jun 2016 11:25:05 -0600 as excerpted:

> Wow. So it sees the data strip corruption, uses good parity on disk to
> fix it, writes the fix to disk, recomputes parity for some reason but
> does it wrongly, and then overwrites good parity with bad parity?
> That's fucked. So in other words, if there are any errors fixed up
> during a scrub, you should do a 2nd scrub. The first scrub should make
> sure data is correct, and the 2nd scrub should make sure the bug is
> papered over by computing correct parity and replacing the bad parity.
> 
> I wonder if the same problem happens with balance or if this is just a
> bug in scrub code?

Could this explain why people have been reporting so many raid56 mode 
cases of btrfs replacing a first drive appearing to succeed just fine, 
but then they go to btrfs replace a second drive, and the array crashes 
as if the first replace didn't work correctly after all, resulting in two 
bad devices once the second replace gets under way, of course bringing 
down the array?

If so, then it looks like we have our answer as to what has been going 
wrong that has been so hard to properly trace and thus to bugfix.

Combine that with the raid4 dedicated parity device behavior you're 
seeing if the writes are all exactly 128 MB, with that possibly 
explaining the super-slow replaces, and this thread may have just given 
us answers to both of those until-now-untraceable issues.

Regardless, what's /very/ clear by now is that raid56 mode as it 
currently exists is more or less fatally flawed, and a full scrap and 
rewrite to an entirely different raid56 mode on-disk format may be 
necessary to fix it.

And what's even clearer is that people /really/ shouldn't be using raid56 
mode for anything but testing with throw-away data, at this point.  
Anything else is simply irresponsible.

Does that mean we need to put a "raid56 mode may eat your babies" level 
warning in the manpage and require a --force to either mkfs.btrfs or 
balance to raid56 mode?  Because that's about where I am on it.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  parent reply	other threads:[~2016-06-26  2:53 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-25 12:21 [BUG] Btrfs scrub sometime recalculate wrong parity in raid5 Goffredo Baroncelli
2016-06-25 17:25 ` Chris Murphy
2016-06-25 17:58   ` Chris Murphy
2016-06-25 18:42     ` Goffredo Baroncelli
2016-06-25 22:33       ` Chris Murphy
2016-06-26  9:20         ` Goffredo Baroncelli
2016-06-26 16:43           ` Chris Murphy
2016-06-26  2:53   ` Duncan [this message]
2016-06-26 22:33     ` ronnie sahlberg
2016-06-26 22:38       ` Hugo Mills
2016-06-27  3:22         ` Steven Haigh
2016-06-27  3:21       ` Steven Haigh
2016-06-27 19:47         ` Duncan
2016-06-27  3:50       ` Christoph Anton Mitterer
2016-06-27  4:35         ` Andrei Borzenkov
2016-06-27 16:39           ` Christoph Anton Mitterer
2016-09-21  7:28 ` Qu Wenruo
2016-09-21  7:35   ` Tomasz Torcz
2016-09-21  9:15     ` Qu Wenruo
2016-09-21 15:13       ` Chris Murphy
2016-09-22  2:08         ` Qu Wenruo
2016-09-22  2:44           ` Chris Murphy
2016-09-22  3:00             ` Qu Wenruo
2016-09-22  3:12               ` Chris Murphy
2016-09-22  3:07           ` Christoph Anton Mitterer
2016-09-22  3:18             ` Qu Wenruo
2016-09-21 15:02   ` Chris Murphy
2016-11-04  2:10 ` Qu Wenruo
2016-11-05  7:23   ` Goffredo Baroncelli
  -- strict thread matches above, loose matches on Subject: below --
2016-07-12 21:50 [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two Goffredo Baroncelli
2016-07-16 15:51 ` [BUG] Btrfs scrub sometime recalculate wrong parity in raid5 Jarkko Lavinen
2016-07-17 19:46   ` Jarkko Lavinen
2016-07-18 18:56   ` Goffredo Baroncelli
2016-08-19 13:17 Philip Espunkt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$60ea0$54eb541$a3b060fc$181835e5@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).