linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: linux-btrfs@vger.kernel.org
Subject: Re: Adventures in btrfs raid5 disk recovery - update
Date: Wed, 22 Jun 2016 00:06:13 -0400	[thread overview]
Message-ID: <20160622040613.GP15597@hungrycats.org> (raw)
In-Reply-To: <20160620034427.GK15597@hungrycats.org>

[-- Attachment #1: Type: text/plain, Size: 3569 bytes --]

TL;DR:

	Kernel 4.6.2 causes a world of pain.  Use 4.5.7 instead.

	'btrfs dev stat' doesn't seem to count "csum failed"
	(i.e. corruption) errors in compressed extents.



On Sun, Jun 19, 2016 at 11:44:27PM -0400, Zygo Blaxell wrote:
> Not so long ago, I had a disk fail in a btrfs filesystem with raid1
> metadata and raid5 data.  I mounted the filesystem readonly, replaced
> the failing disk, and attempted to recover by adding the new disk and
> deleting the missing disk.

> I'm currently using kernel 4.6.2

That turned out to be a mistake.  4.6.2 has some severe problems.

Over the past few days I've been upgrading other machines from 4.5.7
to 4.6.2.  This morning I saw the aggregate data coming back from
those machines, and it's all bad:  stalls in snapshot delete, balance,
and sync; some machines just lock up with no console messages; a lot of
watchdog timeouts.  None of the machines could get to an uptime over 26
hours and still be in a usable state.

I switched to 4.5.7 and the crashes, balance/delete hangs, and some of
the data corruption modes stopped.

> I'm
> getting EIO randomly all over the filesystem, including in files that were
> written entirely _after_ the disk failure.

There were actually four distinct corruption modes happening:

1.  There are some number (16500 so far) "normal" corrupt blocks:  read
repeatably returns EIO, they show up in scrub with sane log messages,
and replacing the files that contain these blocks makes them go away.
These blocks appear to be contained in extents that coincide with the
date of the disk failure.  Interestingly, no matter how many times I
read these blocks, I get no increase in the 'btrfs dev stat' numbers
even though I get kernel csum failure messages.  That looks like a bug.

2.  When attempting to replace corrupted files with rsync, I had used
'rsync --inplace'.  This caused bad blocks to be overwritten within
extents, but does not necessarily replace the _entire_ extent containing a
bad block.  This creates corrupt blocks that show up in scrub, balance,
and device delete, but not when reading files.  It also updates the
timestamps so a file with old corruption looks "new" to an insufficiently
sophisticated analysis tool.

3.  Files were corrupted while they were written and accessed via NFS.
This created files with correct btrfs checksums, but garbage contents.
This would show up as failures during 'git gc' or rsync checksum
mismatches.  During one of the many VM crashes, any writes in progress at
the time of the crash were lost.  This effectively rewound the filesystem
several minutes each time as btrfs reverts to the previous committed
tree on the next mount.  4.6.2's hanging issues made this worse by
delaying btrfs commits indefinitely.  The NFS clients were completely
unaware of this, so when the VM rebooted, files ended up with holes,
or would just disappear while in use.

4.  After a VM crash and the filesystem reverted to the previous
committed tree, files with bad blocks that had been repaired through
the NFS server or with rsync would be "unrepaired" (i.e. the filesystem
would revert back to the original corrupted blocks after the mount).

Combinations of these could occur as well for extra confusion, and some
corrupted blocks are contained in many files thanks to dedup.

With kernel 4.5.7 there have been no lockups during commit and no VM
crashes, so I haven't seen any of corruption modes 3 and 4 since 4.5.7.

Balance is now running normally to move the remaining data off the
missing disk.  ETA is 558 hours.  See you in mid-July!  ;)


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

      parent reply	other threads:[~2016-06-22  4:06 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-20  3:44 Adventures in btrfs raid5 disk recovery Zygo Blaxell
2016-06-20 18:13 ` Roman Mamedov
2016-06-20 19:11   ` Zygo Blaxell
2016-06-20 19:30     ` Chris Murphy
2016-06-20 20:40       ` Zygo Blaxell
2016-06-20 21:27         ` Chris Murphy
2016-06-21  1:55           ` Zygo Blaxell
2016-06-21  3:53             ` Zygo Blaxell
2016-06-22 17:14             ` Chris Murphy
2016-06-22 20:35               ` Zygo Blaxell
2016-06-23 19:32                 ` Goffredo Baroncelli
2016-06-24  0:26                   ` Chris Murphy
2016-06-24  1:47                     ` Zygo Blaxell
2016-06-24  4:02                       ` Andrei Borzenkov
2016-06-24  8:50                         ` Hugo Mills
2016-06-24  9:52                           ` Andrei Borzenkov
2016-06-24 10:16                             ` Hugo Mills
2016-06-24 10:19                               ` Andrei Borzenkov
2016-06-24 10:59                                 ` Hugo Mills
2016-06-24 11:36                                   ` Austin S. Hemmelgarn
2016-06-24 17:40                               ` Chris Murphy
2016-06-24 18:06                                 ` Zygo Blaxell
2016-06-24 17:06                             ` Chris Murphy
2016-06-24 17:21                               ` Andrei Borzenkov
2016-06-24 17:52                                 ` Chris Murphy
2016-06-24 18:19                                   ` Austin S. Hemmelgarn
2016-06-25 16:44                                     ` Chris Murphy
2016-06-25 21:52                                       ` Chris Murphy
2016-06-26  7:54                                         ` Andrei Borzenkov
2016-06-26 15:03                                           ` Duncan
2016-06-26 19:30                                           ` Chris Murphy
2016-06-26 19:52                                             ` Zygo Blaxell
2016-06-27 11:21                                       ` Austin S. Hemmelgarn
2016-06-27 16:17                                         ` Chris Murphy
2016-06-27 20:54                                           ` Chris Murphy
2016-06-27 21:02                                           ` Henk Slager
2016-06-27 21:57                                           ` Zygo Blaxell
2016-06-27 22:30                                             ` Chris Murphy
2016-06-28  1:52                                               ` Zygo Blaxell
2016-06-28  2:39                                                 ` Chris Murphy
2016-06-28  3:17                                                   ` Zygo Blaxell
2016-06-28 11:23                                                     ` Austin S. Hemmelgarn
2016-06-28 12:05                                             ` Austin S. Hemmelgarn
2016-06-28 12:14                                               ` Steven Haigh
2016-06-28 12:25                                                 ` Austin S. Hemmelgarn
2016-06-28 16:40                                                   ` Steven Haigh
2016-06-28 18:01                                                     ` Chris Murphy
2016-06-28 18:17                                                       ` Steven Haigh
2016-07-05 23:05                                                         ` Chris Murphy
2016-07-06 11:51                                                           ` Austin S. Hemmelgarn
2016-07-06 16:43                                                             ` Chris Murphy
2016-07-06 17:18                                                               ` Austin S. Hemmelgarn
2016-07-06 18:45                                                                 ` Chris Murphy
2016-07-06 19:15                                                                   ` Austin S. Hemmelgarn
2016-07-06 21:01                                                                     ` Chris Murphy
2016-06-24 16:52                           ` Chris Murphy
2016-06-24 16:56                             ` Hugo Mills
2016-06-24 16:39                         ` Zygo Blaxell
2016-06-24  1:36                   ` Zygo Blaxell
2016-06-23 23:37               ` Chris Murphy
2016-06-24  2:07                 ` Zygo Blaxell
2016-06-24  5:20                   ` Chris Murphy
2016-06-24 10:16                     ` Andrei Borzenkov
2016-06-24 17:33                       ` Chris Murphy
2016-06-24 11:24                     ` Austin S. Hemmelgarn
2016-06-24 16:32                     ` Zygo Blaxell
2016-06-24  2:17                 ` Zygo Blaxell
2016-06-22  4:06 ` Zygo Blaxell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160622040613.GP15597@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).