From: Tom Arild Naess <tanaess@gmail.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Unrecoverable error on raid10
Date: Sun, 7 Feb 2016 01:40:38 +0100 [thread overview]
Message-ID: <56B69286.3080205@gmail.com> (raw)
In-Reply-To: <CAJCQCtSOv8rhDi69NyR6J4cs91bc9ahh7382+kHz1knAe5BRdg@mail.gmail.com>
On 07. feb. 2016 00:32, Chris Murphy wrote:
> It's probably unrelated the problem, but I would given the many bug
> fixes (including in send/receive) since kernel 3.19, and progs 4.1,
> that I'd get both systems using the same kernel and progs version. I
> suspect most of upstream's testing before release for send/receive is
> with matching kernel and progs versions. My understanding is most of
> the send code is in the kernel, and most of the receive code is in
> progs (of course, receive also implies writing to a Btrfs volume as
> well which would be kernel code too). I really wouldn't intentionally
> mix and match versions like this, unless you're trying to find bugs as
> a result of mismatching versions.
Ok, that sounds like good advice. My thought was to keep the versions
different to reduce the risk of my data getting nuked on both systems
because of some obscure bug in one specific version, since btrfs is not
100% stable yet. Also, it was much easier to create a small read-only
"NAS OS" to run from a USB stick using Arch Linux than with Ubuntu.
Guess I'll have to re-evaluate this then.
> Note that this is a logical address. The chunk tree will translate
> that into separate physical sectors on the actual drives. This kind of
> corruption suggests that it's not media, or even storage stack related
> like a torn write or anything like that. I'm not sure how it can
> happen, someone else who knows the sequence of data checksumming, data
> allocation being split into two paths for writes, and metadata writes,
> would have to speak up.
Being a logical or physical address is not the point here at all. The
file ended up corrupted because somehow both copies of the file had a
checksum mismatch on the exact same (4k) block of data. This should not
be possible. For now I can only see two explanations - a weird bug
somewhere in btrfs or corrupt RAM, because either the data block or the
checksum must have been corrupted somewhere between the calculation and
writing to the disk. Next step now is a few rounds of memtest.
> Also, the file is still recoverable most likely. You can use btrfs
> restore to extract it from the unmounted file system without
> complaining about checksum mismatches. It's just that the normal read
> path won't hand over data it thinks is corrupt.
I still haven't learned enough about the capabilities of btrfs, so I
wasn't aware of this. And since this was the backup server, I replaced
the file from the main server to see if this would break the incremental
send/receive (and that worked perfectly, since I kept the inode i guess).
>> This is not what I expect from a raid10!
> Technically what you don't expect from raid10 is any notification that
> the file may be corrupt at all. It'd be interesting to extract the
> file with restore, and then compare hashes to a known good copy.
Well, I really don't expect this to be happening at all. If this is a
bug in btrfs, it could just as well have struck the same file on the
server. Without a backup I could not know if it was the data or the
checksum that was bogus. This being a 16GB edited family film video
file, I had the original source deleted, so I could very well end up
with some very annoying chop in the video in a worst case scenario.
Anyway, I just hope that if there is a bug in the code, this could help
find it.
--
Tom Arild Naess
next prev parent reply other threads:[~2016-02-07 0:40 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-06 21:35 Unrecoverable error on raid10 Tom Arild Naess
2016-02-06 23:32 ` Chris Murphy
2016-02-07 0:40 ` Tom Arild Naess [this message]
2016-02-07 23:57 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56B69286.3080205@gmail.com \
--to=tanaess@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).