From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f49.google.com ([209.85.215.49]:35811 "EHLO mail-lf0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751154AbcBGAkl (ORCPT ); Sat, 6 Feb 2016 19:40:41 -0500 Received: by mail-lf0-f49.google.com with SMTP id l143so77426273lfe.2 for ; Sat, 06 Feb 2016 16:40:40 -0800 (PST) Subject: Re: Unrecoverable error on raid10 To: Chris Murphy References: <56B66704.5070505@gmail.com> Cc: Btrfs BTRFS From: Tom Arild Naess Message-ID: <56B69286.3080205@gmail.com> Date: Sun, 7 Feb 2016 01:40:38 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 07. feb. 2016 00:32, Chris Murphy wrote: > It's probably unrelated the problem, but I would given the many bug > fixes (including in send/receive) since kernel 3.19, and progs 4.1, > that I'd get both systems using the same kernel and progs version. I > suspect most of upstream's testing before release for send/receive is > with matching kernel and progs versions. My understanding is most of > the send code is in the kernel, and most of the receive code is in > progs (of course, receive also implies writing to a Btrfs volume as > well which would be kernel code too). I really wouldn't intentionally > mix and match versions like this, unless you're trying to find bugs as > a result of mismatching versions. Ok, that sounds like good advice. My thought was to keep the versions different to reduce the risk of my data getting nuked on both systems because of some obscure bug in one specific version, since btrfs is not 100% stable yet. Also, it was much easier to create a small read-only "NAS OS" to run from a USB stick using Arch Linux than with Ubuntu. Guess I'll have to re-evaluate this then. > Note that this is a logical address. The chunk tree will translate > that into separate physical sectors on the actual drives. This kind of > corruption suggests that it's not media, or even storage stack related > like a torn write or anything like that. I'm not sure how it can > happen, someone else who knows the sequence of data checksumming, data > allocation being split into two paths for writes, and metadata writes, > would have to speak up. Being a logical or physical address is not the point here at all. The file ended up corrupted because somehow both copies of the file had a checksum mismatch on the exact same (4k) block of data. This should not be possible. For now I can only see two explanations - a weird bug somewhere in btrfs or corrupt RAM, because either the data block or the checksum must have been corrupted somewhere between the calculation and writing to the disk. Next step now is a few rounds of memtest. > Also, the file is still recoverable most likely. You can use btrfs > restore to extract it from the unmounted file system without > complaining about checksum mismatches. It's just that the normal read > path won't hand over data it thinks is corrupt. I still haven't learned enough about the capabilities of btrfs, so I wasn't aware of this. And since this was the backup server, I replaced the file from the main server to see if this would break the incremental send/receive (and that worked perfectly, since I kept the inode i guess). >> This is not what I expect from a raid10! > Technically what you don't expect from raid10 is any notification that > the file may be corrupt at all. It'd be interesting to extract the > file with restore, and then compare hashes to a known good copy. Well, I really don't expect this to be happening at all. If this is a bug in btrfs, it could just as well have struck the same file on the server. Without a backup I could not know if it was the data or the checksum that was bogus. This being a 16GB edited family film video file, I had the original source deleted, so I could very well end up with some very annoying chop in the video in a worst case scenario. Anyway, I just hope that if there is a bug in the code, this could help find it. -- Tom Arild Naess