* Unrecoverable error on raid10
@ 2016-02-06 21:35 Tom Arild Naess
2016-02-06 23:32 ` Chris Murphy
0 siblings, 1 reply; 4+ messages in thread
From: Tom Arild Naess @ 2016-02-06 21:35 UTC (permalink / raw)
To: linux-btrfs
Hello,
I have quite recently converted my file server to btrfs, and I am in the
progress of setting up a new backup server with btrfs to be able to
utilize btrfs send/receive.
FIle server:
> uname -a
Linux main 3.19.0-49-generic #55~14.04.1-Ubuntu SMP Fri Jan 22 11:24:31
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
> btrfs fi show /store
Label: none uuid: 2d84ca51-ec42-4fe3-888a-777cad6e1921
Total devices 4 FS bytes used 4.35TiB
devid 1 size 3.64TiB used 2.18TiB path /dev/sdc
devid 2 size 3.64TiB used 2.18TiB path /dev/sdd
devid 3 size 3.64TiB used 2.18TiB path /dev/sdb
devid 4 size 3.64TiB used 2.18TiB path /dev/sda
btrfs-progs v4.1 (custom compiled)
> btrfs fi df /store
Data, RAID10: total=4.35TiB, used=4.35TiB
System, RAID10: total=64.00MiB, used=480.00KiB
Metadata, RAID10: total=6.00GiB, used=4.59GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
Backup server:
> uname -a
Linux backup 4.2.5-1-ARCH #1 SMP PREEMPT Tue Oct 27 08:13:28 CET 2015
x86_64 GNU/Linux
> sudo btrfs fi show /backup
Label: none uuid: 8825ce78-d620-48f5-9f03-8c4568d3719d
Total devices 4 FS bytes used 2.46TiB
devid 1 size 2.73TiB used 1.24TiB path /dev/sdb
devid 2 size 2.73TiB used 1.24TiB path /dev/sda
devid 3 size 2.73TiB used 1.24TiB path /dev/sdd
devid 4 size 2.73TiB used 1.24TiB path /dev/sdc
btrfs-progs v4.3
> btrfs fi df /backup
Data, RAID10: total=2.48TiB, used=2.46TiB
System, RAID10: total=64.00MiB, used=320.00KiB
Metadata, RAID10: total=7.00GiB, used=6.02GiB
Today I balanced and scrubbed the file system on the backup server for
the first time, since I have run several send/receives containing
terabytes of data and also delete many sub volumes. The scrub came up
with one uncorrectable error:
> btrfs scrub start -Bd /backup
scrub device /dev/sdb (id 1) done
scrub started at Sat Feb 6 04:14:36 2016 and finished after 03:30:41
total bytes scrubbed: 1.23TiB with 0 errors
scrub device /dev/sda (id 2) done
scrub started at Sat Feb 6 04:14:36 2016 and finished after 03:27:21
total bytes scrubbed: 1.23TiB with 1 errors
error details: csum=1
corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdd (id 3) done
scrub started at Sat Feb 6 04:14:36 2016 and finished after 03:27:18
total bytes scrubbed: 1.23TiB with 1 errors
error details: csum=1
corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdc (id 4) done
scrub started at Sat Feb 6 04:14:36 2016 and finished after 03:27:19
total bytes scrubbed: 1.23TiB with 0 errors
ERROR: there are uncorrectable errors
This an except from the logs while scrubbing:
Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical
3531011186688 on dev /dev/sdd, sector 3446072048, root 3811, inode
127923, offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical
3531011186688 on dev /dev/sda, sector 3446072048, root 3811, inode
127923, offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
logical 3531011186688 on dev /dev/sda
Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sdd errs: wr 0, rd 0,
flush 0, corrupt 1, gen 0
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
logical 3531011186688 on dev /dev/sdd
Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sda errs: wr 0, rd 0,
flush 0, corrupt 1, gen 0
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
logical 3531011186688 on dev /dev/sda
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
logical 3531011186688 on dev /dev/sdd
What's strange is that the failed file have a checksum error in the
exact same spot on both the mirrored copies, which means the file is
unrecoverable. This is not what I expect from a raid10! Unfortunately I
do only have one snapshot left on the backup server, so I don't know if
any of the other snapshots had the same problem.
The file (called xxxxxxxx for privacy) was created in the the last btrfs
send/receive, but I did not notice any errors during the transfer.
This an except from the logs while trying to read the file afterwards:
Feb 06 13:28:45 backup kernel: BTRFS warning (device sdb): csum failed
ino 127923 off 6936002560 csum 284124578 expected csum 1756277981
Anyone seen anything like this on their system? I guess this is a bug,
but I have not been able to find anything like this with Google.
--
Tom Arild Næss
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Unrecoverable error on raid10
2016-02-06 21:35 Unrecoverable error on raid10 Tom Arild Naess
@ 2016-02-06 23:32 ` Chris Murphy
2016-02-07 0:40 ` Tom Arild Naess
0 siblings, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2016-02-06 23:32 UTC (permalink / raw)
To: Tom Arild Naess; +Cc: Btrfs BTRFS
On Sat, Feb 6, 2016 at 2:35 PM, Tom Arild Naess <tanaess@gmail.com> wrote:
> Hello,
>
> I have quite recently converted my file server to btrfs, and I am in the
> progress of setting up a new backup server with btrfs to be able to utilize
> btrfs send/receive.
>
> FIle server:
>>
>> uname -a
>
> Linux main 3.19.0-49-generic #55~14.04.1-Ubuntu SMP Fri Jan 22 11:24:31 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
>
>> btrfs fi show /store
>
> Label: none uuid: 2d84ca51-ec42-4fe3-888a-777cad6e1921
> Total devices 4 FS bytes used 4.35TiB
> devid 1 size 3.64TiB used 2.18TiB path /dev/sdc
> devid 2 size 3.64TiB used 2.18TiB path /dev/sdd
> devid 3 size 3.64TiB used 2.18TiB path /dev/sdb
> devid 4 size 3.64TiB used 2.18TiB path /dev/sda
>
> btrfs-progs v4.1 (custom compiled)
>
>> btrfs fi df /store
>
> Data, RAID10: total=4.35TiB, used=4.35TiB
> System, RAID10: total=64.00MiB, used=480.00KiB
> Metadata, RAID10: total=6.00GiB, used=4.59GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> Backup server:
>>
>> uname -a
>
> Linux backup 4.2.5-1-ARCH #1 SMP PREEMPT Tue Oct 27 08:13:28 CET 2015 x86_64
> GNU/Linux
It's probably unrelated the problem, but I would given the many bug
fixes (including in send/receive) since kernel 3.19, and progs 4.1,
that I'd get both systems using the same kernel and progs version. I
suspect most of upstream's testing before release for send/receive is
with matching kernel and progs versions. My understanding is most of
the send code is in the kernel, and most of the receive code is in
progs (of course, receive also implies writing to a Btrfs volume as
well which would be kernel code too). I really wouldn't intentionally
mix and match versions like this, unless you're trying to find bugs as
a result of mismatching versions.
> This an except from the logs while scrubbing:
>
> Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical
> 3531011186688 on dev /dev/sdd, sector 3446072048, root 3811, inode 127923,
> offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
> Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical
> 3531011186688 on dev /dev/sda, sector 3446072048, root 3811, inode 127923,
> offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
> Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
> logical 3531011186688 on dev /dev/sda
> Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sdd errs: wr 0, rd 0, flush
> 0, corrupt 1, gen 0
> Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
> logical 3531011186688 on dev /dev/sdd
> Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sda errs: wr 0, rd 0, flush
> 0, corrupt 1, gen 0
> Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
> logical 3531011186688 on dev /dev/sda
> Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
> logical 3531011186688 on dev /dev/sdd
>
> What's strange is that the failed file have a checksum error in the exact
> same spot on both the mirrored copies, which means the file is
> unrecoverable.
Note that this is a logical address. The chunk tree will translate
that into separate physical sectors on the actual drives. This kind of
corruption suggests that it's not media, or even storage stack related
like a torn write or anything like that. I'm not sure how it can
happen, someone else who knows the sequence of data checksumming, data
allocation being split into two paths for writes, and metadata writes,
would have to speak up.
Also, the file is still recoverable most likely. You can use btrfs
restore to extract it from the unmounted file system without
complaining about checksum mismatches. It's just that the normal read
path won't hand over data it thinks is corrupt.
>This is not what I expect from a raid10!
Technically what you don't expect from raid10 is any notification that
the file may be corrupt at all. It'd be interesting to extract the
file with restore, and then compare hashes to a known good copy.
--
Chris Murphy
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Unrecoverable error on raid10
2016-02-06 23:32 ` Chris Murphy
@ 2016-02-07 0:40 ` Tom Arild Naess
2016-02-07 23:57 ` Chris Murphy
0 siblings, 1 reply; 4+ messages in thread
From: Tom Arild Naess @ 2016-02-07 0:40 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On 07. feb. 2016 00:32, Chris Murphy wrote:
> It's probably unrelated the problem, but I would given the many bug
> fixes (including in send/receive) since kernel 3.19, and progs 4.1,
> that I'd get both systems using the same kernel and progs version. I
> suspect most of upstream's testing before release for send/receive is
> with matching kernel and progs versions. My understanding is most of
> the send code is in the kernel, and most of the receive code is in
> progs (of course, receive also implies writing to a Btrfs volume as
> well which would be kernel code too). I really wouldn't intentionally
> mix and match versions like this, unless you're trying to find bugs as
> a result of mismatching versions.
Ok, that sounds like good advice. My thought was to keep the versions
different to reduce the risk of my data getting nuked on both systems
because of some obscure bug in one specific version, since btrfs is not
100% stable yet. Also, it was much easier to create a small read-only
"NAS OS" to run from a USB stick using Arch Linux than with Ubuntu.
Guess I'll have to re-evaluate this then.
> Note that this is a logical address. The chunk tree will translate
> that into separate physical sectors on the actual drives. This kind of
> corruption suggests that it's not media, or even storage stack related
> like a torn write or anything like that. I'm not sure how it can
> happen, someone else who knows the sequence of data checksumming, data
> allocation being split into two paths for writes, and metadata writes,
> would have to speak up.
Being a logical or physical address is not the point here at all. The
file ended up corrupted because somehow both copies of the file had a
checksum mismatch on the exact same (4k) block of data. This should not
be possible. For now I can only see two explanations - a weird bug
somewhere in btrfs or corrupt RAM, because either the data block or the
checksum must have been corrupted somewhere between the calculation and
writing to the disk. Next step now is a few rounds of memtest.
> Also, the file is still recoverable most likely. You can use btrfs
> restore to extract it from the unmounted file system without
> complaining about checksum mismatches. It's just that the normal read
> path won't hand over data it thinks is corrupt.
I still haven't learned enough about the capabilities of btrfs, so I
wasn't aware of this. And since this was the backup server, I replaced
the file from the main server to see if this would break the incremental
send/receive (and that worked perfectly, since I kept the inode i guess).
>> This is not what I expect from a raid10!
> Technically what you don't expect from raid10 is any notification that
> the file may be corrupt at all. It'd be interesting to extract the
> file with restore, and then compare hashes to a known good copy.
Well, I really don't expect this to be happening at all. If this is a
bug in btrfs, it could just as well have struck the same file on the
server. Without a backup I could not know if it was the data or the
checksum that was bogus. This being a 16GB edited family film video
file, I had the original source deleted, so I could very well end up
with some very annoying chop in the video in a worst case scenario.
Anyway, I just hope that if there is a bug in the code, this could help
find it.
--
Tom Arild Naess
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Unrecoverable error on raid10
2016-02-07 0:40 ` Tom Arild Naess
@ 2016-02-07 23:57 ` Chris Murphy
0 siblings, 0 replies; 4+ messages in thread
From: Chris Murphy @ 2016-02-07 23:57 UTC (permalink / raw)
To: Tom Arild Naess; +Cc: Chris Murphy, Btrfs BTRFS
On Sat, Feb 6, 2016 at 5:40 PM, Tom Arild Naess <tanaess@gmail.com> wrote:
>>> This is not what I expect from a raid10!
>>
>> Technically what you don't expect from raid10 is any notification that
>> the file may be corrupt at all. It'd be interesting to extract the
>> file with restore, and then compare hashes to a known good copy.
>
> Well, I really don't expect this to be happening at all. If this is a bug in
> btrfs, it could just as well have struck the same file on the server.
Seems like a bug in that the results are definitely not expected. But
there isn't enough information to isolate the source. I'm not aware of
any sort of Btrfs bug like this.
>This being a 16GB edited family film video file, I had the
> original source deleted, so I could very well end up with some very annoying
> chop in the video in a worst case scenario.
>
> Anyway, I just hope that if there is a bug in the code, this could help find
> it.
There isn't enough information to know what's even going on in this
particular instance, let alone to deduce if there's a bug in the code.
The entire dmesg would be more helpful than just one snippet, because
invariably the problem occurred before the uncorrectable corruption
messages appear. Whether or not the kernel messages report the
instigator is another question, it may have happened this boot, or
some other boot you no longer even have messages for.
Chris Murphy
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-02-07 23:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-06 21:35 Unrecoverable error on raid10 Tom Arild Naess
2016-02-06 23:32 ` Chris Murphy
2016-02-07 0:40 ` Tom Arild Naess
2016-02-07 23:57 ` Chris Murphy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).