linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unrecoverable error on raid10
@ 2016-02-06 21:35 Tom Arild Naess
  2016-02-06 23:32 ` Chris Murphy
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Arild Naess @ 2016-02-06 21:35 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I have quite recently converted my file server to btrfs, and I am in the 
progress of setting up a new backup server with btrfs to be able to 
utilize btrfs send/receive.

FIle server:
> uname -a
Linux main 3.19.0-49-generic #55~14.04.1-Ubuntu SMP Fri Jan 22 11:24:31 
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

> btrfs fi show /store
Label: none  uuid: 2d84ca51-ec42-4fe3-888a-777cad6e1921
     Total devices 4 FS bytes used 4.35TiB
     devid    1 size 3.64TiB used 2.18TiB path /dev/sdc
     devid    2 size 3.64TiB used 2.18TiB path /dev/sdd
     devid    3 size 3.64TiB used 2.18TiB path /dev/sdb
     devid    4 size 3.64TiB used 2.18TiB path /dev/sda

btrfs-progs v4.1 (custom compiled)

> btrfs fi df /store
Data, RAID10: total=4.35TiB, used=4.35TiB
System, RAID10: total=64.00MiB, used=480.00KiB
Metadata, RAID10: total=6.00GiB, used=4.59GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


Backup server:
> uname -a
Linux backup 4.2.5-1-ARCH #1 SMP PREEMPT Tue Oct 27 08:13:28 CET 2015 
x86_64 GNU/Linux

> sudo btrfs fi show /backup
Label: none  uuid: 8825ce78-d620-48f5-9f03-8c4568d3719d
     Total devices 4 FS bytes used 2.46TiB
     devid    1 size 2.73TiB used 1.24TiB path /dev/sdb
     devid    2 size 2.73TiB used 1.24TiB path /dev/sda
     devid    3 size 2.73TiB used 1.24TiB path /dev/sdd
     devid    4 size 2.73TiB used 1.24TiB path /dev/sdc

btrfs-progs v4.3

> btrfs fi df /backup
Data, RAID10: total=2.48TiB, used=2.46TiB
System, RAID10: total=64.00MiB, used=320.00KiB
Metadata, RAID10: total=7.00GiB, used=6.02GiB


Today I balanced and scrubbed the file system on the backup server for 
the first time, since I have run several send/receives containing 
terabytes of data and also delete many sub volumes. The scrub came up 
with one uncorrectable error:

> btrfs scrub start -Bd /backup
scrub device /dev/sdb (id 1) done
     scrub started at Sat Feb  6 04:14:36 2016 and finished after 03:30:41
     total bytes scrubbed: 1.23TiB with 0 errors
scrub device /dev/sda (id 2) done
     scrub started at Sat Feb  6 04:14:36 2016 and finished after 03:27:21
     total bytes scrubbed: 1.23TiB with 1 errors
     error details: csum=1
     corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdd (id 3) done
     scrub started at Sat Feb  6 04:14:36 2016 and finished after 03:27:18
     total bytes scrubbed: 1.23TiB with 1 errors
     error details: csum=1
     corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdc (id 4) done
     scrub started at Sat Feb  6 04:14:36 2016 and finished after 03:27:19
     total bytes scrubbed: 1.23TiB with 0 errors
ERROR: there are uncorrectable errors


This an except from the logs while scrubbing:

Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical 
3531011186688 on dev /dev/sdd, sector 3446072048, root 3811, inode 
127923, offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical 
3531011186688 on dev /dev/sda, sector 3446072048, root 3811, inode 
127923, offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at 
logical 3531011186688 on dev /dev/sda
Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sdd errs: wr 0, rd 0, 
flush 0, corrupt 1, gen 0
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at 
logical 3531011186688 on dev /dev/sdd
Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sda errs: wr 0, rd 0, 
flush 0, corrupt 1, gen 0
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at 
logical 3531011186688 on dev /dev/sda
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at 
logical 3531011186688 on dev /dev/sdd

What's strange is that the failed file have a checksum error in the 
exact same spot on both the mirrored copies, which means the file is 
unrecoverable. This is not what I expect from a raid10! Unfortunately I 
do only have one snapshot left on the backup server, so I don't know if 
any of the other snapshots had the same problem.

The file (called xxxxxxxx for privacy) was created in the the last btrfs 
send/receive, but I did not notice any errors during the transfer.


This an except from the logs while trying to read the file afterwards:

Feb 06 13:28:45 backup kernel: BTRFS warning (device sdb): csum failed 
ino 127923 off 6936002560 csum 284124578 expected csum 1756277981


Anyone seen anything like this on their system? I guess this is a bug, 
but I have not been able to find anything like this with Google.



-- 
Tom Arild Næss


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Unrecoverable error on raid10
  2016-02-06 21:35 Unrecoverable error on raid10 Tom Arild Naess
@ 2016-02-06 23:32 ` Chris Murphy
  2016-02-07  0:40   ` Tom Arild Naess
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2016-02-06 23:32 UTC (permalink / raw)
  To: Tom Arild Naess; +Cc: Btrfs BTRFS

On Sat, Feb 6, 2016 at 2:35 PM, Tom Arild Naess <tanaess@gmail.com> wrote:
> Hello,
>
> I have quite recently converted my file server to btrfs, and I am in the
> progress of setting up a new backup server with btrfs to be able to utilize
> btrfs send/receive.
>
> FIle server:
>>
>> uname -a
>
> Linux main 3.19.0-49-generic #55~14.04.1-Ubuntu SMP Fri Jan 22 11:24:31 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
>
>> btrfs fi show /store
>
> Label: none  uuid: 2d84ca51-ec42-4fe3-888a-777cad6e1921
>     Total devices 4 FS bytes used 4.35TiB
>     devid    1 size 3.64TiB used 2.18TiB path /dev/sdc
>     devid    2 size 3.64TiB used 2.18TiB path /dev/sdd
>     devid    3 size 3.64TiB used 2.18TiB path /dev/sdb
>     devid    4 size 3.64TiB used 2.18TiB path /dev/sda
>
> btrfs-progs v4.1 (custom compiled)
>
>> btrfs fi df /store
>
> Data, RAID10: total=4.35TiB, used=4.35TiB
> System, RAID10: total=64.00MiB, used=480.00KiB
> Metadata, RAID10: total=6.00GiB, used=4.59GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> Backup server:
>>
>> uname -a
>
> Linux backup 4.2.5-1-ARCH #1 SMP PREEMPT Tue Oct 27 08:13:28 CET 2015 x86_64
> GNU/Linux

It's probably unrelated the problem, but I would given the many bug
fixes (including in send/receive) since kernel 3.19, and progs 4.1,
that I'd get both systems using the same kernel and progs version. I
suspect most of upstream's testing before release for send/receive is
with matching kernel and progs versions. My understanding is most of
the send code is in the kernel, and most of the receive code is in
progs (of course, receive also implies writing to a Btrfs volume as
well which would be kernel code too). I really wouldn't intentionally
mix and match versions like this, unless you're trying to find bugs as
a result of mismatching versions.



> This an except from the logs while scrubbing:
>
> Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical
> 3531011186688 on dev /dev/sdd, sector 3446072048, root 3811, inode 127923,
> offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
> Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical
> 3531011186688 on dev /dev/sda, sector 3446072048, root 3811, inode 127923,
> offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
> Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
> logical 3531011186688 on dev /dev/sda
> Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sdd errs: wr 0, rd 0, flush
> 0, corrupt 1, gen 0
> Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
> logical 3531011186688 on dev /dev/sdd
> Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sda errs: wr 0, rd 0, flush
> 0, corrupt 1, gen 0
> Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
> logical 3531011186688 on dev /dev/sda
> Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
> logical 3531011186688 on dev /dev/sdd
>
> What's strange is that the failed file have a checksum error in the exact
> same spot on both the mirrored copies, which means the file is
> unrecoverable.

Note that this is a logical address. The chunk tree will translate
that into separate physical sectors on the actual drives. This kind of
corruption suggests that it's not media, or even storage stack related
like a torn write or anything like that. I'm not sure how it can
happen, someone else who knows the sequence of data checksumming, data
allocation being split into two paths for writes, and metadata writes,
would have to speak up.

Also, the file is still recoverable most likely. You can use btrfs
restore to extract it from the unmounted file system without
complaining about checksum mismatches. It's just that the normal read
path won't hand over data it thinks is corrupt.

>This is not what I expect from a raid10!

Technically what you don't expect from raid10 is any notification that
the file may be corrupt at all. It'd be interesting to extract the
file with restore, and then compare hashes to a known good copy.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Unrecoverable error on raid10
  2016-02-06 23:32 ` Chris Murphy
@ 2016-02-07  0:40   ` Tom Arild Naess
  2016-02-07 23:57     ` Chris Murphy
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Arild Naess @ 2016-02-07  0:40 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On 07. feb. 2016 00:32, Chris Murphy wrote:

> It's probably unrelated the problem, but I would given the many bug
> fixes (including in send/receive) since kernel 3.19, and progs 4.1,
> that I'd get both systems using the same kernel and progs version. I
> suspect most of upstream's testing before release for send/receive is
> with matching kernel and progs versions. My understanding is most of
> the send code is in the kernel, and most of the receive code is in
> progs (of course, receive also implies writing to a Btrfs volume as
> well which would be kernel code too). I really wouldn't intentionally
> mix and match versions like this, unless you're trying to find bugs as
> a result of mismatching versions.
Ok, that sounds like good advice. My thought was to keep the versions 
different to reduce the risk of my data getting nuked on both systems 
because of some obscure bug in one specific version, since btrfs is not 
100% stable yet. Also, it was much easier to create a small read-only 
"NAS OS" to run from a USB stick using Arch Linux than with Ubuntu. 
Guess I'll have to re-evaluate this then.

> Note that this is a logical address. The chunk tree will translate
> that into separate physical sectors on the actual drives. This kind of
> corruption suggests that it's not media, or even storage stack related
> like a torn write or anything like that. I'm not sure how it can
> happen, someone else who knows the sequence of data checksumming, data
> allocation being split into two paths for writes, and metadata writes,
> would have to speak up.
Being a logical or physical address is not the point here at all. The 
file ended up corrupted because somehow both copies of the file had a 
checksum mismatch on the exact same (4k) block of data. This should not 
be possible. For now I can only see two explanations - a weird bug 
somewhere in btrfs or corrupt RAM, because either the data block or the 
checksum must have been corrupted somewhere between the calculation and 
writing to the disk. Next step now is a few rounds of memtest.
> Also, the file is still recoverable most likely. You can use btrfs
> restore to extract it from the unmounted file system without
> complaining about checksum mismatches. It's just that the normal read
> path won't hand over data it thinks is corrupt.
I still haven't learned enough about the capabilities of btrfs, so I 
wasn't aware of this. And since this was the backup server, I replaced 
the file from the main server to see if this would break the incremental 
send/receive (and that worked perfectly, since I kept the inode i guess).

>> This is not what I expect from a raid10!
> Technically what you don't expect from raid10 is any notification that
> the file may be corrupt at all. It'd be interesting to extract the
> file with restore, and then compare hashes to a known good copy.
Well, I really don't expect this to be happening at all. If this is a 
bug in btrfs, it could just as well have struck the same file on the 
server. Without a backup I could not know if it was the data or the 
checksum that was bogus. This being a 16GB edited family film video 
file, I had the original source deleted, so I could very well end up 
with some very annoying chop in the video in a worst case scenario.

Anyway, I just hope that if there is a bug in the code, this could help 
find it.

-- 
Tom Arild Naess


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Unrecoverable error on raid10
  2016-02-07  0:40   ` Tom Arild Naess
@ 2016-02-07 23:57     ` Chris Murphy
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Murphy @ 2016-02-07 23:57 UTC (permalink / raw)
  To: Tom Arild Naess; +Cc: Chris Murphy, Btrfs BTRFS

On Sat, Feb 6, 2016 at 5:40 PM, Tom Arild Naess <tanaess@gmail.com> wrote:

>>> This is not what I expect from a raid10!
>>
>> Technically what you don't expect from raid10 is any notification that
>> the file may be corrupt at all. It'd be interesting to extract the
>> file with restore, and then compare hashes to a known good copy.
>
> Well, I really don't expect this to be happening at all. If this is a bug in
> btrfs, it could just as well have struck the same file on the server.

Seems like a bug in that the results are definitely not expected. But
there isn't enough information to isolate the source. I'm not aware of
any sort of Btrfs bug like this.


>This being a 16GB edited family film video file, I had the
> original source deleted, so I could very well end up with some very annoying
> chop in the video in a worst case scenario.
>
> Anyway, I just hope that if there is a bug in the code, this could help find
> it.

There isn't enough information to know what's even going on in this
particular instance, let alone to deduce if there's a bug in the code.
The entire dmesg would be more helpful than just one snippet, because
invariably the problem occurred before the uncorrectable corruption
messages appear. Whether or not the kernel messages report the
instigator is another question, it may have happened this boot, or
some other boot you no longer even have messages for.



Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-02-07 23:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-06 21:35 Unrecoverable error on raid10 Tom Arild Naess
2016-02-06 23:32 ` Chris Murphy
2016-02-07  0:40   ` Tom Arild Naess
2016-02-07 23:57     ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).