Unrecoverable error on raid10

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tom Arild Naess <tanaess@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Unrecoverable error on raid10
Date: Sat, 6 Feb 2016 22:35:00 +0100	[thread overview]
Message-ID: <56B66704.5070505@gmail.com> (raw)

Hello,

I have quite recently converted my file server to btrfs, and I am in the 
progress of setting up a new backup server with btrfs to be able to 
utilize btrfs send/receive.

FIle server:
> uname -a
Linux main 3.19.0-49-generic #55~14.04.1-Ubuntu SMP Fri Jan 22 11:24:31 
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

> btrfs fi show /store
Label: none  uuid: 2d84ca51-ec42-4fe3-888a-777cad6e1921
     Total devices 4 FS bytes used 4.35TiB
     devid    1 size 3.64TiB used 2.18TiB path /dev/sdc
     devid    2 size 3.64TiB used 2.18TiB path /dev/sdd
     devid    3 size 3.64TiB used 2.18TiB path /dev/sdb
     devid    4 size 3.64TiB used 2.18TiB path /dev/sda

btrfs-progs v4.1 (custom compiled)

> btrfs fi df /store
Data, RAID10: total=4.35TiB, used=4.35TiB
System, RAID10: total=64.00MiB, used=480.00KiB
Metadata, RAID10: total=6.00GiB, used=4.59GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


Backup server:
> uname -a
Linux backup 4.2.5-1-ARCH #1 SMP PREEMPT Tue Oct 27 08:13:28 CET 2015 
x86_64 GNU/Linux

> sudo btrfs fi show /backup
Label: none  uuid: 8825ce78-d620-48f5-9f03-8c4568d3719d
     Total devices 4 FS bytes used 2.46TiB
     devid    1 size 2.73TiB used 1.24TiB path /dev/sdb
     devid    2 size 2.73TiB used 1.24TiB path /dev/sda
     devid    3 size 2.73TiB used 1.24TiB path /dev/sdd
     devid    4 size 2.73TiB used 1.24TiB path /dev/sdc

btrfs-progs v4.3

> btrfs fi df /backup
Data, RAID10: total=2.48TiB, used=2.46TiB
System, RAID10: total=64.00MiB, used=320.00KiB
Metadata, RAID10: total=7.00GiB, used=6.02GiB


Today I balanced and scrubbed the file system on the backup server for 
the first time, since I have run several send/receives containing 
terabytes of data and also delete many sub volumes. The scrub came up 
with one uncorrectable error:

> btrfs scrub start -Bd /backup
scrub device /dev/sdb (id 1) done
     scrub started at Sat Feb  6 04:14:36 2016 and finished after 03:30:41
     total bytes scrubbed: 1.23TiB with 0 errors
scrub device /dev/sda (id 2) done
     scrub started at Sat Feb  6 04:14:36 2016 and finished after 03:27:21
     total bytes scrubbed: 1.23TiB with 1 errors
     error details: csum=1
     corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdd (id 3) done
     scrub started at Sat Feb  6 04:14:36 2016 and finished after 03:27:18
     total bytes scrubbed: 1.23TiB with 1 errors
     error details: csum=1
     corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdc (id 4) done
     scrub started at Sat Feb  6 04:14:36 2016 and finished after 03:27:19
     total bytes scrubbed: 1.23TiB with 0 errors
ERROR: there are uncorrectable errors


This an except from the logs while scrubbing:

Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical 
3531011186688 on dev /dev/sdd, sector 3446072048, root 3811, inode 
127923, offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical 
3531011186688 on dev /dev/sda, sector 3446072048, root 3811, inode 
127923, offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at 
logical 3531011186688 on dev /dev/sda
Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sdd errs: wr 0, rd 0, 
flush 0, corrupt 1, gen 0
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at 
logical 3531011186688 on dev /dev/sdd
Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sda errs: wr 0, rd 0, 
flush 0, corrupt 1, gen 0
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at 
logical 3531011186688 on dev /dev/sda
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at 
logical 3531011186688 on dev /dev/sdd

What's strange is that the failed file have a checksum error in the 
exact same spot on both the mirrored copies, which means the file is 
unrecoverable. This is not what I expect from a raid10! Unfortunately I 
do only have one snapshot left on the backup server, so I don't know if 
any of the other snapshots had the same problem.

The file (called xxxxxxxx for privacy) was created in the the last btrfs 
send/receive, but I did not notice any errors during the transfer.


This an except from the logs while trying to read the file afterwards:

Feb 06 13:28:45 backup kernel: BTRFS warning (device sdb): csum failed 
ino 127923 off 6936002560 csum 284124578 expected csum 1756277981


Anyone seen anything like this on their system? I guess this is a bug, 
but I have not been able to find anything like this with Google.



-- 
Tom Arild Næss

next             reply	other threads:[~2016-02-06 21:35 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-06 21:35 Tom Arild Naess [this message]
2016-02-06 23:32 ` Unrecoverable error on raid10 Chris Murphy
2016-02-07  0:40   ` Tom Arild Naess
2016-02-07 23:57     ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B66704.5070505@gmail.com \
    --to=tanaess@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.