From: Tom Arild Naess <tanaess@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Unrecoverable error on raid10
Date: Sat, 6 Feb 2016 22:35:00 +0100 [thread overview]
Message-ID: <56B66704.5070505@gmail.com> (raw)
Hello,
I have quite recently converted my file server to btrfs, and I am in the
progress of setting up a new backup server with btrfs to be able to
utilize btrfs send/receive.
FIle server:
> uname -a
Linux main 3.19.0-49-generic #55~14.04.1-Ubuntu SMP Fri Jan 22 11:24:31
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
> btrfs fi show /store
Label: none uuid: 2d84ca51-ec42-4fe3-888a-777cad6e1921
Total devices 4 FS bytes used 4.35TiB
devid 1 size 3.64TiB used 2.18TiB path /dev/sdc
devid 2 size 3.64TiB used 2.18TiB path /dev/sdd
devid 3 size 3.64TiB used 2.18TiB path /dev/sdb
devid 4 size 3.64TiB used 2.18TiB path /dev/sda
btrfs-progs v4.1 (custom compiled)
> btrfs fi df /store
Data, RAID10: total=4.35TiB, used=4.35TiB
System, RAID10: total=64.00MiB, used=480.00KiB
Metadata, RAID10: total=6.00GiB, used=4.59GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
Backup server:
> uname -a
Linux backup 4.2.5-1-ARCH #1 SMP PREEMPT Tue Oct 27 08:13:28 CET 2015
x86_64 GNU/Linux
> sudo btrfs fi show /backup
Label: none uuid: 8825ce78-d620-48f5-9f03-8c4568d3719d
Total devices 4 FS bytes used 2.46TiB
devid 1 size 2.73TiB used 1.24TiB path /dev/sdb
devid 2 size 2.73TiB used 1.24TiB path /dev/sda
devid 3 size 2.73TiB used 1.24TiB path /dev/sdd
devid 4 size 2.73TiB used 1.24TiB path /dev/sdc
btrfs-progs v4.3
> btrfs fi df /backup
Data, RAID10: total=2.48TiB, used=2.46TiB
System, RAID10: total=64.00MiB, used=320.00KiB
Metadata, RAID10: total=7.00GiB, used=6.02GiB
Today I balanced and scrubbed the file system on the backup server for
the first time, since I have run several send/receives containing
terabytes of data and also delete many sub volumes. The scrub came up
with one uncorrectable error:
> btrfs scrub start -Bd /backup
scrub device /dev/sdb (id 1) done
scrub started at Sat Feb 6 04:14:36 2016 and finished after 03:30:41
total bytes scrubbed: 1.23TiB with 0 errors
scrub device /dev/sda (id 2) done
scrub started at Sat Feb 6 04:14:36 2016 and finished after 03:27:21
total bytes scrubbed: 1.23TiB with 1 errors
error details: csum=1
corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdd (id 3) done
scrub started at Sat Feb 6 04:14:36 2016 and finished after 03:27:18
total bytes scrubbed: 1.23TiB with 1 errors
error details: csum=1
corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdc (id 4) done
scrub started at Sat Feb 6 04:14:36 2016 and finished after 03:27:19
total bytes scrubbed: 1.23TiB with 0 errors
ERROR: there are uncorrectable errors
This an except from the logs while scrubbing:
Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical
3531011186688 on dev /dev/sdd, sector 3446072048, root 3811, inode
127923, offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
Feb 06 06:21:20 backup kernel: BTRFS: checksum error at logical
3531011186688 on dev /dev/sda, sector 3446072048, root 3811, inode
127923, offset 6936002560, length 4096, links 1 (path: xxxxxxxx)
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
logical 3531011186688 on dev /dev/sda
Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sdd errs: wr 0, rd 0,
flush 0, corrupt 1, gen 0
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
logical 3531011186688 on dev /dev/sdd
Feb 06 06:21:20 backup kernel: BTRFS: bdev /dev/sda errs: wr 0, rd 0,
flush 0, corrupt 1, gen 0
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
logical 3531011186688 on dev /dev/sda
Feb 06 06:21:20 backup kernel: BTRFS: unable to fixup (regular) error at
logical 3531011186688 on dev /dev/sdd
What's strange is that the failed file have a checksum error in the
exact same spot on both the mirrored copies, which means the file is
unrecoverable. This is not what I expect from a raid10! Unfortunately I
do only have one snapshot left on the backup server, so I don't know if
any of the other snapshots had the same problem.
The file (called xxxxxxxx for privacy) was created in the the last btrfs
send/receive, but I did not notice any errors during the transfer.
This an except from the logs while trying to read the file afterwards:
Feb 06 13:28:45 backup kernel: BTRFS warning (device sdb): csum failed
ino 127923 off 6936002560 csum 284124578 expected csum 1756277981
Anyone seen anything like this on their system? I guess this is a bug,
but I have not been able to find anything like this with Google.
--
Tom Arild Næss
next reply other threads:[~2016-02-06 21:35 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-06 21:35 Tom Arild Naess [this message]
2016-02-06 23:32 ` Unrecoverable error on raid10 Chris Murphy
2016-02-07 0:40 ` Tom Arild Naess
2016-02-07 23:57 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56B66704.5070505@gmail.com \
--to=tanaess@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.