From: Goffredo Baroncelli <kreijack@inwind.it>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5
Date: Sat, 25 Jun 2016 14:21:38 +0200 [thread overview]
Message-ID: <8695beeb-f991-28c4-cf6b-8c92339e468f@inwind.it> (raw)
[-- Attachment #1: Type: text/plain, Size: 4691 bytes --]
Hi all,
following the thread "Adventures in btrfs raid5 disk recovery", I investigated a bit the BTRFS capability to scrub a corrupted raid5 filesystem. To test it, I first find where a file was stored, and then I tried to corrupt the data disks (when unmounted) or the parity disk.
The result showed that sometime the kernel recomputed the parity wrongly.
I tested the following kernel
- 4.6.1
- 4.5.4
and both showed the same behavior.
The test was performed as described below:
1) create a filesystem in raid5 mode (for data and metadata) of 1500MB
truncate -s 500M disk1.img; losetup -f disk1.img
truncate -s 500M disk2.img; losetup -f disk2.img
truncate -s 500M disk3.img; losetup -f disk3.img
sudo mkfs.btrfs -d raid5 -m raid5 /dev/loop[0-2]
sudo mount /dev/loop0 mnt/
2) I created a file with a length of 128kb:
python -c "print 'ad'+'a'*65534+'bd'+'b'*65533" | sudo tee mnt/out.txt
sudo umount mnt/
3) I looked at the output of 'btrfs-debug-tree /dev/loop0' and I was able to find where the file stripe is located:
/dev/loop0: offset=81788928+16*4096 (64k, second half of the file: 'bdbbbb.....)
/dev/loop1: offset=61865984+16*4096 (64k, first half of the file: 'adaaaa.....)
/dev/loop2: offset=61865984+16*4096 (64k, parity: '\x03\x00\x03\x03\x03.....)
4) I tried to corrupt each disk (one disk per test), and then run a scrub:
for example for the disk /dev/loop2:
sudo dd if=/dev/zero of=/dev/loop2 bs=1 \
seek=$((61865984+16*4096)) count=5
sudo mount /dev/loop0 mnt
sudo btrfs scrub start mnt/.
5) I check the disks at the offsets above, to verify that the data/parity is correct
However I found that:
1) if I corrupt the parity disk (/dev/loop2), scrub don't find any corruption, but recomputed the parity (always correctly);
2) when I corrupted the other disks (/dev/loop[01]) btrfs was able to find the corruption. But I found two main behaviors:
2.a) the kernel repaired the damage, but compute the wrong parity. Where it was the parity, the kernel copied the data of the second disk on the parity disk
2.b) the kernel repaired the damage, and rebuild a correct parity
I have to point out another strange thing: in dmesg I found two kinds of messages:
msg1)
[....]
[ 1021.366944] BTRFS info (device loop2): disk space caching is enabled
[ 1021.366949] BTRFS: has skinny extents
[ 1021.399208] BTRFS warning (device loop2): checksum error at logical 142802944 on dev /dev/loop0, sector 159872, root 5, inode 257, offset 65536, length 4096, links 1 (path: out.txt)
[ 1021.399214] BTRFS error (device loop2): bdev /dev/loop0 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 1021.399291] BTRFS error (device loop2): fixed up error at logical 142802944 on dev /dev/loop0
msg2)
[ 1017.435068] BTRFS info (device loop2): disk space caching is enabled
[ 1017.435074] BTRFS: has skinny extents
[ 1017.436778] BTRFS info (device loop2): bdev /dev/loop0 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 1017.463403] BTRFS warning (device loop2): checksum error at logical 142802944 on dev /dev/loop0, sector 159872, root 5, inode 257, offset 65536, length 4096, links 1 (path: out.txt)
[ 1017.463409] BTRFS error (device loop2): bdev /dev/loop0 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[ 1017.463467] BTRFS warning (device loop2): checksum error at logical 142802944 on dev /dev/loop0, sector 159872, root 5, inode 257, offset 65536, length 4096, links 1 (path: out.txt)
[ 1017.463472] BTRFS error (device loop2): bdev /dev/loop0 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
[ 1017.463512] BTRFS error (device loop2): unable to fixup (regular) error at logical 142802944 on dev /dev/loop0
[ 1017.463535] BTRFS error (device loop2): fixed up error at logical 142802944 on dev /dev/loop0
but these seem to be UNrelated to the kernel behavior 2.a) or 2.b)
Another strangeness is that SCRUB sometime reports
ERROR: there are uncorrectable errors
and sometime reports
WARNING: errors detected during scrubbing, corrected
but also these seems UNrelated to the behavior 2.a) or 2.b) or msg1 or msg2
Enclosed you can find the script which I used to trigger the bug. I have to rerun it several times to show the problem because it doesn't happen every time. Pay attention that the offset and the loop device name are hard coded. You must run the script in the same directory where it is: eg "bash test.sh".
Br
G.Baroncelli
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
[-- Attachment #2: test.sh --]
[-- Type: application/x-sh, Size: 2819 bytes --]
next reply other threads:[~2016-06-25 12:21 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-25 12:21 Goffredo Baroncelli [this message]
2016-06-25 17:25 ` [BUG] Btrfs scrub sometime recalculate wrong parity in raid5 Chris Murphy
2016-06-25 17:58 ` Chris Murphy
2016-06-25 18:42 ` Goffredo Baroncelli
2016-06-25 22:33 ` Chris Murphy
2016-06-26 9:20 ` Goffredo Baroncelli
2016-06-26 16:43 ` Chris Murphy
2016-06-26 2:53 ` Duncan
2016-06-26 22:33 ` ronnie sahlberg
2016-06-26 22:38 ` Hugo Mills
2016-06-27 3:22 ` Steven Haigh
2016-06-27 3:21 ` Steven Haigh
2016-06-27 19:47 ` Duncan
2016-06-27 3:50 ` Christoph Anton Mitterer
2016-06-27 4:35 ` Andrei Borzenkov
2016-06-27 16:39 ` Christoph Anton Mitterer
2016-09-21 7:28 ` Qu Wenruo
2016-09-21 7:35 ` Tomasz Torcz
2016-09-21 9:15 ` Qu Wenruo
2016-09-21 15:13 ` Chris Murphy
2016-09-22 2:08 ` Qu Wenruo
2016-09-22 2:44 ` Chris Murphy
2016-09-22 3:00 ` Qu Wenruo
2016-09-22 3:12 ` Chris Murphy
2016-09-22 3:07 ` Christoph Anton Mitterer
2016-09-22 3:18 ` Qu Wenruo
2016-09-21 15:02 ` Chris Murphy
2016-11-04 2:10 ` Qu Wenruo
2016-11-05 7:23 ` Goffredo Baroncelli
-- strict thread matches above, loose matches on Subject: below --
2016-07-12 21:50 [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two Goffredo Baroncelli
2016-07-16 15:51 ` [BUG] Btrfs scrub sometime recalculate wrong parity in raid5 Jarkko Lavinen
2016-07-17 19:46 ` Jarkko Lavinen
2016-07-18 18:56 ` Goffredo Baroncelli
2016-08-19 13:17 Philip Espunkt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8695beeb-f991-28c4-cf6b-8c92339e468f@inwind.it \
--to=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).