From: Waxhead <waxhead@online.no>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Btrfs scrub failure for raid 6 kernel 4.3
Date: Mon, 28 Dec 2015 00:06:46 +0100 [thread overview]
Message-ID: <56806F06.50309@online.no> (raw)
In-Reply-To: <CAJCQCtSHWTqn7zzCtUtGjpr5=dMqfC1aGWaeNpZyVVhjrCvDHg@mail.gmail.com>
Chris Murphy wrote:
> On Sun, Dec 27, 2015 at 6:59 AM, Waxhead <waxhead@online.no> wrote:
>> Hi,
>>
>> I have a "toy-array" of 6x USB drives hooked up to a hub where I made a
>> btrfs raid 6 data+metadata filesystem.
>>
>> I copied some files to the filesystem, ripped out one USB drive and ruined
>> it dd if=/dev/random to various locations on the drive. Put the USB drive
>> back and the filesystem mounts ok.
>>
>> If i start scrub I after seconds get the following
>>
>> kernel:[ 50.844026] CPU: 1 PID: 91 Comm: kworker/u4:2 Not tainted
>> 4.3.0-1-686-pae #1 Debian 4.3.3-2
>> kernel:[ 50.844026] Hardware name: Acer AOA150/ , BIOS v0.3310
>> 10/06/2008
>> kernel:[ 50.844026] Workqueue: btrfs-endio-raid56
>> btrfs_endio_raid56_helper [btrfs]
>> kernel:[ 50.844026] task: f642c040 ti: f664c000 task.ti: f664c000
>> kernel:[ 50.844026] Stack:
>> kernel:[ 50.844026] 00000005 f0d20800 f664ded0 f86d0262 00000000
>> f664deac c109a0fc 00000001
>> kernel:[ 50.844026] f79eac40 edb4a000 edb7a000 edb8a000 edbba000
>> eccc1000 ecca1000 00000000
>> kernel:[ 50.844026] 00000000 f664de68 00000003 f664de74 ecb23000
>> f664de5c f5cda6a4 f0d20800
>> kernel:[ 50.844026] Call Trace:
>> kernel:[ 50.844026] [<f86d0262>] ? finish_parity_scrub+0x272/0x560
>> [btrfs]
>> kernel:[ 50.844026] [<c109a0fc>] ? set_next_entity+0x8c/0xba0
>> kernel:[ 50.844026] [<c127d130>] ? bio_endio+0x40/0x70
>> kernel:[ 50.844026] [<f86891fe>] ? btrfs_scrubparity_helper+0xce/0x270
>> [btrfs]
>> kernel:[ 50.844026] [<c107ca7d>] ? process_one_work+0x14d/0x360
>> kernel:[ 50.844026] [<c107ccc9>] ? worker_thread+0x39/0x440
>> kernel:[ 50.844026] [<c107cc90>] ? process_one_work+0x360/0x360
>> kernel:[ 50.844026] [<c10821a6>] ? kthread+0xa6/0xc0
>> kernel:[ 50.844026] [<c1536181>] ? ret_from_kernel_thread+0x21/0x30
>> kernel:[ 50.844026] [<c1082100>] ? kthread_create_on_node+0x130/0x130
>> kernel:[ 50.844026] Code: 6e c1 e8 ac dd f2 ff 83 c4 04 5b 5d c3 8d b6 00
>> 00 00 00 31 c9 81 3d 84 f0 6e c1 84 f0 6e c1 0f 95 c1 eb b9 8d b4 200 00 00
>> 00 0f 0b 8d b4 26 00 00 00 00 8d bc 27 00
>> kernel:[ 50.844026] EIP: [<c1174858>] kunmap_high+0xa8/0xc0 SS:ESP
>> 0068:f664de40
>>
>> This is only a test setup and I will keep this filesystem for a while if it
>> can be of any use...
> Sounds like a bug, but also might be missing functionality still. If
> you can include the reproduce steps, including the exact
> locations+lengths of the random writes, that's probably useful.
>
> More than one thing could be going on. First, I don't know that Btrfs
> even understands the device went missing because it doesn't yet have a
> concept of faulty devices, and then I've seen it get confused when
> drives reappear with new drive designations (not uncommon), and from
> your call trace we don't know if that happened because there's not
> enough information posted. Second, if the damage is too much on a
> device, it almost certainly isn't recognized when reattached. But this
> depends on what locations were damaged. If Btrfs doesn't recognize the
> drive as part of the array, then the scrub request is effectively a
> scrub for a volume with a missing drive which you probably wouldn't
> ever do, you'd first replace the missing device. Scrubs happen on
> normally operating arrays not degraded ones. So it's uncertain either
> Btrfs, or the user, had any idea what state the volume was actually in
> at the time.
>
> Conversely on mdadm, it knows in such a case to mark a device as
> faulty, the array automatically goes degraded, but when the drive is
> reattached it is not automatically re-added. When the user re-adds,
> typically a complete rebuild happens unless there's a write-intent
> bitmap, which isn't a default at create time.
>
I am afraid I can't exactly include the how to reproduce steps.
I do however have the filesystem in a "bad state" so if there is
anything I can do - let me know.
First of all ... a "btrfs filesystem show" does list all drives
Label: none uuid: 2832346e-0720-499f-8239-355534e5721b
Total devices 6 FS bytes used 8.53GiB
devid 1 size 7.68GiB used 3.08GiB path /dev/sdb1
devid 2 size 7.68GiB used 3.08GiB path /dev/sdc1
devid 3 size 7.68GiB used 3.08GiB path /dev/sdd1
devid 4 size 7.68GiB used 3.08GiB path /dev/sde1
devid 5 size 7.68GiB used 3.08GiB path /dev/sdf1
devid 6 size 7.68GiB used 3.08GiB path /dev/sdg1
mount /dev/sdb1 /mnt/
btrfs filesystem df /mnt
Data, RAID6: total=12.00GiB, used=8.45GiB
System, RAID6: total=64.00MiB, used=16.00KiB
Metadata, RAID6: total=256.00MiB, used=84.58MiB
GlobalReserve, single: total=32.00MiB, used=0.00B
btrfs scrub status /mnt
scrub status for 2832346e-0720-499f-8239-355534e5721b
scrub started at Sun Mar 29 23:21:04 2015 and finished after
00:01:04
total bytes scrubbed: 1.97GiB with 14549 errors
error details: super=2 csum=14547
corrected errors: 0, uncorrectable errors: 14547, unverified
errors: 0
Now here is the first worrying part... it says that scrub started at Sun
Mar 29. That is NOT true, the first scrub I did on this filesystem was a
few days ago and it claims it is a lot of uncorrectable errors. Why?
This is after all a raid6 filesystem correct?!
btrfs scrub start -B /mnt
Message from syslogd@a150 at Dec 27 23:44:22 ...
kernel:[ 611.478448] CPU: 0 PID: 1200 Comm: kworker/u4:1 Not tainted
4.3.0-1-686-pae #1 Debian 4.3.3-2
Message from syslogd@a150 at Dec 27 23:44:22 ...
kernel:[ 611.478448] Hardware name: Acer AOA150/ , BIOS
v0.3310 10/06/2008
kernel:[ 611.478448] Workqueue: btrfs-endio-raid56
btrfs_endio_raid56_helper [btrfs]
kernel:[ 611.478448] task: ec403040 ti: ec4a2000 task.ti: ec4a2000
kernel:[ 611.478448] Stack:
kernel:[ 611.478448] 00000005 ecd78800 ec4a3ed0 f8768262 00000000
0000008e 5ead4067 0000008e
kernel:[ 611.478448] 5ead3301 ec5bd000 ec5ce000 ec5fd000 ec62d000
ec5a9000 ec5a8000 f79d27cc
kernel:[ 611.478448] 00000000 ec4a3e68 00000003 ec4a3e74 ec32d700
ec4a3e5c f5ccaba0 ecd78800
kernel:[ 611.478448] Call Trace:
kernel:[ 611.478448] [<f8768262>] ? finish_parity_scrub+0x272/0x560
[btrfs]
kernel:[ 611.478448] [<c127d130>] ? bio_endio+0x40/0x70
kernel:[ 611.478448] [<f87211fe>] ?
btrfs_scrubparity_helper+0xce/0x270 [btrfs]
kernel:[ 611.478448] [<c107ca7d>] ? process_one_work+0x14d/0x360
kernel:[ 611.482350] [<c107ccc9>] ? worker_thread+0x39/0x440
kernel:[ 611.482350] [<c107cc90>] ? process_one_work+0x360/0x360
kernel:[ 611.482350] [<c10821a6>] ? kthread+0xa6/0xc0
kernel:[ 611.482350] [<c1536181>] ? ret_from_kernel_thread+0x21/0x30
kernel:[ 611.482350] [<c1082100>] ? kthread_create_on_node+0x130/0x130
kernel:[ 611.482350] Code: c4 04 5b 5d c3 8d b6 00 00 00 00 31 c9 81
3d 84 f0 6e c1 84 f0 6e c1 0f 95 c1 eb b9 8d b4 26 00 00 00 00 0f 0b 8d
b6 00 00 00 00 <0f> 0b 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89
e5 56 53
kernel:[ 611.482350] EIP: [<c1174860>] kunmap_high+0xb0/0xc0 SS:ESP
0068:ec4a3e40
This is what I got from my ssh login , there is a longer stacktrace on
the computer I am testing this on... what I can read on the screen is
(hope I got all the numbers right):
? print_oops_end_marker+0x41/0x70
? oops_end+0x92/0xd0
? no_context+0x100/0x2b0
? __bad_area_nosemaphore+0xb5/0x140
? dequeue_task_fair+0x4c/0xbd0
? check_preempt_curr+0x7a/0x90
? __do_page_fault+0x460/0x460
? bad_area_nosemaphore+0x17/0x20
? error_code+0x67/0x6c
? alloc_pid+0x5b/0x420
? kthread_data+0xf/0x20
? wq_worker_sleeping+0x10/0x90
? __schedule+0x4e2/0x8c0
? schedule+0x2b/0x80
? do_exit+0x746/0x9f0
? vprintk_default+0x37/0x40
? printk_0x17/0x19
? oops_end+0x92
? do_error_trap+0x8a/0x120
? kunmap_high+0xb0/0xc0
? __alloc_pages_nodemask+0x13b/0x850
? do_overflow+0x30/0x30
? do_invalid_op+0x24/0x30
? error_code+0x67/0x6c
? compact_unblock_should_abort.isra.31+0x7b/0x90
? kunmap_high+0xb0/0xc0
? finish_parity_scrub+0x272/0x556 [btrfs]
? bio_endio+0x40/0x70
? btrfs_scrubparity_helper+0xce/0x270 [btrfs]
? process_one_work+0x14d/0x360
? worker_thread+0x39/0x440
? process_one_work+0x360/0x360
? kthread+0xa6/0xc0
? ret_from_kernel_thread+0x21/0x30
? kthread_create_on_node+0x130/0x130
---[end trace....]
I hope this is of more help. Again if there is anything I can do I am
happy to help. I don't need this filesystem so no need to recover it.
next prev parent reply other threads:[~2015-12-27 23:06 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-27 13:59 Btrfs scrub failure for raid 6 kernel 4.3 Waxhead
2015-12-27 18:29 ` Chris Murphy
2015-12-27 23:06 ` Waxhead [this message]
2015-12-28 1:48 ` Duncan
2015-12-28 2:04 ` Waxhead
2015-12-28 2:18 ` Chris Murphy
2015-12-28 21:08 ` Waxhead
2015-12-28 21:23 ` Chris Murphy
[not found] ` <5681BDD0.1060407@online.no>
2015-12-29 0:29 ` Chris Murphy
2015-12-29 20:19 ` Waxhead
2015-12-30 4:22 ` Chris Murphy
2015-12-30 18:31 ` Waxhead
2015-12-30 19:08 ` Waxhead
2015-12-28 4:02 ` Duncan
2015-12-28 21:17 ` Waxhead
2015-12-28 21:50 ` Chris Murphy
2015-12-28 0:39 ` Christoph Anton Mitterer
2015-12-28 0:58 ` Chris Murphy
2015-12-28 1:09 ` Christoph Anton Mitterer
2015-12-28 1:23 ` Chris Murphy
2015-12-28 1:31 ` Christoph Anton Mitterer
2015-12-28 2:16 ` Duncan
2015-12-28 1:21 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56806F06.50309@online.no \
--to=waxhead@online.no \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).