From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f48.google.com ([209.85.214.48]:38455 "EHLO mail-it0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751451AbcFWSfi (ORCPT ); Thu, 23 Jun 2016 14:35:38 -0400 Received: by mail-it0-f48.google.com with SMTP id h190so71830364ith.1 for ; Thu, 23 Jun 2016 11:35:38 -0700 (PDT) Subject: Re: system crash on replace To: Steven Haigh , linux-btrfs@vger.kernel.org References: <4805f6ef-a422-ab61-657b-8aafa9c47d1b@crc.id.au> From: "Austin S. Hemmelgarn" Message-ID: <22c30021-3939-795d-fd96-b116cd739764@gmail.com> Date: Thu, 23 Jun 2016 14:35:30 -0400 MIME-Version: 1.0 In-Reply-To: <4805f6ef-a422-ab61-657b-8aafa9c47d1b@crc.id.au> Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2016-06-23 13:44, Steven Haigh wrote: > Hi all, > > Relative newbie to BTRFS, but long time linux user. I pass the full > disks from a Xen Dom0 -> guest DomU and run BTRFS within the DomU. > > I've migrated my existing mdadm RAID6 to a BTRFS raid6 layout. I have a > drive that threw a few UNC errors during a subsequent balance, so I've > pulled that drive, dd /dev/zero to it, then trying to add it back in to > the BTRFS system via: > btrfs replace start 4 /dev/xvdf /mnt/fileshare > > The command gets accepted and the replace starts - however apart from it > being at a glacial pace, it seems to hang the system hard with the > following output: > > zeus login: BTRFS info (device xvdc): not using ssd allocation scheme > BTRFS info (device xvdc): disk space caching is enabled > BUG: unable to handle kernel paging request at ffffc90040eed000 > IP: [] __memcpy+0x12/0x20 > PGD 7fbc6067 PUD 7ce9e067 PMD 7b6d6067 PTE 0 > Oops: 0002 [#1] SMP > Modules linked in: x86_pkg_temp_thermal coretemp crct10dif_pclmul btrfs > aesni_intel aes_x86_64 lrw gf128mul xor glue_helper ablk_helper cryptd > pcspkr raid6_pq nfsd auth > _rpcgss nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel > xen_gntalloc xen_evtchn ipv6 autofs4 > CPU: 1 PID: 2271 Comm: kworker/u4:4 Not tainted 4.4.13-1.el7xen.x86_64 #1 > Workqueue: btrfs-endio btrfs_endio_helper [btrfs] > task: ffff88007c5c83c0 ti: ffff88005b978000 task.ti: ffff88005b978000 > RIP: e030:[] [] __memcpy+0x12/0x20 > RSP: e02b:ffff88005b97bc60 EFLAGS: 00010246 > RAX: ffffc90040eecff8 RBX: 0000000000001000 RCX: 00000000000001ff > RDX: 0000000000000000 RSI: ffff88004e4ec008 RDI: ffffc90040eed000 > RBP: ffff88005b97bd28 R08: ffffc90040eeb000 R09: 00000000607a06e1 > R10: 0000000000007ff0 R11: 0000000000000000 R12: 0000000000000000 > R13: 00000000607a26d9 R14: 00000000607a26e1 R15: ffff880062c613d8 > FS: 00007fd08feb88c0(0000) GS:ffff88007f500000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffffc90040eed000 CR3: 0000000056004000 CR4: 0000000000042660 > Stack: > ffffffffa05afc77 ffff88005b97bc90 ffffffff81190a14 0000160000000000 > ffff88005ba01900 0000000000000000 00000000607a06e1 ffffc90040eeb000 > 0000000000000002 000000000000001b 0000000000000000 0000002000000001 > Call Trace: > [] ? lzo_decompress_biovec+0x237/0x2b0 [btrfs] > [] ? vmalloc+0x54/0x60 > [] end_compressed_bio_read+0x1d4/0x2a0 [btrfs] > [] ? kmem_cache_free+0xcc/0x2c0 > [] bio_endio+0x40/0x60 > [] end_workqueue_fn+0x3c/0x40 [btrfs] > [] normal_work_helper+0xc1/0x300 [btrfs] > [] ? finish_task_switch+0x82/0x280 > [] btrfs_endio_helper+0x12/0x20 [btrfs] > [] process_one_work+0x154/0x400 > [] worker_thread+0x11a/0x460 > [] ? rescuer_thread+0x2f0/0x2f0 > [] kthread+0xc9/0xe0 > [] ? kthread_park+0x60/0x60 > [] ret_from_fork+0x3f/0x70 > [] ? kthread_park+0x60/0x60 > Code: 74 0e 48 8b 43 60 48 2b 43 50 88 43 4e 5b 5d c3 e8 b4 fc ff ff eb > eb 90 90 66 66 90 66 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 48 > a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3 > RIP [] __memcpy+0x12/0x20 > RSP > CR2: ffffc90040eed000 > ---[ end trace 001cc830ec804da7 ]--- > BUG: unable to handle kernel paging request at ffffffffffffffd8 > IP: [] kthread_data+0x10/0x20 > PGD 188d067 PUD 188f067 PMD 0 > Oops: 0000 [#2] SMP > Modules linked in: x86_pkg_temp_thermal coretemp crct10dif_pclmul btrfs > aesni_intel aes_x86_64 lrw gf128mul xor glue_helper ablk_helper cryptd > pcspkr raid6_pq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables > xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4 > CPU: 1 PID: 2271 Comm: kworker/u4:4 Tainted: G D > 4.4.13-1.el7xen.x86_64 #1 > task: ffff88007c5c83c0 ti: ffff88005b978000 task.ti: ffff88005b978000 > RIP: e030:[] [] kthread_data+0x10/0x20 > RSP: e02b:ffff88005b97b980 EFLAGS: 00010002 > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001 > RDX: ffffffff81bd5080 RSI: 0000000000000001 RDI: ffff88007c5c83c0 > RBP: ffff88005b97b980 R08: 0000000000000001 R09: 000013f6a632677e > R10: 000000000000001f R11: 0000000000000000 R12: ffff88007c5c83c0 > R13: 0000000000000000 R14: 00000000000163c0 R15: 0000000000000001 > FS: 00007fd08feb88c0(0000) GS:ffff88007f500000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000028 CR3: 0000000056004000 CR4: 0000000000042660 > Stack: > ffff88005b97b998 ffffffff81094751 ffff88007f5163c0 ffff88005b97b9e0 > ffffffff8165a34f ffff88007b678448 ffff88007c5c83c0 ffff88005b97c000 > ffff88007c5c8710 0000000000000000 ffff88007c5c83c0 ffff88007cffacc0 > Call Trace: > [] wq_worker_sleeping+0x11/0x90 > [] __schedule+0x3bf/0x880 > [] schedule+0x35/0x80 > [] do_exit+0x65f/0xad0 > [] oops_end+0x9a/0xd0 > [] no_context+0x10d/0x360 > [] __bad_area_nosemaphore+0x109/0x210 > [] bad_area_nosemaphore+0x13/0x20 > [] __do_page_fault+0x80/0x3f0 > [] ? vmap_page_range_noflush+0x284/0x390 > [] do_page_fault+0x22/0x30 > [] page_fault+0x28/0x30 > [] ? __memcpy+0x12/0x20 > [] ? lzo_decompress_biovec+0x237/0x2b0 [btrfs] > [] ? vmalloc+0x54/0x60 > [] end_compressed_bio_read+0x1d4/0x2a0 [btrfs] > [] ? kmem_cache_free+0xcc/0x2c0 > [] bio_endio+0x40/0x60 > [] end_workqueue_fn+0x3c/0x40 [btrfs] > [] normal_work_helper+0xc1/0x300 [btrfs] > [] ? finish_task_switch+0x82/0x280 > [] btrfs_endio_helper+0x12/0x20 [btrfs] > [] process_one_work+0x154/0x400 > [] worker_thread+0x11a/0x460 > [] ? rescuer_thread+0x2f0/0x2f0 > [] kthread+0xc9/0xe0 > [] ? kthread_park+0x60/0x60 > [] ret_from_fork+0x3f/0x70 > [] ? kthread_park+0x60/0x60 > Code: ff ff eb 92 be 3a 02 00 00 48 c7 c7 de 81 7a 81 e8 d6 34 fe ff e9 > d5 fe ff ff 90 66 66 66 66 90 55 48 8b 87 00 04 00 00 48 89 e5 <48> 8b > 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 > RIP [] kthread_data+0x10/0x20 > RSP > CR2: ffffffffffffffd8 > ---[ end trace 001cc830ec804da8 ]--- > Fixing recursive fault but reboot is needed! > > Current details: > $ btrfs fi show > Label: 'BTRFS' uuid: 41cda023-1c96-4174-98ba-f29a3d38cd85 > Total devices 5 FS bytes used 4.81TiB > devid 1 size 2.73TiB used 1.66TiB path /dev/xvdc > devid 2 size 2.73TiB used 1.67TiB path /dev/xvdd > devid 3 size 1.82TiB used 1.82TiB path /dev/xvde > devid 5 size 1.82TiB used 1.82TiB path /dev/xvdg > *** Some devices missing > > $ btrfs fi df /mnt/fileshare > Data, RAID6: total=5.14TiB, used=4.81TiB > System, RAID6: total=160.00MiB, used=608.00KiB > Metadata, RAID6: total=6.19GiB, used=5.19GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > > [ 12.833431] Btrfs loaded > [ 12.834339] BTRFS: device label BTRFS devid 1 transid 46614 /dev/xvdc > [ 12.861109] BTRFS: device label BTRFS devid 0 transid 46614 /dev/xvdf > [ 12.861357] BTRFS: device label BTRFS devid 3 transid 46614 /dev/xvde > [ 12.862859] BTRFS: device label BTRFS devid 2 transid 46614 /dev/xvdd > [ 12.863372] BTRFS: device label BTRFS devid 5 transid 46614 /dev/xvdg > [ 379.629911] BTRFS info (device xvdg): allowing degraded mounts > [ 379.630028] BTRFS info (device xvdg): not using ssd allocation scheme > [ 379.630120] BTRFS info (device xvdg): disk space caching is enabled > [ 379.630207] BTRFS: has skinny extents > [ 379.631024] BTRFS warning (device xvdg): devid 4 uuid > 61ccce61-9787-453e-b793-1b86f8015ee1 is missing > [ 383.675967] BTRFS info (device xvdg): continuing dev_replace from > (devid 4) to /dev/xvdf @0% > > I've currently cancelled the replace with a btrfs replace cancel > /mnt/fileshare - so at least the system doesn't crash completely in the > meantime - and mounted it with -o degraded until I can see what the deal > is here. > > Any suggestions? > The first thing I would suggest is to follow the suggestion at the bottom of the back-trace you got, and reboot your system. If you've hit a recursive page fault, then something is going seriously wrong, and you really can't trust the state of the system afterwards. Now, with that out of the way, I think you've actually hit two bugs. The first is a known issue with a currently unknown cause in the BTRFS raid6 code which causes rebuilds to take extremely long amounts of time (the only time I've seen this happen myself it took 8 hours to rebuild a FS with only a few hundred GB of data on it, when the same operation with a raid1 profile took only about 20 minutes). The second appears to be a previously unreported (at least, I've never seen anything like this reported) issue in the in-line compression code. You may be able to work around the second bug short term by turning off compression in the mount options, but I'm not 100% certain if this will help. Expanding further on this all though, after rebooting, see if you get the same BUG output after rebooting the system (before turning off compression in the mount options). If it doesn't happen again, then I'd seriously suggest checking your hardware, as certain types of RAM and/or CPU issues can trigger things like this, and while this particular symptom only shows up intermittently, the primary symptom is data corruption in RAM. As far as the raid6 rebuild issue, there's currently not any known method to fix it, let alone any known way to reliably reproduce it (multiple people have reported it, but I don't know of anyone who has been able to reliably reproduce it starting with a new filesystem). Given this and the couple of other known issues in the raid5/6 code (one being a slightly different form of the traditional 'write hole', and the other being an issue with scrub parsing device stats), general advice here is to avoid using raid5/6 for production systems for the time being.