From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jinpu Wang Subject: [BUG] MD/RAID1 hung forever on bitmap_startwrite+0x122 Date: Mon, 28 Nov 2016 09:45:07 +0100 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Return-path: Sender: linux-raid-owner@vger.kernel.org To: linux-raid@vger.kernel.org Cc: NeilBrown , Shaohua Li List-Id: linux-raid.ids Hi folks, We hit another hung task on bitmap_startwrite with our test machines, this = time it's hung in bitmap_startwrite. We build MD/RAID1 over 2 block devices exported via IB, bitmap=3Dinternal. KVM build on top of RAID1 on compute node, disks are on remote storage node, one storage node crash/reboot, multiple raid1 on multiple compute node KVM run into hung task: [106204.343870] INFO: task kvm:37669 blocked for more than 180 seconds. [106204.344138] Tainted: G IO 4.4.28-1-pserver #1 [106204.344385] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [106204.344798] kvm D ffff882037723710 0 37669 1 0x000= 00000 [106204.344805] ffff882037723710 ffff882038f08d00 ffff882029770d00 ffff8820377236d8 [106204.344809] ffff8820377236d8 ffff882037724000 0000000000308648 0000000000000008 [106204.344813] ffff880f9bd9e8c0 ffff882037723768 ffff882037723728 ffffffff81811c60 [106204.344818] Call Trace: [106204.344831] [] schedule+0x30/0x80 [106204.344841] [] bitmap_startwrite+0x122/0x190 [md_mod= ] [106204.344848] [] ? bio_clone_bioset+0x11b/0x310 [106204.344853] [] ? wait_woken+0x80/0x80 [106204.344859] [] 0xffffffffa0cc5127 [106204.344865] [] md_set_array_sectors+0xac3/0xe20 [md_= mod] [106204.344871] [] ? generic_make_request_checks+0x234/0= x4c0 [106204.344875] [] blk_prologue_bio+0x91/0xc0 [106204.344879] [] generic_make_request+0xfe/0x1e0 [106204.344883] [] submit_bio+0x62/0x150 [106204.344892] [] do_blockdev_direct_IO+0x2317/0x2ba0 [106204.344897] [] ? __remove_hrtimer+0x89/0xa0 [106204.344903] [] ? udp_poll+0x1f/0xb0 [106204.344908] [] ? sock_poll+0x57/0x120 [106204.344913] [] ? I_BDEV+0x10/0x10 [106204.344918] [] __blockdev_direct_IO+0x3e/0x40 [106204.344922] [] blkdev_direct_IO+0x47/0x50 [106204.344930] [] generic_file_direct_write+0xb0/0x170 [106204.344934] [] __generic_file_write_iter+0xcd/0x1f0 [106204.344943] [] ? kmem_cache_free+0x78/0x190 [106204.344948] [] ? bd_unlink_disk_holder+0xf0/0xf0 [106204.344952] [] blkdev_write_iter+0x87/0x110 [106204.344956] [] ? bd_unlink_disk_holder+0xf0/0xf0 [106204.344962] [] aio_run_iocb+0x236/0x2a0 [106204.344966] [] ? eventfd_ctx_read+0x53/0x200 [106204.344973] [] ? __fget_light+0x1f/0x60 [106204.344976] [] ? __fdget+0xe/0x10 [106204.344980] [] do_io_submit+0x23a/0x4d0 [106204.344985] [] SyS_io_submit+0xb/0x10 [106204.344989] [] entry_SYSCALL_64_fastpath+0x12/0x6a [106384.345330] INFO: task kvm:37669 blocked for more than 180 seconds. [106384.345621] Tainted: G IO 4.4.28-1-pserver #1 [106384.345866] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [106384.346275] kvm D ffff882037723710 0 37669 1 0x000= 00000 [106384.346282] ffff882037723710 ffff882038f08d00 ffff882029770d00 ffff8820377236d8 [106384.346286] ffff8820377236d8 ffff882037724000 0000000000308648 0000000000000008 [106384.346290] ffff880f9bd9e8c0 ffff882037723768 ffff882037723728 ffffffff81811c60 [106384.346294] Call Trace: [106384.346308] [] schedule+0x30/0x80 [106384.346317] [] bitmap_startwrite+0x122/0x190 [md_mod= ] [106384.346325] [] ? bio_clone_bioset+0x11b/0x310 [106384.346330] [] ? wait_woken+0x80/0x80 [106384.346336] [] 0xffffffffa0cc5127 [106384.346341] [] md_set_array_sectors+0xac3/0xe20 [md_= mod] [106384.346347] [] ? generic_make_request_checks+0x234/0= x4c0 [106384.346352] [] blk_prologue_bio+0x91/0xc0 [106384.346356] [] generic_make_request+0xfe/0x1e0 [106384.346360] [] submit_bio+0x62/0x150 [106384.346369] [] do_blockdev_direct_IO+0x2317/0x2ba0 (gdb) l *bitmap_startwrite+0x122 0x121d2 is in bitmap_startwrite (drivers/md/bitmap.c:1396). 1394 if (unlikely(COUNTER(*bmc) =3D=3D COUNTER_MAX)) { 1395 DEFINE_WAIT(__wait); 1396 /* note that it is safe to do the prepare_to_w= ait 1397 * after the test as long as we do it before dropping 1398 * the spinlock. 1399 */ 1400 prepare_to_wait(&bitmap->overflow_wait, &__wai= t, 1401 TASK_UNINTERRUPTIBLE); 1402 spin_unlock_irq(&bitmap->counts.lock); 1403 schedule(); 1404 finish_wait(&bitmap->overflow_wait, &__wait); 1405 continue; 1406 } so seem KVM is waiting on overflow_wait queue, but somehow no one wake him up. During reboot one storage, raid1 has a lot IO errors in that time, I guess some error handle part is broken. I haven't have a reproducer, just want to report to community, in case this is known bug, or anyone has patch already :) Thanks, --=20 Jinpu Wang Linux Kernel Developer ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Tel: +49 30 577 008 042 Fax: +49 30 577 008 299 Email: jinpu.wang@profitbricks.com URL: https://www.profitbricks.de Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht Charlottenburg, HRB 125506 B Gesch=C3=A4ftsf=C3=BChrer: Achim Weiss