linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] md hang at schedule in  md_write_start
@ 2013-08-12 16:33 Jack Wang
  2013-08-13  4:31 ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: Jack Wang @ 2013-08-12 16:33 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: Jack Wang, Sebastian Riemer

[-- Attachment #1: Type: text/plain, Size: 16750 bytes --]

Hi Neil,


We've found md hang in our test, it's easy to reproduce with script
attached.

We've tried 3.4 stable kernel and latest mainline, it still exists.

Looks like flush bdi_writeback_workfn race with md_stop, no idea how to
fix it, could you kindly give us suggestions?

Best regards,
Jack

[  186.777410]
[  241.951933] INFO: task kworker/u12:3:247 blocked for more than 120
seconds.
[  241.952001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  241.952075] kworker/u12:3   D 0000000000000000     0   247      2
0x00000000
[  241.952203] Workqueue: writeback  (flush-9:1)
[  241.952319]  ffff88020d331418 0000000000000046 0000000000001000
ffff88020d330000
[  241.952512]  ffff88020d331fd8 ffff88020d330000 ffff88020d330010
ffff88020d330000
[  241.952701]  ffff88020d331fd8 ffff88020d330000 ffff88020c10b7e0
ffff8802158ddd20
[  241.952891] Call Trace:
[  241.952951]  [<ffffffff8173ca64>] schedule+0x24/0x70
[  241.953022]  [<ffffffffa01b244d>] md_write_start+0xad/0x1d0 [md_mod]
[  241.953083]  [<ffffffff8106b460>] ? wake_up_bit+0x40/0x40
[  241.953144]  [<ffffffffa0081b0f>] make_request+0x5f/0xe10 [raid1]
[  241.953204]  [<ffffffff81401274>] ? blk_throtl_bio+0x114/0x580
[  241.953264]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  241.953325]  [<ffffffff810a586e>] ? __lock_acquire+0x2be/0x780
[  241.953384]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  241.953451]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  241.953519]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  241.953587]  [<ffffffffa01b3b33>] md_make_request+0x183/0x340 [md_mod]
[  241.953655]  [<ffffffffa01b3a00>] ? md_make_request+0x50/0x340 [md_mod]
[  241.953716]  [<ffffffff8110c070>] ? mempool_alloc_slab+0x10/0x20
[  241.953774]  [<ffffffff8110c1cb>] ? mempool_alloc+0x5b/0x170
[  241.953834]  [<ffffffff813e7242>] generic_make_request+0xc2/0x100
[  241.953893]  [<ffffffff813e72f6>] submit_bio+0x76/0x160
[  241.954392]  [<ffffffff8119f1ec>] ? bio_alloc_bioset+0x9c/0x1c0
[  241.954451]  [<ffffffff81199c10>] _submit_bh+0x140/0x200
[  241.954510]  [<ffffffff81199cdb>] submit_bh+0xb/0x10
[  241.954568]  [<ffffffff8119c47f>] __block_write_full_page+0x1cf/0x320
[  241.954629]  [<ffffffff8110a746>] ? find_get_pages_tag+0x116/0x1e0
[  241.954689]  [<ffffffff8119ba90>] ? block_invalidatepage+0x140/0x140
[  241.954748]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  241.954804]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  241.954862]  [<ffffffff8119c696>] block_write_full_page_endio+0xc6/0x100
[  241.954924]  [<ffffffff8119c6e0>] block_write_full_page+0x10/0x20
[  241.954983]  [<ffffffff811a1553>] blkdev_writepage+0x13/0x20
[  241.955041]  [<ffffffff81113815>] __writepage+0x15/0x40
[  241.955099]  [<ffffffff81114abd>] write_cache_pages+0x26d/0x540
[  241.955159]  [<ffffffff81113800>] ? set_page_dirty+0x60/0x60
[  241.955219]  [<ffffffff81114dd8>] generic_writepages+0x48/0x60
[  241.955278]  [<ffffffff81114e0e>] do_writepages+0x1e/0x40
[  241.955335]  [<ffffffff811910d4>] __writeback_single_inode+0x44/0x2b0
[  241.955395]  [<ffffffff81192086>] writeback_sb_inodes+0x376/0x570
[  241.955456]  [<ffffffff8173f066>] ? _raw_spin_unlock+0x26/0x40
[  241.955513]  [<ffffffff81192316>] __writeback_inodes_wb+0x96/0xc0
[  241.955571]  [<ffffffff811928f3>] wb_writeback+0x223/0x330
[  241.955630]  [<ffffffff81192b1a>] wb_do_writeback+0x11a/0x250
[  241.955688]  [<ffffffff81193180>] bdi_writeback_workfn+0x80/0x200
[  241.955748]  [<ffffffff810633c6>] process_one_work+0x1e6/0x5d0
[  241.955806]  [<ffffffff81063351>] ? process_one_work+0x171/0x5d0
[  241.955865]  [<ffffffff8106478e>] worker_thread+0x11e/0x3e0
[  241.955923]  [<ffffffff81064670>] ? manage_workers+0x2b0/0x2b0
[  241.955981]  [<ffffffff8106ad3e>] kthread+0xee/0x100
[  241.956040]  [<ffffffff8106ac50>] ? __init_kthread_worker+0x70/0x70
[  241.956100]  [<ffffffff817470ec>] ret_from_fork+0x7c/0xb0
[  241.956156]  [<ffffffff8106ac50>] ? __init_kthread_worker+0x70/0x70
[  241.956214] 3 locks held by kworker/u12:3/247:
[  241.956266]  #0:  (writeback){......}, at: [<ffffffff81063351>]
process_one_work+0x171/0x5d0
[  241.956486]  #1:  ((&(&wb->dwork)->work)){......}, at:
[<ffffffff81063351>] process_one_work+0x171/0x5d0
[  241.956706]  #2:  (&type->s_umount_key#21){......}, at:
[<ffffffff8116b5ae>] grab_super_passive+0x3e/0x90
[  241.956975] INFO: task mdadm:2902 blocked for more than 120 seconds.
[  241.957030] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  241.957138] mdadm           D 0000000000000000     0  2902   2885
0x00000004
[  241.957255]  ffff8802117f95e8 0000000000000046 0000000000001000
ffff8802117f8000
[  241.957443]  ffff8802117f9fd8 ffff8802117f8000 ffff8802117f8010
ffff8802117f8000
[  241.957632]  ffff8802117f9fd8 ffff8802117f8000 ffff88020c1dca80
ffff8802158ddd20
[  241.957819] Call Trace:
[  241.957876]  [<ffffffff8173ca64>] schedule+0x24/0x70
[  241.957941]  [<ffffffffa01b244d>] md_write_start+0xad/0x1d0 [md_mod]
[  241.958000]  [<ffffffff8106b460>] ? wake_up_bit+0x40/0x40
[  241.958059]  [<ffffffffa0081b0f>] make_request+0x5f/0xe10 [raid1]
[  241.958119]  [<ffffffff81401274>] ? blk_throtl_bio+0x114/0x580
[  241.958179]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  241.958238]  [<ffffffff810a586e>] ? __lock_acquire+0x2be/0x780
[  241.958297]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  241.958365]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  241.958433]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  241.958501]  [<ffffffffa01b3b33>] md_make_request+0x183/0x340 [md_mod]
[  241.958568]  [<ffffffffa01b3a00>] ? md_make_request+0x50/0x340 [md_mod]
[  241.958627]  [<ffffffff8110c070>] ? mempool_alloc_slab+0x10/0x20
[  241.958685]  [<ffffffff8110c1cb>] ? mempool_alloc+0x5b/0x170
[  241.958743]  [<ffffffff813e7242>] generic_make_request+0xc2/0x100
[  241.958802]  [<ffffffff813e72f6>] submit_bio+0x76/0x160
[  241.958859]  [<ffffffff8119f1ec>] ? bio_alloc_bioset+0x9c/0x1c0
[  241.958920]  [<ffffffff81199c10>] _submit_bh+0x140/0x200
[  241.958978]  [<ffffffff81199cdb>] submit_bh+0xb/0x10
[  241.959036]  [<ffffffff8119c47f>] __block_write_full_page+0x1cf/0x320
[  241.959096]  [<ffffffff8110a746>] ? find_get_pages_tag+0x116/0x1e0
[  241.959157]  [<ffffffff8119ba90>] ? block_invalidatepage+0x140/0x140
[  241.959215]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  241.959272]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  241.959330]  [<ffffffff8119c696>] block_write_full_page_endio+0xc6/0x100
[  241.959391]  [<ffffffff8119c6e0>] block_write_full_page+0x10/0x20
[  241.959449]  [<ffffffff811a1553>] blkdev_writepage+0x13/0x20
[  241.959507]  [<ffffffff81113815>] __writepage+0x15/0x40
[  241.959566]  [<ffffffff81114abd>] write_cache_pages+0x26d/0x540
[  241.959625]  [<ffffffff8107f833>] ? update_sd_lb_stats+0x133/0x670
[  241.959685]  [<ffffffff81113800>] ? set_page_dirty+0x60/0x60
[  241.959745]  [<ffffffff81114dd8>] generic_writepages+0x48/0x60
[  241.959805]  [<ffffffff81114e0e>] do_writepages+0x1e/0x40
[  241.959864]  [<ffffffff81109ba1>] __filemap_fdatawrite_range+0x51/0x60
[  241.959925]  [<ffffffff81109e1a>] filemap_fdatawrite+0x1a/0x20
[  241.959985]  [<ffffffff81109e7d>] filemap_write_and_wait+0x5d/0x80
[  241.960044]  [<ffffffff811a187c>] __sync_blockdev+0x1c/0x40
[  241.960102]  [<ffffffff811a18ae>] sync_blockdev+0xe/0x10
[  241.960167]  [<ffffffffa01b0c34>] do_md_stop+0x74/0x4e0 [md_mod]
[  241.960235]  [<ffffffffa01b48d4>] md_ioctl+0x784/0x16a0 [md_mod]
[  241.960294]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  241.960356]  [<ffffffff8106e5b3>] ? hrtimer_try_to_cancel+0x43/0xf0
[  241.960416]  [<ffffffff813ef6c3>] __blkdev_driver_ioctl+0x23/0x30
[  241.960476]  [<ffffffff813efd7c>] blkdev_ioctl+0x21c/0x800
[  241.960533]  [<ffffffff811a07cd>] block_ioctl+0x3d/0x50
[  241.960592]  [<ffffffff8117a91c>] do_vfs_ioctl+0x9c/0x560
[  241.960649]  [<ffffffff8106e0b0>] ? update_rmtp+0x80/0x80
[  241.960709]  [<ffffffff8106f48f>] ? hrtimer_start_range_ns+0xf/0x20
[  241.960771]  [<ffffffff8117ae71>] SyS_ioctl+0x91/0xa0
[  241.960831]  [<ffffffff81416829>] ? lockdep_sys_exit_thunk+0x35/0x67
[  241.960897]  [<ffffffff81747192>] system_call_fastpath+0x16/0x1b
[  241.960954] 2 locks held by mdadm/2902:
[  241.961004]  #0:  (&mddev->reconfig_mutex){......}, at:
[<ffffffffa01b423e>] md_ioctl+0xee/0x16a0 [md_mod]
[  241.961235]  #1:  (&mddev->open_mutex){......}, at:
[<ffffffffa01b0c02>] do_md_stop+0x42/0x4e0 [md_mod]
[  361.888286] INFO: task kworker/u12:3:247 blocked for more than 120
seconds.
[  361.888389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  361.888499] kworker/u12:3   D 0000000000000000     0   247      2
0x00000000
[  361.888628] Workqueue: writeback bdi_writeback_workfn (flush-9:1)
[  361.888742]  ffff88020d331418 0000000000000046 0000000000001000
ffff88020d330000
[  361.888932]  ffff88020d331fd8 ffff88020d330000 ffff88020d330010
ffff88020d330000
[  361.889121]  ffff88020d331fd8 ffff88020d330000 ffff88020c10b7e0
ffff8802158ddd20
[  361.889308] Call Trace:
[  361.889368]  [<ffffffff8173ca64>] schedule+0x24/0x70
[  361.889438]  [<ffffffffa01b244d>] md_write_start+0xad/0x1d0 [md_mod]
[  361.889499]  [<ffffffff8106b460>] ? wake_up_bit+0x40/0x40
[  361.889560]  [<ffffffffa0081b0f>] make_request+0x5f/0xe10 [raid1]
[  361.889620]  [<ffffffff81401274>] ? blk_throtl_bio+0x114/0x580
[  361.889681]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  361.889741]  [<ffffffff810a586e>] ? __lock_acquire+0x2be/0x780
[  361.889802]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  361.889870]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  361.889937]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  361.890005]  [<ffffffffa01b3b33>] md_make_request+0x183/0x340 [md_mod]
[  361.890072]  [<ffffffffa01b3a00>] ? md_make_request+0x50/0x340 [md_mod]
[  361.890133]  [<ffffffff8110c070>] ? mempool_alloc_slab+0x10/0x20
[  361.890191]  [<ffffffff8110c1cb>] ? mempool_alloc+0x5b/0x170
[  361.890251]  [<ffffffff813e7242>] generic_make_request+0xc2/0x100
[  361.890310]  [<ffffffff813e72f6>] submit_bio+0x76/0x160
[  361.890369]  [<ffffffff8119f1ec>] ? bio_alloc_bioset+0x9c/0x1c0
[  361.890428]  [<ffffffff81199c10>] _submit_bh+0x140/0x200
[  361.890486]  [<ffffffff81199cdb>] submit_bh+0xb/0x10
[  361.890545]  [<ffffffff8119c47f>] __block_write_full_page+0x1cf/0x320
[  361.890606]  [<ffffffff8110a746>] ? find_get_pages_tag+0x116/0x1e0
[  361.890666]  [<ffffffff8119ba90>] ? block_invalidatepage+0x140/0x140
[  361.890724]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  361.890781]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  361.890839]  [<ffffffff8119c696>] block_write_full_page_endio+0xc6/0x100
[  361.890899]  [<ffffffff8119c6e0>] block_write_full_page+0x10/0x20
[  361.890958]  [<ffffffff811a1553>] blkdev_writepage+0x13/0x20
[  361.891017]  [<ffffffff81113815>] __writepage+0x15/0x40
[  361.891076]  [<ffffffff81114abd>] write_cache_pages+0x26d/0x540
[  361.891135]  [<ffffffff81113800>] ? set_page_dirty+0x60/0x60
[  361.891195]  [<ffffffff81114dd8>] generic_writepages+0x48/0x60
[  361.891255]  [<ffffffff81114e0e>] do_writepages+0x1e/0x40
[  361.891312]  [<ffffffff811910d4>] __writeback_single_inode+0x44/0x2b0
[  361.891371]  [<ffffffff81192086>] writeback_sb_inodes+0x376/0x570
[  361.891431]  [<ffffffff8173f066>] ? _raw_spin_unlock+0x26/0x40
[  361.891490]  [<ffffffff81192316>] __writeback_inodes_wb+0x96/0xc0
[  361.891548]  [<ffffffff811928f3>] wb_writeback+0x223/0x330
[  361.891606]  [<ffffffff81192b1a>] wb_do_writeback+0x11a/0x250
[  361.891665]  [<ffffffff81193180>] bdi_writeback_workfn+0x80/0x200
[  361.891725]  [<ffffffff810633c6>] process_one_work+0x1e6/0x5d0
[  361.891784]  [<ffffffff81063351>] ? process_one_work+0x171/0x5d0
[  361.891843]  [<ffffffff8106478e>] worker_thread+0x11e/0x3e0
[  361.891902]  [<ffffffff81064670>] ? manage_workers+0x2b0/0x2b0
[  361.891959]  [<ffffffff8106ad3e>] kthread+0xee/0x100
[  361.892017]  [<ffffffff8106ac50>] ? __init_kthread_worker+0x70/0x70
[  361.892078]  [<ffffffff817470ec>] ret_from_fork+0x7c/0xb0
[  361.892135]  [<ffffffff8106ac50>] ? __init_kthread_worker+0x70/0x70
[  361.892193] 3 locks held by kworker/u12:3/247:
[  361.892244]  #0:  (writeback){......}, at: [<ffffffff81063351>]
process_one_work+0x171/0x5d0
[  361.892464]  #1:  ((&(&wb->dwork)->work)){......}, at:
[<ffffffff81063351>] process_one_work+0x171/0x5d0
[  361.892687]  #2:  (&type->s_umount_key#21){......}, at:
[<ffffffff8116b5ae>] grab_super_passive+0x3e/0x90
[  361.892956] INFO: task mdadm:2902 blocked for more than 120 seconds.
[  361.893011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  361.893119] mdadm           D 0000000000000000     0  2902   2885
0x00000004
[  361.893236]  ffff8802117f95e8 0000000000000046 0000000000001000
ffff8802117f8000
[  361.893423]  ffff8802117f9fd8 ffff8802117f8000 ffff8802117f8010
ffff8802117f8000
[  361.893611]  ffff8802117f9fd8 ffff8802117f8000 ffff88020c1dca80
ffff8802158ddd20
[  361.894239] Call Trace:
[  361.894294]  [<ffffffff8173ca64>] schedule+0x24/0x70
[  361.894360]  [<ffffffffa01b244d>] md_write_start+0xad/0x1d0 [md_mod]
[  361.894419]  [<ffffffff8106b460>] ? wake_up_bit+0x40/0x40
[  361.894478]  [<ffffffffa0081b0f>] make_request+0x5f/0xe10 [raid1]
[  361.894536]  [<ffffffff81401274>] ? blk_throtl_bio+0x114/0x580
[  361.894596]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  361.894655]  [<ffffffff810a586e>] ? __lock_acquire+0x2be/0x780
[  361.894714]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  361.894781]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  361.894849]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  361.894917]  [<ffffffffa01b3b33>] md_make_request+0x183/0x340 [md_mod]
[  361.894984]  [<ffffffffa01b3a00>] ? md_make_request+0x50/0x340 [md_mod]
[  361.895043]  [<ffffffff8110c070>] ? mempool_alloc_slab+0x10/0x20
[  361.895101]  [<ffffffff8110c1cb>] ? mempool_alloc+0x5b/0x170
[  361.895161]  [<ffffffff813e7242>] generic_make_request+0xc2/0x100
[  361.895220]  [<ffffffff813e72f6>] submit_bio+0x76/0x160
[  361.895277]  [<ffffffff8119f1ec>] ? bio_alloc_bioset+0x9c/0x1c0
[  361.895337]  [<ffffffff81199c10>] _submit_bh+0x140/0x200
[  361.895395]  [<ffffffff81199cdb>] submit_bh+0xb/0x10
[  361.895453]  [<ffffffff8119c47f>] __block_write_full_page+0x1cf/0x320
[  361.895513]  [<ffffffff8110a746>] ? find_get_pages_tag+0x116/0x1e0
[  361.895573]  [<ffffffff8119ba90>] ? block_invalidatepage+0x140/0x140
[  361.895632]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  361.895688]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  361.895746]  [<ffffffff8119c696>] block_write_full_page_endio+0xc6/0x100
[  361.895808]  [<ffffffff8119c6e0>] block_write_full_page+0x10/0x20
[  361.895866]  [<ffffffff811a1553>] blkdev_writepage+0x13/0x20
[  361.895924]  [<ffffffff81113815>] __writepage+0x15/0x40
[  361.895981]  [<ffffffff81114abd>] write_cache_pages+0x26d/0x540
[  361.896041]  [<ffffffff8107f833>] ? update_sd_lb_stats+0x133/0x670
[  361.896100]  [<ffffffff81113800>] ? set_page_dirty+0x60/0x60
[  361.896159]  [<ffffffff81114dd8>] generic_writepages+0x48/0x60
[  361.896218]  [<ffffffff81114e0e>] do_writepages+0x1e/0x40
[  361.896278]  [<ffffffff81109ba1>] __filemap_fdatawrite_range+0x51/0x60
[  361.896338]  [<ffffffff81109e1a>] filemap_fdatawrite+0x1a/0x20
[  361.896397]  [<ffffffff81109e7d>] filemap_write_and_wait+0x5d/0x80
[  361.896456]  [<ffffffff811a187c>] __sync_blockdev+0x1c/0x40
[  361.896515]  [<ffffffff811a18ae>] sync_blockdev+0xe/0x10
[  361.896580]  [<ffffffffa01b0c34>] do_md_stop+0x74/0x4e0 [md_mod]
[  361.896647]  [<ffffffffa01b48d4>] md_ioctl+0x784/0x16a0 [md_mod]
[  361.896707]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  361.896767]  [<ffffffff8106e5b3>] ? hrtimer_try_to_cancel+0x43/0xf0
[  361.896828]  [<ffffffff813ef6c3>] __blkdev_driver_ioctl+0x23/0x30
[  361.896886]  [<ffffffff813efd7c>] blkdev_ioctl+0x21c/0x800
[  361.896943]  [<ffffffff811a07cd>] block_ioctl+0x3d/0x50
[  361.897001]  [<ffffffff8117a91c>] do_vfs_ioctl+0x9c/0x560
[  361.897059]  [<ffffffff8106e0b0>] ? update_rmtp+0x80/0x80
[  361.897116]  [<ffffffff8106f48f>] ? hrtimer_start_range_ns+0xf/0x20
[  361.897175]  [<ffffffff8117ae71>] SyS_ioctl+0x91/0xa0
[  361.897233]  [<ffffffff81416829>] ? lockdep_sys_exit_thunk+0x35/0x67
[  361.897293]  [<ffffffff81747192>] system_call_fastpath+0x16/0x1b
[  361.897350] 2 locks held by mdadm/2902:
[  361.897401]  #0:  (&mddev->reconfig_mutex){......}, at:
[<ffffffffa01b423e>] md_ioctl+0xee/0x16a0 [md_mod]
[  361.897631]  #1:  (&mddev->open_mutex){......}, at:
[<ffffffffa01b0c02>] do_md_stop+0x42/0x4e0 [md_mod]


[-- Attachment #2: mdadm.sh --]
[-- Type: application/x-shellscript, Size: 162 bytes --]

[-- Attachment #3: md_hang_mainline --]
[-- Type: text/plain, Size: 16474 bytes --]


[  186.777410] 
[  241.951933] INFO: task kworker/u12:3:247 blocked for more than 120 seconds.
[  241.952001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  241.952075] kworker/u12:3   D 0000000000000000     0   247      2 0x00000000
[  241.952203] Workqueue: writeback bdi_writeback_workfn (flush-9:1)
[  241.952319]  ffff88020d331418 0000000000000046 0000000000001000 ffff88020d330000
[  241.952512]  ffff88020d331fd8 ffff88020d330000 ffff88020d330010 ffff88020d330000
[  241.952701]  ffff88020d331fd8 ffff88020d330000 ffff88020c10b7e0 ffff8802158ddd20
[  241.952891] Call Trace:
[  241.952951]  [<ffffffff8173ca64>] schedule+0x24/0x70
[  241.953022]  [<ffffffffa01b244d>] md_write_start+0xad/0x1d0 [md_mod]
[  241.953083]  [<ffffffff8106b460>] ? wake_up_bit+0x40/0x40
[  241.953144]  [<ffffffffa0081b0f>] make_request+0x5f/0xe10 [raid1]
[  241.953204]  [<ffffffff81401274>] ? blk_throtl_bio+0x114/0x580
[  241.953264]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  241.953325]  [<ffffffff810a586e>] ? __lock_acquire+0x2be/0x780
[  241.953384]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  241.953451]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  241.953519]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  241.953587]  [<ffffffffa01b3b33>] md_make_request+0x183/0x340 [md_mod]
[  241.953655]  [<ffffffffa01b3a00>] ? md_make_request+0x50/0x340 [md_mod]
[  241.953716]  [<ffffffff8110c070>] ? mempool_alloc_slab+0x10/0x20
[  241.953774]  [<ffffffff8110c1cb>] ? mempool_alloc+0x5b/0x170
[  241.953834]  [<ffffffff813e7242>] generic_make_request+0xc2/0x100
[  241.953893]  [<ffffffff813e72f6>] submit_bio+0x76/0x160
[  241.954392]  [<ffffffff8119f1ec>] ? bio_alloc_bioset+0x9c/0x1c0
[  241.954451]  [<ffffffff81199c10>] _submit_bh+0x140/0x200
[  241.954510]  [<ffffffff81199cdb>] submit_bh+0xb/0x10
[  241.954568]  [<ffffffff8119c47f>] __block_write_full_page+0x1cf/0x320
[  241.954629]  [<ffffffff8110a746>] ? find_get_pages_tag+0x116/0x1e0
[  241.954689]  [<ffffffff8119ba90>] ? block_invalidatepage+0x140/0x140
[  241.954748]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  241.954804]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  241.954862]  [<ffffffff8119c696>] block_write_full_page_endio+0xc6/0x100
[  241.954924]  [<ffffffff8119c6e0>] block_write_full_page+0x10/0x20
[  241.954983]  [<ffffffff811a1553>] blkdev_writepage+0x13/0x20
[  241.955041]  [<ffffffff81113815>] __writepage+0x15/0x40
[  241.955099]  [<ffffffff81114abd>] write_cache_pages+0x26d/0x540
[  241.955159]  [<ffffffff81113800>] ? set_page_dirty+0x60/0x60
[  241.955219]  [<ffffffff81114dd8>] generic_writepages+0x48/0x60
[  241.955278]  [<ffffffff81114e0e>] do_writepages+0x1e/0x40
[  241.955335]  [<ffffffff811910d4>] __writeback_single_inode+0x44/0x2b0
[  241.955395]  [<ffffffff81192086>] writeback_sb_inodes+0x376/0x570
[  241.955456]  [<ffffffff8173f066>] ? _raw_spin_unlock+0x26/0x40
[  241.955513]  [<ffffffff81192316>] __writeback_inodes_wb+0x96/0xc0
[  241.955571]  [<ffffffff811928f3>] wb_writeback+0x223/0x330
[  241.955630]  [<ffffffff81192b1a>] wb_do_writeback+0x11a/0x250
[  241.955688]  [<ffffffff81193180>] bdi_writeback_workfn+0x80/0x200
[  241.955748]  [<ffffffff810633c6>] process_one_work+0x1e6/0x5d0
[  241.955806]  [<ffffffff81063351>] ? process_one_work+0x171/0x5d0
[  241.955865]  [<ffffffff8106478e>] worker_thread+0x11e/0x3e0
[  241.955923]  [<ffffffff81064670>] ? manage_workers+0x2b0/0x2b0
[  241.955981]  [<ffffffff8106ad3e>] kthread+0xee/0x100
[  241.956040]  [<ffffffff8106ac50>] ? __init_kthread_worker+0x70/0x70
[  241.956100]  [<ffffffff817470ec>] ret_from_fork+0x7c/0xb0
[  241.956156]  [<ffffffff8106ac50>] ? __init_kthread_worker+0x70/0x70
[  241.956214] 3 locks held by kworker/u12:3/247:
[  241.956266]  #0:  (writeback){......}, at: [<ffffffff81063351>] process_one_work+0x171/0x5d0
[  241.956486]  #1:  ((&(&wb->dwork)->work)){......}, at: [<ffffffff81063351>] process_one_work+0x171/0x5d0
[  241.956706]  #2:  (&type->s_umount_key#21){......}, at: [<ffffffff8116b5ae>] grab_super_passive+0x3e/0x90
[  241.956975] INFO: task mdadm:2902 blocked for more than 120 seconds.
[  241.957030] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  241.957138] mdadm           D 0000000000000000     0  2902   2885 0x00000004
[  241.957255]  ffff8802117f95e8 0000000000000046 0000000000001000 ffff8802117f8000
[  241.957443]  ffff8802117f9fd8 ffff8802117f8000 ffff8802117f8010 ffff8802117f8000
[  241.957632]  ffff8802117f9fd8 ffff8802117f8000 ffff88020c1dca80 ffff8802158ddd20
[  241.957819] Call Trace:
[  241.957876]  [<ffffffff8173ca64>] schedule+0x24/0x70
[  241.957941]  [<ffffffffa01b244d>] md_write_start+0xad/0x1d0 [md_mod]
[  241.958000]  [<ffffffff8106b460>] ? wake_up_bit+0x40/0x40
[  241.958059]  [<ffffffffa0081b0f>] make_request+0x5f/0xe10 [raid1]
[  241.958119]  [<ffffffff81401274>] ? blk_throtl_bio+0x114/0x580
[  241.958179]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  241.958238]  [<ffffffff810a586e>] ? __lock_acquire+0x2be/0x780
[  241.958297]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  241.958365]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  241.958433]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  241.958501]  [<ffffffffa01b3b33>] md_make_request+0x183/0x340 [md_mod]
[  241.958568]  [<ffffffffa01b3a00>] ? md_make_request+0x50/0x340 [md_mod]
[  241.958627]  [<ffffffff8110c070>] ? mempool_alloc_slab+0x10/0x20
[  241.958685]  [<ffffffff8110c1cb>] ? mempool_alloc+0x5b/0x170
[  241.958743]  [<ffffffff813e7242>] generic_make_request+0xc2/0x100
[  241.958802]  [<ffffffff813e72f6>] submit_bio+0x76/0x160
[  241.958859]  [<ffffffff8119f1ec>] ? bio_alloc_bioset+0x9c/0x1c0
[  241.958920]  [<ffffffff81199c10>] _submit_bh+0x140/0x200
[  241.958978]  [<ffffffff81199cdb>] submit_bh+0xb/0x10
[  241.959036]  [<ffffffff8119c47f>] __block_write_full_page+0x1cf/0x320
[  241.959096]  [<ffffffff8110a746>] ? find_get_pages_tag+0x116/0x1e0
[  241.959157]  [<ffffffff8119ba90>] ? block_invalidatepage+0x140/0x140
[  241.959215]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  241.959272]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  241.959330]  [<ffffffff8119c696>] block_write_full_page_endio+0xc6/0x100
[  241.959391]  [<ffffffff8119c6e0>] block_write_full_page+0x10/0x20
[  241.959449]  [<ffffffff811a1553>] blkdev_writepage+0x13/0x20
[  241.959507]  [<ffffffff81113815>] __writepage+0x15/0x40
[  241.959566]  [<ffffffff81114abd>] write_cache_pages+0x26d/0x540
[  241.959625]  [<ffffffff8107f833>] ? update_sd_lb_stats+0x133/0x670
[  241.959685]  [<ffffffff81113800>] ? set_page_dirty+0x60/0x60
[  241.959745]  [<ffffffff81114dd8>] generic_writepages+0x48/0x60
[  241.959805]  [<ffffffff81114e0e>] do_writepages+0x1e/0x40
[  241.959864]  [<ffffffff81109ba1>] __filemap_fdatawrite_range+0x51/0x60
[  241.959925]  [<ffffffff81109e1a>] filemap_fdatawrite+0x1a/0x20
[  241.959985]  [<ffffffff81109e7d>] filemap_write_and_wait+0x5d/0x80
[  241.960044]  [<ffffffff811a187c>] __sync_blockdev+0x1c/0x40
[  241.960102]  [<ffffffff811a18ae>] sync_blockdev+0xe/0x10
[  241.960167]  [<ffffffffa01b0c34>] do_md_stop+0x74/0x4e0 [md_mod]
[  241.960235]  [<ffffffffa01b48d4>] md_ioctl+0x784/0x16a0 [md_mod]
[  241.960294]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  241.960356]  [<ffffffff8106e5b3>] ? hrtimer_try_to_cancel+0x43/0xf0
[  241.960416]  [<ffffffff813ef6c3>] __blkdev_driver_ioctl+0x23/0x30
[  241.960476]  [<ffffffff813efd7c>] blkdev_ioctl+0x21c/0x800
[  241.960533]  [<ffffffff811a07cd>] block_ioctl+0x3d/0x50
[  241.960592]  [<ffffffff8117a91c>] do_vfs_ioctl+0x9c/0x560
[  241.960649]  [<ffffffff8106e0b0>] ? update_rmtp+0x80/0x80
[  241.960709]  [<ffffffff8106f48f>] ? hrtimer_start_range_ns+0xf/0x20
[  241.960771]  [<ffffffff8117ae71>] SyS_ioctl+0x91/0xa0
[  241.960831]  [<ffffffff81416829>] ? lockdep_sys_exit_thunk+0x35/0x67
[  241.960897]  [<ffffffff81747192>] system_call_fastpath+0x16/0x1b
[  241.960954] 2 locks held by mdadm/2902:
[  241.961004]  #0:  (&mddev->reconfig_mutex){......}, at: [<ffffffffa01b423e>] md_ioctl+0xee/0x16a0 [md_mod]
[  241.961235]  #1:  (&mddev->open_mutex){......}, at: [<ffffffffa01b0c02>] do_md_stop+0x42/0x4e0 [md_mod]
[  361.888286] INFO: task kworker/u12:3:247 blocked for more than 120 seconds.
[  361.888389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.888499] kworker/u12:3   D 0000000000000000     0   247      2 0x00000000
[  361.888628] Workqueue: writeback bdi_writeback_workfn (flush-9:1)
[  361.888742]  ffff88020d331418 0000000000000046 0000000000001000 ffff88020d330000
[  361.888932]  ffff88020d331fd8 ffff88020d330000 ffff88020d330010 ffff88020d330000
[  361.889121]  ffff88020d331fd8 ffff88020d330000 ffff88020c10b7e0 ffff8802158ddd20
[  361.889308] Call Trace:
[  361.889368]  [<ffffffff8173ca64>] schedule+0x24/0x70
[  361.889438]  [<ffffffffa01b244d>] md_write_start+0xad/0x1d0 [md_mod]
[  361.889499]  [<ffffffff8106b460>] ? wake_up_bit+0x40/0x40
[  361.889560]  [<ffffffffa0081b0f>] make_request+0x5f/0xe10 [raid1]
[  361.889620]  [<ffffffff81401274>] ? blk_throtl_bio+0x114/0x580
[  361.889681]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  361.889741]  [<ffffffff810a586e>] ? __lock_acquire+0x2be/0x780
[  361.889802]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  361.889870]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  361.889937]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  361.890005]  [<ffffffffa01b3b33>] md_make_request+0x183/0x340 [md_mod]
[  361.890072]  [<ffffffffa01b3a00>] ? md_make_request+0x50/0x340 [md_mod]
[  361.890133]  [<ffffffff8110c070>] ? mempool_alloc_slab+0x10/0x20
[  361.890191]  [<ffffffff8110c1cb>] ? mempool_alloc+0x5b/0x170
[  361.890251]  [<ffffffff813e7242>] generic_make_request+0xc2/0x100
[  361.890310]  [<ffffffff813e72f6>] submit_bio+0x76/0x160
[  361.890369]  [<ffffffff8119f1ec>] ? bio_alloc_bioset+0x9c/0x1c0
[  361.890428]  [<ffffffff81199c10>] _submit_bh+0x140/0x200
[  361.890486]  [<ffffffff81199cdb>] submit_bh+0xb/0x10
[  361.890545]  [<ffffffff8119c47f>] __block_write_full_page+0x1cf/0x320
[  361.890606]  [<ffffffff8110a746>] ? find_get_pages_tag+0x116/0x1e0
[  361.890666]  [<ffffffff8119ba90>] ? block_invalidatepage+0x140/0x140
[  361.890724]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  361.890781]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  361.890839]  [<ffffffff8119c696>] block_write_full_page_endio+0xc6/0x100
[  361.890899]  [<ffffffff8119c6e0>] block_write_full_page+0x10/0x20
[  361.890958]  [<ffffffff811a1553>] blkdev_writepage+0x13/0x20
[  361.891017]  [<ffffffff81113815>] __writepage+0x15/0x40
[  361.891076]  [<ffffffff81114abd>] write_cache_pages+0x26d/0x540
[  361.891135]  [<ffffffff81113800>] ? set_page_dirty+0x60/0x60
[  361.891195]  [<ffffffff81114dd8>] generic_writepages+0x48/0x60
[  361.891255]  [<ffffffff81114e0e>] do_writepages+0x1e/0x40
[  361.891312]  [<ffffffff811910d4>] __writeback_single_inode+0x44/0x2b0
[  361.891371]  [<ffffffff81192086>] writeback_sb_inodes+0x376/0x570
[  361.891431]  [<ffffffff8173f066>] ? _raw_spin_unlock+0x26/0x40
[  361.891490]  [<ffffffff81192316>] __writeback_inodes_wb+0x96/0xc0
[  361.891548]  [<ffffffff811928f3>] wb_writeback+0x223/0x330
[  361.891606]  [<ffffffff81192b1a>] wb_do_writeback+0x11a/0x250
[  361.891665]  [<ffffffff81193180>] bdi_writeback_workfn+0x80/0x200
[  361.891725]  [<ffffffff810633c6>] process_one_work+0x1e6/0x5d0
[  361.891784]  [<ffffffff81063351>] ? process_one_work+0x171/0x5d0
[  361.891843]  [<ffffffff8106478e>] worker_thread+0x11e/0x3e0
[  361.891902]  [<ffffffff81064670>] ? manage_workers+0x2b0/0x2b0
[  361.891959]  [<ffffffff8106ad3e>] kthread+0xee/0x100
[  361.892017]  [<ffffffff8106ac50>] ? __init_kthread_worker+0x70/0x70
[  361.892078]  [<ffffffff817470ec>] ret_from_fork+0x7c/0xb0
[  361.892135]  [<ffffffff8106ac50>] ? __init_kthread_worker+0x70/0x70
[  361.892193] 3 locks held by kworker/u12:3/247:
[  361.892244]  #0:  (writeback){......}, at: [<ffffffff81063351>] process_one_work+0x171/0x5d0
[  361.892464]  #1:  ((&(&wb->dwork)->work)){......}, at: [<ffffffff81063351>] process_one_work+0x171/0x5d0
[  361.892687]  #2:  (&type->s_umount_key#21){......}, at: [<ffffffff8116b5ae>] grab_super_passive+0x3e/0x90
[  361.892956] INFO: task mdadm:2902 blocked for more than 120 seconds.
[  361.893011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.893119] mdadm           D 0000000000000000     0  2902   2885 0x00000004
[  361.893236]  ffff8802117f95e8 0000000000000046 0000000000001000 ffff8802117f8000
[  361.893423]  ffff8802117f9fd8 ffff8802117f8000 ffff8802117f8010 ffff8802117f8000
[  361.893611]  ffff8802117f9fd8 ffff8802117f8000 ffff88020c1dca80 ffff8802158ddd20
[  361.894239] Call Trace:
[  361.894294]  [<ffffffff8173ca64>] schedule+0x24/0x70
[  361.894360]  [<ffffffffa01b244d>] md_write_start+0xad/0x1d0 [md_mod]
[  361.894419]  [<ffffffff8106b460>] ? wake_up_bit+0x40/0x40
[  361.894478]  [<ffffffffa0081b0f>] make_request+0x5f/0xe10 [raid1]
[  361.894536]  [<ffffffff81401274>] ? blk_throtl_bio+0x114/0x580
[  361.894596]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  361.894655]  [<ffffffff810a586e>] ? __lock_acquire+0x2be/0x780
[  361.894714]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  361.894781]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  361.894849]  [<ffffffffa01b3af1>] ? md_make_request+0x141/0x340 [md_mod]
[  361.894917]  [<ffffffffa01b3b33>] md_make_request+0x183/0x340 [md_mod]
[  361.894984]  [<ffffffffa01b3a00>] ? md_make_request+0x50/0x340 [md_mod]
[  361.895043]  [<ffffffff8110c070>] ? mempool_alloc_slab+0x10/0x20
[  361.895101]  [<ffffffff8110c1cb>] ? mempool_alloc+0x5b/0x170
[  361.895161]  [<ffffffff813e7242>] generic_make_request+0xc2/0x100
[  361.895220]  [<ffffffff813e72f6>] submit_bio+0x76/0x160
[  361.895277]  [<ffffffff8119f1ec>] ? bio_alloc_bioset+0x9c/0x1c0
[  361.895337]  [<ffffffff81199c10>] _submit_bh+0x140/0x200
[  361.895395]  [<ffffffff81199cdb>] submit_bh+0xb/0x10
[  361.895453]  [<ffffffff8119c47f>] __block_write_full_page+0x1cf/0x320
[  361.895513]  [<ffffffff8110a746>] ? find_get_pages_tag+0x116/0x1e0
[  361.895573]  [<ffffffff8119ba90>] ? block_invalidatepage+0x140/0x140
[  361.895632]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  361.895688]  [<ffffffff811a0660>] ? I_BDEV+0x10/0x10
[  361.895746]  [<ffffffff8119c696>] block_write_full_page_endio+0xc6/0x100
[  361.895808]  [<ffffffff8119c6e0>] block_write_full_page+0x10/0x20
[  361.895866]  [<ffffffff811a1553>] blkdev_writepage+0x13/0x20
[  361.895924]  [<ffffffff81113815>] __writepage+0x15/0x40
[  361.895981]  [<ffffffff81114abd>] write_cache_pages+0x26d/0x540
[  361.896041]  [<ffffffff8107f833>] ? update_sd_lb_stats+0x133/0x670
[  361.896100]  [<ffffffff81113800>] ? set_page_dirty+0x60/0x60
[  361.896159]  [<ffffffff81114dd8>] generic_writepages+0x48/0x60
[  361.896218]  [<ffffffff81114e0e>] do_writepages+0x1e/0x40
[  361.896278]  [<ffffffff81109ba1>] __filemap_fdatawrite_range+0x51/0x60
[  361.896338]  [<ffffffff81109e1a>] filemap_fdatawrite+0x1a/0x20
[  361.896397]  [<ffffffff81109e7d>] filemap_write_and_wait+0x5d/0x80
[  361.896456]  [<ffffffff811a187c>] __sync_blockdev+0x1c/0x40
[  361.896515]  [<ffffffff811a18ae>] sync_blockdev+0xe/0x10
[  361.896580]  [<ffffffffa01b0c34>] do_md_stop+0x74/0x4e0 [md_mod]
[  361.896647]  [<ffffffffa01b48d4>] md_ioctl+0x784/0x16a0 [md_mod]
[  361.896707]  [<ffffffff8107daf5>] ? sched_clock_cpu+0xc5/0x100
[  361.896767]  [<ffffffff8106e5b3>] ? hrtimer_try_to_cancel+0x43/0xf0
[  361.896828]  [<ffffffff813ef6c3>] __blkdev_driver_ioctl+0x23/0x30
[  361.896886]  [<ffffffff813efd7c>] blkdev_ioctl+0x21c/0x800
[  361.896943]  [<ffffffff811a07cd>] block_ioctl+0x3d/0x50
[  361.897001]  [<ffffffff8117a91c>] do_vfs_ioctl+0x9c/0x560
[  361.897059]  [<ffffffff8106e0b0>] ? update_rmtp+0x80/0x80
[  361.897116]  [<ffffffff8106f48f>] ? hrtimer_start_range_ns+0xf/0x20
[  361.897175]  [<ffffffff8117ae71>] SyS_ioctl+0x91/0xa0
[  361.897233]  [<ffffffff81416829>] ? lockdep_sys_exit_thunk+0x35/0x67
[  361.897293]  [<ffffffff81747192>] system_call_fastpath+0x16/0x1b
[  361.897350] 2 locks held by mdadm/2902:
[  361.897401]  #0:  (&mddev->reconfig_mutex){......}, at: [<ffffffffa01b423e>] md_ioctl+0xee/0x16a0 [md_mod]
[  361.897631]  #1:  (&mddev->open_mutex){......}, at: [<ffffffffa01b0c02>] do_md_stop+0x42/0x4e0 [md_mod]


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] md hang at schedule in  md_write_start
  2013-08-12 16:33 [BUG] md hang at schedule in md_write_start Jack Wang
@ 2013-08-13  4:31 ` NeilBrown
  2013-08-13  7:42   ` Jack Wang
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2013-08-13  4:31 UTC (permalink / raw)
  To: Jack Wang; +Cc: linux-raid, Jack Wang, Sebastian Riemer

[-- Attachment #1: Type: text/plain, Size: 3636 bytes --]

On Mon, 12 Aug 2013 18:33:49 +0200 Jack Wang <jinpu.wang@profitbricks.com>
wrote:

> Hi Neil,
> 
> 
> We've found md hang in our test, it's easy to reproduce with script
> attached.
> 
> We've tried 3.4 stable kernel and latest mainline, it still exists.
> 
> Looks like flush bdi_writeback_workfn race with md_stop, no idea how to
> fix it, could you kindly give us suggestions?
> 
> Best regards,
> Jack

Thanks for the report.  I can see how that deadlock could happen.

Can you please try this patch and confirm that it fixes it.
I'm not really happy with this approach but nothing better occurs to me yet.

NeilBrown

diff --git a/drivers/md/md.c b/drivers/md/md.c
index a57b0fa..c66af69 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5144,7 +5144,7 @@ int md_run(struct mddev *mddev)
 	
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	
-	if (mddev->flags)
+	if (mddev->flags & MD_UPDATE_SB_FLAGS)
 		md_update_sb(mddev, 0);
 
 	md_new_event(mddev);
@@ -5289,7 +5289,7 @@ static void __md_stop_writes(struct mddev *mddev)
 	md_super_wait(mddev);
 
 	if (mddev->ro == 0 &&
-	    (!mddev->in_sync || mddev->flags)) {
+	    (!mddev->in_sync || (mddev->flags & MD_UPDATE_SB_FLAGS))) {
 		/* mark array as shutdown cleanly */
 		mddev->in_sync = 1;
 		md_update_sb(mddev, 1);
@@ -5337,8 +5337,11 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
 		err = -EBUSY;
 		goto out;
 	}
-	if (bdev)
+	if (bdev) {
+		set_bit(MD_FINAL_FLUSH, &mddev->flags);
 		sync_blockdev(bdev);
+		clear_bit(MD_FINAL_FLUSH, &mddev->flags);
+	}
 	if (mddev->pers) {
 		__md_stop_writes(mddev);
 
@@ -5373,13 +5376,16 @@ static int do_md_stop(struct mddev * mddev, int mode,
 		mutex_unlock(&mddev->open_mutex);
 		return -EBUSY;
 	}
-	if (bdev)
+	if (bdev) {
 		/* It is possible IO was issued on some other
 		 * open file which was closed before we took ->open_mutex.
 		 * As that was not the last close __blkdev_put will not
 		 * have called sync_blockdev, so we must.
 		 */
+		set_bit(MD_FINAL_FLUSH, &mddev->flags);
 		sync_blockdev(bdev);
+		clear_bit(MD_FINAL_FLUSH, &mddev->flags);
+	}
 
 	if (mddev->pers) {
 		if (mddev->ro)
@@ -7814,7 +7820,7 @@ void md_check_recovery(struct mddev *mddev)
 				sysfs_notify_dirent_safe(mddev->sysfs_state);
 		}
 
-		if (mddev->flags)
+		if (mddev->flags & MD_UPDATE_SB_FLAGS)
 			md_update_sb(mddev, 0);
 
 		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
@@ -7904,7 +7910,10 @@ void md_check_recovery(struct mddev *mddev)
 					sysfs_notify_dirent_safe(mddev->sysfs_action);
 		}
 		mddev_unlock(mddev);
-	}
+	} else if (test_bit(MD_FINAL_FLUSH, &mddev->flags) &&
+		   mddev->in_sync == 0 &&
+		   (mddev->flags & MD_UPDATE_SB_FLAGS))
+		md_update_sb(mddev, 0);
 }
 
 void md_reap_sync_thread(struct mddev *mddev)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 77924d3..e1e003a 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -209,7 +209,12 @@ struct mddev {
 #define MD_CHANGE_DEVS	0	/* Some device status has changed */
 #define MD_CHANGE_CLEAN 1	/* transition to or from 'clean' */
 #define MD_CHANGE_PENDING 2	/* switch from 'clean' to 'active' in progress */
+#define MD_UPDATE_SB_FLAGS (1 | 2 | 4)	/* If these are set, md_update_sb needed */
 #define MD_ARRAY_FIRST_USE 3    /* First use of array, needs initialization */
+#define MD_FINAL_FLUSH	4	/* md_check_recovery is permitted to call
+				 * md_update_sb() to switch to 'active'
+				 * without taking reconfig_mutex
+				 */
 
 	int				suspended;
 	atomic_t			active_io;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [BUG] md hang at schedule in  md_write_start
  2013-08-13  4:31 ` NeilBrown
@ 2013-08-13  7:42   ` Jack Wang
  2013-08-14  0:44     ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: Jack Wang @ 2013-08-13  7:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Jack Wang, Sebastian Riemer

On 08/13/2013 06:31 AM, NeilBrown wrote:
> On Mon, 12 Aug 2013 18:33:49 +0200 Jack Wang <jinpu.wang@profitbricks.com>
> wrote:
> 
>> Hi Neil,
>>
>>
>> We've found md hang in our test, it's easy to reproduce with script
>> attached.
>>
>> We've tried 3.4 stable kernel and latest mainline, it still exists.
>>
>> Looks like flush bdi_writeback_workfn race with md_stop, no idea how to
>> fix it, could you kindly give us suggestions?
>>
>> Best regards,
>> Jack
> 
> Thanks for the report.  I can see how that deadlock could happen.
> 
> Can you please try this patch and confirm that it fixes it.
> I'm not really happy with this approach but nothing better occurs to me yet.
> 
> NeilBrown
> 

Hi Neil,

Thanks for quick fix, I tested on 3.4 stable and mainline, it works now.
Could you give more description about the bug and fix.

Regards,
Jack


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] md hang at schedule in  md_write_start
  2013-08-13  7:42   ` Jack Wang
@ 2013-08-14  0:44     ` NeilBrown
  2013-08-14  8:09       ` Jack Wang
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2013-08-14  0:44 UTC (permalink / raw)
  To: Jack Wang; +Cc: linux-raid, Jack Wang, Sebastian Riemer

[-- Attachment #1: Type: text/plain, Size: 8524 bytes --]

On Tue, 13 Aug 2013 09:42:53 +0200 Jack Wang <jinpu.wang@profitbricks.com>
wrote:

> On 08/13/2013 06:31 AM, NeilBrown wrote:
> > On Mon, 12 Aug 2013 18:33:49 +0200 Jack Wang <jinpu.wang@profitbricks.com>
> > wrote:
> > 
> >> Hi Neil,
> >>
> >>
> >> We've found md hang in our test, it's easy to reproduce with script
> >> attached.
> >>
> >> We've tried 3.4 stable kernel and latest mainline, it still exists.
> >>
> >> Looks like flush bdi_writeback_workfn race with md_stop, no idea how to
> >> fix it, could you kindly give us suggestions?
> >>
> >> Best regards,
> >> Jack
> > 
> > Thanks for the report.  I can see how that deadlock could happen.
> > 
> > Can you please try this patch and confirm that it fixes it.
> > I'm not really happy with this approach but nothing better occurs to me yet.
> > 
> > NeilBrown
> > 
> 
> Hi Neil,
> 
> Thanks for quick fix, I tested on 3.4 stable and mainline, it works now.
> Could you give more description about the bug and fix.
>
Thanks for testing.

The problem:
 If you open a block device (e.g. /dev/md0) and write to it the writes will
 be buffered in the page cache until an 'fsync' or similar.
 When the last open file descriptor on the block device is closed, that
 triggers a flush even if there was no fsync.
 So if you
    dd > /dev/md0
    mdadm --stop /dev/md0
 The 'close' that happens when dd exits will flush the cache.  So when mdadm
 opens /dev/md0 the cache will be empty.  This is the normal situation.

 However if "mdadm --stop /dev/md0" open /dev/md0 before 'dd' exits, then
 nothing will trigger the flush and that causes problems as I'll get to in a
 minute.
 Normally if this happened, mdadm would call the STOP_ARRAY ioctl which would
 notice that there is an extra open (from dd) and would abort.
 However "mdadm -S" retries a few times if it confirmed that the array wasn't
 mounted.  Eventually it opens just before 'dd' closes.  The presence of the
 "mdadm -D" might affect this - it might hold a lock that "mdadm -S" waits a
 little while for.

 Anyway by the time that "mdadm --stop" has called STOP_ARRAY on the open
 file descriptor and got to do_md_stop() it is holding ->reconfig_mutex
 (because md_ioctl() calls mddev_lock()).
 While holding this mutex it calls sync_blockdev() to ensure the page cache
 is flushed.  This is where the problem occurs.
 If the array is currently marked 'clean' and there a dirty pages in the page
 cache, md_write_start() while request that the superblock be marked 'dirty'.
 This is handled by md_check_recovery() which is called by the array
 managment thread.  However it will only update the superblock if it can get
 ->reconfig_mutex.

 So the "mdadm --stop" thread is holding ->reconfig_mutex and waiting for
 dirty data to be flushed.  The flush thread is waiting for the superblock
 the be updated by the array management thread.  The array management thread
 won't update the superblock until it can get ->reconfig_mutex.
 i.e. a deadlock.

 One way to "fix" it would be to call md_allow_write() in do_md_stop() before
 calling sync_blockdev().  This would remove the deadlock, but would often
 modify the superblock unnecessarily.

 I would be nice if I could check beforehand if sync_blockdev() will actually
 write anything and then call md_allow_write() if it would.  But I don't
 think that is possible.

 So the approach I took in the patch I gave you was to set a flag in
 do_md_stop to tell md_check_recovery() that it was ok to update the
 superblock without holding a lock, because the lock is already held.
 I don't really like that though.  It feels like it should be racy.

 I could call sync_blockdev() *before* taking the ->reconfig_mutex but that
 would be racy as another process could theoretically write after the
 sync_blockdev, and close before do_md_stop() checks for other opens....

 However maybe I could make use for ->open_mutex.  This guards opening and
 destroying of the array, which are the issue here.

 Before the mddev_lock() in md_ioctl() I could (in the STOP_ARRAY case)
    lock ->open_mutex
    check that mddev->openers is 1 - abort if not
    set a flag
    release ->open_mutex
    call sync_blockdev.

 Then in md_open()
    after getting ->open_mutex, clear the flag.

 Then in do_md_stop()
    after getting ->open_mutex, if the flag is set, abort with EBUSY.

 This would ensure that the page cache is not dirty when do_md_stop decides
 to stop the array by flushing it early and making sure no-one else can open
 it.

 I think I like this approach better.

 Could you retry the following patch instead?

Thanks
NeilBrown

diff --git a/drivers/md/md.c b/drivers/md/md.c
index a57b0fa..296aac1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5144,7 +5144,7 @@ int md_run(struct mddev *mddev)
 	
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	
-	if (mddev->flags)
+	if (mddev->flags & MD_UPDATE_SB_FLAGS)
 		md_update_sb(mddev, 0);
 
 	md_new_event(mddev);
@@ -5289,7 +5289,7 @@ static void __md_stop_writes(struct mddev *mddev)
 	md_super_wait(mddev);
 
 	if (mddev->ro == 0 &&
-	    (!mddev->in_sync || mddev->flags)) {
+	    (!mddev->in_sync || (mddev->flags & MD_UPDATE_SB_FLAGS))) {
 		/* mark array as shutdown cleanly */
 		mddev->in_sync = 1;
 		md_update_sb(mddev, 1);
@@ -5337,8 +5337,14 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
 		err = -EBUSY;
 		goto out;
 	}
-	if (bdev)
-		sync_blockdev(bdev);
+	if (bdev && !test_bit(MD_STILL_CLOSED, &mddev->flags)) {
+		/* Someone opened the device since we flushed it
+		 * so page cache could be dirty and it is too late
+		 * to flush.  So abort
+		 */
+		mutex_unlock(&mddev->open_mutex);
+		return -EBUSY;
+	}
 	if (mddev->pers) {
 		__md_stop_writes(mddev);
 
@@ -5373,14 +5379,14 @@ static int do_md_stop(struct mddev * mddev, int mode,
 		mutex_unlock(&mddev->open_mutex);
 		return -EBUSY;
 	}
-	if (bdev)
-		/* It is possible IO was issued on some other
-		 * open file which was closed before we took ->open_mutex.
-		 * As that was not the last close __blkdev_put will not
-		 * have called sync_blockdev, so we must.
+	if (bdev && !test_bit(MD_STILL_CLOSED, &mddev->flags)) {
+		/* Someone opened the device since we flushed it
+		 * so page cache could be dirty and it is too late
+		 * to flush.  So abort
 		 */
-		sync_blockdev(bdev);
-
+		mutex_unlock(&mddev->open_mutex);
+		return -EBUSY;
+	}
 	if (mddev->pers) {
 		if (mddev->ro)
 			set_disk_ro(disk, 0);
@@ -6417,6 +6423,20 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
 						 !test_bit(MD_RECOVERY_NEEDED,
 							   &mddev->flags),
 						 msecs_to_jiffies(5000));
+	if (cmd == STOP_ARRAY || cmd == STOP_ARRAY_RO) {
+		/* Need to flush page cache, and ensure no-one else opens
+		 * and writes
+		 */
+		mutex_lock(&mddev->open_mutex);
+		if (atomic_read(&mddev->openers) > 1) {
+			mutex_unlock(&mddev->open_mutex);
+			err = -EBUSY;
+			goto abort;
+		}
+		set_bit(MD_STILL_CLOSED, &mddev->flags);
+		mutex_unlock(&mddev->open_mutex);
+		sync_blockdev(bdev);
+	}
 	err = mddev_lock(mddev);
 	if (err) {
 		printk(KERN_INFO 
@@ -6670,6 +6690,7 @@ static int md_open(struct block_device *bdev, fmode_t mode)
 
 	err = 0;
 	atomic_inc(&mddev->openers);
+	clear_bit(MD_STILL_CLOSED, &mddev->flags);
 	mutex_unlock(&mddev->open_mutex);
 
 	check_disk_change(bdev);
@@ -7814,7 +7835,7 @@ void md_check_recovery(struct mddev *mddev)
 				sysfs_notify_dirent_safe(mddev->sysfs_state);
 		}
 
-		if (mddev->flags)
+		if (mddev->flags & MD_UPDATE_SB_FLAGS)
 			md_update_sb(mddev, 0);
 
 		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 77924d3..8c3c6cf 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -209,7 +209,11 @@ struct mddev {
 #define MD_CHANGE_DEVS	0	/* Some device status has changed */
 #define MD_CHANGE_CLEAN 1	/* transition to or from 'clean' */
 #define MD_CHANGE_PENDING 2	/* switch from 'clean' to 'active' in progress */
+#define MD_UPDATE_SB_FLAGS (1 | 2 | 4)	/* If these are set, md_update_sb needed */
 #define MD_ARRAY_FIRST_USE 3    /* First use of array, needs initialization */
+#define MD_STILL_CLOSED	4	/* If we, then array has not been opened since
+				 * md_ioctl checked on it.
+				 */
 
 	int				suspended;
 	atomic_t			active_io;



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [BUG] md hang at schedule in  md_write_start
  2013-08-14  0:44     ` NeilBrown
@ 2013-08-14  8:09       ` Jack Wang
  2013-09-10 11:09         ` Jack Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Jack Wang @ 2013-08-14  8:09 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Jack Wang, Sebastian Riemer

On 08/14/2013 02:44 AM, NeilBrown wrote:
> On Tue, 13 Aug 2013 09:42:53 +0200 Jack Wang <jinpu.wang@profitbricks.com>
> wrote:
> 
>> On 08/13/2013 06:31 AM, NeilBrown wrote:
>>> On Mon, 12 Aug 2013 18:33:49 +0200 Jack Wang <jinpu.wang@profitbricks.com>
>>> wrote:
>>>
>>>> Hi Neil,
>>>>
>>>>
>>>> We've found md hang in our test, it's easy to reproduce with script
>>>> attached.
>>>>
>>>> We've tried 3.4 stable kernel and latest mainline, it still exists.
>>>>
>>>> Looks like flush bdi_writeback_workfn race with md_stop, no idea how to
>>>> fix it, could you kindly give us suggestions?
>>>>
>>>> Best regards,
>>>> Jack
>>>
>>> Thanks for the report.  I can see how that deadlock could happen.
>>>
>>> Can you please try this patch and confirm that it fixes it.
>>> I'm not really happy with this approach but nothing better occurs to me yet.
>>>
>>> NeilBrown
>>>
>>
>> Hi Neil,
>>
>> Thanks for quick fix, I tested on 3.4 stable and mainline, it works now.
>> Could you give more description about the bug and fix.
>>
> Thanks for testing.
> 
> The problem:
>  If you open a block device (e.g. /dev/md0) and write to it the writes will
>  be buffered in the page cache until an 'fsync' or similar.
>  When the last open file descriptor on the block device is closed, that
>  triggers a flush even if there was no fsync.
>  So if you
>     dd > /dev/md0
>     mdadm --stop /dev/md0
>  The 'close' that happens when dd exits will flush the cache.  So when mdadm
>  opens /dev/md0 the cache will be empty.  This is the normal situation.
> 
>  However if "mdadm --stop /dev/md0" open /dev/md0 before 'dd' exits, then
>  nothing will trigger the flush and that causes problems as I'll get to in a
>  minute.
>  Normally if this happened, mdadm would call the STOP_ARRAY ioctl which would
>  notice that there is an extra open (from dd) and would abort.
>  However "mdadm -S" retries a few times if it confirmed that the array wasn't
>  mounted.  Eventually it opens just before 'dd' closes.  The presence of the
>  "mdadm -D" might affect this - it might hold a lock that "mdadm -S" waits a
>  little while for.
> 
>  Anyway by the time that "mdadm --stop" has called STOP_ARRAY on the open
>  file descriptor and got to do_md_stop() it is holding ->reconfig_mutex
>  (because md_ioctl() calls mddev_lock()).
>  While holding this mutex it calls sync_blockdev() to ensure the page cache
>  is flushed.  This is where the problem occurs.
>  If the array is currently marked 'clean' and there a dirty pages in the page
>  cache, md_write_start() while request that the superblock be marked 'dirty'.
>  This is handled by md_check_recovery() which is called by the array
>  managment thread.  However it will only update the superblock if it can get
>  ->reconfig_mutex.
> 
>  So the "mdadm --stop" thread is holding ->reconfig_mutex and waiting for
>  dirty data to be flushed.  The flush thread is waiting for the superblock
>  the be updated by the array management thread.  The array management thread
>  won't update the superblock until it can get ->reconfig_mutex.
>  i.e. a deadlock.
> 
>  One way to "fix" it would be to call md_allow_write() in do_md_stop() before
>  calling sync_blockdev().  This would remove the deadlock, but would often
>  modify the superblock unnecessarily.
> 
>  I would be nice if I could check beforehand if sync_blockdev() will actually
>  write anything and then call md_allow_write() if it would.  But I don't
>  think that is possible.
> 
>  So the approach I took in the patch I gave you was to set a flag in
>  do_md_stop to tell md_check_recovery() that it was ok to update the
>  superblock without holding a lock, because the lock is already held.
>  I don't really like that though.  It feels like it should be racy.
> 
>  I could call sync_blockdev() *before* taking the ->reconfig_mutex but that
>  would be racy as another process could theoretically write after the
>  sync_blockdev, and close before do_md_stop() checks for other opens....
> 
>  However maybe I could make use for ->open_mutex.  This guards opening and
>  destroying of the array, which are the issue here.
> 
>  Before the mddev_lock() in md_ioctl() I could (in the STOP_ARRAY case)
>     lock ->open_mutex
>     check that mddev->openers is 1 - abort if not
>     set a flag
>     release ->open_mutex
>     call sync_blockdev.
> 
>  Then in md_open()
>     after getting ->open_mutex, clear the flag.
> 
>  Then in do_md_stop()
>     after getting ->open_mutex, if the flag is set, abort with EBUSY.
> 
>  This would ensure that the page cache is not dirty when do_md_stop decides
>  to stop the array by flushing it early and making sure no-one else can open
>  it.
> 
>  I think I like this approach better.
> 
>  Could you retry the following patch instead?
> 
> Thanks
> NeilBrown

Thanks Neil for informative description:), it is really helpful.
I tried you new patch, it also works as expected.

Best regards,
Jack



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] md hang at schedule in  md_write_start
  2013-08-14  8:09       ` Jack Wang
@ 2013-09-10 11:09         ` Jack Wang
  2013-09-10 23:54           ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: Jack Wang @ 2013-09-10 11:09 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Jack Wang, Sebastian Riemer, stable

snip

Hi Neil,

I notice you send out pull request for md update, which include fix for
this bug.

I think we'd better include the fix to stable tree at least from 3.4
above, what do you think?

md: avoid deadlock when dirty buffers during md_stop.

> http://git.neil.brown.name/?p=md.git;a=commit;h=260fa034ef7a4ff8b73068b48ac497edd5217491

BTW: It will be great if you could add my Reported-by and Tested-by:)

Best regards,
Jack

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] md hang at schedule in  md_write_start
  2013-09-10 11:09         ` Jack Wang
@ 2013-09-10 23:54           ` NeilBrown
  2013-09-11  7:40             ` Jack Wang
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2013-09-10 23:54 UTC (permalink / raw)
  To: Jack Wang; +Cc: linux-raid, Jack Wang, Sebastian Riemer, stable

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

On Tue, 10 Sep 2013 13:09:05 +0200 Jack Wang <jinpu.wang@profitbricks.com>
wrote:

> snip
> 
> Hi Neil,
> 
> I notice you send out pull request for md update, which include fix for
> this bug.
> 
> I think we'd better include the fix to stable tree at least from 3.4
> above, what do you think?

I don't think it is a situation that is at all like to occur in normal usage,
so it doesn't seem justified for -stable.

Do you disagree?  Did you ever experience the deadlock in normal usage or
only in artificial situations?

> 
> md: avoid deadlock when dirty buffers during md_stop.
> 
> > http://git.neil.brown.name/?p=md.git;a=commit;h=260fa034ef7a4ff8b73068b48ac497edd5217491
> 
> BTW: It will be great if you could add my Reported-by and Tested-by:)

Sorry I forgot those.  To late to add them now.

NeilBrown



> 
> Best regards,
> Jack
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] md hang at schedule in  md_write_start
  2013-09-10 23:54           ` NeilBrown
@ 2013-09-11  7:40             ` Jack Wang
  2013-09-11 22:59               ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: Jack Wang @ 2013-09-11  7:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jack Wang, linux-raid, Sebastian Riemer, stable

On 09/11/2013 01:54 AM, NeilBrown wrote:
> On Tue, 10 Sep 2013 13:09:05 +0200 Jack Wang <jinpu.wang@profitbricks.com>
> wrote:
> 
>> snip
>>
>> Hi Neil,
>>
>> I notice you send out pull request for md update, which include fix for
>> this bug.
>>
>> I think we'd better include the fix to stable tree at least from 3.4
>> above, what do you think?
> 
> I don't think it is a situation that is at all like to occur in normal usage,
> so it doesn't seem justified for -stable.
> 
> Do you disagree?  Did you ever experience the deadlock in normal usage or
> only in artificial situations?

Yes, we do see this BUG in our production environment, so I think it's
good to include it in stable tree.


> 
>>
>> md: avoid deadlock when dirty buffers during md_stop.
>>
>>> http://git.neil.brown.name/?p=md.git;a=commit;h=260fa034ef7a4ff8b73068b48ac497edd5217491
>>
>> BTW: It will be great if you could add my Reported-by and Tested-by:)
> 
> Sorry I forgot those.  To late to add them now.
> 
Fine, Thanks all the time:)

Regards,
Jack


> NeilBrown
> 
> 
> 
>>
>> Best regards,
>> Jack
>>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] md hang at schedule in  md_write_start
  2013-09-11  7:40             ` Jack Wang
@ 2013-09-11 22:59               ` NeilBrown
  2013-09-12  7:55                 ` Jack Wang
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2013-09-11 22:59 UTC (permalink / raw)
  To: Jack Wang; +Cc: Jack Wang, linux-raid, Sebastian Riemer, stable

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]

On Wed, 11 Sep 2013 09:40:08 +0200 Jack Wang <xjtuwjp@gmail.com> wrote:

> On 09/11/2013 01:54 AM, NeilBrown wrote:
> > On Tue, 10 Sep 2013 13:09:05 +0200 Jack Wang <jinpu.wang@profitbricks.com>
> > wrote:
> > 
> >> snip
> >>
> >> Hi Neil,
> >>
> >> I notice you send out pull request for md update, which include fix for
> >> this bug.
> >>
> >> I think we'd better include the fix to stable tree at least from 3.4
> >> above, what do you think?
> > 
> > I don't think it is a situation that is at all like to occur in normal usage,
> > so it doesn't seem justified for -stable.
> > 
> > Do you disagree?  Did you ever experience the deadlock in normal usage or
> > only in artificial situations?
> 
> Yes, we do see this BUG in our production environment, so I think it's
> good to include it in stable tree.
> 

I was hoping you would explain how....

Maybe I'm misunderstanding, but as I see it the deadlock can only occur if
you run "mdadm --stop" while some other process has the block device open
and is writing to it.  That seems like a dumb thing to do and my suggest
would be to not do it.
Is there a good reason why you try to stop the array while it is being
written to.
Would it make sense for the process to open the block device with O_EXCL.
This would encourage exclusive access, and would also prevent the deadlock
from happening.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] md hang at schedule in  md_write_start
  2013-09-11 22:59               ` NeilBrown
@ 2013-09-12  7:55                 ` Jack Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2013-09-12  7:55 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jack Wang, linux-raid, Sebastian Riemer, stable

On 09/12/2013 12:59 AM, NeilBrown wrote:
> On Wed, 11 Sep 2013 09:40:08 +0200 Jack Wang <xjtuwjp@gmail.com> wrote:
> 
>> On 09/11/2013 01:54 AM, NeilBrown wrote:
>>> On Tue, 10 Sep 2013 13:09:05 +0200 Jack Wang <jinpu.wang@profitbricks.com>
>>> wrote:
>>>
>>>> snip
>>>>
>>>> Hi Neil,
>>>>
>>>> I notice you send out pull request for md update, which include fix for
>>>> this bug.
>>>>
>>>> I think we'd better include the fix to stable tree at least from 3.4
>>>> above, what do you think?
>>>
>>> I don't think it is a situation that is at all like to occur in normal usage,
>>> so it doesn't seem justified for -stable.
>>>
>>> Do you disagree?  Did you ever experience the deadlock in normal usage or
>>> only in artificial situations?
>>
>> Yes, we do see this BUG in our production environment, so I think it's
>> good to include it in stable tree.
>>
> 
> I was hoping you would explain how....
> 
> Maybe I'm misunderstanding, but as I see it the deadlock can only occur if
> you run "mdadm --stop" while some other process has the block device open
> and is writing to it.  That seems like a dumb thing to do and my suggest
> would be to not do it.
> Is there a good reason why you try to stop the array while it is being
> written to.
> Would it make sense for the process to open the block device with O_EXCL.
> This would encourage exclusive access, and would also prevent the deadlock
> from happening.
> 
> NeilBrown
> 
Thanks Neil for suggestion.
I will look into the code which is developed by other colleagues, and we
will fix that if it is.

Regards,
Jack

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-09-12  7:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-12 16:33 [BUG] md hang at schedule in md_write_start Jack Wang
2013-08-13  4:31 ` NeilBrown
2013-08-13  7:42   ` Jack Wang
2013-08-14  0:44     ` NeilBrown
2013-08-14  8:09       ` Jack Wang
2013-09-10 11:09         ` Jack Wang
2013-09-10 23:54           ` NeilBrown
2013-09-11  7:40             ` Jack Wang
2013-09-11 22:59               ` NeilBrown
2013-09-12  7:55                 ` Jack Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).