linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiao Ni <xni@redhat.com>
To: NeilBrown <neilb@suse.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
Date: Sun, 8 Oct 2017 21:21:29 -0400 (EDT)	[thread overview]
Message-ID: <1345780738.18087591.1507512089744.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <874lrc28x8.fsf@notabene.neil.brown.name>

[-- Attachment #1: Type: text/plain, Size: 5911 bytes --]



----- Original Message -----
> From: "NeilBrown" <neilb@suse.com>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Friday, October 6, 2017 12:32:19 PM
> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
> 
> On Fri, Oct 06 2017, Xiao Ni wrote:
> 
> > On 10/05/2017 01:17 PM, NeilBrown wrote:
> >> On Thu, Sep 14 2017, Xiao Ni wrote:
> >>
> >>>> What do
> >>>>   cat /proc/8987/stack
> >>>>   cat /proc/8983/stack
> >>>>   cat /proc/8966/stack
> >>>>   cat /proc/8381/stack
> >>>>
> >>>> show??
> >> ...
> >>
> >>> /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add
> >>> lockdep_assert_held(&mddev->reconfig_mutex)?
> >>> [root@dell-pr1700-02 ~]# cat /proc/8983/stack
> >>> [<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
> >>> [<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
> >>> [<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
> >>> [<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
> >>> [<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
> >>> [<ffffffff81260457>] __vfs_write+0x37/0x170
> >>> [<ffffffff812619e2>] vfs_write+0xb2/0x1b0
> >>> [<ffffffff81263015>] SyS_write+0x55/0xc0
> >>> [<ffffffff810037c7>] do_syscall_64+0x67/0x150
> >>> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
> >>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>
> >>> [jbd2/md0-8]
> >>> [root@dell-pr1700-02 ~]# cat /proc/8966/stack
> >>> [<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
> >>> [<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
> >>> [<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
> >>> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
> >>> [<ffffffff81376675>] submit_bio+0x75/0x150
> >>> [<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
> >>> [<ffffffff8129e683>] submit_bh+0x13/0x20
> >>> [<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
> >>> [<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
> >>> [<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
> >>> [<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
> >>> [<ffffffff810c73f9>] kthread+0x109/0x140
> >>> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
> >>> [<ffffffffffffffff>] 0xffffffffffffffff
> >> Thanks for this (and sorry it took so long to get to it).
> >> It looks like
> >>
> >> Commit: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and
> >> md_write_start()")
> >>
> >> is badly broken.  I wonder how it ever passed testing.
> >>
> >> In write_start() is change the wait_event() call to
> >>
> >> 	wait_event(mddev->sb_wait,
> >> 		   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
> >> 		   !mddev->suspended);
> >>
> >>
> >> That should be
> >>
> >> 	wait_event(mddev->sb_wait,
> >> 		   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) ||
> >> 		   mddev->suspended);
> > Hi Neil
> >
> > Do we want write bio can be handled when mddev->suspended is 1? After
> > changing to this,
> > write bio can be handled when mddev->suspended is 1.
> 
> This is OK.
> New write bios will not get past md_handle_request().
> A write bios that did get past md_handle_request() is still allowed
> through md_write_start().  The mddev_suspend() call won't complete until
> that write bio has finished.

Hi Neil

Thanks for the explanation. I took some time to read the emails about the
patch cc27b0c78 which introduced this. It's similar with this problem I 
countered. But there is a call of function mddev_suspend in level_store. 
So add the check of mddev->suspended in md_write_start can fix the problem
"reshape raid5 -> raid6 atop bcache deadlocks at start on md_attr_store / 
raid5_make_request". 

In function suspend_lo_store it doesn't call mddev_suspend under mddev->reconfig_mutex.
So there is still a race possibility as you said at first analysis. 
> 
> >
> > When the stuck happens, mddev->suspended is 0 and MD_SB_CHANGE_PENDING
> > is set. So
> > the patch can't fix this problem. I tried the patch, the problem still
> > exists.
> >
> 
> 
> I need to see all the stack traces.

I've added the calltrace as a attachment. 

> 
> 
> > [ 7710.589274] mddev suspend : 0
> > [ 7710.592228] mddev ro : 0
> > [ 7710.594746] mddev insync : 0
> > [ 7710.597620] mddev SB CHANGE PENDING is set
> > [ 7710.601698] mddev SB CHANGE CLEAN is set
> > [ 7710.605601] mddev->persistent : 1
> > [ 7710.608905] mddev->external : 0
> > [ 7710.612030] conf quiesce : 2
> >
> > raid5 is still spinning.
> >
> > Hmm, I have a question. Why can't call md_check_recovery when
> > MD_SB_CHANGE_PENDING
> > is set in raid5d?
> 
> When MD_SB_CHANGE_PENDING is not set, there is no need to call
> md_check_recovery().  I wouldn't hurt except that it would be a waste of
> time.

I'm confused. If we want to call md_check_recovery when MD_SB_CHANGE_PENDING
is set, it should be 

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6299,7 +6299,7 @@ static void raid5d(struct md_thread *thread)
                        break;
                handled += batch_size;
 
-               if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
+               if (mddev->sb_flags & (1 << MD_SB_CHANGE_PENDING)) {
                        spin_unlock_irq(&conf->device_lock);
                        md_check_recovery(mddev);
                        spin_lock_irq(&conf->device_lock);

Right?

Regards
Xiao
> 
> NeilBrown
> 
> 
> >
> >                  if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
> >                          spin_unlock_irq(&conf->device_lock);
> >                          md_check_recovery(mddev);
> >                          spin_lock_irq(&conf->device_lock);
> >                  }
> >
> > Best Regards
> > Xiao
> >
> >
> >>
> >> i.e. it was (!A && !B), it should be (!A || B) !!!!!
> >>
> >> Could you please make that change and try again.
> > Hi Neil
> >
> > I tried the patch and it can't work.
> >>
> >> Thanks,
> >> NeilBrown
> 

[-- Attachment #2: calltrace --]
[-- Type: application/octet-stream, Size: 20356 bytes --]

Oct  5 22:46:18 localhost kernel: INFO: task kworker/u8:3:2453 blocked for more than 120 seconds.
Oct  5 22:46:18 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:18 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:18 localhost kernel: kworker/u8:3    D    0  2453      2 0x00000080
Oct  5 22:46:18 localhost kernel: Workqueue: writeback wb_workfn (flush-9:0)
Oct  5 22:46:18 localhost kernel: Call Trace:
Oct  5 22:46:18 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:18 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:18 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:46:18 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:18 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:46:18 localhost kernel: ? bio_split+0x5d/0x90
Oct  5 22:46:18 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct  5 22:46:18 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:18 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:46:18 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:46:18 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:46:18 localhost kernel: ? __test_set_page_writeback+0xc6/0x320
Oct  5 22:46:18 localhost kernel: ext4_io_submit+0x4c/0x60 [ext4]
Oct  5 22:46:18 localhost kernel: ext4_bio_write_page+0x1a4/0x3b0 [ext4]
Oct  5 22:46:18 localhost kernel: mpage_submit_page+0x57/0x70 [ext4]
Oct  5 22:46:18 localhost kernel: mpage_map_and_submit_buffers+0x168/0x290 [ext4]
Oct  5 22:46:18 localhost kernel: ext4_writepages+0x852/0xe80 [ext4]
Oct  5 22:46:18 localhost kernel: ? account_entity_enqueue+0xd8/0x100
Oct  5 22:46:18 localhost kernel: do_writepages+0x1c/0x70
Oct  5 22:46:18 localhost kernel: __writeback_single_inode+0x45/0x320
Oct  5 22:46:18 localhost kernel: writeback_sb_inodes+0x280/0x570
Oct  5 22:46:18 localhost kernel: __writeback_inodes_wb+0x8c/0xc0
Oct  5 22:46:18 localhost kernel: wb_writeback+0x276/0x310
Oct  5 22:46:18 localhost kernel: wb_workfn+0x19c/0x3b0
Oct  5 22:46:18 localhost kernel: process_one_work+0x149/0x360
Oct  5 22:46:18 localhost kernel: worker_thread+0x4d/0x3c0
Oct  5 22:46:18 localhost kernel: kthread+0x109/0x140
Oct  5 22:46:18 localhost kernel: ? rescuer_thread+0x380/0x380
Oct  5 22:46:18 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:46:18 localhost kernel: ? do_syscall_64+0x67/0x150
Oct  5 22:46:18 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:46:18 localhost kernel: INFO: task jbd2/md0-8:3740 blocked for more than 120 seconds.
Oct  5 22:46:18 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:18 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:18 localhost kernel: jbd2/md0-8      D    0  3740      2 0x00000080
Oct  5 22:46:18 localhost kernel: Call Trace:
Oct  5 22:46:18 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:18 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct  5 22:46:18 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:18 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:46:18 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:18 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:46:19 localhost kernel: ? find_next_bit+0xb/0x10
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:46:19 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:46:19 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:46:19 localhost kernel: ? select_idle_sibling+0x2e0/0x3b0
Oct  5 22:46:19 localhost kernel: submit_bh_wbc+0x140/0x170
Oct  5 22:46:19 localhost kernel: submit_bh+0x13/0x20
Oct  5 22:46:19 localhost kernel: jbd2_write_superblock+0x109/0x230 [jbd2]
Oct  5 22:46:19 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct  5 22:46:19 localhost kernel: ? enqueue_entity+0x1f3/0x720
Oct  5 22:46:19 localhost kernel: jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
Oct  5 22:46:19 localhost kernel: ? mutex_lock_io+0x25/0x30
Oct  5 22:46:19 localhost kernel: jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
Oct  5 22:46:19 localhost kernel: ? account_entity_dequeue+0xaa/0xe0
Oct  5 22:46:19 localhost kernel: ? dequeue_entity+0xed/0x460
Oct  5 22:46:19 localhost kernel: ? ttwu_do_activate+0x7a/0x90
Oct  5 22:46:19 localhost kernel: ? dequeue_task_fair+0x565/0x820
Oct  5 22:46:19 localhost kernel: ? __switch_to+0x229/0x440
Oct  5 22:46:19 localhost kernel: ? lock_timer_base+0x7d/0xa0
Oct  5 22:46:19 localhost kernel: ? try_to_del_timer_sync+0x53/0x80
Oct  5 22:46:19 localhost kernel: kjournald2+0xd2/0x260 [jbd2]
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: kthread+0x109/0x140
Oct  5 22:46:19 localhost kernel: ? commit_timeout+0x10/0x10 [jbd2]
Oct  5 22:46:19 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:46:19 localhost kernel: ? do_syscall_64+0x67/0x150
Oct  5 22:46:19 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:46:19 localhost kernel: INFO: task ext4lazyinit:3743 blocked for more than 120 seconds.
Oct  5 22:46:19 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:19 localhost kernel: ext4lazyinit    D    0  3743      2 0x00000080
Oct  5 22:46:19 localhost kernel: Call Trace:
Oct  5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:19 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct  5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:19 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:46:19 localhost kernel: ? bio_split+0x5d/0x90
Oct  5 22:46:19 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:46:19 localhost kernel: ? mempool_alloc_slab+0x15/0x20
Oct  5 22:46:19 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct  5 22:46:19 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:46:19 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:46:19 localhost kernel: next_bio+0x38/0x40
Oct  5 22:46:19 localhost kernel: __blkdev_issue_zeroout+0x164/0x210
Oct  5 22:46:19 localhost kernel: blkdev_issue_zeroout+0x62/0xc0
Oct  5 22:46:19 localhost kernel: ext4_init_inode_table+0x166/0x380 [ext4]
Oct  5 22:46:19 localhost kernel: ext4_lazyinit_thread+0x2d3/0x350 [ext4]
Oct  5 22:46:19 localhost kernel: kthread+0x109/0x140
Oct  5 22:46:19 localhost kernel: ? ext4_clear_request_list+0x70/0x70 [ext4]
Oct  5 22:46:19 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:46:19 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:46:19 localhost kernel: INFO: task md0_reshape:3751 blocked for more than 120 seconds.
Oct  5 22:46:19 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:19 localhost kernel: md0_reshape     D    0  3751      2 0x00000080
Oct  5 22:46:19 localhost kernel: Call Trace:
Oct  5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:19 localhost kernel: raid5_sync_request+0x2cf/0x370 [raid456]
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: md_do_sync+0xafe/0xee0 [md_mod]
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: md_thread+0x132/0x180 [md_mod]
Oct  5 22:46:19 localhost kernel: kthread+0x109/0x140
Oct  5 22:46:19 localhost kernel: ? find_pers+0x70/0x70 [md_mod]
Oct  5 22:46:19 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:46:19 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:46:19 localhost kernel: INFO: task mdadm:3758 blocked for more than 120 seconds.
Oct  5 22:46:19 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:19 localhost kernel: mdadm           D    0  3758      1 0x00000080
Oct  5 22:46:19 localhost kernel: Call Trace:
Oct  5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:19 localhost kernel: raid5_quiesce+0x274/0x2b0 [raid456]
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: suspend_lo_store+0x82/0xe0 [md_mod]
Oct  5 22:46:19 localhost kernel: md_attr_store+0x80/0xc0 [md_mod]
Oct  5 22:46:19 localhost kernel: sysfs_kf_write+0x3a/0x50
Oct  5 22:46:19 localhost kernel: kernfs_fop_write+0xff/0x180
Oct  5 22:46:19 localhost kernel: __vfs_write+0x37/0x170
Oct  5 22:46:19 localhost kernel: ? selinux_file_permission+0xe5/0x120
Oct  5 22:46:19 localhost kernel: ? security_file_permission+0x3b/0xc0
Oct  5 22:46:19 localhost kernel: vfs_write+0xb2/0x1b0
Oct  5 22:46:19 localhost kernel: ? syscall_trace_enter+0x1d0/0x2b0
Oct  5 22:46:19 localhost kernel: SyS_write+0x55/0xc0
Oct  5 22:46:19 localhost kernel: do_syscall_64+0x67/0x150
Oct  5 22:46:19 localhost kernel: entry_SYSCALL64_slow_path+0x25/0x25
Oct  5 22:46:19 localhost kernel: RIP: 0033:0x7ff6fa57c840
Oct  5 22:46:19 localhost kernel: RSP: 002b:00007ffe2fe145b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Oct  5 22:46:19 localhost kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff6fa57c840
Oct  5 22:46:19 localhost kernel: RDX: 0000000000000001 RSI: 00007ffe2fe14660 RDI: 0000000000000003
Oct  5 22:46:19 localhost kernel: RBP: 00007ffe2fe14660 R08: 00007ffe2fe14660 R09: 000000000000001d
Oct  5 22:46:19 localhost kernel: R10: 000000000000000a R11: 0000000000000246 R12: 00000000004699a6
Oct  5 22:46:19 localhost kernel: R13: 0000000000000000 R14: 0000000001a1bd00 R15: 0000000001a1bd00
Oct  5 22:46:19 localhost kernel: INFO: task dd:3761 blocked for more than 120 seconds.
Oct  5 22:46:19 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:19 localhost kernel: dd              D    0  3761   2291 0x00000080
Oct  5 22:46:19 localhost kernel: Call Trace:
Oct  5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:19 localhost kernel: io_schedule+0x16/0x40
Oct  5 22:46:19 localhost kernel: __lock_page+0x10e/0x160
Oct  5 22:46:19 localhost kernel: ? page_cache_tree_insert+0xf0/0xf0
Oct  5 22:46:19 localhost kernel: mpage_prepare_extent_to_map+0x290/0x310 [ext4]
Oct  5 22:46:19 localhost kernel: ext4_writepages+0x467/0xe80 [ext4]
Oct  5 22:46:19 localhost kernel: do_writepages+0x1c/0x70
Oct  5 22:46:19 localhost kernel: __filemap_fdatawrite_range+0xc6/0x100
Oct  5 22:46:19 localhost kernel: filemap_flush+0x1c/0x20
Oct  5 22:46:19 localhost kernel: ext4_alloc_da_blocks+0x2c/0x70 [ext4]
Oct  5 22:46:19 localhost kernel: ext4_release_file+0x79/0xc0 [ext4]
Oct  5 22:46:19 localhost kernel: __fput+0xe7/0x210
Oct  5 22:46:19 localhost kernel: ____fput+0xe/0x10
Oct  5 22:46:19 localhost kernel: task_work_run+0x83/0xb0
Oct  5 22:46:19 localhost kernel: exit_to_usermode_loop+0x6c/0xa8
Oct  5 22:46:19 localhost kernel: do_syscall_64+0x13a/0x150
Oct  5 22:46:19 localhost kernel: entry_SYSCALL64_slow_path+0x25/0x25
Oct  5 22:46:19 localhost kernel: RIP: 0033:0x7f09c0d15e90
Oct  5 22:46:19 localhost kernel: RSP: 002b:00007ffd681b4f58 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
Oct  5 22:46:19 localhost kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f09c0d15e90
Oct  5 22:46:19 localhost kernel: RDX: 0000000000100000 RSI: 0000000000000000 RDI: 0000000000000001
Oct  5 22:46:19 localhost kernel: RBP: 00000000000003e8 R08: ffffffffffffffff R09: 0000000000102003
Oct  5 22:46:19 localhost kernel: R10: 00007ffd681b4c60 R11: 0000000000000246 R12: 00000000000003e8
Oct  5 22:46:19 localhost kernel: R13: 0000000000000000 R14: 00007ffd681b634b R15: 00007ffd681b51d0
Oct  5 22:48:21 localhost kernel: INFO: task kworker/u8:3:2453 blocked for more than 120 seconds.
Oct  5 22:48:21 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:48:21 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:48:21 localhost kernel: kworker/u8:3    D    0  2453      2 0x00000080
Oct  5 22:48:21 localhost kernel: Workqueue: writeback wb_workfn (flush-9:0)
Oct  5 22:48:21 localhost kernel: Call Trace:
Oct  5 22:48:21 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:48:21 localhost kernel: schedule+0x36/0x80
Oct  5 22:48:21 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:21 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:48:21 localhost kernel: ? bio_split+0x5d/0x90
Oct  5 22:48:21 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct  5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:21 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:48:21 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:48:21 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:48:21 localhost kernel: ? __test_set_page_writeback+0xc6/0x320
Oct  5 22:48:21 localhost kernel: ext4_io_submit+0x4c/0x60 [ext4]
Oct  5 22:48:21 localhost kernel: ext4_bio_write_page+0x1a4/0x3b0 [ext4]
Oct  5 22:48:21 localhost kernel: mpage_submit_page+0x57/0x70 [ext4]
Oct  5 22:48:21 localhost kernel: mpage_map_and_submit_buffers+0x168/0x290 [ext4]
Oct  5 22:48:21 localhost kernel: ext4_writepages+0x852/0xe80 [ext4]
Oct  5 22:48:21 localhost kernel: ? account_entity_enqueue+0xd8/0x100
Oct  5 22:48:21 localhost kernel: do_writepages+0x1c/0x70
Oct  5 22:48:21 localhost kernel: __writeback_single_inode+0x45/0x320
Oct  5 22:48:21 localhost kernel: writeback_sb_inodes+0x280/0x570
Oct  5 22:48:21 localhost kernel: __writeback_inodes_wb+0x8c/0xc0
Oct  5 22:48:21 localhost kernel: wb_writeback+0x276/0x310
Oct  5 22:48:21 localhost kernel: wb_workfn+0x19c/0x3b0
Oct  5 22:48:21 localhost kernel: process_one_work+0x149/0x360
Oct  5 22:48:21 localhost kernel: worker_thread+0x4d/0x3c0
Oct  5 22:48:21 localhost kernel: kthread+0x109/0x140
Oct  5 22:48:21 localhost kernel: ? rescuer_thread+0x380/0x380
Oct  5 22:48:21 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:48:21 localhost kernel: ? do_syscall_64+0x67/0x150
Oct  5 22:48:21 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:48:21 localhost kernel: INFO: task jbd2/md0-8:3740 blocked for more than 120 seconds.
Oct  5 22:48:21 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:48:21 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:48:21 localhost kernel: jbd2/md0-8      D    0  3740      2 0x00000080
Oct  5 22:48:21 localhost kernel: Call Trace:
Oct  5 22:48:21 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:48:21 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct  5 22:48:21 localhost kernel: schedule+0x36/0x80
Oct  5 22:48:21 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:21 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:48:21 localhost kernel: ? find_next_bit+0xb/0x10
Oct  5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:21 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:48:21 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:48:21 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:48:21 localhost kernel: ? select_idle_sibling+0x2e0/0x3b0
Oct  5 22:48:21 localhost kernel: submit_bh_wbc+0x140/0x170
Oct  5 22:48:21 localhost kernel: submit_bh+0x13/0x20
Oct  5 22:48:21 localhost kernel: jbd2_write_superblock+0x109/0x230 [jbd2]
Oct  5 22:48:21 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct  5 22:48:21 localhost kernel: ? enqueue_entity+0x1f3/0x720
Oct  5 22:48:21 localhost kernel: jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
Oct  5 22:48:21 localhost kernel: ? mutex_lock_io+0x25/0x30
Oct  5 22:48:21 localhost kernel: jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
Oct  5 22:48:21 localhost kernel: ? account_entity_dequeue+0xaa/0xe0
Oct  5 22:48:21 localhost kernel: ? dequeue_entity+0xed/0x460
Oct  5 22:48:21 localhost kernel: ? ttwu_do_activate+0x7a/0x90
Oct  5 22:48:21 localhost kernel: ? dequeue_task_fair+0x565/0x820
Oct  5 22:48:21 localhost kernel: ? __switch_to+0x229/0x440
Oct  5 22:48:21 localhost kernel: ? lock_timer_base+0x7d/0xa0
Oct  5 22:48:21 localhost kernel: ? try_to_del_timer_sync+0x53/0x80
Oct  5 22:48:21 localhost kernel: kjournald2+0xd2/0x260 [jbd2]
Oct  5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:21 localhost kernel: kthread+0x109/0x140
Oct  5 22:48:21 localhost kernel: ? commit_timeout+0x10/0x10 [jbd2]
Oct  5 22:48:21 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:48:21 localhost kernel: ? do_syscall_64+0x67/0x150
Oct  5 22:48:21 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:48:21 localhost kernel: INFO: task ext4lazyinit:3743 blocked for more than 120 seconds.
Oct  5 22:48:22 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:48:22 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:48:22 localhost kernel: ext4lazyinit    D    0  3743      2 0x00000080
Oct  5 22:48:22 localhost kernel: Call Trace:
Oct  5 22:48:22 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:48:22 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct  5 22:48:22 localhost kernel: schedule+0x36/0x80
Oct  5 22:48:22 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:22 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:48:22 localhost kernel: ? bio_split+0x5d/0x90
Oct  5 22:48:22 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct  5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:22 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:48:22 localhost kernel: ? mempool_alloc_slab+0x15/0x20
Oct  5 22:48:22 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct  5 22:48:22 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:48:22 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:48:22 localhost kernel: next_bio+0x38/0x40
Oct  5 22:48:22 localhost kernel: __blkdev_issue_zeroout+0x164/0x210
Oct  5 22:48:22 localhost kernel: blkdev_issue_zeroout+0x62/0xc0
Oct  5 22:48:22 localhost kernel: ext4_init_inode_table+0x166/0x380 [ext4]
Oct  5 22:48:22 localhost kernel: ext4_lazyinit_thread+0x2d3/0x350 [ext4]
Oct  5 22:48:22 localhost kernel: kthread+0x109/0x140
Oct  5 22:48:22 localhost kernel: ? ext4_clear_request_list+0x70/0x70 [ext4]
Oct  5 22:48:22 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:48:22 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:48:22 localhost kernel: INFO: task md0_reshape:3751 blocked for more than 120 seconds.
Oct  5 22:48:22 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:48:22 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:48:22 localhost kernel: md0_reshape     D    0  3751      2 0x00000080
Oct  5 22:48:22 localhost kernel: Call Trace:
Oct  5 22:48:22 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:48:22 localhost kernel: schedule+0x36/0x80
Oct  5 22:48:22 localhost kernel: raid5_sync_request+0x2cf/0x370 [raid456]
Oct  5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:22 localhost kernel: md_do_sync+0xafe/0xee0 [md_mod]
Oct  5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:22 localhost kernel: md_thread+0x132/0x180 [md_mod]
Oct  5 22:48:22 localhost kernel: kthread+0x109/0x140
Oct  5 22:48:22 localhost kernel: ? find_pers+0x70/0x70 [md_mod]
Oct  5 22:48:22 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:48:22 localhost kernel: ret_from_fork+0x25/0x30

  reply	other threads:[~2017-10-09  1:21 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-12  1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
2017-09-12  1:49 ` [PATCH 1/4] md: always hold reconfig_mutex when calling mddev_suspend() NeilBrown
2017-09-12  1:49 ` [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce() NeilBrown
2017-09-12  1:49 ` [PATCH 4/4] md: allow metadata update while suspending NeilBrown
2017-09-12  1:49 ` [PATCH 2/4] md: don't call bitmap_create() while array is quiesced NeilBrown
2017-09-12  2:51 ` [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without Xiao Ni
2017-09-13  2:11 ` Xiao Ni
2017-09-13 15:09   ` Xiao Ni
2017-09-13 23:05     ` NeilBrown
2017-09-14  4:55       ` Xiao Ni
2017-09-14  5:32         ` NeilBrown
2017-09-14  7:57           ` Xiao Ni
2017-09-16 13:15             ` Xiao Ni
2017-10-05  5:17             ` NeilBrown
2017-10-06  3:53               ` Xiao Ni
2017-10-06  4:32                 ` NeilBrown
2017-10-09  1:21                   ` Xiao Ni [this message]
2017-10-09  4:57                     ` NeilBrown
2017-10-09  5:32                       ` Xiao Ni
2017-10-09  5:52                         ` NeilBrown
2017-10-10  6:05                           ` Xiao Ni
2017-10-10 21:20                             ` NeilBrown
     [not found]                               ` <960568852.19225619.1507689864371.JavaMail.zimbra@redhat.com>
2017-10-13  3:48                                 ` NeilBrown
2017-10-16  4:43                                   ` Xiao Ni
2017-09-30  9:46 ` Xiao Ni
2017-10-05  5:03   ` NeilBrown
2017-10-06  3:40     ` Xiao Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1345780738.18087591.1507512089744.JavaMail.zimbra@redhat.com \
    --to=xni@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).