All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiao Ni <xni@redhat.com>
To: NeilBrown <neilb@suse.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
Date: Sun, 8 Oct 2017 21:21:29 -0400 (EDT)	[thread overview]
Message-ID: <1345780738.18087591.1507512089744.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <874lrc28x8.fsf@notabene.neil.brown.name>

[-- Attachment #1: Type: text/plain, Size: 5911 bytes --]



----- Original Message -----
> From: "NeilBrown" <neilb@suse.com>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Friday, October 6, 2017 12:32:19 PM
> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
> 
> On Fri, Oct 06 2017, Xiao Ni wrote:
> 
> > On 10/05/2017 01:17 PM, NeilBrown wrote:
> >> On Thu, Sep 14 2017, Xiao Ni wrote:
> >>
> >>>> What do
> >>>>   cat /proc/8987/stack
> >>>>   cat /proc/8983/stack
> >>>>   cat /proc/8966/stack
> >>>>   cat /proc/8381/stack
> >>>>
> >>>> show??
> >> ...
> >>
> >>> /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add
> >>> lockdep_assert_held(&mddev->reconfig_mutex)?
> >>> [root@dell-pr1700-02 ~]# cat /proc/8983/stack
> >>> [<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
> >>> [<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
> >>> [<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
> >>> [<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
> >>> [<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
> >>> [<ffffffff81260457>] __vfs_write+0x37/0x170
> >>> [<ffffffff812619e2>] vfs_write+0xb2/0x1b0
> >>> [<ffffffff81263015>] SyS_write+0x55/0xc0
> >>> [<ffffffff810037c7>] do_syscall_64+0x67/0x150
> >>> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
> >>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>
> >>> [jbd2/md0-8]
> >>> [root@dell-pr1700-02 ~]# cat /proc/8966/stack
> >>> [<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
> >>> [<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
> >>> [<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
> >>> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
> >>> [<ffffffff81376675>] submit_bio+0x75/0x150
> >>> [<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
> >>> [<ffffffff8129e683>] submit_bh+0x13/0x20
> >>> [<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
> >>> [<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
> >>> [<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
> >>> [<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
> >>> [<ffffffff810c73f9>] kthread+0x109/0x140
> >>> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
> >>> [<ffffffffffffffff>] 0xffffffffffffffff
> >> Thanks for this (and sorry it took so long to get to it).
> >> It looks like
> >>
> >> Commit: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and
> >> md_write_start()")
> >>
> >> is badly broken.  I wonder how it ever passed testing.
> >>
> >> In write_start() is change the wait_event() call to
> >>
> >> 	wait_event(mddev->sb_wait,
> >> 		   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
> >> 		   !mddev->suspended);
> >>
> >>
> >> That should be
> >>
> >> 	wait_event(mddev->sb_wait,
> >> 		   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) ||
> >> 		   mddev->suspended);
> > Hi Neil
> >
> > Do we want write bio can be handled when mddev->suspended is 1? After
> > changing to this,
> > write bio can be handled when mddev->suspended is 1.
> 
> This is OK.
> New write bios will not get past md_handle_request().
> A write bios that did get past md_handle_request() is still allowed
> through md_write_start().  The mddev_suspend() call won't complete until
> that write bio has finished.

Hi Neil

Thanks for the explanation. I took some time to read the emails about the
patch cc27b0c78 which introduced this. It's similar with this problem I 
countered. But there is a call of function mddev_suspend in level_store. 
So add the check of mddev->suspended in md_write_start can fix the problem
"reshape raid5 -> raid6 atop bcache deadlocks at start on md_attr_store / 
raid5_make_request". 

In function suspend_lo_store it doesn't call mddev_suspend under mddev->reconfig_mutex.
So there is still a race possibility as you said at first analysis. 
> 
> >
> > When the stuck happens, mddev->suspended is 0 and MD_SB_CHANGE_PENDING
> > is set. So
> > the patch can't fix this problem. I tried the patch, the problem still
> > exists.
> >
> 
> 
> I need to see all the stack traces.

I've added the calltrace as a attachment. 

> 
> 
> > [ 7710.589274] mddev suspend : 0
> > [ 7710.592228] mddev ro : 0
> > [ 7710.594746] mddev insync : 0
> > [ 7710.597620] mddev SB CHANGE PENDING is set
> > [ 7710.601698] mddev SB CHANGE CLEAN is set
> > [ 7710.605601] mddev->persistent : 1
> > [ 7710.608905] mddev->external : 0
> > [ 7710.612030] conf quiesce : 2
> >
> > raid5 is still spinning.
> >
> > Hmm, I have a question. Why can't call md_check_recovery when
> > MD_SB_CHANGE_PENDING
> > is set in raid5d?
> 
> When MD_SB_CHANGE_PENDING is not set, there is no need to call
> md_check_recovery().  I wouldn't hurt except that it would be a waste of
> time.

I'm confused. If we want to call md_check_recovery when MD_SB_CHANGE_PENDING
is set, it should be 

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6299,7 +6299,7 @@ static void raid5d(struct md_thread *thread)
                        break;
                handled += batch_size;
 
-               if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
+               if (mddev->sb_flags & (1 << MD_SB_CHANGE_PENDING)) {
                        spin_unlock_irq(&conf->device_lock);
                        md_check_recovery(mddev);
                        spin_lock_irq(&conf->device_lock);

Right?

Regards
Xiao
> 
> NeilBrown
> 
> 
> >
> >                  if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
> >                          spin_unlock_irq(&conf->device_lock);
> >                          md_check_recovery(mddev);
> >                          spin_lock_irq(&conf->device_lock);
> >                  }
> >
> > Best Regards
> > Xiao
> >
> >
> >>
> >> i.e. it was (!A && !B), it should be (!A || B) !!!!!
> >>
> >> Could you please make that change and try again.
> > Hi Neil
> >
> > I tried the patch and it can't work.
> >>
> >> Thanks,
> >> NeilBrown
> 

[-- Attachment #2: calltrace --]
[-- Type: application/octet-stream, Size: 20356 bytes --]

Oct  5 22:46:18 localhost kernel: INFO: task kworker/u8:3:2453 blocked for more than 120 seconds.
Oct  5 22:46:18 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:18 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:18 localhost kernel: kworker/u8:3    D    0  2453      2 0x00000080
Oct  5 22:46:18 localhost kernel: Workqueue: writeback wb_workfn (flush-9:0)
Oct  5 22:46:18 localhost kernel: Call Trace:
Oct  5 22:46:18 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:18 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:18 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:46:18 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:18 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:46:18 localhost kernel: ? bio_split+0x5d/0x90
Oct  5 22:46:18 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct  5 22:46:18 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:18 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:46:18 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:46:18 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:46:18 localhost kernel: ? __test_set_page_writeback+0xc6/0x320
Oct  5 22:46:18 localhost kernel: ext4_io_submit+0x4c/0x60 [ext4]
Oct  5 22:46:18 localhost kernel: ext4_bio_write_page+0x1a4/0x3b0 [ext4]
Oct  5 22:46:18 localhost kernel: mpage_submit_page+0x57/0x70 [ext4]
Oct  5 22:46:18 localhost kernel: mpage_map_and_submit_buffers+0x168/0x290 [ext4]
Oct  5 22:46:18 localhost kernel: ext4_writepages+0x852/0xe80 [ext4]
Oct  5 22:46:18 localhost kernel: ? account_entity_enqueue+0xd8/0x100
Oct  5 22:46:18 localhost kernel: do_writepages+0x1c/0x70
Oct  5 22:46:18 localhost kernel: __writeback_single_inode+0x45/0x320
Oct  5 22:46:18 localhost kernel: writeback_sb_inodes+0x280/0x570
Oct  5 22:46:18 localhost kernel: __writeback_inodes_wb+0x8c/0xc0
Oct  5 22:46:18 localhost kernel: wb_writeback+0x276/0x310
Oct  5 22:46:18 localhost kernel: wb_workfn+0x19c/0x3b0
Oct  5 22:46:18 localhost kernel: process_one_work+0x149/0x360
Oct  5 22:46:18 localhost kernel: worker_thread+0x4d/0x3c0
Oct  5 22:46:18 localhost kernel: kthread+0x109/0x140
Oct  5 22:46:18 localhost kernel: ? rescuer_thread+0x380/0x380
Oct  5 22:46:18 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:46:18 localhost kernel: ? do_syscall_64+0x67/0x150
Oct  5 22:46:18 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:46:18 localhost kernel: INFO: task jbd2/md0-8:3740 blocked for more than 120 seconds.
Oct  5 22:46:18 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:18 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:18 localhost kernel: jbd2/md0-8      D    0  3740      2 0x00000080
Oct  5 22:46:18 localhost kernel: Call Trace:
Oct  5 22:46:18 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:18 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct  5 22:46:18 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:18 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:46:18 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:18 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:46:19 localhost kernel: ? find_next_bit+0xb/0x10
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:46:19 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:46:19 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:46:19 localhost kernel: ? select_idle_sibling+0x2e0/0x3b0
Oct  5 22:46:19 localhost kernel: submit_bh_wbc+0x140/0x170
Oct  5 22:46:19 localhost kernel: submit_bh+0x13/0x20
Oct  5 22:46:19 localhost kernel: jbd2_write_superblock+0x109/0x230 [jbd2]
Oct  5 22:46:19 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct  5 22:46:19 localhost kernel: ? enqueue_entity+0x1f3/0x720
Oct  5 22:46:19 localhost kernel: jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
Oct  5 22:46:19 localhost kernel: ? mutex_lock_io+0x25/0x30
Oct  5 22:46:19 localhost kernel: jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
Oct  5 22:46:19 localhost kernel: ? account_entity_dequeue+0xaa/0xe0
Oct  5 22:46:19 localhost kernel: ? dequeue_entity+0xed/0x460
Oct  5 22:46:19 localhost kernel: ? ttwu_do_activate+0x7a/0x90
Oct  5 22:46:19 localhost kernel: ? dequeue_task_fair+0x565/0x820
Oct  5 22:46:19 localhost kernel: ? __switch_to+0x229/0x440
Oct  5 22:46:19 localhost kernel: ? lock_timer_base+0x7d/0xa0
Oct  5 22:46:19 localhost kernel: ? try_to_del_timer_sync+0x53/0x80
Oct  5 22:46:19 localhost kernel: kjournald2+0xd2/0x260 [jbd2]
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: kthread+0x109/0x140
Oct  5 22:46:19 localhost kernel: ? commit_timeout+0x10/0x10 [jbd2]
Oct  5 22:46:19 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:46:19 localhost kernel: ? do_syscall_64+0x67/0x150
Oct  5 22:46:19 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:46:19 localhost kernel: INFO: task ext4lazyinit:3743 blocked for more than 120 seconds.
Oct  5 22:46:19 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:19 localhost kernel: ext4lazyinit    D    0  3743      2 0x00000080
Oct  5 22:46:19 localhost kernel: Call Trace:
Oct  5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:19 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct  5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:19 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:46:19 localhost kernel: ? bio_split+0x5d/0x90
Oct  5 22:46:19 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:46:19 localhost kernel: ? mempool_alloc_slab+0x15/0x20
Oct  5 22:46:19 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct  5 22:46:19 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:46:19 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:46:19 localhost kernel: next_bio+0x38/0x40
Oct  5 22:46:19 localhost kernel: __blkdev_issue_zeroout+0x164/0x210
Oct  5 22:46:19 localhost kernel: blkdev_issue_zeroout+0x62/0xc0
Oct  5 22:46:19 localhost kernel: ext4_init_inode_table+0x166/0x380 [ext4]
Oct  5 22:46:19 localhost kernel: ext4_lazyinit_thread+0x2d3/0x350 [ext4]
Oct  5 22:46:19 localhost kernel: kthread+0x109/0x140
Oct  5 22:46:19 localhost kernel: ? ext4_clear_request_list+0x70/0x70 [ext4]
Oct  5 22:46:19 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:46:19 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:46:19 localhost kernel: INFO: task md0_reshape:3751 blocked for more than 120 seconds.
Oct  5 22:46:19 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:19 localhost kernel: md0_reshape     D    0  3751      2 0x00000080
Oct  5 22:46:19 localhost kernel: Call Trace:
Oct  5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:19 localhost kernel: raid5_sync_request+0x2cf/0x370 [raid456]
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: md_do_sync+0xafe/0xee0 [md_mod]
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: md_thread+0x132/0x180 [md_mod]
Oct  5 22:46:19 localhost kernel: kthread+0x109/0x140
Oct  5 22:46:19 localhost kernel: ? find_pers+0x70/0x70 [md_mod]
Oct  5 22:46:19 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:46:19 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:46:19 localhost kernel: INFO: task mdadm:3758 blocked for more than 120 seconds.
Oct  5 22:46:19 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:19 localhost kernel: mdadm           D    0  3758      1 0x00000080
Oct  5 22:46:19 localhost kernel: Call Trace:
Oct  5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:19 localhost kernel: raid5_quiesce+0x274/0x2b0 [raid456]
Oct  5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:46:19 localhost kernel: suspend_lo_store+0x82/0xe0 [md_mod]
Oct  5 22:46:19 localhost kernel: md_attr_store+0x80/0xc0 [md_mod]
Oct  5 22:46:19 localhost kernel: sysfs_kf_write+0x3a/0x50
Oct  5 22:46:19 localhost kernel: kernfs_fop_write+0xff/0x180
Oct  5 22:46:19 localhost kernel: __vfs_write+0x37/0x170
Oct  5 22:46:19 localhost kernel: ? selinux_file_permission+0xe5/0x120
Oct  5 22:46:19 localhost kernel: ? security_file_permission+0x3b/0xc0
Oct  5 22:46:19 localhost kernel: vfs_write+0xb2/0x1b0
Oct  5 22:46:19 localhost kernel: ? syscall_trace_enter+0x1d0/0x2b0
Oct  5 22:46:19 localhost kernel: SyS_write+0x55/0xc0
Oct  5 22:46:19 localhost kernel: do_syscall_64+0x67/0x150
Oct  5 22:46:19 localhost kernel: entry_SYSCALL64_slow_path+0x25/0x25
Oct  5 22:46:19 localhost kernel: RIP: 0033:0x7ff6fa57c840
Oct  5 22:46:19 localhost kernel: RSP: 002b:00007ffe2fe145b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Oct  5 22:46:19 localhost kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff6fa57c840
Oct  5 22:46:19 localhost kernel: RDX: 0000000000000001 RSI: 00007ffe2fe14660 RDI: 0000000000000003
Oct  5 22:46:19 localhost kernel: RBP: 00007ffe2fe14660 R08: 00007ffe2fe14660 R09: 000000000000001d
Oct  5 22:46:19 localhost kernel: R10: 000000000000000a R11: 0000000000000246 R12: 00000000004699a6
Oct  5 22:46:19 localhost kernel: R13: 0000000000000000 R14: 0000000001a1bd00 R15: 0000000001a1bd00
Oct  5 22:46:19 localhost kernel: INFO: task dd:3761 blocked for more than 120 seconds.
Oct  5 22:46:19 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:46:19 localhost kernel: dd              D    0  3761   2291 0x00000080
Oct  5 22:46:19 localhost kernel: Call Trace:
Oct  5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct  5 22:46:19 localhost kernel: io_schedule+0x16/0x40
Oct  5 22:46:19 localhost kernel: __lock_page+0x10e/0x160
Oct  5 22:46:19 localhost kernel: ? page_cache_tree_insert+0xf0/0xf0
Oct  5 22:46:19 localhost kernel: mpage_prepare_extent_to_map+0x290/0x310 [ext4]
Oct  5 22:46:19 localhost kernel: ext4_writepages+0x467/0xe80 [ext4]
Oct  5 22:46:19 localhost kernel: do_writepages+0x1c/0x70
Oct  5 22:46:19 localhost kernel: __filemap_fdatawrite_range+0xc6/0x100
Oct  5 22:46:19 localhost kernel: filemap_flush+0x1c/0x20
Oct  5 22:46:19 localhost kernel: ext4_alloc_da_blocks+0x2c/0x70 [ext4]
Oct  5 22:46:19 localhost kernel: ext4_release_file+0x79/0xc0 [ext4]
Oct  5 22:46:19 localhost kernel: __fput+0xe7/0x210
Oct  5 22:46:19 localhost kernel: ____fput+0xe/0x10
Oct  5 22:46:19 localhost kernel: task_work_run+0x83/0xb0
Oct  5 22:46:19 localhost kernel: exit_to_usermode_loop+0x6c/0xa8
Oct  5 22:46:19 localhost kernel: do_syscall_64+0x13a/0x150
Oct  5 22:46:19 localhost kernel: entry_SYSCALL64_slow_path+0x25/0x25
Oct  5 22:46:19 localhost kernel: RIP: 0033:0x7f09c0d15e90
Oct  5 22:46:19 localhost kernel: RSP: 002b:00007ffd681b4f58 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
Oct  5 22:46:19 localhost kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f09c0d15e90
Oct  5 22:46:19 localhost kernel: RDX: 0000000000100000 RSI: 0000000000000000 RDI: 0000000000000001
Oct  5 22:46:19 localhost kernel: RBP: 00000000000003e8 R08: ffffffffffffffff R09: 0000000000102003
Oct  5 22:46:19 localhost kernel: R10: 00007ffd681b4c60 R11: 0000000000000246 R12: 00000000000003e8
Oct  5 22:46:19 localhost kernel: R13: 0000000000000000 R14: 00007ffd681b634b R15: 00007ffd681b51d0
Oct  5 22:48:21 localhost kernel: INFO: task kworker/u8:3:2453 blocked for more than 120 seconds.
Oct  5 22:48:21 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:48:21 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:48:21 localhost kernel: kworker/u8:3    D    0  2453      2 0x00000080
Oct  5 22:48:21 localhost kernel: Workqueue: writeback wb_workfn (flush-9:0)
Oct  5 22:48:21 localhost kernel: Call Trace:
Oct  5 22:48:21 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:48:21 localhost kernel: schedule+0x36/0x80
Oct  5 22:48:21 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:21 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:48:21 localhost kernel: ? bio_split+0x5d/0x90
Oct  5 22:48:21 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct  5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:21 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:48:21 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:48:21 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:48:21 localhost kernel: ? __test_set_page_writeback+0xc6/0x320
Oct  5 22:48:21 localhost kernel: ext4_io_submit+0x4c/0x60 [ext4]
Oct  5 22:48:21 localhost kernel: ext4_bio_write_page+0x1a4/0x3b0 [ext4]
Oct  5 22:48:21 localhost kernel: mpage_submit_page+0x57/0x70 [ext4]
Oct  5 22:48:21 localhost kernel: mpage_map_and_submit_buffers+0x168/0x290 [ext4]
Oct  5 22:48:21 localhost kernel: ext4_writepages+0x852/0xe80 [ext4]
Oct  5 22:48:21 localhost kernel: ? account_entity_enqueue+0xd8/0x100
Oct  5 22:48:21 localhost kernel: do_writepages+0x1c/0x70
Oct  5 22:48:21 localhost kernel: __writeback_single_inode+0x45/0x320
Oct  5 22:48:21 localhost kernel: writeback_sb_inodes+0x280/0x570
Oct  5 22:48:21 localhost kernel: __writeback_inodes_wb+0x8c/0xc0
Oct  5 22:48:21 localhost kernel: wb_writeback+0x276/0x310
Oct  5 22:48:21 localhost kernel: wb_workfn+0x19c/0x3b0
Oct  5 22:48:21 localhost kernel: process_one_work+0x149/0x360
Oct  5 22:48:21 localhost kernel: worker_thread+0x4d/0x3c0
Oct  5 22:48:21 localhost kernel: kthread+0x109/0x140
Oct  5 22:48:21 localhost kernel: ? rescuer_thread+0x380/0x380
Oct  5 22:48:21 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:48:21 localhost kernel: ? do_syscall_64+0x67/0x150
Oct  5 22:48:21 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:48:21 localhost kernel: INFO: task jbd2/md0-8:3740 blocked for more than 120 seconds.
Oct  5 22:48:21 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:48:21 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:48:21 localhost kernel: jbd2/md0-8      D    0  3740      2 0x00000080
Oct  5 22:48:21 localhost kernel: Call Trace:
Oct  5 22:48:21 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:48:21 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct  5 22:48:21 localhost kernel: schedule+0x36/0x80
Oct  5 22:48:21 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:21 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:48:21 localhost kernel: ? find_next_bit+0xb/0x10
Oct  5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:21 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:48:21 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:48:21 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:48:21 localhost kernel: ? select_idle_sibling+0x2e0/0x3b0
Oct  5 22:48:21 localhost kernel: submit_bh_wbc+0x140/0x170
Oct  5 22:48:21 localhost kernel: submit_bh+0x13/0x20
Oct  5 22:48:21 localhost kernel: jbd2_write_superblock+0x109/0x230 [jbd2]
Oct  5 22:48:21 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct  5 22:48:21 localhost kernel: ? enqueue_entity+0x1f3/0x720
Oct  5 22:48:21 localhost kernel: jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
Oct  5 22:48:21 localhost kernel: ? mutex_lock_io+0x25/0x30
Oct  5 22:48:21 localhost kernel: jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
Oct  5 22:48:21 localhost kernel: ? account_entity_dequeue+0xaa/0xe0
Oct  5 22:48:21 localhost kernel: ? dequeue_entity+0xed/0x460
Oct  5 22:48:21 localhost kernel: ? ttwu_do_activate+0x7a/0x90
Oct  5 22:48:21 localhost kernel: ? dequeue_task_fair+0x565/0x820
Oct  5 22:48:21 localhost kernel: ? __switch_to+0x229/0x440
Oct  5 22:48:21 localhost kernel: ? lock_timer_base+0x7d/0xa0
Oct  5 22:48:21 localhost kernel: ? try_to_del_timer_sync+0x53/0x80
Oct  5 22:48:21 localhost kernel: kjournald2+0xd2/0x260 [jbd2]
Oct  5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:21 localhost kernel: kthread+0x109/0x140
Oct  5 22:48:21 localhost kernel: ? commit_timeout+0x10/0x10 [jbd2]
Oct  5 22:48:21 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:48:21 localhost kernel: ? do_syscall_64+0x67/0x150
Oct  5 22:48:21 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:48:21 localhost kernel: INFO: task ext4lazyinit:3743 blocked for more than 120 seconds.
Oct  5 22:48:22 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:48:22 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:48:22 localhost kernel: ext4lazyinit    D    0  3743      2 0x00000080
Oct  5 22:48:22 localhost kernel: Call Trace:
Oct  5 22:48:22 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:48:22 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct  5 22:48:22 localhost kernel: schedule+0x36/0x80
Oct  5 22:48:22 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct  5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:22 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct  5 22:48:22 localhost kernel: ? bio_split+0x5d/0x90
Oct  5 22:48:22 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct  5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:22 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct  5 22:48:22 localhost kernel: ? mempool_alloc_slab+0x15/0x20
Oct  5 22:48:22 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct  5 22:48:22 localhost kernel: generic_make_request+0x117/0x2f0
Oct  5 22:48:22 localhost kernel: submit_bio+0x75/0x150
Oct  5 22:48:22 localhost kernel: next_bio+0x38/0x40
Oct  5 22:48:22 localhost kernel: __blkdev_issue_zeroout+0x164/0x210
Oct  5 22:48:22 localhost kernel: blkdev_issue_zeroout+0x62/0xc0
Oct  5 22:48:22 localhost kernel: ext4_init_inode_table+0x166/0x380 [ext4]
Oct  5 22:48:22 localhost kernel: ext4_lazyinit_thread+0x2d3/0x350 [ext4]
Oct  5 22:48:22 localhost kernel: kthread+0x109/0x140
Oct  5 22:48:22 localhost kernel: ? ext4_clear_request_list+0x70/0x70 [ext4]
Oct  5 22:48:22 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:48:22 localhost kernel: ret_from_fork+0x25/0x30
Oct  5 22:48:22 localhost kernel: INFO: task md0_reshape:3751 blocked for more than 120 seconds.
Oct  5 22:48:22 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #1
Oct  5 22:48:22 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 22:48:22 localhost kernel: md0_reshape     D    0  3751      2 0x00000080
Oct  5 22:48:22 localhost kernel: Call Trace:
Oct  5 22:48:22 localhost kernel: __schedule+0x28d/0x890
Oct  5 22:48:22 localhost kernel: schedule+0x36/0x80
Oct  5 22:48:22 localhost kernel: raid5_sync_request+0x2cf/0x370 [raid456]
Oct  5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:22 localhost kernel: md_do_sync+0xafe/0xee0 [md_mod]
Oct  5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct  5 22:48:22 localhost kernel: md_thread+0x132/0x180 [md_mod]
Oct  5 22:48:22 localhost kernel: kthread+0x109/0x140
Oct  5 22:48:22 localhost kernel: ? find_pers+0x70/0x70 [md_mod]
Oct  5 22:48:22 localhost kernel: ? kthread_park+0x60/0x60
Oct  5 22:48:22 localhost kernel: ret_from_fork+0x25/0x30

  reply	other threads:[~2017-10-09  1:21 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-12  1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
2017-09-12  1:49 ` [PATCH 2/4] md: don't call bitmap_create() while array is quiesced NeilBrown
2017-09-12  1:49 ` [PATCH 4/4] md: allow metadata update while suspending NeilBrown
2017-09-12  1:49 ` [PATCH 1/4] md: always hold reconfig_mutex when calling mddev_suspend() NeilBrown
2017-09-12  1:49 ` [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce() NeilBrown
2017-09-12  2:51 ` [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without Xiao Ni
2017-09-13  2:11 ` Xiao Ni
2017-09-13 15:09   ` Xiao Ni
2017-09-13 23:05     ` NeilBrown
2017-09-14  4:55       ` Xiao Ni
2017-09-14  5:32         ` NeilBrown
2017-09-14  7:57           ` Xiao Ni
2017-09-16 13:15             ` Xiao Ni
2017-10-05  5:17             ` NeilBrown
2017-10-06  3:53               ` Xiao Ni
2017-10-06  4:32                 ` NeilBrown
2017-10-09  1:21                   ` Xiao Ni [this message]
2017-10-09  4:57                     ` NeilBrown
2017-10-09  5:32                       ` Xiao Ni
2017-10-09  5:52                         ` NeilBrown
2017-10-10  6:05                           ` Xiao Ni
2017-10-10 21:20                             ` NeilBrown
     [not found]                               ` <960568852.19225619.1507689864371.JavaMail.zimbra@redhat.com>
2017-10-13  3:48                                 ` NeilBrown
2017-10-16  4:43                                   ` Xiao Ni
2017-09-30  9:46 ` Xiao Ni
2017-10-05  5:03   ` NeilBrown
2017-10-06  3:40     ` Xiao Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1345780738.18087591.1507512089744.JavaMail.zimbra@redhat.com \
    --to=xni@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.