Re: Stuck in md_write_start because MD_SB_CHANGE_PENDING can't be cleared

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.com>
To: Xiao Ni <xni@redhat.com>, linux-raid <linux-raid@vger.kernel.org>
Cc: shli@kernel.org
Subject: Re: Stuck in md_write_start because MD_SB_CHANGE_PENDING can't be cleared
Date: Mon, 04 Sep 2017 12:16:50 +1000	[thread overview]
Message-ID: <877exfdx7x.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <546311999.4473128.1504339295016.JavaMail.zimbra@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 7293 bytes --]

On Sat, Sep 02 2017, Xiao Ni wrote:

> Hi Neil and Shaohua
>
> I'm trying to reproduce the problem the problem "raid5_finish_reshape is stuck at revalidate_disk".
> But there is a new stuck. I tried with 4.13.0-rc5 and latest mdadm. The steps are:
>
> #!/bin/bash
> num=0
> while [ 1 ]
> do
> 	echo "*************************$num"
> 	mdadm -Ss
> 	mdadm --create --run /dev/md0 --level 5 --metadata 1.2 --raid-devices 5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4 --spare-devices 1 /dev/loop5 --chunk 512
> 	mdadm --wait /dev/md0
> 	mkfs -t ext4 /dev/md0
> 	mount -t ext4 /dev/md0 /mnt/md_test
> 	dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=100
> 	mdadm --add /dev/md0 /dev/loop6
> 	mdadm --grow --raid-devices 6 /dev/md0
> 	dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000
> 	mdadm --wait /dev/md0
> 	((num++))
> 	umount /dev/md0
> done
>
> The calltrace messages are:
> Sep  1 13:57:25 localhost kernel: INFO: task kworker/u8:4:21401 blocked for more than 120 seconds.
> Sep  1 13:57:25 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #2
> Sep  1 13:57:25 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep  1 13:57:25 localhost kernel: kworker/u8:4    D    0 21401      2 0x00000080
> Sep  1 13:57:25 localhost kernel: Workqueue: writeback wb_workfn (flush-9:0)
> Sep  1 13:57:25 localhost kernel: Call Trace:
> Sep  1 13:57:25 localhost kernel: __schedule+0x28d/0x890
> Sep  1 13:57:25 localhost kernel: schedule+0x36/0x80
> Sep  1 13:57:25 localhost kernel: md_write_start+0xf0/0x220 [md_mod]
> Sep  1 13:57:25 localhost kernel: ? remove_wait_queue+0x60/0x60
> Sep  1 13:57:25 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
> Sep  1 13:57:25 localhost kernel: ? bio_split+0x5d/0x90
> Sep  1 13:57:25 localhost kernel: ? blk_queue_split+0xd2/0x630
> Sep  1 13:57:25 localhost kernel: ? remove_wait_queue+0x60/0x60
> Sep  1 13:57:25 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
> ....
>
> All the processes send bio to md are stuck at md_write_start.
>         wait_event(mddev->sb_wait,
>                    !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) && !mddev->suspended);
>
>
> Sep  1 13:57:26 localhost kernel: INFO: task md0_reshape:23605 blocked for more than 120 seconds.
> Sep  1 13:57:26 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #2
> Sep  1 13:57:26 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep  1 13:57:26 localhost kernel: md0_reshape     D    0 23605      2 0x00000080
> Sep  1 13:57:26 localhost kernel: Call Trace:
> Sep  1 13:57:26 localhost kernel: __schedule+0x28d/0x890
> Sep  1 13:57:26 localhost kernel: schedule+0x36/0x80
> Sep  1 13:57:26 localhost kernel: raid5_sync_request+0x2cf/0x370 [raid456]
> Sep  1 13:57:26 localhost kernel: ? remove_wait_queue+0x60/0x60
> Sep  1 13:57:26 localhost kernel: md_do_sync+0xafe/0xee0 [md_mod]
> Sep  1 13:57:26 localhost kernel: ? remove_wait_queue+0x60/0x60
> Sep  1 13:57:26 localhost kernel: md_thread+0x132/0x180 [md_mod]
> Sep  1 13:57:26 localhost kernel: kthread+0x109/0x140
> Sep  1 13:57:26 localhost kernel: ? find_pers+0x70/0x70 [md_mod]
> Sep  1 13:57:26 localhost kernel: ? kthread_park+0x60/0x60
> Sep  1 13:57:26 localhost kernel: ? do_syscall_64+0x67/0x150
> Sep  1 13:57:26 localhost kernel: ret_from_fork+0x25/0x30
>
> It stucks at:
>         /* Allow raid5_quiesce to complete */
>         wait_event(conf->wait_for_overlap, conf->quiesce != 2);
>
> Sep  1 13:57:26 localhost kernel: INFO: task mdadm:23613 blocked for more than 120 seconds.
> Sep  1 13:57:26 localhost kernel:      Tainted: G           OE   4.13.0-rc5 #2
> Sep  1 13:57:26 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep  1 13:57:26 localhost kernel: mdadm           D    0 23613      1 0x00000080
> Sep  1 13:57:26 localhost kernel: Call Trace:
> Sep  1 13:57:26 localhost kernel: __schedule+0x28d/0x890
> Sep  1 13:57:26 localhost kernel: schedule+0x36/0x80
> Sep  1 13:57:26 localhost kernel: raid5_quiesce+0x274/0x2b0 [raid456]
> Sep  1 13:57:26 localhost kernel: ? remove_wait_queue+0x60/0x60
> Sep  1 13:57:26 localhost kernel: suspend_lo_store+0x82/0xe0 [md_mod]
> Sep  1 13:57:26 localhost kernel: md_attr_store+0x80/0xc0 [md_mod]
> Sep  1 13:57:26 localhost kernel: sysfs_kf_write+0x3a/0x50
> Sep  1 13:57:26 localhost kernel: kernfs_fop_write+0xff/0x180
> Sep  1 13:57:26 localhost kernel: __vfs_write+0x37/0x170
> Sep  1 13:57:26 localhost kernel: ? selinux_file_permission+0xe5/0x120
> Sep  1 13:57:26 localhost kernel: ? security_file_permission+0x3b/0xc0
> Sep  1 13:57:26 localhost kernel: vfs_write+0xb2/0x1b0
> Sep  1 13:57:26 localhost kernel: ? syscall_trace_enter+0x1d0/0x2b0
> Sep  1 13:57:26 localhost kernel: SyS_write+0x55/0xc0
> Sep  1 13:57:26 localhost kernel: do_syscall_64+0x67/0x150
>
> It's stuck at:
> conf->quiesce = 2;
> wait_event_cmd(conf->wait_for_quiescent,
>    atomic_read(&conf->active_stripes) == 0 &&
>    atomic_read(&conf->active_aligned_reads) == 0,
>    unlock_all_device_hash_locks_irq(conf),
>    lock_all_device_hash_locks_irq(conf));
> conf->quiesce = 1;
>
> [root@dell-pr1700-02 ~]# cat /proc/mdstat 
> Personalities : [raid6] [raid5] [raid4] 
> md0 : active raid5 loop6[7] loop4[6] loop5[5](S) loop3[3] loop2[2] loop1[1] loop0[0]
>       2039808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
>       [>....................]  reshape =  0.5% (2560/509952) finish=162985.0min speed=0K/sec
>       
> unused devices: <none>
>
>
> I can reproduce this. So I add some logs by printk to check MD_SB_CHANGE_PENDING of mddev->flags
> and mddev->suspended.
> mddev->suspend : 0
> mddev->flags   : SP CHANGE PENDING is set
> conf->quiesce  : 2
>
> The I echo active > /sys/block/md0/md/array_state, reshape can start and finish successfully.
>
> I notice there are some fixes for raid5 stuck problem. Not sure whether it's introduced by those
> patches. 
>
> Best Regards
> Xiao

Thanks for the detailed report.
All the processes that you have mentioned are waiting for
raid5_quiesce() to complete.
raid5_quiesce() must be waiting for either active_stripes or
active_aligned_reads to reach zero.  Most likely active_stripes.

That suggests that either there is a stripe that is active and isn't
being handled for some reason, or the accounting of active stripes has
problems.

The fact that setting the array to 'active' removes the hang is
interesting.  That would cause the threads blocked in md_write_start to
continue, but I cannot see how that would affect active_stripes as
threads waiting for md_write_start() never hold a stripe.  Doing that
could cause a stripe to be used, then released, which might trigger a
wakeup that had been lost.

It would be useful to see if ->active_stripes is 0 when the deadlock
happens.
One way to do that is to add a printk into raid5_status().  Then
whenever you "cat /proc/mdstat" the printk will trigger and tell you the
value of ->active_stripes.

I haven't managed to reproduce this yet, or I would track it down my
self.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

next prev parent reply	other threads:[~2017-09-04  2:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <221835411.4473056.1504338574607.JavaMail.zimbra@redhat.com>
2017-09-02  8:01 ` Stuck in md_write_start because MD_SB_CHANGE_PENDING can't be cleared Xiao Ni
2017-09-04  2:16   ` NeilBrown [this message]
2017-09-04  2:45     ` Xiao Ni
2017-09-04  3:52       ` Xiao Ni
2017-09-04  5:34         ` NeilBrown
2017-09-04  7:36           ` Xiao Ni
2017-09-05  1:36             ` NeilBrown
2017-09-05  2:15               ` Xiao Ni
2017-09-07  1:37                 ` Xiao Ni
2017-09-07  5:37                   ` NeilBrown
2017-09-11  0:14                     ` Xiao Ni
2017-09-11  3:36                       ` NeilBrown
2017-09-11  5:03                         ` Xiao Ni
2017-09-30  9:44               ` Xiao Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877exfdx7x.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).