public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Hou Tao <houtao1@huawei.com>
Cc: linux-xfs@vger.kernel.org, Dave Chinner <david@fromorbit.com>
Subject: Re: umount XFS hung when stopping the xfsaild kthread
Date: Wed, 6 Sep 2017 07:13:47 -0400	[thread overview]
Message-ID: <20170906111347.GB54570@bfoster.bfoster> (raw)
In-Reply-To: <80da6cde-c8aa-d23d-9705-7d3c11507dec@huawei.com>

On Wed, Sep 06, 2017 at 06:59:26PM +0800, Hou Tao wrote:
> Hi Brian,
> 
> On 2017/9/6 2:27, Brian Foster wrote:
> > On Tue, Sep 05, 2017 at 09:48:45PM +0800, Hou Tao wrote:
> >> Hi all,
> >>
> >> We recently encounter a XFS umount hang problem. As we can see the following
> >> stacks, the umount process was trying to stop the xfsaild kthread and waiting
> >> for the exit of the xfsaild thread, and the xfsaild thread was waiting for
> >> wake-up.
> >>
> >> [<ffffffff810a604a>] kthread_stop+0x4a/0xe0
> >> [<ffffffffa0680317>] xfs_trans_ail_destroy+0x17/0x30 [xfs]
> >> [<ffffffffa067569e>] xfs_log_unmount+0x1e/0x60 [xfs]
> >> [<ffffffffa066ac15>] xfs_unmountfs+0xd5/0x190 [xfs]
> >> [<ffffffffa066da62>] xfs_fs_put_super+0x32/0x90 [xfs]
> >> [<ffffffff811ebad6>] generic_shutdown_super+0x56/0xe0
> >> [<ffffffff811ebf27>] kill_block_super+0x27/0x70
> >> [<ffffffff811ec269>] deactivate_locked_super+0x49/0x60
> >> [<ffffffff811ec866>] deactivate_super+0x46/0x60
> >> [<ffffffff81209995>] mntput_no_expire+0xc5/0x120
> >> [<ffffffff8120aacf>] SyS_umount+0x9f/0x3c0
> >> [<ffffffff81652a09>] system_call_fastpath+0x16/0x1b
> >> [<ffffffffffffffff>] 0xffffffffffffffff
> >>
> >> [<ffffffffa067faa7>] xfsaild+0x537/0x5e0 [xfs]
> >> [<ffffffff810a5ddf>] kthread+0xcf/0xe0
> >> [<ffffffff81652958>] ret_from_fork+0x58/0x90
> >> [<ffffffffffffffff>] 0xffffffffffffffff
> >>
> >> The kernel version is RHEL7.3 and we are trying to reproduce it (not yet).
> >> I have check the related code and suspect the same problem may also exists in
> >> the mainline.
> >>
> >> The following is the possible sequences which may lead to the hang of umount:
> >>
> >> xfsaild: kthread_should_stop() // return false, so xfsaild continue
> >>
> >> umount: set_bit(KTHREAD_SHOULD_STOP, &kthread->flags) // by kthread_stop()
> >>
> >> umount: wake_up_process() // because xfsaild is still running, so 0 is returned
> >>
> >> xfsaild: __set_current_state()
> >> xfsaild: schedule() // Now, on one will wake it up
> >>
> > 
> > That seems plausible to me. kthread_stop() sets the SHOULD_STOP bit and
> > then wakes up the thread. On the other side, xfsaild() checks
> > SHOULD_STOP, sets the task state and sleeps. It seems like it should be
> > possible for this to hang if xfsaild() checks the should stop bit and
> > then kthread_stop() runs before we set the task state (as you've
> > outlined above).
> > 
> >> The solution I think is adding an extra kthread_should_stop() before
> >> invoking schedule(). Maybe a smp_mb() is needed too, because we needs to
> >> ensure the read of the stop flag happens after the write of the task status.
> >> Something likes the following patch:
> >>
> > 
> > I think the important bit is to check after we've set the task state,
> > yes? That way we know either we've seen the bit or kthread_stop() has
> > woken the task (which I think means the task remains runnable and will
> > be rescheduled such that it exits). If so, I'd probably move the check
> > up after the task state set and add a comment.
> Yes, that's what I tries to express (but failed to).
> Yes, moving the check up after the task state set is much clearer.
> 
> >> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> >> index 9056c0f..6313f67 100644
> >> --- a/fs/xfs/xfs_trans_ail.c
> >> +++ b/fs/xfs/xfs_trans_ail.c
> >> @@ -520,6 +520,11 @@ xfsaild(
> >>                 if (!xfs_ail_min(ailp) &&
> >>                     ailp->xa_target == ailp->xa_target_prev) {
> >>                         spin_unlock(&ailp->xa_lock);
> >> +
> >> +                       smp_mb();
> >> +                       if (kthread_should_stop())
> >> +                               break;
> >> +
> >>                         freezable_schedule();
> >>                         tout = 0;
> >>                         continue;
> >>
> >> Any suggestions ?
> >>
> > 
> > Could you try some hacks to verify this problem on an upstream kernel? I
> > think you should be able to add an artificial delay in xfsaild() before
> > we set the task state when the fs is unmounting and the AIL is empty
> > (ie., it looks like we're going to schedule out), then add a smaller
> > delay to xfs_trans_ail_destroy() to make sure we wait for xfsaild() to
> > settle, but run kthread_stop() between the should_stop check and setting
> > the task state. Then we could potentially confirm the problem and verify
> > the fix unwinds everything correctly. Hm?
> Thanks for your suggestion. As per your suggestion, we had reproduced the
> umount hang problem on both RHEL7 kernel and upstream kernel.
> 
> We just add a 50us delay before the kthread_should_stop() and __set_current_state()
> in xfsaild(), and add a 20us delay before kthread_stop() in xfs_trans_ail_destroy().
> After the hacks, we run an infinite loop: mount + do nothing + umount, and then the
> hang of umount will occur after ten minutes or less.
> 

Ok, thanks for testing. I just sent another mail that shows a larger
delay will reproduce the same problem in one test cycle (mount, touch a
file, umount).

Brian

> Regards,
> 
> Tao
> 
> > 
> > Brian
> > 
> >> Regards,
> >>
> >> Tao
> >>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > .
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-09-06 11:13 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-05 13:48 umount XFS hung when stopping the xfsaild kthread Hou Tao
2017-09-05 18:27 ` Brian Foster
2017-09-06 10:59   ` Hou Tao
2017-09-06 11:13     ` Brian Foster [this message]
2017-09-05 23:00 ` Dave Chinner
2017-09-06 11:11   ` Brian Foster
2017-09-06 11:47     ` Dave Chinner
2017-09-06 12:18       ` Brian Foster
     [not found]     ` <20170906224107.GL29261@wotan.suse.de>
2017-09-07 12:24       ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170906111347.GB54570@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=houtao1@huawei.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox