From: Jeff Layton <jlayton@kernel.org>
To: Dai Ngo <dai.ngo@oracle.com>, chuck.lever@oracle.com
Cc: linux-nfs@vger.kernel.org, linux-nfs@stwm.de
Subject: Re: [PATCH 3/3] NFSD: Fix server reboot hang problem when callback workqueue is stuck
Date: Fri, 15 Dec 2023 14:54:59 -0500 [thread overview]
Message-ID: <c9e8608d0cb2562c645d1f31043fcbc90cf53d52.camel@kernel.org> (raw)
In-Reply-To: <1702667703-17978-4-git-send-email-dai.ngo@oracle.com>
On Fri, 2023-12-15 at 11:15 -0800, Dai Ngo wrote:
> If the callback workqueue is stuck, nfsd4_deleg_getattr_conflict will
> also stuck waiting for the callback request to be executed. This causes
> the client to hang waiting for the reply of the GETATTR and also causes
> the reboot of the NFS server to hang due to the pending NFS request.
>
> Fix by replacing wait_on_bit with wait_on_bit_timeout with 20 seconds
> time out.
>
> Reported-by: Wolfgang Walter <linux-nfs@stwm.de>
> Fixes: 6c41d9a9bd02 ("NFSD: handle GETATTR conflict with write delegation")
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
> fs/nfsd/nfs4state.c | 6 +++++-
> fs/nfsd/state.h | 2 ++
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 175f3e9f5822..0cc7d4953807 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -2948,6 +2948,9 @@ void nfs4_cb_getattr(struct nfs4_cb_fattr *ncf)
> if (test_and_set_bit(CB_GETATTR_BUSY, &ncf->ncf_cb_flags))
> return;
>
> + /* set to proper status when nfsd4_cb_getattr_done runs */
> + ncf->ncf_cb_status = NFS4ERR_IO;
> +
> refcount_inc(&dp->dl_stid.sc_count);
> if (!nfsd4_run_cb(&ncf->ncf_getattr)) {
> refcount_dec(&dp->dl_stid.sc_count);
> @@ -8558,7 +8561,8 @@ nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct inode *inode,
> nfs4_cb_getattr(&dp->dl_cb_fattr);
> spin_unlock(&ctx->flc_lock);
>
> - wait_on_bit(&ncf->ncf_cb_flags, CB_GETATTR_BUSY, TASK_INTERRUPTIBLE);
> + wait_on_bit_timeout(&ncf->ncf_cb_flags, CB_GETATTR_BUSY,
> + TASK_INTERRUPTIBLE, NFSD_CB_GETATTR_TIMEOUT);
The RPC won't necessarily have timed out at this point, and it looks
like ncf_cb_status won't have been set to anything (and is probably
still 0?).
Don't you need to check whether the wait timed out or was successful?
What happens now when this times out?
> if (ncf->ncf_cb_status) {
> status = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ));
> if (status != nfserr_jukebox ||
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index f96eaa8e9413..94563a6813a6 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -135,6 +135,8 @@ struct nfs4_cb_fattr {
> /* bits for ncf_cb_flags */
> #define CB_GETATTR_BUSY 0
>
> +#define NFSD_CB_GETATTR_TIMEOUT msecs_to_jiffies(20000) /* 20 secs */
> +
Why 20s?
> /*
> * Represents a delegation stateid. The nfs4_client holds references to these
> * and they are put when it is being destroyed or when the delegation is
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2023-12-15 19:55 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-15 19:15 [PATCH 0/3] Bug fixes for NFSD callback Dai Ngo
2023-12-15 19:15 ` [PATCH 1/3] SUNRPC: remove printk when back channel request not found Dai Ngo
2023-12-15 19:37 ` Jeff Layton
2023-12-15 19:15 ` [PATCH 2/3] NFSD: restore delegation's sc_count if nfsd4_run_cb fails Dai Ngo
2023-12-15 19:42 ` Jeff Layton
2023-12-15 20:00 ` dai.ngo
2023-12-15 20:15 ` Jeff Layton
2023-12-15 20:22 ` dai.ngo
2023-12-15 19:15 ` [PATCH 3/3] NFSD: Fix server reboot hang problem when callback workqueue is stuck Dai Ngo
2023-12-15 19:54 ` Chuck Lever
2023-12-15 20:40 ` dai.ngo
2023-12-15 21:41 ` Chuck Lever
2023-12-15 21:55 ` dai.ngo
2023-12-16 1:21 ` Chuck Lever
2023-12-16 3:18 ` dai.ngo
2023-12-16 3:57 ` Chuck Lever
2023-12-16 22:44 ` dai.ngo
2023-12-18 16:02 ` Chuck Lever
2023-12-18 18:17 ` dai.ngo
2023-12-18 19:10 ` Chuck Lever
2023-12-18 20:27 ` dai.ngo
2023-12-15 19:54 ` Jeff Layton [this message]
2023-12-15 20:18 ` dai.ngo
2023-12-15 20:25 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c9e8608d0cb2562c645d1f31043fcbc90cf53d52.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=linux-nfs@stwm.de \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox