From: Jeff Layton <jlayton@kernel.org>
To: dai.ngo@oracle.com, Chuck Lever III <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Helen Chao <helen.chao@oracle.com>,
Anna Schumaker <anna@kernel.org>,
Trond Myklebust <trondmy@hammerspace.com>
Subject: Re: [PATCH] SUNRPC: remove the maximum number of retries in call_bind_status
Date: Thu, 06 Apr 2023 13:33:30 -0400 [thread overview]
Message-ID: <64c4e5c4e61962fd828bcbef79db1df6466a875d.camel@kernel.org> (raw)
In-Reply-To: <ed870a33-0809-3cfd-2d5a-b97409568b97@oracle.com>
On Tue, 2023-03-14 at 09:19 -0700, dai.ngo@oracle.com wrote:
> On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
> > On 3/8/23 10:50 AM, Chuck Lever III wrote:
> > >
> > > > On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> > > >
> > > > Currently call_bind_status places a hard limit of 3 to the number of
> > > > retries on EACCES error. This limit was done to accommodate the
> > > > behavior
> > > > of a buggy server that keeps returning garbage when the NLM daemon is
> > > > killed on the NFS server. However this change causes problem for other
> > > > servers that take a little longer than 9 seconds for the port mapper to
> > > > become ready when the NFS server is restarted.
> > > >
> > > > This patch removes this hard coded limit and let the RPC handles
> > > > the retry according to whether the export is soft or hard mounted.
> > > >
> > > > To avoid the hang with buggy server, the client can use soft mount for
> > > > the export.
> > > >
> > > > Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
> > > > Reported-by: Helen Chao <helen.chao@oracle.com>
> > > > Tested-by: Helen Chao <helen.chao@oracle.com>
> > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > Helen is the royal queen of ^C ;-)
> > >
> > > Did you try ^C on a mount while it waits for a rebind?
> >
> > She uses a test script that restarts the NFS server while NLM lock test
> > is running. The failure is random, sometimes it fails and sometimes it
> > passes depending on when the LOCK/UNLOCK requests come in so I think
> > it's hard to time it to do the ^C, but I will ask.
>
> We did the test with ^C and here is what we found.
>
> For synchronous RPC task the signal was delivered to the RPC task and
> the task exit with -ERESTARTSYS from __rpc_execute as expected.
>
> For asynchronous RPC task the process that invokes the RPC task to send
> the request detected the signal in rpc_wait_for_completion_task and exits
> with -ERESTARTSYS. However the async RPC was allowed to continue to run
> to completion. So if the async RPC task was retrying an operation and
> the NFS server was down, it will retry forever if this is a hard mount
> or until the NFS server comes back up.
>
> The question for the list is should we propagate the signal to the async
> task via rpc_signal_task to stop its execution or just leave it alone as is.
>
>
That is a good question.
I like the patch overall, as it gets rid of a special one-off retry
counter, but I too share some concerns about retrying indefinitely when
an server goes missing.
>
Propagating a signal seems like the right thing to do. Looks like
rpcb_getport_done would also need to grow a check for RPC_SIGNALLED ?
It sounds pretty straightforward otherwise.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2023-04-06 17:33 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-08 18:45 [PATCH] SUNRPC: remove the maximum number of retries in call_bind_status Dai Ngo
2023-03-08 18:50 ` Chuck Lever III
2023-03-08 19:03 ` dai.ngo
2023-03-14 16:19 ` dai.ngo
2023-03-16 13:38 ` Fwd: " Chuck Lever III
2023-03-27 16:05 ` dai.ngo
2023-04-06 17:33 ` Jeff Layton [this message]
2023-04-06 18:10 ` Jeff Layton
2023-04-06 19:36 ` dai.ngo
2023-04-06 19:43 ` Chuck Lever III
2023-04-06 19:59 ` Jeff Layton
2023-04-06 20:58 ` dai.ngo
2023-04-06 21:51 ` Jeff Layton
2023-04-06 20:58 ` dai.ngo
-- strict thread matches above, loose matches on Subject: below --
2023-04-18 20:19 Dai Ngo
2023-04-19 10:06 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=64c4e5c4e61962fd828bcbef79db1df6466a875d.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=helen.chao@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox