From: Calum Mackay <calum.mackay@oracle.com>
To: trondmy@hammerspace.com, anna.schumaker@netapp.com
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] lockd: don't use interval-based rebinding over TCP
Date: Thu, 12 Nov 2020 21:04:07 +0000 [thread overview]
Message-ID: <c9ae2eec-6db9-d4de-fb29-e4dcc6be2dac@oracle.com> (raw)
In-Reply-To: <20201028201627.23625-1-calum.mackay@oracle.com>
[-- Attachment #1.1: Type: text/plain, Size: 3663 bytes --]
hi Anna, Trond,
was there any comment on this, please?
thanks,
calum.
On 28/10/2020 8:16 pm, Calum Mackay wrote:
> NLM uses an interval-based rebinding, i.e. it clears the transport's
> binding under certain conditions if more than 60 seconds have elapsed
> since the connection was last bound.
>
> This rebinding is not necessary for an autobind RPC client over a
> connection-oriented protocol like TCP.
>
> It can also cause problems: it is possible for nlm_bind_host() to clear
> XPRT_BOUND whilst a connection worker is in the middle of trying to
> reconnect, after it had already been checked in xprt_connect().
>
> When the connection worker notices that XPRT_BOUND has been cleared
> under it, in xs_tcp_finish_connecting(), that results in:
>
> xs_tcp_setup_socket: connect returned unhandled error -107
>
> Worse, it's possible that the two can get into lockstep, resulting in
> the same behaviour repeated indefinitely, with the above error every
> 300 seconds, without ever recovering, and the connection never being
> established. This has been seen in practice, with a large number of NLM
> client tasks, following a server restart.
>
> The existing callers of nlm_bind_host & nlm_rebind_host should not need
> to force the rebind, for TCP, so restrict the interval-based rebinding
> to UDP only.
>
> For TCP, we will still rebind when needed, e.g. on timeout, and connection
> error (including closure), since connection-related errors on an existing
> connection, ECONNREFUSED when trying to connect, and rpc_check_timeout(),
> already unconditionally clear XPRT_BOUND.
>
> To avoid having to add the fix, and explanation, to both nlm_bind_host()
> and nlm_rebind_host(), remove the duplicate code from the former, and
> have it call the latter.
>
> Drop the dprintk, which adds no value over a trace.
>
> Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
> ---
> fs/lockd/host.c | 20 +++++++++++---------
> 1 file changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/fs/lockd/host.c b/fs/lockd/host.c
> index 0afb6d59bad0..771c289f6df7 100644
> --- a/fs/lockd/host.c
> +++ b/fs/lockd/host.c
> @@ -439,12 +439,7 @@ nlm_bind_host(struct nlm_host *host)
> * RPC rebind is required
> */
> if ((clnt = host->h_rpcclnt) != NULL) {
> - if (time_after_eq(jiffies, host->h_nextrebind)) {
> - rpc_force_rebind(clnt);
> - host->h_nextrebind = jiffies + NLM_HOST_REBIND;
> - dprintk("lockd: next rebind in %lu jiffies\n",
> - host->h_nextrebind - jiffies);
> - }
> + nlm_rebind_host(host);
> } else {
> unsigned long increment = nlmsvc_timeout;
> struct rpc_timeout timeparms = {
> @@ -494,13 +489,20 @@ nlm_bind_host(struct nlm_host *host)
> return clnt;
> }
>
> -/*
> - * Force a portmap lookup of the remote lockd port
> +/**
> + * nlm_rebind_host - If needed, force a portmap lookup of the peer's lockd port
> + * @host: NLM host handle for peer
> + *
> + * This is not needed when using a connection-oriented protocol, such as TCP.
> + * The existing autobind mechanism is sufficient to force a rebind when
> + * required, e.g. on connection state transitions.
> */
> void
> nlm_rebind_host(struct nlm_host *host)
> {
> - dprintk("lockd: rebind host %s\n", host->h_name);
> + if (host->h_proto != IPPROTO_UDP)
> + return;
> +
> if (host->h_rpcclnt && time_after_eq(jiffies, host->h_nextrebind)) {
> rpc_force_rebind(host->h_rpcclnt);
> host->h_nextrebind = jiffies + NLM_HOST_REBIND;
>
--
Calum Mackay
Linux Kernel Engineering
Oracle Linux and Virtualisation
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
prev parent reply other threads:[~2020-11-12 21:06 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-28 20:16 [PATCH] lockd: don't use interval-based rebinding over TCP Calum Mackay
2020-11-12 21:04 ` Calum Mackay [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c9ae2eec-6db9-d4de-fb29-e4dcc6be2dac@oracle.com \
--to=calum.mackay@oracle.com \
--cc=anna.schumaker@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox