public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Calum Mackay <calum.mackay@oracle.com>
To: trondmy@hammerspace.com, anna.schumaker@netapp.com
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] lockd: don't use interval-based rebinding over TCP
Date: Thu, 12 Nov 2020 21:04:07 +0000	[thread overview]
Message-ID: <c9ae2eec-6db9-d4de-fb29-e4dcc6be2dac@oracle.com> (raw)
In-Reply-To: <20201028201627.23625-1-calum.mackay@oracle.com>


[-- Attachment #1.1: Type: text/plain, Size: 3663 bytes --]

hi Anna, Trond,

was there any comment on this, please?

thanks,
calum.

On 28/10/2020 8:16 pm, Calum Mackay wrote:
> NLM uses an interval-based rebinding, i.e. it clears the transport's
> binding under certain conditions if more than 60 seconds have elapsed
> since the connection was last bound.
> 
> This rebinding is not necessary for an autobind RPC client over a
> connection-oriented protocol like TCP.
> 
> It can also cause problems: it is possible for nlm_bind_host() to clear
> XPRT_BOUND whilst a connection worker is in the middle of trying to
> reconnect, after it had already been checked in xprt_connect().
> 
> When the connection worker notices that XPRT_BOUND has been cleared
> under it, in xs_tcp_finish_connecting(), that results in:
> 
> 	xs_tcp_setup_socket: connect returned unhandled error -107
> 
> Worse, it's possible that the two can get into lockstep, resulting in
> the same behaviour repeated indefinitely, with the above error every
> 300 seconds, without ever recovering, and the connection never being
> established. This has been seen in practice, with a large number of NLM
> client tasks, following a server restart.
> 
> The existing callers of nlm_bind_host & nlm_rebind_host should not need
> to force the rebind, for TCP, so restrict the interval-based rebinding
> to UDP only.
> 
> For TCP, we will still rebind when needed, e.g. on timeout, and connection
> error (including closure), since connection-related errors on an existing
> connection, ECONNREFUSED when trying to connect, and rpc_check_timeout(),
> already unconditionally clear XPRT_BOUND.
> 
> To avoid having to add the fix, and explanation, to both nlm_bind_host()
> and nlm_rebind_host(), remove the duplicate code from the former, and
> have it call the latter.
> 
> Drop the dprintk, which adds no value over a trace.
> 
> Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
> ---
>   fs/lockd/host.c | 20 +++++++++++---------
>   1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/lockd/host.c b/fs/lockd/host.c
> index 0afb6d59bad0..771c289f6df7 100644
> --- a/fs/lockd/host.c
> +++ b/fs/lockd/host.c
> @@ -439,12 +439,7 @@ nlm_bind_host(struct nlm_host *host)
>   	 * RPC rebind is required
>   	 */
>   	if ((clnt = host->h_rpcclnt) != NULL) {
> -		if (time_after_eq(jiffies, host->h_nextrebind)) {
> -			rpc_force_rebind(clnt);
> -			host->h_nextrebind = jiffies + NLM_HOST_REBIND;
> -			dprintk("lockd: next rebind in %lu jiffies\n",
> -					host->h_nextrebind - jiffies);
> -		}
> +		nlm_rebind_host(host);
>   	} else {
>   		unsigned long increment = nlmsvc_timeout;
>   		struct rpc_timeout timeparms = {
> @@ -494,13 +489,20 @@ nlm_bind_host(struct nlm_host *host)
>   	return clnt;
>   }
>   
> -/*
> - * Force a portmap lookup of the remote lockd port
> +/**
> + * nlm_rebind_host - If needed, force a portmap lookup of the peer's lockd port
> + * @host: NLM host handle for peer
> + *
> + * This is not needed when using a connection-oriented protocol, such as TCP.
> + * The existing autobind mechanism is sufficient to force a rebind when
> + * required, e.g. on connection state transitions.
>    */
>   void
>   nlm_rebind_host(struct nlm_host *host)
>   {
> -	dprintk("lockd: rebind host %s\n", host->h_name);
> +	if (host->h_proto != IPPROTO_UDP)
> +		return;
> +
>   	if (host->h_rpcclnt && time_after_eq(jiffies, host->h_nextrebind)) {
>   		rpc_force_rebind(host->h_rpcclnt);
>   		host->h_nextrebind = jiffies + NLM_HOST_REBIND;
> 

-- 
Calum Mackay
Linux Kernel Engineering
Oracle Linux and Virtualisation

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

      reply	other threads:[~2020-11-12 21:06 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-28 20:16 [PATCH] lockd: don't use interval-based rebinding over TCP Calum Mackay
2020-11-12 21:04 ` Calum Mackay [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9ae2eec-6db9-d4de-fb29-e4dcc6be2dac@oracle.com \
    --to=calum.mackay@oracle.com \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox