Linux NFS development
 help / color / mirror / Atom feed
From: dai.ngo@oracle.com
To: Trond Myklebust <trondmy@hammerspace.com>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"jlayton@kernel.org" <jlayton@kernel.org>
Subject: Re: [PATCH] SUNRPC: increase max timeout for rebind to handle NFS server restart
Date: Mon, 17 Apr 2023 16:14:11 -0700	[thread overview]
Message-ID: <00fe5d5a-43b8-0f0e-2afe-ae68367f822d@oracle.com> (raw)
In-Reply-To: <d0a5cf8b8c8dbdbf83658b9456afb798af5d7941.camel@hammerspace.com>


On 4/17/23 2:53 PM, Trond Myklebust wrote:
> On Mon, 2023-04-17 at 14:51 -0700, dai.ngo@oracle.com wrote:
>> On 4/17/23 2:48 PM, Trond Myklebust wrote:
>>> On Mon, 2023-04-17 at 21:41 +0000, Chuck Lever III wrote:
>>>>> On Apr 17, 2023, at 4:49 PM, Trond Myklebust
>>>>> <trondmy@hammerspace.com> wrote:
>>>>>
>>>>> On Fri, 2023-04-07 at 20:30 -0700, Dai Ngo wrote:
>>>>>> Currently call_bind_status places a hard limit of 9 seconds
>>>>>> for
>>>>>> retries
>>>>>> on EACCES error. This limit was done to prevent the RPC
>>>>>> request
>>>>>> from
>>>>>> being retried forever if the remote server has problem and
>>>>>> never
>>>>>> comes
>>>>>> up
>>>>>>
>>>>>> However this 9 seconds timeout is too short, comparing to
>>>>>> other
>>>>>> RPC
>>>>>> timeouts which are generally in minutes. This causes
>>>>>> intermittent
>>>>>> failure
>>>>>> with EIO on the client side when there are lots of NLM
>>>>>> activity
>>>>>> and
>>>>>> the
>>>>>> NFS server is restarted.
>>>>>>
>>>>>> Instead of removing the max timeout for retry and relying on
>>>>>> the
>>>>>> RPC
>>>>>> timeout mechanism to handle the retry, which can lead to the
>>>>>> RPC
>>>>>> being
>>>>>> retried forever if the remote NLM service fails to come up.
>>>>>> This
>>>>>> patch
>>>>>> simply increases the max timeout of call_bind_status from 9
>>>>>> to 90
>>>>>> seconds
>>>>>> which should allow enough time for NLM to register after a
>>>>>> restart,
>>>>>> and
>>>>>> not retrying forever if there is real problem with the remote
>>>>>> system.
>>>>>>
>>>>>> Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock
>>>>>> requests")
>>>>>> Reported-by: Helen Chao <helen.chao@oracle.com>
>>>>>> Tested-by: Helen Chao <helen.chao@oracle.com>
>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>> ---
>>>>>>    include/linux/sunrpc/clnt.h  | 3 +++
>>>>>>    include/linux/sunrpc/sched.h | 4 ++--
>>>>>>    net/sunrpc/clnt.c            | 2 +-
>>>>>>    net/sunrpc/sched.c           | 3 ++-
>>>>>>    4 files changed, 8 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/sunrpc/clnt.h
>>>>>> b/include/linux/sunrpc/clnt.h
>>>>>> index 770ef2cb5775..81afc5ea2665 100644
>>>>>> --- a/include/linux/sunrpc/clnt.h
>>>>>> +++ b/include/linux/sunrpc/clnt.h
>>>>>> @@ -162,6 +162,9 @@ struct rpc_add_xprt_test {
>>>>>>    #define RPC_CLNT_CREATE_REUSEPORT      (1UL << 11)
>>>>>>    #define RPC_CLNT_CREATE_CONNECTED      (1UL << 12)
>>>>>>    
>>>>>> +#define        RPC_CLNT_REBIND_DELAY           3
>>>>>> +#define        RPC_CLNT_REBIND_MAX_TIMEOUT     90
>>>>>> +
>>>>>>    struct rpc_clnt *rpc_create(struct rpc_create_args *args);
>>>>>>    struct rpc_clnt        *rpc_bind_new_program(struct
>>>>>> rpc_clnt *,
>>>>>>                                   const struct rpc_program *,
>>>>>> u32);
>>>>>> diff --git a/include/linux/sunrpc/sched.h
>>>>>> b/include/linux/sunrpc/sched.h
>>>>>> index b8ca3ecaf8d7..e9dc142f10bb 100644
>>>>>> --- a/include/linux/sunrpc/sched.h
>>>>>> +++ b/include/linux/sunrpc/sched.h
>>>>>> @@ -90,8 +90,8 @@ struct rpc_task {
>>>>>>    #endif
>>>>>>           unsigned char           tk_priority : 2,/* Task
>>>>>> priority
>>>>>> */
>>>>>>                                   tk_garb_retry : 2,
>>>>>> -                               tk_cred_retry : 2,
>>>>>> -                               tk_rebind_retry : 2;
>>>>>> +                               tk_cred_retry : 2;
>>>>>> +       unsigned char           tk_rebind_retry;
>>>>>>    };
>>>>>>    
>>>>>>    typedef void                   (*rpc_action)(struct
>>>>>> rpc_task *);
>>>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>>>>> index fd7e1c630493..222578af6b01 100644
>>>>>> --- a/net/sunrpc/clnt.c
>>>>>> +++ b/net/sunrpc/clnt.c
>>>>>> @@ -2053,7 +2053,7 @@ call_bind_status(struct rpc_task *task)
>>>>>>                   if (task->tk_rebind_retry == 0)
>>>>>>                           break;
>>>>>>                   task->tk_rebind_retry--;
>>>>>> -               rpc_delay(task, 3*HZ);
>>>>>> +               rpc_delay(task, RPC_CLNT_REBIND_DELAY * HZ);
>>>>>>                   goto retry_timeout;
>>>>>>           case -ENOBUFS:
>>>>>>                   rpc_delay(task, HZ >> 2);
>>>>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>>>>> index be587a308e05..5c18a35752aa 100644
>>>>>> --- a/net/sunrpc/sched.c
>>>>>> +++ b/net/sunrpc/sched.c
>>>>>> @@ -817,7 +817,8 @@ rpc_init_task_statistics(struct rpc_task
>>>>>> *task)
>>>>>>           /* Initialize retry counters */
>>>>>>           task->tk_garb_retry = 2;
>>>>>>           task->tk_cred_retry = 2;
>>>>>> -       task->tk_rebind_retry = 2;
>>>>>> +       task->tk_rebind_retry = RPC_CLNT_REBIND_MAX_TIMEOUT /
>>>>>> +
>>>>>> RPC_CLNT_REBIND_DELAY;
>>>>> Why not just implement an exponential back off? If the server
>>>>> is
>>>>> slow
>>>>> to come up, then pounding the rpcbind service every 3 seconds
>>>>> isn't
>>>>> going to help.
>>>> Exponential backoff has the disadvantage that once we've gotten
>>>> to the longer retry times, it's easy for a client to wait quite
>>>> some time before it notices the rebind service is available.
>>>>
>>>> But I have to wonder if retrying every 3 seconds is useful if
>>>> the request is going over TCP.
>>>>
>>> The problem isn't reliability: this is handling a case where we
>>> _are_
>>> getting a reply from the server, just not one we wanted. EACCES
>>> here
>>> means that the rpcbind server didn't return a valid entry for the
>>> service we requested, presumably because the service hasn't been
>>> registered yet.
>> That's correct.
>>
>>> So yes, an exponential back off is appropriate here.
>> I think Chuck's point is still valid. It makes the client a little
>> more
>> responsive; does not have to wait that long, and the overhead of a
>> RPC
>> request every 3 seconds is not that significant.
> It is when you do it 30 times before giving up.

This is true for a dead NFS server, we save 25 RPC calls in a period
of ~90 seconds.

In the case of a head failover and the NFS service on the takeover
head can take up to 15 seconds to restart to pick up the new exports
from the fail head, then the client can potentially wait up to ~20
seconds, after the NFS server is ready, to resume normal operation.
IMHO, it's a 'long' time and I'd prioritize speed over saving some
overhead.

Having said that, I will make the change that Trond suggested if
Chuck and Jeff don't have any objection.

-Dai

>

  reply	other threads:[~2023-04-17 23:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-08  3:30 [PATCH] SUNRPC: increase max timeout for rebind to handle NFS server restart Dai Ngo
2023-04-12 17:50 ` Jeff Layton
2023-04-12 17:55   ` Chuck Lever III
2023-04-12 18:09     ` dai.ngo
2023-04-17 20:49 ` Trond Myklebust
2023-04-17 21:41   ` Chuck Lever III
2023-04-17 21:48     ` Trond Myklebust
2023-04-17 21:51       ` dai.ngo
2023-04-17 21:53         ` Trond Myklebust
2023-04-17 23:14           ` dai.ngo [this message]
2023-04-18  0:23             ` Trond Myklebust
2023-04-18  1:04               ` dai.ngo
2023-04-18 16:02                 ` Trond Myklebust
2023-04-18 17:40                   ` dai.ngo
2023-04-17 21:53       ` Chuck Lever III
2023-04-17 22:10         ` Trond Myklebust
2023-04-18  0:06           ` Chuck Lever III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00fe5d5a-43b8-0f0e-2afe-ae68367f822d@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox