Re: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Chuck Lever <chuck.lever@oracle.com>
To: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.
Date: Wed, 03 Feb 2010 17:34:06 -0500	[thread overview]
Message-ID: <4B69F9DE.10604@oracle.com> (raw)
In-Reply-To: <20100204082354.0bf3b7e5@notabene.brown>

On 02/03/2010 04:23 PM, Neil Brown wrote:
> On Wed, 03 Feb 2010 10:43:04 -0500
> Chuck Lever<chuck.lever@oracle.com>  wrote:
>
>> On 02/03/2010 01:31 AM, NeilBrown wrote:
>>> If we drop a request in the sunrpc layer, either due kmalloc failure,
>>> or due to a cache miss when we could not queue the request for later
>>> replay, then close the connection to encourage the client to retry sooner.
>>
>> I studied connection dropping behavior a few years back, and decided
>> that dropping the connection on a retransmit is nearly always
>> counterproductive.  Any other pending requests on a connection that is
>> dropped must also be retransmitted, which means one retransmit suddenly
>> turns into many.  And then you get into issues of idempotency and all
>> the extra traffic and the long delays and the risk of reconnecting on a
>> different port so that XID replay is undetectable...
>
> You make some good points there, thanks.
>
>>
>> I don't think dropping the connection will cause the client to
>> retransmit sooner.  Clients I have encountered will reconnect and
>> retransmit only after their retransmit timeout fires, never sooner.
>>
>
> I thought I had noticed the Linux client resending immediately, but it would
> have been a while ago, and I could easily be remembering wrongly.
>
> My reasoning was that if the connection is closed then the client can *know*
> that they won't get a response to any outstanding requests, rather than
> having to use the timeout heuristic.  How the client uses that information I
> don't know, but at least they would have it.

So, I seem to remember expecting that behavior, and then being 
disappointed when it didn't work that way :-)

It's also the case that there is some pathological behavior around 
reconnect in some of the older 2.6 NFS clients.  Trond would probably 
remember the details there.

In any event, I think you would do well to make some direct observations 
with several different vintages of Linux NFS clients, just to be sure 
this works as you expect it to with reasonable clients.

I noticed that connection drops are especially onerous when a server is 
under load, and that's exactly when drops seem to occur most often. It's 
one of those corner cases that's nearly impossible to test well.

>> Unfortunately NFSv4 requires a connection drop before a retransmit, but
>> NFSv3 does not.  NFSv4 servers are rather supposed to try very hard not
>> to drop requests.
>>
>> How often do you expect this kind of recovery to be necessary?  Would it
>> be possible to drop only for NFSv4 connections?
>>
>
> With the improved handling of large requests I would expect this kind of
> recovery would be very rarely needed.
>
> Yes, it would be quite easy to only drop connections on which we have seen an
> NFSv4 request... and maybe also connections on which we have not successfully
> handled any request yet(?).
> What if, instead of closing the connection, we set a flag so that it would be
> closed as soon as it had been idle for 1 second,  thus flushing any other
> pending requests???   That probably doesn't help - there would easily be real
> cases where other threads of activity keep the connection busy, while the
> thread waiting for the lost request still needs a full time-out.
>
> I would be happy with the v4-only version.

v4 would almost be required to work this way, I think.  I wouldn't 
object to a v4-only implementation for now.

I'm sure there are cases where v3 will have to drop the connection... 
i'd just like to ensure that's it's absolutely a last resort.

-- 
chuck[dot]lever[at]oracle[dot]com

next prev parent reply	other threads:[~2010-02-03 22:35 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-03  6:31 [PATCH 0/9] Cache deferal improvements - try++ NeilBrown
     [not found] ` <20100203060657.12945.27293.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-02-03  6:31   ` [PATCH 1/9] sunrpc: don't keep expired entries in the auth caches NeilBrown
     [not found]     ` <20100203063130.12945.29226.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-03-15  0:58       ` J. Bruce Fields
2010-02-03  6:31   ` [PATCH 5/9] nfsd/idmap: drop special request deferal in favour of improved default NeilBrown
2010-02-03  6:31   ` [PATCH 9/9] sunrpc/cache: change deferred-request hash table to use hlist NeilBrown
2010-02-03  6:31   ` [PATCH 7/9] nfsd: factor out hash functions for export caches NeilBrown
     [not found]     ` <20100203063131.12945.38791.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-03-17 19:35       ` J. Bruce Fields
2010-02-03  6:31   ` [PATCH 4/9] sunrpc/cache: allow threads to block while waiting for cache update NeilBrown
2010-04-15 15:55     ` J. Bruce Fields
2010-02-03  6:31   ` [PATCH 2/9] sunrpc/cache: factor out cache_is_expired NeilBrown
     [not found]     ` <20100203063131.12945.65321.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-03-15  0:58       ` J. Bruce Fields
2010-02-03  6:31   ` [PATCH 8/9] svcauth_gss: replace a trivial 'switch' with an 'if' NeilBrown
2010-02-03  6:31   ` [PATCH 3/9] sunrpc: never return expired entries in sunrpc_cache_lookup NeilBrown
     [not found]     ` <20100203063131.12945.97779.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-03-17 21:33       ` J. Bruce Fields
2010-03-24  1:22         ` Neil Brown
2010-03-30 15:11           ` J. Bruce Fields
2010-02-03  6:31   ` [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost NeilBrown
2010-02-03 15:43     ` Chuck Lever
2010-02-03 21:23       ` Neil Brown
2010-02-03 22:20         ` Trond Myklebust
2010-02-03 22:34           ` J. Bruce Fields
2010-02-03 22:40           ` Chuck Lever
2010-02-03 23:13             ` Trond Myklebust
2010-02-04  0:06               ` Chuck Lever
2010-02-04  0:24                 ` Trond Myklebust
2010-02-03 22:34         ` Chuck Lever [this message]
2010-02-03 19:43   ` [PATCH 0/9] Cache deferal improvements - try++ J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B69F9DE.10604@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox