public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: Sunrpc transport reconnection...
Date: Mon, 29 Mar 2010 18:25:16 -0400	[thread overview]
Message-ID: <4BB128CC.7060707@oracle.com> (raw)
In-Reply-To: <1269898987.15895.95.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>

On 03/29/2010 05:43 PM, Trond Myklebust wrote:
> Having looked more carefully at the code...
>
> Is there any reason to keep the xprt->connect_timeout? As far as I can
> see, it would appear to be completely redundant.
>
> For UDP there is no disconnect/reconnect. The socket is set up once and
> for all, so no need for connect_timeout.

I vaguely recall at least one case where UDP connect can hang.  The 
connect_timeout in that case is short, on the order of 5 seconds.

> In the case of a TCP reconnection, the xprt->reestablish_timeout will do
> exponential back-off to prevent reconnecting unnecessarily. The
> XPRT_CONNECTING lock then prevents anyone from trying to interfere with
> that connect request until it gets a reply, or the TCP layer decides
> that the socket has timed out. Again, it appears that
> xprt->connect_timeout is redundant.

It probably is redundant when the host is unreachable.  We already have 
the retransmit timeout, and the socket connect call is always 
nonblocking because we aren't supposed to sleep.

If the server refuses our connection, the connect_timeout timer appears 
to be discarded, which is probably the wrong thing to do.

> RDMA reconnection appears to follow the TCP model. Once again, there is
> exponential back-off, enforced by XPRT_CONNECTING.
>
> So why do we have xprt->connect_timeout? What is it enforcing?

It may be a vestige of the previous connect implementation, where 
call_connect actually could have slept in the RPC scheduler.

We eventually want to invoke call_timeout if the connection never 
completes.  It could be because the server keeps refusing or resetting 
our connection, or because there is no response from the server.  The 
retransmit timeout is probably appropriate for both of those cases.

-- 
chuck[dot]lever[at]oracle[dot]com

      parent reply	other threads:[~2010-03-29 22:27 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-29 21:43 Sunrpc transport reconnection Trond Myklebust
     [not found] ` <1269898987.15895.95.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-03-29 22:25   ` Chuck Lever [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BB128CC.7060707@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox