All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: Mi Jinlong <mijinlong@cn.fujitsu.com>
Cc: Chuck Lever <chuck.lever@oracle.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	NFSv3 list <linux-nfs@vger.kernel.org>
Subject: Re: [RFC] SUNRPC connect timeout case network request delay
Date: Fri, 12 Mar 2010 08:35:10 -0500	[thread overview]
Message-ID: <1268400910.3156.6.camel@localhost.localdomain> (raw)
In-Reply-To: <4B9A0F8C.5030900@cn.fujitsu.com>

On Fri, 2010-03-12 at 17:55 +0800, Mi Jinlong wrote: 
> Hi,
> 
>   Thanks for your reply.
> 
> Chuck Lever 写道:
> > On 03/08/2010 04:59 AM, Mi Jinlong wrote:
> >> Hi chuck,
> >>
> >>   Thanks for your reply.
> >>
> >> Chuck Lever 写道:
> >>> On 03/04/2010 05:12 AM, Mi Jinlong wrote:
> >>>> Hi,
> >>>> Step4: [22:42:16] Write data to file
> >>>>          [22:42:16] Write data success
> >>>> Step5: [22:42:16] Unlock file
> >>>>          [22:46:30] Unlock file success.
> >>>> Step6: [22:46:30] Close file /mnt/nfs/file
> >>>>          [22:46:30] Close fiel /mnt/nfs/file success
> >>>>
> >>>> The problem is at step5, unlock file takes 4 min, it's a long time
> >>>> than expected.
> >>>> When traceing the kernel, I find SUNRPC call call_connect timeout many
> >>>> times,
> >>>> one timeout is 1min.
> >>>
> >>> The kernel's TCP reconnect logic will retry until it succeeds, without
> >>> letting the upper level make progress.  For some reason, it is having
> >>> difficulty reconnecting with your server.
> >>>
> >>>> I think it's a problem of kernel, but i don't know why, can someone
> >>>> help me ?
> >>>
> >>> # sudo rpcdebug -m rpc -s xprt trans
> >>
> >>   After running this command, I got some important messages that I think.
> >>
> >>   RPC:       xs_connect delayed xprt  for 3 seconds
> >>   ...
> >>   RPC:       xs_connect delayed xprt  for 6 seconds
> >>   ...
> >>   RPC:       xs_connect delayed xprt  for 12 seconds
> >>   ...
> >>   RPC:       xs_connect delayed xprt  for 24 seconds
> >>   ...
> >>   ...
> >>   RPC:       xs_connect delayed xprt  for 300 seconds
> >>
> >> This message is printed at xs_connect, and the delay time is double
> >> there.
> >> IMO, when some data translate over through a socket, the socket should
> >> be released.
> >> But, it seems the socket isn't released through those messages above.
> >> Is it wrong, or there are some other reasons ?
> > 
> > The code is trying to connect, but the ->connect call isn't working
> > somehow.  The code backs off by doubling the timeout each time, so that
> > the connect attempts don't overload the server.
> > 
> > This tells us that the code is attempting to connect, but not why the
> > connect attempt is failing.
> 
>   When reading the kernel codes, I find a problem at function xs_tcp_close.
>     ....
>     772 static void xs_tcp_close(struct rpc_xprt *xprt)
>     773 {
>     774         if (test_and_clear_bit(XPRT_CONNECTION_CLOSE, &xprt->state))
>     775                 xs_close(xprt);
>     776         else
>     777                 xs_tcp_shutdown(xprt);
>     778 }
>      ...
>   If a task call xs_tcp_close to close the xprt's sock, many times it only call 
>   xs_tcp_shutdown to using the next layer's close function to close the socket
>   connection.
>   But after close the socket connection, the socket also exist, so the socket may
>   be reused. Is it a problem ? I think after xs_tcp_shutdown, the socket should
>   be released.

No it shouldn't. The whole point of the current code is to allow the RPC
client to _reuse_ the same port without having to wait for a TIME_WAIT.

The reason why we want to do that is because a lot of servers key their
duplicate reply caches on the port number. See
    http://www.connectathon.org/talks96/werme1.html

Trond


      reply	other threads:[~2010-03-12 13:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-04 10:12 [RFC] SUNRPC connect timeout case network request delay Mi Jinlong
2010-03-04 16:58 ` Chuck Lever
2010-03-08  9:59   ` Mi Jinlong
2010-03-08 15:40     ` Chuck Lever
2010-03-12  9:55       ` Mi Jinlong
2010-03-12 13:35         ` Trond Myklebust [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1268400910.3156.6.camel@localhost.localdomain \
    --to=trond.myklebust@fys.uio.no \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mijinlong@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.