From: Chuck Lever <chuck.lever@oracle.com>
To: Mi Jinlong <mijinlong@cn.fujitsu.com>
Cc: "Trond.Myklebust" <trond.myklebust@fys.uio.no>,
"J. Bruce Fields" <bfields@fieldses.org>,
NFSv3 list <linux-nfs@vger.kernel.org>
Subject: Re: [RFC] SUNRPC connect timeout case network request delay
Date: Mon, 08 Mar 2010 10:40:32 -0500 [thread overview]
Message-ID: <4B951A70.7080201@oracle.com> (raw)
In-Reply-To: <4B94CA73.90600@cn.fujitsu.com>
On 03/08/2010 04:59 AM, Mi Jinlong wrote:
> Hi chuck,
>
> Thanks for your reply.
>
> Chuck Lever =E5=86=99=E9=81=93:
>> On 03/04/2010 05:12 AM, Mi Jinlong wrote:
>>> Hi,
>>> Step4: [22:42:16] Write data to file
>>> [22:42:16] Write data success
>>> Step5: [22:42:16] Unlock file
>>> [22:46:30] Unlock file success.
>>> Step6: [22:46:30] Close file /mnt/nfs/file
>>> [22:46:30] Close fiel /mnt/nfs/file success
>>>
>>> The problem is at step5, unlock file takes 4 min, it's a long time
>>> than expected.
>>> When traceing the kernel, I find SUNRPC call call_connect timeout m=
any
>>> times,
>>> one timeout is 1min.
>>
>> The kernel's TCP reconnect logic will retry until it succeeds, witho=
ut
>> letting the upper level make progress. For some reason, it is havin=
g
>> difficulty reconnecting with your server.
>>
>>> I think it's a problem of kernel, but i don't know why, can someone
>>> help me ?
>>
>> # sudo rpcdebug -m rpc -s xprt trans
>
> After running this command, I got some important messages that I th=
ink.
>
> RPC: xs_connect delayed xprt for 3 seconds
> ...
> RPC: xs_connect delayed xprt for 6 seconds
> ...
> RPC: xs_connect delayed xprt for 12 seconds
> ...
> RPC: xs_connect delayed xprt for 24 seconds
> ...
> ...
> RPC: xs_connect delayed xprt for 300 seconds
>
> This message is printed at xs_connect, and the delay time is double t=
here.
> IMO, when some data translate over through a socket, the socket shoul=
d be released.
> But, it seems the socket isn't released through those messages above.
> Is it wrong, or there are some other reasons ?
The code is trying to connect, but the ->connect call isn't working=20
somehow. The code backs off by doubling the timeout each time, so that=
=20
the connect attempts don't overload the server.
This tells us that the code is attempting to connect, but not why the=20
connect attempt is failing.
> At the latest kernel, this bug was fix by patch
> "NFS/RPC: fix problems with reestablish_timeout and related code."
> But I don't sure about this.
--=20
chuck[dot]lever[at]oracle[dot]com
next prev parent reply other threads:[~2010-03-08 15:41 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-04 10:12 [RFC] SUNRPC connect timeout case network request delay Mi Jinlong
2010-03-04 16:58 ` Chuck Lever
2010-03-08 9:59 ` Mi Jinlong
2010-03-08 15:40 ` Chuck Lever [this message]
2010-03-12 9:55 ` Mi Jinlong
2010-03-12 13:35 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B951A70.7080201@oracle.com \
--to=chuck.lever@oracle.com \
--cc=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
--cc=mijinlong@cn.fujitsu.com \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.