From: Mi Jinlong <mijinlong@cn.fujitsu.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: NFSv3 list <linux-nfs@vger.kernel.org>,
"J. Bruce Fields" <bfields@fieldses.org>,
"Trond.Myklebust" <trond.myklebust@fys.uio.no>,
"Batsakis, Alexandros" <Alexandros.Batsakis@netapp.com>
Subject: Re: [PATCH] NFS: add a sysctl for disable the reconnect delay
Date: Wed, 14 Apr 2010 18:30:23 +0800 [thread overview]
Message-ID: <4BC5993F.2040401@cn.fujitsu.com> (raw)
In-Reply-To: <4BC48150.6020405@oracle.com>
Chuck Lever =E5=86=99=E9=81=93:
> On 04/13/2010 06:25 AM, Mi Jinlong wrote:
>> Hi Chuck,
>>
>> Sorry for replying your message so later.
>>
>> Chuck Lever =E5=86=99=E9=81=93:
>>> Hi Mi-
>>>
>>> On 03/18/2010 06:11 AM, Mi Jinlong wrote:
>>>> If network partition or some other reason cause a reconnect, it ca=
nnot
>>>> succeed immediately when environment recover, but client want to
>>>> connect
>>>> timely sometimes.
>>>>
>>>> This patch can provide a proc
>>>> file(/proc/sys/fs/nfs/nfs_disable_reconnect_delay)
>>>> to allow client disable the reconnect delay(reestablish_timeout) w=
hen
>>>> using NFS.
>>>>
>>>> It's only useful for NFS.
>>>
>>> There's a good reason for the connection re-establishment delay, an=
d
>>> only very few instances where you'd want to disable it. A sysctl i=
s the
>>> wrong place for this, as it would disable the reconnect delay acros=
s the
>>> board, instead of for just those occasions when it is actually nece=
ssary
>>> to connect immediately.
>>
>> Yes, I agree with you.
>>
>>>
>>> I assume that because the grace period has a time limit, you would =
want
>>> the client to reconnect at all costs? I think that this is actuall=
y
>>> when a client should take care not to spuriously reconnect: during =
a
>>> server reboot, a server may be sluggish or not completely ready to
>>> accept client requests. It's not a time when a client should be
>>> showering a server with connection attempts.
>>>
>>> The reconnect delay is an exponential backoff that starts at 3 seco=
nds,
>>> so if the server is really ready to accept connections, the actual
>>> connection delay ought to be quick.
>>>
>>> We're already considering shortening the maximum amount of time the
>>> client can wait before trying a reconnect. And, it might possibly =
be
>>> that the network layer itself is interfering with the backoff logic=
that
>>> is already built into the RPC client. (If true, that would be the =
real
>>> bug in this case). I'm not interested in a workaround when we real=
ly
>>> should fix any underlying issues to make this work correctly.
>>>
>>> Perhaps the RPC client needs to distinguish between connection refu=
sal
>>> (where a lengthening exponential backoff between connection attempt=
s
>>> makes sense) and no server response (where we want the client's net=
work
>>> layer to keep sending SYN requests so that it can reconnect as soon=
as
>>> possible).
>>
>> When reading the kernel's code and testing, I find there are thre=
e
>> case:
>>
>> A. network partition:
>> Becasue the client can't communicate with server's rpcbind,
>> so there is no influence.
>>
>> B. server's nfs service stop:
>> The client call xprt_connect to conncet, but get err(111:
>> Connection refused).
>>
>> C. server's nfs service sotp, and ifdown the NIC after about 60s:
>> At first, when the NIC is up, xprt_connect get err(111:
>> Connection refused) as 2.
>>
>> After NIC is down, xprt_connect get err(113: No route to host)=
=2E
>>
>> When connecting fail, the sunrpc level only get a ETIMEDOUT or
>> EAGAIN err, it will also
>> call xprt_connect to reconnect.
>> If we make the network layer to keep sending SYN requests, but the=
re
>> will be more request
>> be delayed at the request queue, and the reestablish_timeout also =
be
>> increased.
>>
>> Can we distinguish those refusal at sunrpc level, but not at xprt
>> level ?
What do you think that I show yesterday?
>> If we can do that, the problem will solved easily.
>>
>> [NOTE]
>> the testing process:
>> client server
>> 1. mount nfs (OK)
>> 2. df (OK)
>> 3. nfs stop
>> 4. df (hang)
>>
>> I get message through rpcdebug.
>=20
> We have a matrix of cases. "soft" v. "hard" RPCs, ECONNREFUSED v. no
> response, connection previously closed by server disconnect v. client
> idle timeout.
connection previously closed by server disconnect v. client idle time=
out?
Can you explain to me in some sort? Maybe it's useful for me. Thanks.
>=20
> I've found at least one major bug in this logic, and that is that the=
60
> second transport connect timer is clobbered in the ECONNREFUSED case,=
so
> soft RPCs never time out if the server refuses a connection, for
> example. I handed all of this off to Trond.
Really?=20
I mount the nfs file through soft(-o soft), and then I using "df" com=
mand
to see the mount information after server's nfs stop.
The "df" will return with error -5(Input/output error), maybe it's RP=
Cs=20
timeout cause the df return?
thanks,
Mi Jinlong
next prev parent reply other threads:[~2010-04-14 10:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-18 10:11 [PATCH] NFS: add a sysctl for disable the reconnect delay Mi Jinlong
2010-03-18 15:41 ` Chuck Lever
2010-04-13 10:25 ` Mi Jinlong
2010-04-13 14:36 ` Chuck Lever
2010-04-14 10:30 ` Mi Jinlong [this message]
2010-04-14 20:43 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BC5993F.2040401@cn.fujitsu.com \
--to=mijinlong@cn.fujitsu.com \
--cc=Alexandros.Batsakis@netapp.com \
--cc=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.