public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] NFS: add a sysctl for disable the reconnect delay
@ 2010-03-18 10:11 Mi Jinlong
  2010-03-18 15:41 ` Chuck Lever
  0 siblings, 1 reply; 6+ messages in thread
From: Mi Jinlong @ 2010-03-18 10:11 UTC (permalink / raw)
  To: NFSv3 list, J. Bruce Fields, Chuck Lever, Trond.Myklebust

If network partition or some other reason cause a reconnect, it cannot 
succeed immediately when environment recover, but client want to connect
timely sometimes. 

This patch can provide a proc file(/proc/sys/fs/nfs/nfs_disable_reconnect_delay)
to allow client disable the reconnect delay(reestablish_timeout) when using NFS.

It's only useful for NFS.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
---
 fs/nfs/client.c             |    3 +++
 fs/nfs/sysctl.c             |    8 ++++++++
 include/linux/nfs_fs.h      |    6 ++++++
 include/linux/sunrpc/clnt.h |    1 +
 include/linux/sunrpc/xprt.h |    3 ++-
 net/sunrpc/clnt.c           |    2 ++
 net/sunrpc/xprtsock.c       |    2 +-
 7 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 8d25ccb..e878724 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -55,6 +55,8 @@ static LIST_HEAD(nfs_client_list);
 static LIST_HEAD(nfs_volume_list);
 static DECLARE_WAIT_QUEUE_HEAD(nfs_client_active_wq);
 
+int nfs_disable_reconnect_delay = 0;
+
 /*
  * RPC cruft for NFS
  */
@@ -607,6 +609,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp,
 		.program	= &nfs_program,
 		.version	= clp->rpc_ops->version,
 		.authflavor	= flavor,
+		.no_recon_delay	= nfs_disable_reconnect_delay,
 	};
 
 	if (discrtry)
diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
index b62481d..6c04479 100644
--- a/fs/nfs/sysctl.c
+++ b/fs/nfs/sysctl.c
@@ -58,6 +58,14 @@ static ctl_table nfs_cb_sysctls[] = {
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
 	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "nfs_disable_reconnect_delay",
+		.data		= &nfs_disable_reconnect_delay,
+		.maxlen		= sizeof(nfs_disable_reconnect_delay),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
 	{ .ctl_name = 0 }
 };
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index f6b9024..e031496 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -390,6 +390,12 @@ static inline struct rpc_cred *nfs_file_cred(struct file *file)
 }
 
 /*
+ * linux/fs/nfs/client.c
+ */
+
+extern int nfs_disable_reconnect_delay;
+
+/*
  * linux/fs/nfs/xattr.c
  */
 #ifdef CONFIG_NFS_V3_ACL
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 5bd17f6..f73eae1 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -115,6 +115,7 @@ struct rpc_create_args {
 	rpc_authflavor_t	authflavor;
 	unsigned long		flags;
 	char			*client_name;
+	int			no_recon_delay;  /* no delay when reconnect */
 };
 
 /* Values for "flags" field */
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 1175d58..a177348 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -153,7 +153,8 @@ struct rpc_xprt {
 	unsigned int		max_reqs;	/* total slots */
 	unsigned long		state;		/* transport state */
 	unsigned char		shutdown   : 1,	/* being shut down */
-				resvport   : 1; /* use a reserved port */
+				resvport   : 1, /* use a reserved port */
+				no_recon_delay: 1; /* no delay when reconnect */
 	unsigned int		bind_index;	/* bind function index */
 
 	/*
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index df1039f..7a90d1a 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -316,6 +316,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
 	if (args->flags & RPC_CLNT_CREATE_NONPRIVPORT)
 		xprt->resvport = 0;
 
+	xprt->no_recon_delay = !!args->no_recon_delay;
+
 	clnt = rpc_new_client(args, xprt);
 	if (IS_ERR(clnt))
 		return clnt;
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 24c9605..52f2367 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2089,7 +2089,7 @@ static void xs_connect(struct rpc_task *task)
 	if (xprt_test_and_set_connecting(xprt))
 		return;
 
-	if (transport->sock != NULL) {
+	if (!xprt->no_recon_delay && transport->sock != NULL) {
 		dprintk("RPC:       xs_connect delayed xprt %p for %lu "
 				"seconds\n",
 				xprt, xprt->reestablish_timeout / HZ);
-- 
1.6.4



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] NFS: add a sysctl for disable the reconnect delay
  2010-03-18 10:11 [PATCH] NFS: add a sysctl for disable the reconnect delay Mi Jinlong
@ 2010-03-18 15:41 ` Chuck Lever
  2010-04-13 10:25   ` Mi Jinlong
  0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2010-03-18 15:41 UTC (permalink / raw)
  To: Mi Jinlong
  Cc: NFSv3 list, J. Bruce Fields, Trond.Myklebust,
	Batsakis, Alexandros

Hi Mi-

On 03/18/2010 06:11 AM, Mi Jinlong wrote:
> If network partition or some other reason cause a reconnect, it cannot
> succeed immediately when environment recover, but client want to connect
> timely sometimes.
>
> This patch can provide a proc file(/proc/sys/fs/nfs/nfs_disable_reconnect_delay)
> to allow client disable the reconnect delay(reestablish_timeout) when using NFS.
>
> It's only useful for NFS.

There's a good reason for the connection re-establishment delay, and 
only very few instances where you'd want to disable it.  A sysctl is the 
wrong place for this, as it would disable the reconnect delay across the 
board, instead of for just those occasions when it is actually necessary 
to connect immediately.

I assume that because the grace period has a time limit, you would want 
the client to reconnect at all costs?  I think that this is actually 
when a client should take care not to spuriously reconnect: during a 
server reboot, a server may be sluggish or not completely ready to 
accept client requests.  It's not a time when a client should be 
showering a server with connection attempts.

The reconnect delay is an exponential backoff that starts at 3 seconds, 
so if the server is really ready to accept connections, the actual 
connection delay ought to be quick.

We're already considering shortening the maximum amount of time the 
client can wait before trying a reconnect.  And, it might possibly be 
that the network layer itself is interfering with the backoff logic that 
is already built into the RPC client.  (If true, that would be the real 
bug in this case).  I'm not interested in a workaround when we really 
should fix any underlying issues to make this work correctly.

Perhaps the RPC client needs to distinguish between connection refusal 
(where a lengthening exponential backoff between connection attempts 
makes sense) and no server response (where we want the client's network 
layer to keep sending SYN requests so that it can reconnect as soon as 
possible).

The second scenario might disable the reconnect timer so that only one 
->connect() call would be outstanding until the network layer tells us 
it's given up on SYN retries.

> Signed-off-by: Mi Jinlong<mijinlong@cn.fujitsu.com>
> ---
>   fs/nfs/client.c             |    3 +++
>   fs/nfs/sysctl.c             |    8 ++++++++
>   include/linux/nfs_fs.h      |    6 ++++++
>   include/linux/sunrpc/clnt.h |    1 +
>   include/linux/sunrpc/xprt.h |    3 ++-
>   net/sunrpc/clnt.c           |    2 ++
>   net/sunrpc/xprtsock.c       |    2 +-
>   7 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 8d25ccb..e878724 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -55,6 +55,8 @@ static LIST_HEAD(nfs_client_list);
>   static LIST_HEAD(nfs_volume_list);
>   static DECLARE_WAIT_QUEUE_HEAD(nfs_client_active_wq);
>
> +int nfs_disable_reconnect_delay = 0;
> +
>   /*
>    * RPC cruft for NFS
>    */
> @@ -607,6 +609,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp,
>   		.program	=&nfs_program,
>   		.version	= clp->rpc_ops->version,
>   		.authflavor	= flavor,
> +		.no_recon_delay	= nfs_disable_reconnect_delay,
>   	};
>
>   	if (discrtry)
> diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
> index b62481d..6c04479 100644
> --- a/fs/nfs/sysctl.c
> +++ b/fs/nfs/sysctl.c
> @@ -58,6 +58,14 @@ static ctl_table nfs_cb_sysctls[] = {
>   		.mode		= 0644,
>   		.proc_handler	=&proc_dointvec,
>   	},
> +	{
> +		.ctl_name	= CTL_UNNUMBERED,
> +		.procname	= "nfs_disable_reconnect_delay",
> +		.data		=&nfs_disable_reconnect_delay,
> +		.maxlen		= sizeof(nfs_disable_reconnect_delay),
> +		.mode		= 0644,
> +		.proc_handler	=&proc_dointvec,
> +	},
>   	{ .ctl_name = 0 }
>   };
>
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index f6b9024..e031496 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -390,6 +390,12 @@ static inline struct rpc_cred *nfs_file_cred(struct file *file)
>   }
>
>   /*
> + * linux/fs/nfs/client.c
> + */
> +
> +extern int nfs_disable_reconnect_delay;
> +
> +/*
>    * linux/fs/nfs/xattr.c
>    */
>   #ifdef CONFIG_NFS_V3_ACL
> diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
> index 5bd17f6..f73eae1 100644
> --- a/include/linux/sunrpc/clnt.h
> +++ b/include/linux/sunrpc/clnt.h
> @@ -115,6 +115,7 @@ struct rpc_create_args {
>   	rpc_authflavor_t	authflavor;
>   	unsigned long		flags;
>   	char			*client_name;
> +	int			no_recon_delay;  /* no delay when reconnect */
>   };
>
>   /* Values for "flags" field */
> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> index 1175d58..a177348 100644
> --- a/include/linux/sunrpc/xprt.h
> +++ b/include/linux/sunrpc/xprt.h
> @@ -153,7 +153,8 @@ struct rpc_xprt {
>   	unsigned int		max_reqs;	/* total slots */
>   	unsigned long		state;		/* transport state */
>   	unsigned char		shutdown   : 1,	/* being shut down */
> -				resvport   : 1; /* use a reserved port */
> +				resvport   : 1, /* use a reserved port */
> +				no_recon_delay: 1; /* no delay when reconnect */
>   	unsigned int		bind_index;	/* bind function index */
>
>   	/*
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index df1039f..7a90d1a 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -316,6 +316,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
>   	if (args->flags&  RPC_CLNT_CREATE_NONPRIVPORT)
>   		xprt->resvport = 0;
>
> +	xprt->no_recon_delay = !!args->no_recon_delay;
> +
>   	clnt = rpc_new_client(args, xprt);
>   	if (IS_ERR(clnt))
>   		return clnt;
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 24c9605..52f2367 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2089,7 +2089,7 @@ static void xs_connect(struct rpc_task *task)
>   	if (xprt_test_and_set_connecting(xprt))
>   		return;
>
> -	if (transport->sock != NULL) {
> +	if (!xprt->no_recon_delay&&  transport->sock != NULL) {
>   		dprintk("RPC:       xs_connect delayed xprt %p for %lu "
>   				"seconds\n",
>   				xprt, xprt->reestablish_timeout / HZ);


-- 
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] NFS: add a sysctl for disable the reconnect delay
  2010-03-18 15:41 ` Chuck Lever
@ 2010-04-13 10:25   ` Mi Jinlong
  2010-04-13 14:36     ` Chuck Lever
  0 siblings, 1 reply; 6+ messages in thread
From: Mi Jinlong @ 2010-04-13 10:25 UTC (permalink / raw)
  To: Chuck Lever
  Cc: NFSv3 list, J. Bruce Fields, Trond.Myklebust,
	Batsakis, Alexandros

Hi Chuck,

  Sorry for replying your message so later.

Chuck Lever =E5=86=99=E9=81=93:
> Hi Mi-
>=20
> On 03/18/2010 06:11 AM, Mi Jinlong wrote:
>> If network partition or some other reason cause a reconnect, it cann=
ot
>> succeed immediately when environment recover, but client want to con=
nect
>> timely sometimes.
>>
>> This patch can provide a proc
>> file(/proc/sys/fs/nfs/nfs_disable_reconnect_delay)
>> to allow client disable the reconnect delay(reestablish_timeout) whe=
n
>> using NFS.
>>
>> It's only useful for NFS.
>=20
> There's a good reason for the connection re-establishment delay, and
> only very few instances where you'd want to disable it.  A sysctl is =
the
> wrong place for this, as it would disable the reconnect delay across =
the
> board, instead of for just those occasions when it is actually necess=
ary
> to connect immediately.

  Yes, I agree with you.

>=20
> I assume that because the grace period has a time limit, you would wa=
nt
> the client to reconnect at all costs?  I think that this is actually
> when a client should take care not to spuriously reconnect: during a
> server reboot, a server may be sluggish or not completely ready to
> accept client requests.  It's not a time when a client should be
> showering a server with connection attempts.
>=20
> The reconnect delay is an exponential backoff that starts at 3 second=
s,
> so if the server is really ready to accept connections, the actual
> connection delay ought to be quick.
>=20
> We're already considering shortening the maximum amount of time the
> client can wait before trying a reconnect.  And, it might possibly be
> that the network layer itself is interfering with the backoff logic t=
hat
> is already built into the RPC client.  (If true, that would be the re=
al
> bug in this case).  I'm not interested in a workaround when we really
> should fix any underlying issues to make this work correctly.
>=20
> Perhaps the RPC client needs to distinguish between connection refusa=
l
> (where a lengthening exponential backoff between connection attempts
> makes sense) and no server response (where we want the client's netwo=
rk
> layer to keep sending SYN requests so that it can reconnect as soon a=
s
> possible).

  When reading the kernel's code and testing, I find there are three ca=
se:

  A. network partition:
     Becasue the client can't communicate with server's rpcbind,=20
     so there is no influence.

  B. server's nfs service stop:
     The client call xprt_connect to conncet, but get err(111: Connecti=
on refused).

  C. server's nfs service sotp, and ifdown the NIC after about 60s:
     At first, when the NIC is up, xprt_connect get err(111: Connection=
 refused) as 2.

     After NIC is down, xprt_connect get err(113: No route to host).

 When connecting fail, the sunrpc level only get a ETIMEDOUT or EAGAIN =
err, it will also
 call xprt_connect to reconnect.
 If we make the network layer to keep sending SYN requests, but there w=
ill be more request=20
 be delayed at the request queue, and the reestablish_timeout also be i=
ncreased.

 Can we distinguish those refusal at sunrpc level, but not at xprt leve=
l ?
 If we can do that, the problem will solved easily.
 =20
 [NOTE]
   the testing process:
         client                    server
   1.   mount nfs (OK)=20
   2.     df (OK)
   3.                             nfs stop
   4.     df (hang)

  I get message through rpcdebug.

>=20
> The second scenario might disable the reconnect timer so that only on=
e
> ->connect() call would be outstanding until the network layer tells u=
s
> it's given up on SYN retries.
 =20
  I think that's a good idea, but implementation may be a great work.
 =20
thanks,
Mi Jinlong


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] NFS: add a sysctl for disable the reconnect delay
  2010-04-13 10:25   ` Mi Jinlong
@ 2010-04-13 14:36     ` Chuck Lever
  2010-04-14 10:30       ` Mi Jinlong
  0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2010-04-13 14:36 UTC (permalink / raw)
  To: Mi Jinlong
  Cc: NFSv3 list, J. Bruce Fields, Trond.Myklebust,
	Batsakis, Alexandros

On 04/13/2010 06:25 AM, Mi Jinlong wrote:
> Hi Chuck,
>
>    Sorry for replying your message so later.
>
> Chuck Lever =E5=86=99=E9=81=93:
>> Hi Mi-
>>
>> On 03/18/2010 06:11 AM, Mi Jinlong wrote:
>>> If network partition or some other reason cause a reconnect, it can=
not
>>> succeed immediately when environment recover, but client want to co=
nnect
>>> timely sometimes.
>>>
>>> This patch can provide a proc
>>> file(/proc/sys/fs/nfs/nfs_disable_reconnect_delay)
>>> to allow client disable the reconnect delay(reestablish_timeout) wh=
en
>>> using NFS.
>>>
>>> It's only useful for NFS.
>>
>> There's a good reason for the connection re-establishment delay, and
>> only very few instances where you'd want to disable it.  A sysctl is=
 the
>> wrong place for this, as it would disable the reconnect delay across=
 the
>> board, instead of for just those occasions when it is actually neces=
sary
>> to connect immediately.
>
>    Yes, I agree with you.
>
>>
>> I assume that because the grace period has a time limit, you would w=
ant
>> the client to reconnect at all costs?  I think that this is actually
>> when a client should take care not to spuriously reconnect: during a
>> server reboot, a server may be sluggish or not completely ready to
>> accept client requests.  It's not a time when a client should be
>> showering a server with connection attempts.
>>
>> The reconnect delay is an exponential backoff that starts at 3 secon=
ds,
>> so if the server is really ready to accept connections, the actual
>> connection delay ought to be quick.
>>
>> We're already considering shortening the maximum amount of time the
>> client can wait before trying a reconnect.  And, it might possibly b=
e
>> that the network layer itself is interfering with the backoff logic =
that
>> is already built into the RPC client.  (If true, that would be the r=
eal
>> bug in this case).  I'm not interested in a workaround when we reall=
y
>> should fix any underlying issues to make this work correctly.
>>
>> Perhaps the RPC client needs to distinguish between connection refus=
al
>> (where a lengthening exponential backoff between connection attempts
>> makes sense) and no server response (where we want the client's netw=
ork
>> layer to keep sending SYN requests so that it can reconnect as soon =
as
>> possible).
>
>    When reading the kernel's code and testing, I find there are three=
 case:
>
>    A. network partition:
>       Becasue the client can't communicate with server's rpcbind,
>       so there is no influence.
>
>    B. server's nfs service stop:
>       The client call xprt_connect to conncet, but get err(111: Conne=
ction refused).
>
>    C. server's nfs service sotp, and ifdown the NIC after about 60s:
>       At first, when the NIC is up, xprt_connect get err(111: Connect=
ion refused) as 2.
>
>       After NIC is down, xprt_connect get err(113: No route to host).
>
>   When connecting fail, the sunrpc level only get a ETIMEDOUT or EAGA=
IN err, it will also
>   call xprt_connect to reconnect.
>   If we make the network layer to keep sending SYN requests, but ther=
e will be more request
>   be delayed at the request queue, and the reestablish_timeout also b=
e increased.
>
>   Can we distinguish those refusal at sunrpc level, but not at xprt l=
evel ?
>   If we can do that, the problem will solved easily.
>
>   [NOTE]
>     the testing process:
>           client                    server
>     1.   mount nfs (OK)
>     2.     df (OK)
>     3.                             nfs stop
>     4.     df (hang)
>
>    I get message through rpcdebug.

We have a matrix of cases.  "soft" v. "hard" RPCs, ECONNREFUSED v. no=20
response, connection previously closed by server disconnect v. client=20
idle timeout.

I've found at least one major bug in this logic, and that is that the 6=
0=20
second transport connect timer is clobbered in the ECONNREFUSED case, s=
o=20
soft RPCs never time out if the server refuses a connection, for=20
example.  I handed all of this off to Trond.

>> The second scenario might disable the reconnect timer so that only o=
ne
>> ->connect() call would be outstanding until the network layer tells =
us
>> it's given up on SYN retries.
>
>    I think that's a good idea, but implementation may be a great work=
=2E
>
> thanks,
> Mi Jinlong
>


--=20
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] NFS: add a sysctl for disable the reconnect delay
  2010-04-13 14:36     ` Chuck Lever
@ 2010-04-14 10:30       ` Mi Jinlong
  2010-04-14 20:43         ` Chuck Lever
  0 siblings, 1 reply; 6+ messages in thread
From: Mi Jinlong @ 2010-04-14 10:30 UTC (permalink / raw)
  To: Chuck Lever
  Cc: NFSv3 list, J. Bruce Fields, Trond.Myklebust,
	Batsakis, Alexandros



Chuck Lever =E5=86=99=E9=81=93:
> On 04/13/2010 06:25 AM, Mi Jinlong wrote:
>> Hi Chuck,
>>
>>    Sorry for replying your message so later.
>>
>> Chuck Lever =E5=86=99=E9=81=93:
>>> Hi Mi-
>>>
>>> On 03/18/2010 06:11 AM, Mi Jinlong wrote:
>>>> If network partition or some other reason cause a reconnect, it ca=
nnot
>>>> succeed immediately when environment recover, but client want to
>>>> connect
>>>> timely sometimes.
>>>>
>>>> This patch can provide a proc
>>>> file(/proc/sys/fs/nfs/nfs_disable_reconnect_delay)
>>>> to allow client disable the reconnect delay(reestablish_timeout) w=
hen
>>>> using NFS.
>>>>
>>>> It's only useful for NFS.
>>>
>>> There's a good reason for the connection re-establishment delay, an=
d
>>> only very few instances where you'd want to disable it.  A sysctl i=
s the
>>> wrong place for this, as it would disable the reconnect delay acros=
s the
>>> board, instead of for just those occasions when it is actually nece=
ssary
>>> to connect immediately.
>>
>>    Yes, I agree with you.
>>
>>>
>>> I assume that because the grace period has a time limit, you would =
want
>>> the client to reconnect at all costs?  I think that this is actuall=
y
>>> when a client should take care not to spuriously reconnect: during =
a
>>> server reboot, a server may be sluggish or not completely ready to
>>> accept client requests.  It's not a time when a client should be
>>> showering a server with connection attempts.
>>>
>>> The reconnect delay is an exponential backoff that starts at 3 seco=
nds,
>>> so if the server is really ready to accept connections, the actual
>>> connection delay ought to be quick.
>>>
>>> We're already considering shortening the maximum amount of time the
>>> client can wait before trying a reconnect.  And, it might possibly =
be
>>> that the network layer itself is interfering with the backoff logic=
 that
>>> is already built into the RPC client.  (If true, that would be the =
real
>>> bug in this case).  I'm not interested in a workaround when we real=
ly
>>> should fix any underlying issues to make this work correctly.
>>>
>>> Perhaps the RPC client needs to distinguish between connection refu=
sal
>>> (where a lengthening exponential backoff between connection attempt=
s
>>> makes sense) and no server response (where we want the client's net=
work
>>> layer to keep sending SYN requests so that it can reconnect as soon=
 as
>>> possible).
>>
>>    When reading the kernel's code and testing, I find there are thre=
e
>> case:
>>
>>    A. network partition:
>>       Becasue the client can't communicate with server's rpcbind,
>>       so there is no influence.
>>
>>    B. server's nfs service stop:
>>       The client call xprt_connect to conncet, but get err(111:
>> Connection refused).
>>
>>    C. server's nfs service sotp, and ifdown the NIC after about 60s:
>>       At first, when the NIC is up, xprt_connect get err(111:
>> Connection refused) as 2.
>>
>>       After NIC is down, xprt_connect get err(113: No route to host)=
=2E
>>
>>   When connecting fail, the sunrpc level only get a ETIMEDOUT or
>> EAGAIN err, it will also
>>   call xprt_connect to reconnect.
>>   If we make the network layer to keep sending SYN requests, but the=
re
>> will be more request
>>   be delayed at the request queue, and the reestablish_timeout also =
be
>> increased.
>>
>>   Can we distinguish those refusal at sunrpc level, but not at xprt
>> level ?

   What do you think that I show yesterday?

>>   If we can do that, the problem will solved easily.
>>
>>   [NOTE]
>>     the testing process:
>>           client                    server
>>     1.   mount nfs (OK)
>>     2.     df (OK)
>>     3.                             nfs stop
>>     4.     df (hang)
>>
>>    I get message through rpcdebug.
>=20
> We have a matrix of cases.  "soft" v. "hard" RPCs, ECONNREFUSED v. no
> response, connection previously closed by server disconnect v. client
> idle timeout.

  connection previously closed by server disconnect v. client idle time=
out?
  Can you explain to me in some sort? Maybe it's useful for me. Thanks.

>=20
> I've found at least one major bug in this logic, and that is that the=
 60
> second transport connect timer is clobbered in the ECONNREFUSED case,=
 so
> soft RPCs never time out if the server refuses a connection, for
> example.  I handed all of this off to Trond.

  Really?=20
  I mount the nfs file through soft(-o soft), and then I using "df" com=
mand
  to see the mount information after server's nfs stop.
  The "df" will return with error -5(Input/output error), maybe it's RP=
Cs=20
  timeout cause the df return?

thanks,
Mi Jinlong


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] NFS: add a sysctl for disable the reconnect delay
  2010-04-14 10:30       ` Mi Jinlong
@ 2010-04-14 20:43         ` Chuck Lever
  0 siblings, 0 replies; 6+ messages in thread
From: Chuck Lever @ 2010-04-14 20:43 UTC (permalink / raw)
  To: Mi Jinlong
  Cc: NFSv3 list, J. Bruce Fields, Trond.Myklebust,
	Batsakis, Alexandros

On Apr 14, 2010, at 6:30 AM, Mi Jinlong wrote:
> Chuck Lever =E5=86=99=E9=81=93:
>> On 04/13/2010 06:25 AM, Mi Jinlong wrote:
>>> Hi Chuck,
>>>=20
>>>   Sorry for replying your message so later.
>>>=20
>>> Chuck Lever =E5=86=99=E9=81=93:
>>>> Hi Mi-
>>>>=20
>>>> On 03/18/2010 06:11 AM, Mi Jinlong wrote:
>>>>> If network partition or some other reason cause a reconnect, it c=
annot
>>>>> succeed immediately when environment recover, but client want to
>>>>> connect
>>>>> timely sometimes.
>>>>>=20
>>>>> This patch can provide a proc
>>>>> file(/proc/sys/fs/nfs/nfs_disable_reconnect_delay)
>>>>> to allow client disable the reconnect delay(reestablish_timeout) =
when
>>>>> using NFS.
>>>>>=20
>>>>> It's only useful for NFS.
>>>>=20
>>>> There's a good reason for the connection re-establishment delay, a=
nd
>>>> only very few instances where you'd want to disable it.  A sysctl =
is the
>>>> wrong place for this, as it would disable the reconnect delay acro=
ss the
>>>> board, instead of for just those occasions when it is actually nec=
essary
>>>> to connect immediately.
>>>=20
>>>   Yes, I agree with you.
>>>=20
>>>>=20
>>>> I assume that because the grace period has a time limit, you would=
 want
>>>> the client to reconnect at all costs?  I think that this is actual=
ly
>>>> when a client should take care not to spuriously reconnect: during=
 a
>>>> server reboot, a server may be sluggish or not completely ready to
>>>> accept client requests.  It's not a time when a client should be
>>>> showering a server with connection attempts.
>>>>=20
>>>> The reconnect delay is an exponential backoff that starts at 3 sec=
onds,
>>>> so if the server is really ready to accept connections, the actual
>>>> connection delay ought to be quick.
>>>>=20
>>>> We're already considering shortening the maximum amount of time th=
e
>>>> client can wait before trying a reconnect.  And, it might possibly=
 be
>>>> that the network layer itself is interfering with the backoff logi=
c that
>>>> is already built into the RPC client.  (If true, that would be the=
 real
>>>> bug in this case).  I'm not interested in a workaround when we rea=
lly
>>>> should fix any underlying issues to make this work correctly.
>>>>=20
>>>> Perhaps the RPC client needs to distinguish between connection ref=
usal
>>>> (where a lengthening exponential backoff between connection attemp=
ts
>>>> makes sense) and no server response (where we want the client's ne=
twork
>>>> layer to keep sending SYN requests so that it can reconnect as soo=
n as
>>>> possible).
>>>=20
>>>   When reading the kernel's code and testing, I find there are thre=
e
>>> case:
>>>=20
>>>   A. network partition:
>>>      Becasue the client can't communicate with server's rpcbind,
>>>      so there is no influence.
>>>=20
>>>   B. server's nfs service stop:
>>>      The client call xprt_connect to conncet, but get err(111:
>>> Connection refused).
>>>=20
>>>   C. server's nfs service sotp, and ifdown the NIC after about 60s:
>>>      At first, when the NIC is up, xprt_connect get err(111:
>>> Connection refused) as 2.
>>>=20
>>>      After NIC is down, xprt_connect get err(113: No route to host)=
=2E
>>>=20
>>>  When connecting fail, the sunrpc level only get a ETIMEDOUT or
>>> EAGAIN err, it will also
>>>  call xprt_connect to reconnect.
>>>  If we make the network layer to keep sending SYN requests, but the=
re
>>> will be more request
>>>  be delayed at the request queue, and the reestablish_timeout also =
be
>>> increased.
>>>=20
>>>  Can we distinguish those refusal at sunrpc level, but not at xprt
>>> level ?
>=20
>   What do you think that I show yesterday?

In xprtsock.c, these reconnection errors are distinguished.  In the gen=
eric sunrpc client (xprt.c) they are not -- when an RPC transmission is=
 sent, xprtsock.c returns ENOTCONN for any connection error.  Trond mad=
e this change after 2.6.18.  The differences matter in how the client r=
e-establishes the connection, and that logic is all in xprtsock.c.  So,=
 the RPC client already makes this distinction, but the logic may have =
bugs.

>>>  If we can do that, the problem will solved easily.
>>>=20
>>>  [NOTE]
>>>    the testing process:
>>>          client                    server
>>>    1.   mount nfs (OK)
>>>    2.     df (OK)
>>>    3.                             nfs stop
>>>    4.     df (hang)
>>>=20
>>>   I get message through rpcdebug.
>>=20
>> We have a matrix of cases.  "soft" v. "hard" RPCs, ECONNREFUSED v. n=
o
>> response, connection previously closed by server disconnect v. clien=
t
>> idle timeout.
>=20
>  connection previously closed by server disconnect v. client idle tim=
eout?
>  Can you explain to me in some sort? Maybe it's useful for me. Thanks=
=2E

If the server closed the connection, the client should use the re-estab=
lish timeout to delay the reconnection in order to prevent a hard loop =
of client connection retries.  If the client idled the connection out, =
then the client should reconnect immediately.

>> I've found at least one major bug in this logic, and that is that th=
e 60
>> second transport connect timer is clobbered in the ECONNREFUSED case=
, so
>> soft RPCs never time out if the server refuses a connection, for
>> example.  I handed all of this off to Trond.
>=20
>  Really?=20
>  I mount the nfs file through soft(-o soft), and then I using "df" co=
mmand
>  to see the mount information after server's nfs stop.
>  The "df" will return with error -5(Input/output error), maybe it's R=
PCs=20
>  timeout cause the df return?

RPC timeouts generally cause an EIO.  However, if the server continues =
to refuse a connection, the timeout never occurs.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-04-14 20:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-18 10:11 [PATCH] NFS: add a sysctl for disable the reconnect delay Mi Jinlong
2010-03-18 15:41 ` Chuck Lever
2010-04-13 10:25   ` Mi Jinlong
2010-04-13 14:36     ` Chuck Lever
2010-04-14 10:30       ` Mi Jinlong
2010-04-14 20:43         ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox