[PATCH] NFSv4: Use exponential backoff delay for NFS4

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
@ 2013-04-24 20:55 Dave Chiluk
  2013-04-24 21:11 ` J. Bruce Fields
  2013-04-24 21:28 ` Myklebust, Trond
  0 siblings, 2 replies; 21+ messages in thread
From: Dave Chiluk @ 2013-04-24 20:55 UTC (permalink / raw)
  To: Trond.Myklebust, bfields, linux-nfs, linux-kernel

Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.

Additionally this alleviates an interoperability problem with the AIX NFSv4
Server.  The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
close when it happens in close proximity to a RELEASE_LOCKOWNER.  This would
cause a linux client to hang for 15 seconds.

Signed-off-by: Dave Chiluk <chiluk@canonical.com>
---
 fs/nfs/nfs4proc.c            |   12 ++++++++++++
 include/linux/sunrpc/sched.h |    1 +
 2 files changed, 13 insertions(+)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 0ad025e..37dad27 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4006,6 +4006,18 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
 #endif /* CONFIG_NFS_V4_1 */
 		case -NFS4ERR_DELAY:
 			nfs_inc_server_stats(server, NFSIOS_DELAY);
+			/* Do an exponential backoff of retries from
+			 * NFS4_POLL_RETRY_MIN to NFS4_POLL_RETRY_MAX. */
+			task->tk_timeout = NFS4_POLL_RETRY_MIN <<
+					(task->tk_delays*2);
+			if (task->tk_timeout > NFS4_POLL_RETRY_MAX)
+				rpc_delay(task, NFS4_POLL_RETRY_MAX);
+			else {
+				task->tk_delays++;
+				rpc_delay(task, task->tk_timeout);
+			}
+			task->tk_status = 0;
+			return -EAGAIN;
 		case -NFS4ERR_GRACE:
 			rpc_delay(task, NFS4_POLL_RETRY_MAX);
 			task->tk_status = 0;
diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 84ca436..60f82bf 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -62,6 +62,7 @@ struct rpc_task {
 	void *			tk_calldata;
 
 	unsigned long		tk_timeout;	/* timeout for rpc_sleep() */
+	unsigned short		tk_delays;	/* number of times task delayed */
 	unsigned long		tk_runstate;	/* Task run status */
 	struct workqueue_struct	*tk_workqueue;	/* Normally rpciod, but could
 						 * be any workqueue
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-24 20:55 [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY Dave Chiluk
@ 2013-04-24 21:11 ` J. Bruce Fields
  2013-04-24 21:28 ` Myklebust, Trond
  1 sibling, 0 replies; 21+ messages in thread
From: J. Bruce Fields @ 2013-04-24 21:11 UTC (permalink / raw)
  To: Dave Chiluk; +Cc: Trond.Myklebust, linux-nfs, linux-kernel

On Wed, Apr 24, 2013 at 03:55:49PM -0500, Dave Chiluk wrote:
> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
> 
> Additionally this alleviates an interoperability problem with the AIX NFSv4
> Server.  The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> close when it happens in close proximity to a RELEASE_LOCKOWNER.  This would
> cause a linux client to hang for 15 seconds.
> 
> Signed-off-by: Dave Chiluk <chiluk@canonical.com>
> ---
>  fs/nfs/nfs4proc.c            |   12 ++++++++++++
>  include/linux/sunrpc/sched.h |    1 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 0ad025e..37dad27 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -4006,6 +4006,18 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
>  #endif /* CONFIG_NFS_V4_1 */
>  		case -NFS4ERR_DELAY:
>  			nfs_inc_server_stats(server, NFSIOS_DELAY);
> +			/* Do an exponential backoff of retries from
> +			 * NFS4_POLL_RETRY_MIN to NFS4_POLL_RETRY_MAX. */
> +			task->tk_timeout = NFS4_POLL_RETRY_MIN <<
> +					(task->tk_delays*2);
> +			if (task->tk_timeout > NFS4_POLL_RETRY_MAX)
> +				rpc_delay(task, NFS4_POLL_RETRY_MAX);
> +			else {
> +				task->tk_delays++;
> +				rpc_delay(task, task->tk_timeout);
> +			}
> +			task->tk_status = 0;
> +			return -EAGAIN;

Just as a matter of style, could you stick this in a helper something
like the existing nfs4_delay?:

		case -NFS4ERR_DELAY:
			nfs_inc_server_stats(server, NFSIOS_DELAY);
			nfs4_async_delay(task);
			task->tk_status = 0;
			return -EAGAIN;
		...

--b.

>  		case -NFS4ERR_GRACE:
>  			rpc_delay(task, NFS4_POLL_RETRY_MAX);
>  			task->tk_status = 0;
> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
> index 84ca436..60f82bf 100644
> --- a/include/linux/sunrpc/sched.h
> +++ b/include/linux/sunrpc/sched.h
> @@ -62,6 +62,7 @@ struct rpc_task {
>  	void *			tk_calldata;
>  
>  	unsigned long		tk_timeout;	/* timeout for rpc_sleep() */
> +	unsigned short		tk_delays;	/* number of times task delayed */
>  	unsigned long		tk_runstate;	/* Task run status */
>  	struct workqueue_struct	*tk_workqueue;	/* Normally rpciod, but could
>  						 * be any workqueue
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-24 20:55 [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY Dave Chiluk
  2013-04-24 21:11 ` J. Bruce Fields
@ 2013-04-24 21:28 ` Myklebust, Trond
  2013-04-24 21:54   ` Dave Chiluk
  1 sibling, 1 reply; 21+ messages in thread
From: Myklebust, Trond @ 2013-04-24 21:28 UTC (permalink / raw)
  To: Dave Chiluk
  Cc: bfields@fieldses.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
> 
> Additionally this alleviates an interoperability problem with the AIX NFSv4
> Server.  The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> close when it happens in close proximity to a RELEASE_LOCKOWNER.  This would
> cause a linux client to hang for 15 seconds.

Hi Dave,

The AIX server is not being motivated by any requirements in the NFSv4
spec here, so I fail to see the reason why the behaviour that you
describe can justify changing the client. It is not at all obvious to me
that we should be retrying aggressively when NFSv4 servers return
NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
the exising 15 seconds?

The motivation for doing it in the case of OPEN, SETATTR, etc is
clearer: those operations may require the server to recall a delegation,
in which case aggressive retries are in order since delegation recalls
are usually fast.
The motivation in the case of LOCK is less clear, but it is basically
down to the fact that NFSv4 has a polling model for doing blocking
locks.
In all other cases, why should we be treating NFS4ERR_DELAY any
differently from how we treat NFS3ERR_JUKEBOX in NFSv3?

Note that if we do decide that changing the client is the right thing,
then I don't want the patch to add new fields to struct rpc_task. That's
the wrong layer for storing NFSv4 client specific data.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-24 21:28 ` Myklebust, Trond
@ 2013-04-24 21:54   ` Dave Chiluk
  2013-04-24 22:35     ` Myklebust, Trond
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Chiluk @ 2013-04-24 21:54 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: bfields@fieldses.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2947 bytes --]

On 04/24/2013 04:28 PM, Myklebust, Trond wrote:
> On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
>> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
>> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
>>
>> Additionally this alleviates an interoperability problem with the AIX NFSv4
>> Server.  The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
>> close when it happens in close proximity to a RELEASE_LOCKOWNER.  This would
>> cause a linux client to hang for 15 seconds.
> 
> Hi Dave,
> 
> The AIX server is not being motivated by any requirements in the NFSv4
> spec here, so I fail to see the reason why the behaviour that you
> describe can justify changing the client. It is not at all obvious to me
> that we should be retrying aggressively when NFSv4 servers return
> NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
> the exising 15 seconds?

I agree with you that AIX is at fault, and that the preferable situation
for the linux client would be for AIX to not return NFS4ERR_DELAY in
this use case.  I have attached a simple program that causes exacerbates
the problem on the AIX server.  I have already had a conference call
with AIX NFS development about this issue, where I vehemently tried to
convince them to fix their server.  Unfortunately as I don't have much
reputation in the NFS community, I was unable to convince them to do the
right thing.  I would be more than happy to set up another call, if
someone higher up in the linux NFS hierarchy would be willing to
participate.

That being said, I think implementing an exponential backoff is an
improvement in the client regardless of what AIX is doing.  If a server
needs only 2 seconds to process a request for which NFS4ERR_DELAY was
returned, this algorithm would get the client back and running after
only 2.1 seconds of elapsed time.  Whereas the current dumb algorithm
would simply wait 15 seconds.  This is the reason that I implemented
this change.

> The motivation for doing it in the case of OPEN, SETATTR, etc is
> clearer: those operations may require the server to recall a delegation,
> in which case aggressive retries are in order since delegation recalls
> are usually fast.
> The motivation in the case of LOCK is less clear, but it is basically
> down to the fact that NFSv4 has a polling model for doing blocking
> locks.

> In all other cases, why should we be treating NFS4ERR_DELAY any
> differently from how we treat NFS3ERR_JUKEBOX in NFSv3?
> 
> Note that if we do decide that changing the client is the right thing,
> then I don't want the patch to add new fields to struct rpc_task. That's
> the wrong layer for storing NFSv4 client specific data.

This is something that I was concerned about as well, but I could not
find another persistent way to do this.  I am open to suggestions of
which structures would be more acceptable.

Thanks,
Dave.

[-- Attachment #2: open-close.c --]
[-- Type: text/x-csrc, Size: 410 bytes --]

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/file.h>
#include <unistd.h>

#define FILENAME "testfile"

int main()
{
	int	fd = open( FILENAME, O_RDWR );

	if( fd==0 )
	{
		fputs( "Failed to open `" FILENAME "'.", stderr );
		return 1;
	}

	printf( "flock() returned %d. Now calling close() ...\n",
		flock( fd, LOCK_EX|LOCK_NB ) );
	close( fd );
	return 0;
}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-24 21:54   ` Dave Chiluk
@ 2013-04-24 22:35     ` Myklebust, Trond
  2013-04-25 12:19       ` David Wysochanski
  0 siblings, 1 reply; 21+ messages in thread
From: Myklebust, Trond @ 2013-04-24 22:35 UTC (permalink / raw)
  To: Dave Chiluk
  Cc: bfields@fieldses.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote:
> On 04/24/2013 04:28 PM, Myklebust, Trond wrote:
> > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
> >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
> >>
> >> Additionally this alleviates an interoperability problem with the AIX NFSv4
> >> Server.  The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> >> close when it happens in close proximity to a RELEASE_LOCKOWNER.  This would
> >> cause a linux client to hang for 15 seconds.
> > 
> > Hi Dave,
> > 
> > The AIX server is not being motivated by any requirements in the NFSv4
> > spec here, so I fail to see the reason why the behaviour that you
> > describe can justify changing the client. It is not at all obvious to me
> > that we should be retrying aggressively when NFSv4 servers return
> > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
> > the exising 15 seconds?
> 
> I agree with you that AIX is at fault, and that the preferable situation
> for the linux client would be for AIX to not return NFS4ERR_DELAY in
> this use case.  I have attached a simple program that causes exacerbates
> the problem on the AIX server.  I have already had a conference call
> with AIX NFS development about this issue, where I vehemently tried to
> convince them to fix their server.  Unfortunately as I don't have much
> reputation in the NFS community, I was unable to convince them to do the
> right thing.  I would be more than happy to set up another call, if
> someone higher up in the linux NFS hierarchy would be willing to
> participate.

I'd think that if they have customers that want to use Linux clients,
then those customers are likely to have more influence. This is entirely
a consequence of _their_ design decisions, quite frankly, since
returning NFS4ERR_DELAY in the above situation is downright silly. The
server designers _know_ that the RELEASE_LOCKOWNER will finish whatever
it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do
the exact same state manipulations anyway...

> That being said, I think implementing an exponential backoff is an
> improvement in the client regardless of what AIX is doing.  If a server
> needs only 2 seconds to process a request for which NFS4ERR_DELAY was
> returned, this algorithm would get the client back and running after
> only 2.1 seconds of elapsed time.  Whereas the current dumb algorithm
> would simply wait 15 seconds.  This is the reason that I implemented
> this change.

Right, but my point above is that _in_general_ if we don't know why the
server is returning NFS4ERR_DELAY, then how can we attach any retry
numbers at all? HSM systems, for instance, have very different latencies
than the above and were the reason for inventing NFS3ERR_JUKEBOX in the
first place.

> > The motivation for doing it in the case of OPEN, SETATTR, etc is
> > clearer: those operations may require the server to recall a delegation,
> > in which case aggressive retries are in order since delegation recalls
> > are usually fast.
> > The motivation in the case of LOCK is less clear, but it is basically
> > down to the fact that NFSv4 has a polling model for doing blocking
> > locks.
> 
> > In all other cases, why should we be treating NFS4ERR_DELAY any
> > differently from how we treat NFS3ERR_JUKEBOX in NFSv3?
> > 
> > Note that if we do decide that changing the client is the right thing,
> > then I don't want the patch to add new fields to struct rpc_task. That's
> > the wrong layer for storing NFSv4 client specific data.
> 
> This is something that I was concerned about as well, but I could not
> find another persistent way to do this.  I am open to suggestions of
> which structures would be more acceptable.

We could change nfs4_async_handle_error() to take a struct
nfs4_exception, just like nfs4_handle_exception() does; at some point we
can use that to unify the two.
Just store the timeout somewhere in the nfs4_closedata.


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-24 22:35     ` Myklebust, Trond
@ 2013-04-25 12:19       ` David Wysochanski
  2013-04-25 13:19         ` Myklebust, Trond
  2013-04-25 13:29         ` bfields
  0 siblings, 2 replies; 21+ messages in thread
From: David Wysochanski @ 2013-04-25 12:19 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Dave Chiluk, bfields@fieldses.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Wed, 2013-04-24 at 22:35 +0000, Myklebust, Trond wrote:
> On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote:
> > On 04/24/2013 04:28 PM, Myklebust, Trond wrote:
> > > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
> > >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> > >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
> > >>
> > >> Additionally this alleviates an interoperability problem with the AIX NFSv4
> > >> Server.  The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> > >> close when it happens in close proximity to a RELEASE_LOCKOWNER.  This would
> > >> cause a linux client to hang for 15 seconds.
> > > 
> > > Hi Dave,
> > > 
> > > The AIX server is not being motivated by any requirements in the NFSv4
> > > spec here, so I fail to see the reason why the behaviour that you
> > > describe can justify changing the client. It is not at all obvious to me
> > > that we should be retrying aggressively when NFSv4 servers return
> > > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
> > > the exising 15 seconds?
> > 
> > I agree with you that AIX is at fault, and that the preferable situation
> > for the linux client would be for AIX to not return NFS4ERR_DELAY in
> > this use case.  I have attached a simple program that causes exacerbates
> > the problem on the AIX server.  I have already had a conference call
> > with AIX NFS development about this issue, where I vehemently tried to
> > convince them to fix their server.  Unfortunately as I don't have much
> > reputation in the NFS community, I was unable to convince them to do the
> > right thing.  I would be more than happy to set up another call, if
> > someone higher up in the linux NFS hierarchy would be willing to
> > participate.
> 
> I'd think that if they have customers that want to use Linux clients,
> then those customers are likely to have more influence. This is entirely
> a consequence of _their_ design decisions, quite frankly, since
> returning NFS4ERR_DELAY in the above situation is downright silly. The
> server designers _know_ that the RELEASE_LOCKOWNER will finish whatever
> it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do
> the exact same state manipulations anyway...
> 
> > That being said, I think implementing an exponential backoff is an
> > improvement in the client regardless of what AIX is doing.  If a server
> > needs only 2 seconds to process a request for which NFS4ERR_DELAY was
> > returned, this algorithm would get the client back and running after
> > only 2.1 seconds of elapsed time.  Whereas the current dumb algorithm
> > would simply wait 15 seconds.  This is the reason that I implemented
> > this change.
> 
> Right, but my point above is that _in_general_ if we don't know why the
> server is returning NFS4ERR_DELAY, then how can we attach any retry
> numbers at all? HSM systems, for instance, have very different latencies
> than the above and were the reason for inventing NFS3ERR_JUKEBOX in the
> first place.
> 

Agreed we can't know why the server is returning NFS4ERR_DELAY so it's
hard to pick a retry number.  Can you explain the rationale for the
current 15 seconds delay?  Was it just for simplicity or something else?




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 12:19       ` David Wysochanski
@ 2013-04-25 13:19         ` Myklebust, Trond
  2013-04-25 13:29         ` bfields
  1 sibling, 0 replies; 21+ messages in thread
From: Myklebust, Trond @ 2013-04-25 13:19 UTC (permalink / raw)
  To: dwysocha@redhat.com
  Cc: Dave Chiluk, bfields@fieldses.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, 2013-04-25 at 08:19 -0400, David Wysochanski wrote:
> On Wed, 2013-04-24 at 22:35 +0000, Myklebust, Trond wrote:
> > On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote:
> > > On 04/24/2013 04:28 PM, Myklebust, Trond wrote:
> > > > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
> > > >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> > > >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
> > > >>
> > > >> Additionally this alleviates an interoperability problem with the AIX NFSv4
> > > >> Server.  The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> > > >> close when it happens in close proximity to a RELEASE_LOCKOWNER.  This would
> > > >> cause a linux client to hang for 15 seconds.
> > > > 
> > > > Hi Dave,
> > > > 
> > > > The AIX server is not being motivated by any requirements in the NFSv4
> > > > spec here, so I fail to see the reason why the behaviour that you
> > > > describe can justify changing the client. It is not at all obvious to me
> > > > that we should be retrying aggressively when NFSv4 servers return
> > > > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
> > > > the exising 15 seconds?
> > > 
> > > I agree with you that AIX is at fault, and that the preferable situation
> > > for the linux client would be for AIX to not return NFS4ERR_DELAY in
> > > this use case.  I have attached a simple program that causes exacerbates
> > > the problem on the AIX server.  I have already had a conference call
> > > with AIX NFS development about this issue, where I vehemently tried to
> > > convince them to fix their server.  Unfortunately as I don't have much
> > > reputation in the NFS community, I was unable to convince them to do the
> > > right thing.  I would be more than happy to set up another call, if
> > > someone higher up in the linux NFS hierarchy would be willing to
> > > participate.
> > 
> > I'd think that if they have customers that want to use Linux clients,
> > then those customers are likely to have more influence. This is entirely
> > a consequence of _their_ design decisions, quite frankly, since
> > returning NFS4ERR_DELAY in the above situation is downright silly. The
> > server designers _know_ that the RELEASE_LOCKOWNER will finish whatever
> > it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do
> > the exact same state manipulations anyway...
> > 
> > > That being said, I think implementing an exponential backoff is an
> > > improvement in the client regardless of what AIX is doing.  If a server
> > > needs only 2 seconds to process a request for which NFS4ERR_DELAY was
> > > returned, this algorithm would get the client back and running after
> > > only 2.1 seconds of elapsed time.  Whereas the current dumb algorithm
> > > would simply wait 15 seconds.  This is the reason that I implemented
> > > this change.
> > 
> > Right, but my point above is that _in_general_ if we don't know why the
> > server is returning NFS4ERR_DELAY, then how can we attach any retry
> > numbers at all? HSM systems, for instance, have very different latencies
> > than the above and were the reason for inventing NFS3ERR_JUKEBOX in the
> > first place.
> > 
> 
> Agreed we can't know why the server is returning NFS4ERR_DELAY so it's
> hard to pick a retry number.  Can you explain the rationale for the
> current 15 seconds delay?  Was it just for simplicity or something else?
> 

Our expectation for NFS4ERR_DELAY event that are not listed in
RFC3530/RFC5661 is that it should be rare, but is expected on average to
last significantly longer than an RPC round-trip between the server and
client.
The other constraint was that we needed a number which is shorter than
the lease period so that we don't have to keep sending RENEWs. 

The 2 main cases we thought we'd have to deal with were:

- HSM systems fetching data from a tape backup or something similar
- Idmappers needing to refill their cache from LDAP/NIS/...

We did not expect servers to be using NFS4ERR_DELAY as a generic tool
for avoiding mutexes. That sounds like great a business opportunity for
the network switch vendors, but a poor one for everyone else...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 12:19       ` David Wysochanski
  2013-04-25 13:19         ` Myklebust, Trond
@ 2013-04-25 13:29         ` bfields
  2013-04-25 13:30           ` Myklebust, Trond
  1 sibling, 1 reply; 21+ messages in thread
From: bfields @ 2013-04-25 13:29 UTC (permalink / raw)
  To: David Wysochanski
  Cc: Myklebust, Trond, Dave Chiluk, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, Apr 25, 2013 at 08:19:34AM -0400, David Wysochanski wrote:
> On Wed, 2013-04-24 at 22:35 +0000, Myklebust, Trond wrote:
> > On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote:
> > > On 04/24/2013 04:28 PM, Myklebust, Trond wrote:
> > > > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
> > > >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> > > >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
> > > >>
> > > >> Additionally this alleviates an interoperability problem with the AIX NFSv4
> > > >> Server.  The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> > > >> close when it happens in close proximity to a RELEASE_LOCKOWNER.  This would
> > > >> cause a linux client to hang for 15 seconds.
> > > > 
> > > > Hi Dave,
> > > > 
> > > > The AIX server is not being motivated by any requirements in the NFSv4
> > > > spec here, so I fail to see the reason why the behaviour that you
> > > > describe can justify changing the client. It is not at all obvious to me
> > > > that we should be retrying aggressively when NFSv4 servers return
> > > > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
> > > > the exising 15 seconds?
> > > 
> > > I agree with you that AIX is at fault, and that the preferable situation
> > > for the linux client would be for AIX to not return NFS4ERR_DELAY in
> > > this use case.  I have attached a simple program that causes exacerbates
> > > the problem on the AIX server.  I have already had a conference call
> > > with AIX NFS development about this issue, where I vehemently tried to
> > > convince them to fix their server.  Unfortunately as I don't have much
> > > reputation in the NFS community, I was unable to convince them to do the
> > > right thing.  I would be more than happy to set up another call, if
> > > someone higher up in the linux NFS hierarchy would be willing to
> > > participate.
> > 
> > I'd think that if they have customers that want to use Linux clients,
> > then those customers are likely to have more influence. This is entirely
> > a consequence of _their_ design decisions, quite frankly, since
> > returning NFS4ERR_DELAY in the above situation is downright silly. The
> > server designers _know_ that the RELEASE_LOCKOWNER will finish whatever
> > it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do
> > the exact same state manipulations anyway...
> > 
> > > That being said, I think implementing an exponential backoff is an
> > > improvement in the client regardless of what AIX is doing.  If a server
> > > needs only 2 seconds to process a request for which NFS4ERR_DELAY was
> > > returned, this algorithm would get the client back and running after
> > > only 2.1 seconds of elapsed time.  Whereas the current dumb algorithm
> > > would simply wait 15 seconds.  This is the reason that I implemented
> > > this change.
> > 
> > Right, but my point above is that _in_general_ if we don't know why the
> > server is returning NFS4ERR_DELAY, then how can we attach any retry
> > numbers at all? HSM systems, for instance, have very different latencies
> > than the above and were the reason for inventing NFS3ERR_JUKEBOX in the
> > first place.
> > 
> 
> Agreed we can't know why the server is returning NFS4ERR_DELAY so it's
> hard to pick a retry number.  Can you explain the rationale for the
> current 15 seconds delay?  Was it just for simplicity or something else?

As I understand it the original idea was that cold data really could
take multiple seconds or minutes to retrieve (because e.g. a tape
library might need to go load the right tape and rewind to the right
spot...).  Is that sort of system really used much these days?

My position is that we simply have no idea what order of magnitude even
delay should be.  And that in such a situation exponential backoff such
as implemented in the synchronous case seems the reasonable default as
it guarantees at worst doubling the delay while still bounding the
long-term average frequency of retries.

--b.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 13:29         ` bfields
@ 2013-04-25 13:30           ` Myklebust, Trond
  2013-04-25 13:49             ` bfields
  0 siblings, 1 reply; 21+ messages in thread
From: Myklebust, Trond @ 2013-04-25 13:30 UTC (permalink / raw)
  To: bfields@fieldses.org
  Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote:

> My position is that we simply have no idea what order of magnitude even
> delay should be.  And that in such a situation exponential backoff such
> as implemented in the synchronous case seems the reasonable default as
> it guarantees at worst doubling the delay while still bounding the
> long-term average frequency of retries.

So we start with a 15 second delay, and then go to 60 seconds?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 13:30           ` Myklebust, Trond
@ 2013-04-25 13:49             ` bfields
  2013-04-25 14:10               ` Myklebust, Trond
  2013-04-25 14:51               ` Chuck Lever
  0 siblings, 2 replies; 21+ messages in thread
From: bfields @ 2013-04-25 13:49 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote:
> 
> > My position is that we simply have no idea what order of magnitude even
> > delay should be.  And that in such a situation exponential backoff such
> > as implemented in the synchronous case seems the reasonable default as
> > it guarantees at worst doubling the delay while still bounding the
> > long-term average frequency of retries.
> 
> So we start with a 15 second delay, and then go to 60 seconds?

I agree that a server should normally be doing the wait on its own if
the wait would be on the order of an rpc round trip.

So I'd be inclined to start with a delay that was an order of magnitude
or two more than a round trip.

And I'd expect NFS isn't common on networks with 1-second latencies.

So the 1/10 second we're using in the synchronous case sounds closer to
the right ballpark to me.

--b.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 13:49             ` bfields
@ 2013-04-25 14:10               ` Myklebust, Trond
  2013-04-25 15:28                 ` [PATCH] NFSv4: Use exponential backoff delay for Ni Matt W. Benjamin
  2013-04-25 18:19                 ` [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY bfields
  2013-04-25 14:51               ` Chuck Lever
  1 sibling, 2 replies; 21+ messages in thread
From: Myklebust, Trond @ 2013-04-25 14:10 UTC (permalink / raw)
  To: bfields@fieldses.org
  Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote:
> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
> > On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote:
> > 
> > > My position is that we simply have no idea what order of magnitude even
> > > delay should be.  And that in such a situation exponential backoff such
> > > as implemented in the synchronous case seems the reasonable default as
> > > it guarantees at worst doubling the delay while still bounding the
> > > long-term average frequency of retries.
> > 
> > So we start with a 15 second delay, and then go to 60 seconds?
> 
> I agree that a server should normally be doing the wait on its own if
> the wait would be on the order of an rpc round trip.
> 
> So I'd be inclined to start with a delay that was an order of magnitude
> or two more than a round trip.
> 
> And I'd expect NFS isn't common on networks with 1-second latencies.
> 
> So the 1/10 second we're using in the synchronous case sounds closer to
> the right ballpark to me.

OK, then. Now all I need is actual motivation for changing the existing
code other than handwaving arguments about "polling is better than flat
waits".
What actual use cases are impacting us now, other than the AIX design
decision to force CLOSE to retry at least once before succeeding?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for Ni
  2013-04-25 14:10               ` Myklebust, Trond
@ 2013-04-25 15:28                 ` Matt W. Benjamin
  2013-04-25 15:42                     ` Myklebust, Trond
  2013-04-25 18:19                 ` [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY bfields
  1 sibling, 1 reply; 21+ messages in thread
From: Matt W. Benjamin @ 2013-04-25 15:28 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Wysochanski, Dave Chiluk, linux-nfs, linux-kernel, bfields

Hi,

Just to clarify, the IBM delay behavior is not legal?

Matt

----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:

> 
> OK, then. Now all I need is actual motivation for changing the
> existing
> code other than handwaving arguments about "polling is better than
> flat
> waits".
> What actual use cases are impacting us now, other than the AIX design
> decision to force CLOSE to retry at least once before succeeding?
> 


-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel.  734-761-4689 
fax.  734-769-8938 
cel.  734-216-5309 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] NFSv4: Use exponential backoff delay for Ni
  2013-04-25 15:28                 ` [PATCH] NFSv4: Use exponential backoff delay for Ni Matt W. Benjamin
@ 2013-04-25 15:42                     ` Myklebust, Trond
  0 siblings, 0 replies; 21+ messages in thread
From: Myklebust, Trond @ 2013-04-25 15:42 UTC (permalink / raw)
  To: Matt W. Benjamin
  Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, bfields@fieldses.org

SXQncyBsZWdhbCwgYnV0IGR1bWIuLi4NCg0KPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0K
PiBGcm9tOiBNYXR0IFcuIEJlbmphbWluIFttYWlsdG86bWF0dEBsaW51eGJveC5jb21dDQo+IFNl
bnQ6IFRodXJzZGF5LCBBcHJpbCAyNSwgMjAxMyAxMToyOCBBTQ0KPiBUbzogTXlrbGVidXN0LCBU
cm9uZA0KPiBDYzogRGF2aWQgV3lzb2NoYW5za2k7IERhdmUgQ2hpbHVrOyBsaW51eC1uZnNAdmdl
ci5rZXJuZWwub3JnOyBsaW51eC0NCj4ga2VybmVsQHZnZXIua2VybmVsLm9yZzsgYmZpZWxkc0Bm
aWVsZHNlcy5vcmcNCj4gU3ViamVjdDogUmU6IFtQQVRDSF0gTkZTdjQ6IFVzZSBleHBvbmVudGlh
bCBiYWNrb2ZmIGRlbGF5IGZvciBOaQ0KPiANCj4gSGksDQo+IA0KPiBKdXN0IHRvIGNsYXJpZnks
IHRoZSBJQk0gZGVsYXkgYmVoYXZpb3IgaXMgbm90IGxlZ2FsPw0KPiANCj4gTWF0dA0KPiANCj4g
LS0tLS0gIlRyb25kIE15a2xlYnVzdCIgPFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90
ZToNCj4gDQo+ID4NCj4gPiBPSywgdGhlbi4gTm93IGFsbCBJIG5lZWQgaXMgYWN0dWFsIG1vdGl2
YXRpb24gZm9yIGNoYW5naW5nIHRoZQ0KPiA+IGV4aXN0aW5nIGNvZGUgb3RoZXIgdGhhbiBoYW5k
d2F2aW5nIGFyZ3VtZW50cyBhYm91dCAicG9sbGluZyBpcyBiZXR0ZXINCj4gPiB0aGFuIGZsYXQg
d2FpdHMiLg0KPiA+IFdoYXQgYWN0dWFsIHVzZSBjYXNlcyBhcmUgaW1wYWN0aW5nIHVzIG5vdywg
b3RoZXIgdGhhbiB0aGUgQUlYIGRlc2lnbg0KPiA+IGRlY2lzaW9uIHRvIGZvcmNlIENMT1NFIHRv
IHJldHJ5IGF0IGxlYXN0IG9uY2UgYmVmb3JlIHN1Y2NlZWRpbmc/DQo+ID4NCj4gDQo+IA0KPiAt
LQ0KPiBNYXR0IEJlbmphbWluDQo+IFRoZSBMaW51eCBCb3gNCj4gMjA2IFNvdXRoIEZpZnRoIEF2
ZS4gU3VpdGUgMTUwDQo+IEFubiBBcmJvciwgTUkgIDQ4MTA0DQo+IA0KPiBodHRwOi8vbGludXhi
b3guY29tDQo+IA0KPiB0ZWwuICA3MzQtNzYxLTQ2ODkNCj4gZmF4LiAgNzM0LTc2OS04OTM4DQo+
IGNlbC4gIDczNC0yMTYtNTMwOQ0K

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] NFSv4: Use exponential backoff delay for Ni
@ 2013-04-25 15:42                     ` Myklebust, Trond
  0 siblings, 0 replies; 21+ messages in thread
From: Myklebust, Trond @ 2013-04-25 15:42 UTC (permalink / raw)
  To: Matt W. Benjamin
  Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, bfields@fieldses.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1169 bytes --]

It's legal, but dumb...

> -----Original Message-----
> From: Matt W. Benjamin [mailto:matt@linuxbox.com]
> Sent: Thursday, April 25, 2013 11:28 AM
> To: Myklebust, Trond
> Cc: David Wysochanski; Dave Chiluk; linux-nfs@vger.kernel.org; linux-
> kernel@vger.kernel.org; bfields@fieldses.org
> Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for Ni
> 
> Hi,
> 
> Just to clarify, the IBM delay behavior is not legal?
> 
> Matt
> 
> ----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:
> 
> >
> > OK, then. Now all I need is actual motivation for changing the
> > existing code other than handwaving arguments about "polling is better
> > than flat waits".
> > What actual use cases are impacting us now, other than the AIX design
> > decision to force CLOSE to retry at least once before succeeding?
> >
> 
> 
> --
> Matt Benjamin
> The Linux Box
> 206 South Fifth Ave. Suite 150
> Ann Arbor, MI  48104
> 
> http://linuxbox.com
> 
> tel.  734-761-4689
> fax.  734-769-8938
> cel.  734-216-5309
ÿôèº{.nÇ+‰·Ÿ®‰†+%ŠËÿ±éÝ¶\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dÊ‡Ú™ë,j\a¢f£¢·hšïêÿ‘êçz_è®\x03(éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨èÚ&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 14:10               ` Myklebust, Trond
  2013-04-25 15:28                 ` [PATCH] NFSv4: Use exponential backoff delay for Ni Matt W. Benjamin
@ 2013-04-25 18:19                 ` bfields
  2013-04-25 18:40                   ` Chuck Lever
  1 sibling, 1 reply; 21+ messages in thread
From: bfields @ 2013-04-25 18:19 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
> On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote:
> > On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
> > > On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote:
> > > 
> > > > My position is that we simply have no idea what order of magnitude even
> > > > delay should be.  And that in such a situation exponential backoff such
> > > > as implemented in the synchronous case seems the reasonable default as
> > > > it guarantees at worst doubling the delay while still bounding the
> > > > long-term average frequency of retries.
> > > 
> > > So we start with a 15 second delay, and then go to 60 seconds?
> > 
> > I agree that a server should normally be doing the wait on its own if
> > the wait would be on the order of an rpc round trip.
> > 
> > So I'd be inclined to start with a delay that was an order of magnitude
> > or two more than a round trip.
> > 
> > And I'd expect NFS isn't common on networks with 1-second latencies.
> > 
> > So the 1/10 second we're using in the synchronous case sounds closer to
> > the right ballpark to me.
> 
> OK, then. Now all I need is actual motivation for changing the existing
> code other than handwaving arguments about "polling is better than flat
> waits".
> What actual use cases are impacting us now, other than the AIX design
> decision to force CLOSE to retry at least once before succeeding?

Nah, I've got nothing, and I agree that the AIX problem is there bug.

Just for fun I looked at re-checked the Linux server cases.  As far as I
can tell they are:

	- delegations: returned immediately on detection of any
	  conflict.  The current behavior in the sync case looks
	  reasonable to me.
	- allocation failures: not really sure it's the best error, but
	  it seems to be all the protocol offers.  We probably don't
	  care much what the client does in this case.
	- some rare cases that would probably indicate bugs (e.g.,
	  attempting to destroy a client while other rpc's from that
	  client are running.)  Again we don't care what the client does
	  here.
	- the 4.1 slot-inuse case.

We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
ENOMEM) to delay.  I thought I remembered one of those being used by
some HFS system, but can't actually find an example now.  A quick grep
doesn't show anything interesting.

--b.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 18:19                 ` [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY bfields
@ 2013-04-25 18:40                   ` Chuck Lever
  2013-04-25 18:46                     ` bfields
  0 siblings, 1 reply; 21+ messages in thread
From: Chuck Lever @ 2013-04-25 18:40 UTC (permalink / raw)
  To: bfields@fieldses.org
  Cc: Myklebust, Trond, David Wysochanski, Dave Chiluk,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org


On Apr 25, 2013, at 2:19 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote:

> On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
>> On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote:
>>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
>>>> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote:
>>>> 
>>>>> My position is that we simply have no idea what order of magnitude even
>>>>> delay should be.  And that in such a situation exponential backoff such
>>>>> as implemented in the synchronous case seems the reasonable default as
>>>>> it guarantees at worst doubling the delay while still bounding the
>>>>> long-term average frequency of retries.
>>>> 
>>>> So we start with a 15 second delay, and then go to 60 seconds?
>>> 
>>> I agree that a server should normally be doing the wait on its own if
>>> the wait would be on the order of an rpc round trip.
>>> 
>>> So I'd be inclined to start with a delay that was an order of magnitude
>>> or two more than a round trip.
>>> 
>>> And I'd expect NFS isn't common on networks with 1-second latencies.
>>> 
>>> So the 1/10 second we're using in the synchronous case sounds closer to
>>> the right ballpark to me.
>> 
>> OK, then. Now all I need is actual motivation for changing the existing
>> code other than handwaving arguments about "polling is better than flat
>> waits".
>> What actual use cases are impacting us now, other than the AIX design
>> decision to force CLOSE to retry at least once before succeeding?
> 
> Nah, I've got nothing, and I agree that the AIX problem is there bug.
> 
> Just for fun I looked at re-checked the Linux server cases.  As far as I
> can tell they are:
> 
> 	- delegations: returned immediately on detection of any
> 	  conflict.  The current behavior in the sync case looks
> 	  reasonable to me.
> 	- allocation failures: not really sure it's the best error, but
> 	  it seems to be all the protocol offers.  We probably don't
> 	  care much what the client does in this case.
> 	- some rare cases that would probably indicate bugs (e.g.,
> 	  attempting to destroy a client while other rpc's from that
> 	  client are running.)  Again we don't care what the client does
> 	  here.
> 	- the 4.1 slot-inuse case.
> 
> We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
> ENOMEM) to delay.  I thought I remembered one of those being used by
> some HFS system, but can't actually find an example now.  A quick grep
> doesn't show anything interesting.

It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 18:40                   ` Chuck Lever
@ 2013-04-25 18:46                     ` bfields
  2013-04-25 18:51                       ` Chuck Lever
  2013-04-25 18:52                       ` Myklebust, Trond
  0 siblings, 2 replies; 21+ messages in thread
From: bfields @ 2013-04-25 18:46 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Myklebust, Trond, David Wysochanski, Dave Chiluk,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org

On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote:
> 
> On Apr 25, 2013, at 2:19 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote:
> 
> > On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
> >> On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote:
> >>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
> >>>> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote:
> >>>> 
> >>>>> My position is that we simply have no idea what order of magnitude even
> >>>>> delay should be.  And that in such a situation exponential backoff such
> >>>>> as implemented in the synchronous case seems the reasonable default as
> >>>>> it guarantees at worst doubling the delay while still bounding the
> >>>>> long-term average frequency of retries.
> >>>> 
> >>>> So we start with a 15 second delay, and then go to 60 seconds?
> >>> 
> >>> I agree that a server should normally be doing the wait on its own if
> >>> the wait would be on the order of an rpc round trip.
> >>> 
> >>> So I'd be inclined to start with a delay that was an order of magnitude
> >>> or two more than a round trip.
> >>> 
> >>> And I'd expect NFS isn't common on networks with 1-second latencies.
> >>> 
> >>> So the 1/10 second we're using in the synchronous case sounds closer to
> >>> the right ballpark to me.
> >> 
> >> OK, then. Now all I need is actual motivation for changing the existing
> >> code other than handwaving arguments about "polling is better than flat
> >> waits".
> >> What actual use cases are impacting us now, other than the AIX design
> >> decision to force CLOSE to retry at least once before succeeding?
> > 
> > Nah, I've got nothing, and I agree that the AIX problem is there bug.
> > 
> > Just for fun I looked at re-checked the Linux server cases.  As far as I
> > can tell they are:
> > 
> > 	- delegations: returned immediately on detection of any
> > 	  conflict.  The current behavior in the sync case looks
> > 	  reasonable to me.
> > 	- allocation failures: not really sure it's the best error, but
> > 	  it seems to be all the protocol offers.  We probably don't
> > 	  care much what the client does in this case.
> > 	- some rare cases that would probably indicate bugs (e.g.,
> > 	  attempting to destroy a client while other rpc's from that
> > 	  client are running.)  Again we don't care what the client does
> > 	  here.
> > 	- the 4.1 slot-inuse case.
> > 
> > We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
> > ENOMEM) to delay.  I thought I remembered one of those being used by
> > some HFS system, but can't actually find an example now.  A quick grep
> > doesn't show anything interesting.
> 
> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server.

I thought they'd decided they'll be forced to find a different way to do
that?

(The issue being that it only works if you're using 4.1, and if the
session state itself isn't part of the state to be transferred.
Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY
is seqid-modifying.)

--b.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 18:46                     ` bfields
@ 2013-04-25 18:51                       ` Chuck Lever
  2013-04-25 18:57                         ` bfields
  2013-04-25 18:52                       ` Myklebust, Trond
  1 sibling, 1 reply; 21+ messages in thread
From: Chuck Lever @ 2013-04-25 18:51 UTC (permalink / raw)
  To: bfields@fieldses.org
  Cc: Myklebust, Trond, David Wysochanski, Dave Chiluk,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org


On Apr 25, 2013, at 2:46 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote:

> On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote:
>> 
>> On Apr 25, 2013, at 2:19 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote:
>> 
>>> On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
>>>> On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote:
>>>>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
>>>>>> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote:
>>>>>> 
>>>>>>> My position is that we simply have no idea what order of magnitude even
>>>>>>> delay should be.  And that in such a situation exponential backoff such
>>>>>>> as implemented in the synchronous case seems the reasonable default as
>>>>>>> it guarantees at worst doubling the delay while still bounding the
>>>>>>> long-term average frequency of retries.
>>>>>> 
>>>>>> So we start with a 15 second delay, and then go to 60 seconds?
>>>>> 
>>>>> I agree that a server should normally be doing the wait on its own if
>>>>> the wait would be on the order of an rpc round trip.
>>>>> 
>>>>> So I'd be inclined to start with a delay that was an order of magnitude
>>>>> or two more than a round trip.
>>>>> 
>>>>> And I'd expect NFS isn't common on networks with 1-second latencies.
>>>>> 
>>>>> So the 1/10 second we're using in the synchronous case sounds closer to
>>>>> the right ballpark to me.
>>>> 
>>>> OK, then. Now all I need is actual motivation for changing the existing
>>>> code other than handwaving arguments about "polling is better than flat
>>>> waits".
>>>> What actual use cases are impacting us now, other than the AIX design
>>>> decision to force CLOSE to retry at least once before succeeding?
>>> 
>>> Nah, I've got nothing, and I agree that the AIX problem is there bug.
>>> 
>>> Just for fun I looked at re-checked the Linux server cases.  As far as I
>>> can tell they are:
>>> 
>>> 	- delegations: returned immediately on detection of any
>>> 	  conflict.  The current behavior in the sync case looks
>>> 	  reasonable to me.
>>> 	- allocation failures: not really sure it's the best error, but
>>> 	  it seems to be all the protocol offers.  We probably don't
>>> 	  care much what the client does in this case.
>>> 	- some rare cases that would probably indicate bugs (e.g.,
>>> 	  attempting to destroy a client while other rpc's from that
>>> 	  client are running.)  Again we don't care what the client does
>>> 	  here.
>>> 	- the 4.1 slot-inuse case.
>>> 
>>> We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
>>> ENOMEM) to delay.  I thought I remembered one of those being used by
>>> some HFS system, but can't actually find an example now.  A quick grep
>>> doesn't show anything interesting.
>> 
>> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server.
> 
> I thought they'd decided they'll be forced to find a different way to do
> that?
> 
> (The issue being that it only works if you're using 4.1, and if the
> session state itself isn't part of the state to be transferred.
> Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY
> is seqid-modifying.)

The answer is not to return NFS4ERR_DELAY on seqid-modifying operations.

The source server can return NFS4ERR_DELAY to the client's migration recovery operations (the GETATTR(fs_locations) request) for example.

Or, the server could return it on the initial PUTFH operation in a compound containing seqid-modifying operations.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 18:51                       ` Chuck Lever
@ 2013-04-25 18:57                         ` bfields
  0 siblings, 0 replies; 21+ messages in thread
From: bfields @ 2013-04-25 18:57 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Myklebust, Trond, David Wysochanski, Dave Chiluk,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org

On Thu, Apr 25, 2013 at 02:51:20PM -0400, Chuck Lever wrote:
> 
> On Apr 25, 2013, at 2:46 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote:
> 
> > On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote:
> >> 
> >> On Apr 25, 2013, at 2:19 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote:
> >> 
> >>> On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
> >>>> On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote:
> >>>>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
> >>>>>> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote:
> >>>>>> 
> >>>>>>> My position is that we simply have no idea what order of magnitude even
> >>>>>>> delay should be.  And that in such a situation exponential backoff such
> >>>>>>> as implemented in the synchronous case seems the reasonable default as
> >>>>>>> it guarantees at worst doubling the delay while still bounding the
> >>>>>>> long-term average frequency of retries.
> >>>>>> 
> >>>>>> So we start with a 15 second delay, and then go to 60 seconds?
> >>>>> 
> >>>>> I agree that a server should normally be doing the wait on its own if
> >>>>> the wait would be on the order of an rpc round trip.
> >>>>> 
> >>>>> So I'd be inclined to start with a delay that was an order of magnitude
> >>>>> or two more than a round trip.
> >>>>> 
> >>>>> And I'd expect NFS isn't common on networks with 1-second latencies.
> >>>>> 
> >>>>> So the 1/10 second we're using in the synchronous case sounds closer to
> >>>>> the right ballpark to me.
> >>>> 
> >>>> OK, then. Now all I need is actual motivation for changing the existing
> >>>> code other than handwaving arguments about "polling is better than flat
> >>>> waits".
> >>>> What actual use cases are impacting us now, other than the AIX design
> >>>> decision to force CLOSE to retry at least once before succeeding?
> >>> 
> >>> Nah, I've got nothing, and I agree that the AIX problem is there bug.
> >>> 
> >>> Just for fun I looked at re-checked the Linux server cases.  As far as I
> >>> can tell they are:
> >>> 
> >>> 	- delegations: returned immediately on detection of any
> >>> 	  conflict.  The current behavior in the sync case looks
> >>> 	  reasonable to me.
> >>> 	- allocation failures: not really sure it's the best error, but
> >>> 	  it seems to be all the protocol offers.  We probably don't
> >>> 	  care much what the client does in this case.
> >>> 	- some rare cases that would probably indicate bugs (e.g.,
> >>> 	  attempting to destroy a client while other rpc's from that
> >>> 	  client are running.)  Again we don't care what the client does
> >>> 	  here.
> >>> 	- the 4.1 slot-inuse case.
> >>> 
> >>> We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
> >>> ENOMEM) to delay.  I thought I remembered one of those being used by
> >>> some HFS system, but can't actually find an example now.  A quick grep
> >>> doesn't show anything interesting.
> >> 
> >> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server.
> > 
> > I thought they'd decided they'll be forced to find a different way to do
> > that?
> > 
> > (The issue being that it only works if you're using 4.1, and if the
> > session state itself isn't part of the state to be transferred.
> > Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY
> > is seqid-modifying.)
> 
> The answer is not to return NFS4ERR_DELAY on seqid-modifying operations.
> 
> The source server can return NFS4ERR_DELAY to the client's migration recovery operations (the GETATTR(fs_locations) request) for example.
> 
> Or, the server could return it on the initial PUTFH operation in a compound containing seqid-modifying operations.

Oh, right, I'd forgotten that approach....

--b.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 18:46                     ` bfields
  2013-04-25 18:51                       ` Chuck Lever
@ 2013-04-25 18:52                       ` Myklebust, Trond
  1 sibling, 0 replies; 21+ messages in thread
From: Myklebust, Trond @ 2013-04-25 18:52 UTC (permalink / raw)
  To: bfields@fieldses.org
  Cc: Chuck Lever, Myklebust, Trond, David Wysochanski, Dave Chiluk,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org


On Apr 25, 2013, at 2:46 PM, "bfields@fieldses.org" <bfields@fieldses.org>
 wrote:

> On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote:
>> 
>> On Apr 25, 2013, at 2:19 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote:
>> 
>>> On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
>>>> On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote:
>>>>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
>>>>>> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote:
>>>>>> 
>>>>>>> My position is that we simply have no idea what order of magnitude even
>>>>>>> delay should be.  And that in such a situation exponential backoff such
>>>>>>> as implemented in the synchronous case seems the reasonable default as
>>>>>>> it guarantees at worst doubling the delay while still bounding the
>>>>>>> long-term average frequency of retries.
>>>>>> 
>>>>>> So we start with a 15 second delay, and then go to 60 seconds?
>>>>> 
>>>>> I agree that a server should normally be doing the wait on its own if
>>>>> the wait would be on the order of an rpc round trip.
>>>>> 
>>>>> So I'd be inclined to start with a delay that was an order of magnitude
>>>>> or two more than a round trip.
>>>>> 
>>>>> And I'd expect NFS isn't common on networks with 1-second latencies.
>>>>> 
>>>>> So the 1/10 second we're using in the synchronous case sounds closer to
>>>>> the right ballpark to me.
>>>> 
>>>> OK, then. Now all I need is actual motivation for changing the existing
>>>> code other than handwaving arguments about "polling is better than flat
>>>> waits".
>>>> What actual use cases are impacting us now, other than the AIX design
>>>> decision to force CLOSE to retry at least once before succeeding?
>>> 
>>> Nah, I've got nothing, and I agree that the AIX problem is there bug.
>>> 
>>> Just for fun I looked at re-checked the Linux server cases.  As far as I
>>> can tell they are:
>>> 
>>> 	- delegations: returned immediately on detection of any
>>> 	  conflict.  The current behavior in the sync case looks
>>> 	  reasonable to me.
>>> 	- allocation failures: not really sure it's the best error, but
>>> 	  it seems to be all the protocol offers.  We probably don't
>>> 	  care much what the client does in this case.
>>> 	- some rare cases that would probably indicate bugs (e.g.,
>>> 	  attempting to destroy a client while other rpc's from that
>>> 	  client are running.)  Again we don't care what the client does
>>> 	  here.
>>> 	- the 4.1 slot-inuse case.
>>> 
>>> We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
>>> ENOMEM) to delay.  I thought I remembered one of those being used by
>>> some HFS system, but can't actually find an example now.  A quick grep
>>> doesn't show anything interesting.
>> 
>> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server.
> 
> I thought they'd decided they'll be forced to find a different way to do
> that?
> 
> (The issue being that it only works if you're using 4.1, and if the
> session state itself isn't part of the state to be transferred.
> Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY
> is seqid-modifying.)

Either way, migration is not a performance-critical path that needs 1second or less response times on those NFS4ERR_DELAY replies.

Trond


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY
  2013-04-25 13:49             ` bfields
  2013-04-25 14:10               ` Myklebust, Trond
@ 2013-04-25 14:51               ` Chuck Lever
  1 sibling, 0 replies; 21+ messages in thread
From: Chuck Lever @ 2013-04-25 14:51 UTC (permalink / raw)
  To: bfields
  Cc: Myklebust, Trond, David Wysochanski, Dave Chiluk,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org


On Apr 25, 2013, at 9:49 AM, bfields@fieldses.org wrote:

> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
>> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote:
>> 
>>> My position is that we simply have no idea what order of magnitude even
>>> delay should be.  And that in such a situation exponential backoff such
>>> as implemented in the synchronous case seems the reasonable default as
>>> it guarantees at worst doubling the delay while still bounding the
>>> long-term average frequency of retries.
>> 
>> So we start with a 15 second delay, and then go to 60 seconds?
> 
> I agree that a server should normally be doing the wait on its own if
> the wait would be on the order of an rpc round trip.
> 
> So I'd be inclined to start with a delay that was an order of magnitude
> or two more than a round trip.
> 
> And I'd expect NFS isn't common on networks with 1-second latencies.
> 
> So the 1/10 second we're using in the synchronous case sounds closer to
> the right ballpark to me.

The RPC layer already keeps RPC round trip statistics, so the client doesn't have to guess with a "one size fits all" number.

I'm all for keeping client recovery time short.  But after following this argument, I think 10xRTT is crazy short.  Aggressive retransmits can lead to data corruption, and RTT on a fast server is going to be on the order of a millisecond.  And what about RDMA, where RTT is about 20usecs? 

A better answer might be to start at one second then exponentially back off to the minimum of 0.25x the lease time and 0.25x the RPC retransmit time out.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-04-25 18:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-24 20:55 [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY Dave Chiluk
2013-04-24 21:11 ` J. Bruce Fields
2013-04-24 21:28 ` Myklebust, Trond
2013-04-24 21:54   ` Dave Chiluk
2013-04-24 22:35     ` Myklebust, Trond
2013-04-25 12:19       ` David Wysochanski
2013-04-25 13:19         ` Myklebust, Trond
2013-04-25 13:29         ` bfields
2013-04-25 13:30           ` Myklebust, Trond
2013-04-25 13:49             ` bfields
2013-04-25 14:10               ` Myklebust, Trond
2013-04-25 15:28                 ` [PATCH] NFSv4: Use exponential backoff delay for Ni Matt W. Benjamin
2013-04-25 15:42                   ` Myklebust, Trond
2013-04-25 15:42                     ` Myklebust, Trond
2013-04-25 18:19                 ` [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY bfields
2013-04-25 18:40                   ` Chuck Lever
2013-04-25 18:46                     ` bfields
2013-04-25 18:51                       ` Chuck Lever
2013-04-25 18:57                         ` bfields
2013-04-25 18:52                       ` Myklebust, Trond
2013-04-25 14:51               ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.