* [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY @ 2013-04-24 20:55 Dave Chiluk 2013-04-24 21:11 ` J. Bruce Fields 2013-04-24 21:28 ` Myklebust, Trond 0 siblings, 2 replies; 21+ messages in thread From: Dave Chiluk @ 2013-04-24 20:55 UTC (permalink / raw) To: Trond.Myklebust, bfields, linux-nfs, linux-kernel Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions. Additionally this alleviates an interoperability problem with the AIX NFSv4 Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a close when it happens in close proximity to a RELEASE_LOCKOWNER. This would cause a linux client to hang for 15 seconds. Signed-off-by: Dave Chiluk <chiluk@canonical.com> --- fs/nfs/nfs4proc.c | 12 ++++++++++++ include/linux/sunrpc/sched.h | 1 + 2 files changed, 13 insertions(+) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 0ad025e..37dad27 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -4006,6 +4006,18 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server, #endif /* CONFIG_NFS_V4_1 */ case -NFS4ERR_DELAY: nfs_inc_server_stats(server, NFSIOS_DELAY); + /* Do an exponential backoff of retries from + * NFS4_POLL_RETRY_MIN to NFS4_POLL_RETRY_MAX. */ + task->tk_timeout = NFS4_POLL_RETRY_MIN << + (task->tk_delays*2); + if (task->tk_timeout > NFS4_POLL_RETRY_MAX) + rpc_delay(task, NFS4_POLL_RETRY_MAX); + else { + task->tk_delays++; + rpc_delay(task, task->tk_timeout); + } + task->tk_status = 0; + return -EAGAIN; case -NFS4ERR_GRACE: rpc_delay(task, NFS4_POLL_RETRY_MAX); task->tk_status = 0; diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 84ca436..60f82bf 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -62,6 +62,7 @@ struct rpc_task { void * tk_calldata; unsigned long tk_timeout; /* timeout for rpc_sleep() */ + unsigned short tk_delays; /* number of times task delayed */ unsigned long tk_runstate; /* Task run status */ struct workqueue_struct *tk_workqueue; /* Normally rpciod, but could * be any workqueue -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-24 20:55 [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY Dave Chiluk @ 2013-04-24 21:11 ` J. Bruce Fields 2013-04-24 21:28 ` Myklebust, Trond 1 sibling, 0 replies; 21+ messages in thread From: J. Bruce Fields @ 2013-04-24 21:11 UTC (permalink / raw) To: Dave Chiluk; +Cc: Trond.Myklebust, linux-nfs, linux-kernel On Wed, Apr 24, 2013 at 03:55:49PM -0500, Dave Chiluk wrote: > Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow > to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions. > > Additionally this alleviates an interoperability problem with the AIX NFSv4 > Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a > close when it happens in close proximity to a RELEASE_LOCKOWNER. This would > cause a linux client to hang for 15 seconds. > > Signed-off-by: Dave Chiluk <chiluk@canonical.com> > --- > fs/nfs/nfs4proc.c | 12 ++++++++++++ > include/linux/sunrpc/sched.h | 1 + > 2 files changed, 13 insertions(+) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 0ad025e..37dad27 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -4006,6 +4006,18 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server, > #endif /* CONFIG_NFS_V4_1 */ > case -NFS4ERR_DELAY: > nfs_inc_server_stats(server, NFSIOS_DELAY); > + /* Do an exponential backoff of retries from > + * NFS4_POLL_RETRY_MIN to NFS4_POLL_RETRY_MAX. */ > + task->tk_timeout = NFS4_POLL_RETRY_MIN << > + (task->tk_delays*2); > + if (task->tk_timeout > NFS4_POLL_RETRY_MAX) > + rpc_delay(task, NFS4_POLL_RETRY_MAX); > + else { > + task->tk_delays++; > + rpc_delay(task, task->tk_timeout); > + } > + task->tk_status = 0; > + return -EAGAIN; Just as a matter of style, could you stick this in a helper something like the existing nfs4_delay?: case -NFS4ERR_DELAY: nfs_inc_server_stats(server, NFSIOS_DELAY); nfs4_async_delay(task); task->tk_status = 0; return -EAGAIN; ... --b. > case -NFS4ERR_GRACE: > rpc_delay(task, NFS4_POLL_RETRY_MAX); > task->tk_status = 0; > diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h > index 84ca436..60f82bf 100644 > --- a/include/linux/sunrpc/sched.h > +++ b/include/linux/sunrpc/sched.h > @@ -62,6 +62,7 @@ struct rpc_task { > void * tk_calldata; > > unsigned long tk_timeout; /* timeout for rpc_sleep() */ > + unsigned short tk_delays; /* number of times task delayed */ > unsigned long tk_runstate; /* Task run status */ > struct workqueue_struct *tk_workqueue; /* Normally rpciod, but could > * be any workqueue > -- > 1.7.9.5 > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-24 20:55 [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY Dave Chiluk 2013-04-24 21:11 ` J. Bruce Fields @ 2013-04-24 21:28 ` Myklebust, Trond 2013-04-24 21:54 ` Dave Chiluk 1 sibling, 1 reply; 21+ messages in thread From: Myklebust, Trond @ 2013-04-24 21:28 UTC (permalink / raw) To: Dave Chiluk Cc: bfields@fieldses.org, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote: > Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow > to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions. > > Additionally this alleviates an interoperability problem with the AIX NFSv4 > Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a > close when it happens in close proximity to a RELEASE_LOCKOWNER. This would > cause a linux client to hang for 15 seconds. Hi Dave, The AIX server is not being motivated by any requirements in the NFSv4 spec here, so I fail to see the reason why the behaviour that you describe can justify changing the client. It is not at all obvious to me that we should be retrying aggressively when NFSv4 servers return NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than the exising 15 seconds? The motivation for doing it in the case of OPEN, SETATTR, etc is clearer: those operations may require the server to recall a delegation, in which case aggressive retries are in order since delegation recalls are usually fast. The motivation in the case of LOCK is less clear, but it is basically down to the fact that NFSv4 has a polling model for doing blocking locks. In all other cases, why should we be treating NFS4ERR_DELAY any differently from how we treat NFS3ERR_JUKEBOX in NFSv3? Note that if we do decide that changing the client is the right thing, then I don't want the patch to add new fields to struct rpc_task. That's the wrong layer for storing NFSv4 client specific data. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-24 21:28 ` Myklebust, Trond @ 2013-04-24 21:54 ` Dave Chiluk 2013-04-24 22:35 ` Myklebust, Trond 0 siblings, 1 reply; 21+ messages in thread From: Dave Chiluk @ 2013-04-24 21:54 UTC (permalink / raw) To: Myklebust, Trond Cc: bfields@fieldses.org, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 2947 bytes --] On 04/24/2013 04:28 PM, Myklebust, Trond wrote: > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote: >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions. >> >> Additionally this alleviates an interoperability problem with the AIX NFSv4 >> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a >> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would >> cause a linux client to hang for 15 seconds. > > Hi Dave, > > The AIX server is not being motivated by any requirements in the NFSv4 > spec here, so I fail to see the reason why the behaviour that you > describe can justify changing the client. It is not at all obvious to me > that we should be retrying aggressively when NFSv4 servers return > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than > the exising 15 seconds? I agree with you that AIX is at fault, and that the preferable situation for the linux client would be for AIX to not return NFS4ERR_DELAY in this use case. I have attached a simple program that causes exacerbates the problem on the AIX server. I have already had a conference call with AIX NFS development about this issue, where I vehemently tried to convince them to fix their server. Unfortunately as I don't have much reputation in the NFS community, I was unable to convince them to do the right thing. I would be more than happy to set up another call, if someone higher up in the linux NFS hierarchy would be willing to participate. That being said, I think implementing an exponential backoff is an improvement in the client regardless of what AIX is doing. If a server needs only 2 seconds to process a request for which NFS4ERR_DELAY was returned, this algorithm would get the client back and running after only 2.1 seconds of elapsed time. Whereas the current dumb algorithm would simply wait 15 seconds. This is the reason that I implemented this change. > The motivation for doing it in the case of OPEN, SETATTR, etc is > clearer: those operations may require the server to recall a delegation, > in which case aggressive retries are in order since delegation recalls > are usually fast. > The motivation in the case of LOCK is less clear, but it is basically > down to the fact that NFSv4 has a polling model for doing blocking > locks. > In all other cases, why should we be treating NFS4ERR_DELAY any > differently from how we treat NFS3ERR_JUKEBOX in NFSv3? > > Note that if we do decide that changing the client is the right thing, > then I don't want the patch to add new fields to struct rpc_task. That's > the wrong layer for storing NFSv4 client specific data. This is something that I was concerned about as well, but I could not find another persistent way to do this. I am open to suggestions of which structures would be more acceptable. Thanks, Dave. [-- Attachment #2: open-close.c --] [-- Type: text/x-csrc, Size: 410 bytes --] #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/file.h> #include <unistd.h> #define FILENAME "testfile" int main() { int fd = open( FILENAME, O_RDWR ); if( fd==0 ) { fputs( "Failed to open `" FILENAME "'.", stderr ); return 1; } printf( "flock() returned %d. Now calling close() ...\n", flock( fd, LOCK_EX|LOCK_NB ) ); close( fd ); return 0; } ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-24 21:54 ` Dave Chiluk @ 2013-04-24 22:35 ` Myklebust, Trond 2013-04-25 12:19 ` David Wysochanski 0 siblings, 1 reply; 21+ messages in thread From: Myklebust, Trond @ 2013-04-24 22:35 UTC (permalink / raw) To: Dave Chiluk Cc: bfields@fieldses.org, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote: > On 04/24/2013 04:28 PM, Myklebust, Trond wrote: > > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote: > >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow > >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions. > >> > >> Additionally this alleviates an interoperability problem with the AIX NFSv4 > >> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a > >> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would > >> cause a linux client to hang for 15 seconds. > > > > Hi Dave, > > > > The AIX server is not being motivated by any requirements in the NFSv4 > > spec here, so I fail to see the reason why the behaviour that you > > describe can justify changing the client. It is not at all obvious to me > > that we should be retrying aggressively when NFSv4 servers return > > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than > > the exising 15 seconds? > > I agree with you that AIX is at fault, and that the preferable situation > for the linux client would be for AIX to not return NFS4ERR_DELAY in > this use case. I have attached a simple program that causes exacerbates > the problem on the AIX server. I have already had a conference call > with AIX NFS development about this issue, where I vehemently tried to > convince them to fix their server. Unfortunately as I don't have much > reputation in the NFS community, I was unable to convince them to do the > right thing. I would be more than happy to set up another call, if > someone higher up in the linux NFS hierarchy would be willing to > participate. I'd think that if they have customers that want to use Linux clients, then those customers are likely to have more influence. This is entirely a consequence of _their_ design decisions, quite frankly, since returning NFS4ERR_DELAY in the above situation is downright silly. The server designers _know_ that the RELEASE_LOCKOWNER will finish whatever it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do the exact same state manipulations anyway... > That being said, I think implementing an exponential backoff is an > improvement in the client regardless of what AIX is doing. If a server > needs only 2 seconds to process a request for which NFS4ERR_DELAY was > returned, this algorithm would get the client back and running after > only 2.1 seconds of elapsed time. Whereas the current dumb algorithm > would simply wait 15 seconds. This is the reason that I implemented > this change. Right, but my point above is that _in_general_ if we don't know why the server is returning NFS4ERR_DELAY, then how can we attach any retry numbers at all? HSM systems, for instance, have very different latencies than the above and were the reason for inventing NFS3ERR_JUKEBOX in the first place. > > The motivation for doing it in the case of OPEN, SETATTR, etc is > > clearer: those operations may require the server to recall a delegation, > > in which case aggressive retries are in order since delegation recalls > > are usually fast. > > The motivation in the case of LOCK is less clear, but it is basically > > down to the fact that NFSv4 has a polling model for doing blocking > > locks. > > > In all other cases, why should we be treating NFS4ERR_DELAY any > > differently from how we treat NFS3ERR_JUKEBOX in NFSv3? > > > > Note that if we do decide that changing the client is the right thing, > > then I don't want the patch to add new fields to struct rpc_task. That's > > the wrong layer for storing NFSv4 client specific data. > > This is something that I was concerned about as well, but I could not > find another persistent way to do this. I am open to suggestions of > which structures would be more acceptable. We could change nfs4_async_handle_error() to take a struct nfs4_exception, just like nfs4_handle_exception() does; at some point we can use that to unify the two. Just store the timeout somewhere in the nfs4_closedata. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-24 22:35 ` Myklebust, Trond @ 2013-04-25 12:19 ` David Wysochanski 2013-04-25 13:19 ` Myklebust, Trond 2013-04-25 13:29 ` bfields 0 siblings, 2 replies; 21+ messages in thread From: David Wysochanski @ 2013-04-25 12:19 UTC (permalink / raw) To: Myklebust, Trond Cc: Dave Chiluk, bfields@fieldses.org, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Wed, 2013-04-24 at 22:35 +0000, Myklebust, Trond wrote: > On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote: > > On 04/24/2013 04:28 PM, Myklebust, Trond wrote: > > > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote: > > >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow > > >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions. > > >> > > >> Additionally this alleviates an interoperability problem with the AIX NFSv4 > > >> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a > > >> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would > > >> cause a linux client to hang for 15 seconds. > > > > > > Hi Dave, > > > > > > The AIX server is not being motivated by any requirements in the NFSv4 > > > spec here, so I fail to see the reason why the behaviour that you > > > describe can justify changing the client. It is not at all obvious to me > > > that we should be retrying aggressively when NFSv4 servers return > > > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than > > > the exising 15 seconds? > > > > I agree with you that AIX is at fault, and that the preferable situation > > for the linux client would be for AIX to not return NFS4ERR_DELAY in > > this use case. I have attached a simple program that causes exacerbates > > the problem on the AIX server. I have already had a conference call > > with AIX NFS development about this issue, where I vehemently tried to > > convince them to fix their server. Unfortunately as I don't have much > > reputation in the NFS community, I was unable to convince them to do the > > right thing. I would be more than happy to set up another call, if > > someone higher up in the linux NFS hierarchy would be willing to > > participate. > > I'd think that if they have customers that want to use Linux clients, > then those customers are likely to have more influence. This is entirely > a consequence of _their_ design decisions, quite frankly, since > returning NFS4ERR_DELAY in the above situation is downright silly. The > server designers _know_ that the RELEASE_LOCKOWNER will finish whatever > it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do > the exact same state manipulations anyway... > > > That being said, I think implementing an exponential backoff is an > > improvement in the client regardless of what AIX is doing. If a server > > needs only 2 seconds to process a request for which NFS4ERR_DELAY was > > returned, this algorithm would get the client back and running after > > only 2.1 seconds of elapsed time. Whereas the current dumb algorithm > > would simply wait 15 seconds. This is the reason that I implemented > > this change. > > Right, but my point above is that _in_general_ if we don't know why the > server is returning NFS4ERR_DELAY, then how can we attach any retry > numbers at all? HSM systems, for instance, have very different latencies > than the above and were the reason for inventing NFS3ERR_JUKEBOX in the > first place. > Agreed we can't know why the server is returning NFS4ERR_DELAY so it's hard to pick a retry number. Can you explain the rationale for the current 15 seconds delay? Was it just for simplicity or something else? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 12:19 ` David Wysochanski @ 2013-04-25 13:19 ` Myklebust, Trond 2013-04-25 13:29 ` bfields 1 sibling, 0 replies; 21+ messages in thread From: Myklebust, Trond @ 2013-04-25 13:19 UTC (permalink / raw) To: dwysocha@redhat.com Cc: Dave Chiluk, bfields@fieldses.org, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, 2013-04-25 at 08:19 -0400, David Wysochanski wrote: > On Wed, 2013-04-24 at 22:35 +0000, Myklebust, Trond wrote: > > On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote: > > > On 04/24/2013 04:28 PM, Myklebust, Trond wrote: > > > > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote: > > > >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow > > > >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions. > > > >> > > > >> Additionally this alleviates an interoperability problem with the AIX NFSv4 > > > >> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a > > > >> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would > > > >> cause a linux client to hang for 15 seconds. > > > > > > > > Hi Dave, > > > > > > > > The AIX server is not being motivated by any requirements in the NFSv4 > > > > spec here, so I fail to see the reason why the behaviour that you > > > > describe can justify changing the client. It is not at all obvious to me > > > > that we should be retrying aggressively when NFSv4 servers return > > > > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than > > > > the exising 15 seconds? > > > > > > I agree with you that AIX is at fault, and that the preferable situation > > > for the linux client would be for AIX to not return NFS4ERR_DELAY in > > > this use case. I have attached a simple program that causes exacerbates > > > the problem on the AIX server. I have already had a conference call > > > with AIX NFS development about this issue, where I vehemently tried to > > > convince them to fix their server. Unfortunately as I don't have much > > > reputation in the NFS community, I was unable to convince them to do the > > > right thing. I would be more than happy to set up another call, if > > > someone higher up in the linux NFS hierarchy would be willing to > > > participate. > > > > I'd think that if they have customers that want to use Linux clients, > > then those customers are likely to have more influence. This is entirely > > a consequence of _their_ design decisions, quite frankly, since > > returning NFS4ERR_DELAY in the above situation is downright silly. The > > server designers _know_ that the RELEASE_LOCKOWNER will finish whatever > > it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do > > the exact same state manipulations anyway... > > > > > That being said, I think implementing an exponential backoff is an > > > improvement in the client regardless of what AIX is doing. If a server > > > needs only 2 seconds to process a request for which NFS4ERR_DELAY was > > > returned, this algorithm would get the client back and running after > > > only 2.1 seconds of elapsed time. Whereas the current dumb algorithm > > > would simply wait 15 seconds. This is the reason that I implemented > > > this change. > > > > Right, but my point above is that _in_general_ if we don't know why the > > server is returning NFS4ERR_DELAY, then how can we attach any retry > > numbers at all? HSM systems, for instance, have very different latencies > > than the above and were the reason for inventing NFS3ERR_JUKEBOX in the > > first place. > > > > Agreed we can't know why the server is returning NFS4ERR_DELAY so it's > hard to pick a retry number. Can you explain the rationale for the > current 15 seconds delay? Was it just for simplicity or something else? > Our expectation for NFS4ERR_DELAY event that are not listed in RFC3530/RFC5661 is that it should be rare, but is expected on average to last significantly longer than an RPC round-trip between the server and client. The other constraint was that we needed a number which is shorter than the lease period so that we don't have to keep sending RENEWs. The 2 main cases we thought we'd have to deal with were: - HSM systems fetching data from a tape backup or something similar - Idmappers needing to refill their cache from LDAP/NIS/... We did not expect servers to be using NFS4ERR_DELAY as a generic tool for avoiding mutexes. That sounds like great a business opportunity for the network switch vendors, but a poor one for everyone else... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 12:19 ` David Wysochanski 2013-04-25 13:19 ` Myklebust, Trond @ 2013-04-25 13:29 ` bfields 2013-04-25 13:30 ` Myklebust, Trond 1 sibling, 1 reply; 21+ messages in thread From: bfields @ 2013-04-25 13:29 UTC (permalink / raw) To: David Wysochanski Cc: Myklebust, Trond, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, Apr 25, 2013 at 08:19:34AM -0400, David Wysochanski wrote: > On Wed, 2013-04-24 at 22:35 +0000, Myklebust, Trond wrote: > > On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote: > > > On 04/24/2013 04:28 PM, Myklebust, Trond wrote: > > > > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote: > > > >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow > > > >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions. > > > >> > > > >> Additionally this alleviates an interoperability problem with the AIX NFSv4 > > > >> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a > > > >> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would > > > >> cause a linux client to hang for 15 seconds. > > > > > > > > Hi Dave, > > > > > > > > The AIX server is not being motivated by any requirements in the NFSv4 > > > > spec here, so I fail to see the reason why the behaviour that you > > > > describe can justify changing the client. It is not at all obvious to me > > > > that we should be retrying aggressively when NFSv4 servers return > > > > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than > > > > the exising 15 seconds? > > > > > > I agree with you that AIX is at fault, and that the preferable situation > > > for the linux client would be for AIX to not return NFS4ERR_DELAY in > > > this use case. I have attached a simple program that causes exacerbates > > > the problem on the AIX server. I have already had a conference call > > > with AIX NFS development about this issue, where I vehemently tried to > > > convince them to fix their server. Unfortunately as I don't have much > > > reputation in the NFS community, I was unable to convince them to do the > > > right thing. I would be more than happy to set up another call, if > > > someone higher up in the linux NFS hierarchy would be willing to > > > participate. > > > > I'd think that if they have customers that want to use Linux clients, > > then those customers are likely to have more influence. This is entirely > > a consequence of _their_ design decisions, quite frankly, since > > returning NFS4ERR_DELAY in the above situation is downright silly. The > > server designers _know_ that the RELEASE_LOCKOWNER will finish whatever > > it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do > > the exact same state manipulations anyway... > > > > > That being said, I think implementing an exponential backoff is an > > > improvement in the client regardless of what AIX is doing. If a server > > > needs only 2 seconds to process a request for which NFS4ERR_DELAY was > > > returned, this algorithm would get the client back and running after > > > only 2.1 seconds of elapsed time. Whereas the current dumb algorithm > > > would simply wait 15 seconds. This is the reason that I implemented > > > this change. > > > > Right, but my point above is that _in_general_ if we don't know why the > > server is returning NFS4ERR_DELAY, then how can we attach any retry > > numbers at all? HSM systems, for instance, have very different latencies > > than the above and were the reason for inventing NFS3ERR_JUKEBOX in the > > first place. > > > > Agreed we can't know why the server is returning NFS4ERR_DELAY so it's > hard to pick a retry number. Can you explain the rationale for the > current 15 seconds delay? Was it just for simplicity or something else? As I understand it the original idea was that cold data really could take multiple seconds or minutes to retrieve (because e.g. a tape library might need to go load the right tape and rewind to the right spot...). Is that sort of system really used much these days? My position is that we simply have no idea what order of magnitude even delay should be. And that in such a situation exponential backoff such as implemented in the synchronous case seems the reasonable default as it guarantees at worst doubling the delay while still bounding the long-term average frequency of retries. --b. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 13:29 ` bfields @ 2013-04-25 13:30 ` Myklebust, Trond 2013-04-25 13:49 ` bfields 0 siblings, 1 reply; 21+ messages in thread From: Myklebust, Trond @ 2013-04-25 13:30 UTC (permalink / raw) To: bfields@fieldses.org Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote: > My position is that we simply have no idea what order of magnitude even > delay should be. And that in such a situation exponential backoff such > as implemented in the synchronous case seems the reasonable default as > it guarantees at worst doubling the delay while still bounding the > long-term average frequency of retries. So we start with a 15 second delay, and then go to 60 seconds? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 13:30 ` Myklebust, Trond @ 2013-04-25 13:49 ` bfields 2013-04-25 14:10 ` Myklebust, Trond 2013-04-25 14:51 ` Chuck Lever 0 siblings, 2 replies; 21+ messages in thread From: bfields @ 2013-04-25 13:49 UTC (permalink / raw) To: Myklebust, Trond Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote: > On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote: > > > My position is that we simply have no idea what order of magnitude even > > delay should be. And that in such a situation exponential backoff such > > as implemented in the synchronous case seems the reasonable default as > > it guarantees at worst doubling the delay while still bounding the > > long-term average frequency of retries. > > So we start with a 15 second delay, and then go to 60 seconds? I agree that a server should normally be doing the wait on its own if the wait would be on the order of an rpc round trip. So I'd be inclined to start with a delay that was an order of magnitude or two more than a round trip. And I'd expect NFS isn't common on networks with 1-second latencies. So the 1/10 second we're using in the synchronous case sounds closer to the right ballpark to me. --b. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 13:49 ` bfields @ 2013-04-25 14:10 ` Myklebust, Trond 2013-04-25 15:28 ` [PATCH] NFSv4: Use exponential backoff delay for Ni Matt W. Benjamin 2013-04-25 18:19 ` [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY bfields 2013-04-25 14:51 ` Chuck Lever 1 sibling, 2 replies; 21+ messages in thread From: Myklebust, Trond @ 2013-04-25 14:10 UTC (permalink / raw) To: bfields@fieldses.org Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote: > On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote: > > On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote: > > > > > My position is that we simply have no idea what order of magnitude even > > > delay should be. And that in such a situation exponential backoff such > > > as implemented in the synchronous case seems the reasonable default as > > > it guarantees at worst doubling the delay while still bounding the > > > long-term average frequency of retries. > > > > So we start with a 15 second delay, and then go to 60 seconds? > > I agree that a server should normally be doing the wait on its own if > the wait would be on the order of an rpc round trip. > > So I'd be inclined to start with a delay that was an order of magnitude > or two more than a round trip. > > And I'd expect NFS isn't common on networks with 1-second latencies. > > So the 1/10 second we're using in the synchronous case sounds closer to > the right ballpark to me. OK, then. Now all I need is actual motivation for changing the existing code other than handwaving arguments about "polling is better than flat waits". What actual use cases are impacting us now, other than the AIX design decision to force CLOSE to retry at least once before succeeding? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for Ni 2013-04-25 14:10 ` Myklebust, Trond @ 2013-04-25 15:28 ` Matt W. Benjamin 2013-04-25 15:42 ` Myklebust, Trond 2013-04-25 18:19 ` [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY bfields 1 sibling, 1 reply; 21+ messages in thread From: Matt W. Benjamin @ 2013-04-25 15:28 UTC (permalink / raw) To: Trond Myklebust Cc: David Wysochanski, Dave Chiluk, linux-nfs, linux-kernel, bfields Hi, Just to clarify, the IBM delay behavior is not legal? Matt ----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote: > > OK, then. Now all I need is actual motivation for changing the > existing > code other than handwaving arguments about "polling is better than > flat > waits". > What actual use cases are impacting us now, other than the AIX design > decision to force CLOSE to retry at least once before succeeding? > -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://linuxbox.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH] NFSv4: Use exponential backoff delay for Ni 2013-04-25 15:28 ` [PATCH] NFSv4: Use exponential backoff delay for Ni Matt W. Benjamin @ 2013-04-25 15:42 ` Myklebust, Trond 0 siblings, 0 replies; 21+ messages in thread From: Myklebust, Trond @ 2013-04-25 15:42 UTC (permalink / raw) To: Matt W. Benjamin Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, bfields@fieldses.org SXQncyBsZWdhbCwgYnV0IGR1bWIuLi4NCg0KPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0K PiBGcm9tOiBNYXR0IFcuIEJlbmphbWluIFttYWlsdG86bWF0dEBsaW51eGJveC5jb21dDQo+IFNl bnQ6IFRodXJzZGF5LCBBcHJpbCAyNSwgMjAxMyAxMToyOCBBTQ0KPiBUbzogTXlrbGVidXN0LCBU cm9uZA0KPiBDYzogRGF2aWQgV3lzb2NoYW5za2k7IERhdmUgQ2hpbHVrOyBsaW51eC1uZnNAdmdl ci5rZXJuZWwub3JnOyBsaW51eC0NCj4ga2VybmVsQHZnZXIua2VybmVsLm9yZzsgYmZpZWxkc0Bm aWVsZHNlcy5vcmcNCj4gU3ViamVjdDogUmU6IFtQQVRDSF0gTkZTdjQ6IFVzZSBleHBvbmVudGlh bCBiYWNrb2ZmIGRlbGF5IGZvciBOaQ0KPiANCj4gSGksDQo+IA0KPiBKdXN0IHRvIGNsYXJpZnks IHRoZSBJQk0gZGVsYXkgYmVoYXZpb3IgaXMgbm90IGxlZ2FsPw0KPiANCj4gTWF0dA0KPiANCj4g LS0tLS0gIlRyb25kIE15a2xlYnVzdCIgPFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90 ZToNCj4gDQo+ID4NCj4gPiBPSywgdGhlbi4gTm93IGFsbCBJIG5lZWQgaXMgYWN0dWFsIG1vdGl2 YXRpb24gZm9yIGNoYW5naW5nIHRoZQ0KPiA+IGV4aXN0aW5nIGNvZGUgb3RoZXIgdGhhbiBoYW5k d2F2aW5nIGFyZ3VtZW50cyBhYm91dCAicG9sbGluZyBpcyBiZXR0ZXINCj4gPiB0aGFuIGZsYXQg d2FpdHMiLg0KPiA+IFdoYXQgYWN0dWFsIHVzZSBjYXNlcyBhcmUgaW1wYWN0aW5nIHVzIG5vdywg b3RoZXIgdGhhbiB0aGUgQUlYIGRlc2lnbg0KPiA+IGRlY2lzaW9uIHRvIGZvcmNlIENMT1NFIHRv IHJldHJ5IGF0IGxlYXN0IG9uY2UgYmVmb3JlIHN1Y2NlZWRpbmc/DQo+ID4NCj4gDQo+IA0KPiAt LQ0KPiBNYXR0IEJlbmphbWluDQo+IFRoZSBMaW51eCBCb3gNCj4gMjA2IFNvdXRoIEZpZnRoIEF2 ZS4gU3VpdGUgMTUwDQo+IEFubiBBcmJvciwgTUkgIDQ4MTA0DQo+IA0KPiBodHRwOi8vbGludXhi b3guY29tDQo+IA0KPiB0ZWwuICA3MzQtNzYxLTQ2ODkNCj4gZmF4LiAgNzM0LTc2OS04OTM4DQo+ IGNlbC4gIDczNC0yMTYtNTMwOQ0K ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH] NFSv4: Use exponential backoff delay for Ni @ 2013-04-25 15:42 ` Myklebust, Trond 0 siblings, 0 replies; 21+ messages in thread From: Myklebust, Trond @ 2013-04-25 15:42 UTC (permalink / raw) To: Matt W. Benjamin Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, bfields@fieldses.org [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1169 bytes --] It's legal, but dumb... > -----Original Message----- > From: Matt W. Benjamin [mailto:matt@linuxbox.com] > Sent: Thursday, April 25, 2013 11:28 AM > To: Myklebust, Trond > Cc: David Wysochanski; Dave Chiluk; linux-nfs@vger.kernel.org; linux- > kernel@vger.kernel.org; bfields@fieldses.org > Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for Ni > > Hi, > > Just to clarify, the IBM delay behavior is not legal? > > Matt > > ----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote: > > > > > OK, then. Now all I need is actual motivation for changing the > > existing code other than handwaving arguments about "polling is better > > than flat waits". > > What actual use cases are impacting us now, other than the AIX design > > decision to force CLOSE to retry at least once before succeeding? > > > > > -- > Matt Benjamin > The Linux Box > 206 South Fifth Ave. Suite 150 > Ann Arbor, MI 48104 > > http://linuxbox.com > > tel. 734-761-4689 > fax. 734-769-8938 > cel. 734-216-5309 ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 14:10 ` Myklebust, Trond 2013-04-25 15:28 ` [PATCH] NFSv4: Use exponential backoff delay for Ni Matt W. Benjamin @ 2013-04-25 18:19 ` bfields 2013-04-25 18:40 ` Chuck Lever 1 sibling, 1 reply; 21+ messages in thread From: bfields @ 2013-04-25 18:19 UTC (permalink / raw) To: Myklebust, Trond Cc: David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote: > On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote: > > On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote: > > > On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote: > > > > > > > My position is that we simply have no idea what order of magnitude even > > > > delay should be. And that in such a situation exponential backoff such > > > > as implemented in the synchronous case seems the reasonable default as > > > > it guarantees at worst doubling the delay while still bounding the > > > > long-term average frequency of retries. > > > > > > So we start with a 15 second delay, and then go to 60 seconds? > > > > I agree that a server should normally be doing the wait on its own if > > the wait would be on the order of an rpc round trip. > > > > So I'd be inclined to start with a delay that was an order of magnitude > > or two more than a round trip. > > > > And I'd expect NFS isn't common on networks with 1-second latencies. > > > > So the 1/10 second we're using in the synchronous case sounds closer to > > the right ballpark to me. > > OK, then. Now all I need is actual motivation for changing the existing > code other than handwaving arguments about "polling is better than flat > waits". > What actual use cases are impacting us now, other than the AIX design > decision to force CLOSE to retry at least once before succeeding? Nah, I've got nothing, and I agree that the AIX problem is there bug. Just for fun I looked at re-checked the Linux server cases. As far as I can tell they are: - delegations: returned immediately on detection of any conflict. The current behavior in the sync case looks reasonable to me. - allocation failures: not really sure it's the best error, but it seems to be all the protocol offers. We probably don't care much what the client does in this case. - some rare cases that would probably indicate bugs (e.g., attempting to destroy a client while other rpc's from that client are running.) Again we don't care what the client does here. - the 4.1 slot-inuse case. We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK, ENOMEM) to delay. I thought I remembered one of those being used by some HFS system, but can't actually find an example now. A quick grep doesn't show anything interesting. --b. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 18:19 ` [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY bfields @ 2013-04-25 18:40 ` Chuck Lever 2013-04-25 18:46 ` bfields 0 siblings, 1 reply; 21+ messages in thread From: Chuck Lever @ 2013-04-25 18:40 UTC (permalink / raw) To: bfields@fieldses.org Cc: Myklebust, Trond, David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Apr 25, 2013, at 2:19 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote: > On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote: >> On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote: >>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote: >>>> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote: >>>> >>>>> My position is that we simply have no idea what order of magnitude even >>>>> delay should be. And that in such a situation exponential backoff such >>>>> as implemented in the synchronous case seems the reasonable default as >>>>> it guarantees at worst doubling the delay while still bounding the >>>>> long-term average frequency of retries. >>>> >>>> So we start with a 15 second delay, and then go to 60 seconds? >>> >>> I agree that a server should normally be doing the wait on its own if >>> the wait would be on the order of an rpc round trip. >>> >>> So I'd be inclined to start with a delay that was an order of magnitude >>> or two more than a round trip. >>> >>> And I'd expect NFS isn't common on networks with 1-second latencies. >>> >>> So the 1/10 second we're using in the synchronous case sounds closer to >>> the right ballpark to me. >> >> OK, then. Now all I need is actual motivation for changing the existing >> code other than handwaving arguments about "polling is better than flat >> waits". >> What actual use cases are impacting us now, other than the AIX design >> decision to force CLOSE to retry at least once before succeeding? > > Nah, I've got nothing, and I agree that the AIX problem is there bug. > > Just for fun I looked at re-checked the Linux server cases. As far as I > can tell they are: > > - delegations: returned immediately on detection of any > conflict. The current behavior in the sync case looks > reasonable to me. > - allocation failures: not really sure it's the best error, but > it seems to be all the protocol offers. We probably don't > care much what the client does in this case. > - some rare cases that would probably indicate bugs (e.g., > attempting to destroy a client while other rpc's from that > client are running.) Again we don't care what the client does > here. > - the 4.1 slot-inuse case. > > We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK, > ENOMEM) to delay. I thought I remembered one of those being used by > some HFS system, but can't actually find an example now. A quick grep > doesn't show anything interesting. It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 18:40 ` Chuck Lever @ 2013-04-25 18:46 ` bfields 2013-04-25 18:51 ` Chuck Lever 2013-04-25 18:52 ` Myklebust, Trond 0 siblings, 2 replies; 21+ messages in thread From: bfields @ 2013-04-25 18:46 UTC (permalink / raw) To: Chuck Lever Cc: Myklebust, Trond, David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote: > > On Apr 25, 2013, at 2:19 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote: > > > On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote: > >> On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote: > >>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote: > >>>> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote: > >>>> > >>>>> My position is that we simply have no idea what order of magnitude even > >>>>> delay should be. And that in such a situation exponential backoff such > >>>>> as implemented in the synchronous case seems the reasonable default as > >>>>> it guarantees at worst doubling the delay while still bounding the > >>>>> long-term average frequency of retries. > >>>> > >>>> So we start with a 15 second delay, and then go to 60 seconds? > >>> > >>> I agree that a server should normally be doing the wait on its own if > >>> the wait would be on the order of an rpc round trip. > >>> > >>> So I'd be inclined to start with a delay that was an order of magnitude > >>> or two more than a round trip. > >>> > >>> And I'd expect NFS isn't common on networks with 1-second latencies. > >>> > >>> So the 1/10 second we're using in the synchronous case sounds closer to > >>> the right ballpark to me. > >> > >> OK, then. Now all I need is actual motivation for changing the existing > >> code other than handwaving arguments about "polling is better than flat > >> waits". > >> What actual use cases are impacting us now, other than the AIX design > >> decision to force CLOSE to retry at least once before succeeding? > > > > Nah, I've got nothing, and I agree that the AIX problem is there bug. > > > > Just for fun I looked at re-checked the Linux server cases. As far as I > > can tell they are: > > > > - delegations: returned immediately on detection of any > > conflict. The current behavior in the sync case looks > > reasonable to me. > > - allocation failures: not really sure it's the best error, but > > it seems to be all the protocol offers. We probably don't > > care much what the client does in this case. > > - some rare cases that would probably indicate bugs (e.g., > > attempting to destroy a client while other rpc's from that > > client are running.) Again we don't care what the client does > > here. > > - the 4.1 slot-inuse case. > > > > We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK, > > ENOMEM) to delay. I thought I remembered one of those being used by > > some HFS system, but can't actually find an example now. A quick grep > > doesn't show anything interesting. > > It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server. I thought they'd decided they'll be forced to find a different way to do that? (The issue being that it only works if you're using 4.1, and if the session state itself isn't part of the state to be transferred. Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY is seqid-modifying.) --b. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 18:46 ` bfields @ 2013-04-25 18:51 ` Chuck Lever 2013-04-25 18:57 ` bfields 2013-04-25 18:52 ` Myklebust, Trond 1 sibling, 1 reply; 21+ messages in thread From: Chuck Lever @ 2013-04-25 18:51 UTC (permalink / raw) To: bfields@fieldses.org Cc: Myklebust, Trond, David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Apr 25, 2013, at 2:46 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote: > On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote: >> >> On Apr 25, 2013, at 2:19 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote: >> >>> On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote: >>>> On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote: >>>>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote: >>>>>> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote: >>>>>> >>>>>>> My position is that we simply have no idea what order of magnitude even >>>>>>> delay should be. And that in such a situation exponential backoff such >>>>>>> as implemented in the synchronous case seems the reasonable default as >>>>>>> it guarantees at worst doubling the delay while still bounding the >>>>>>> long-term average frequency of retries. >>>>>> >>>>>> So we start with a 15 second delay, and then go to 60 seconds? >>>>> >>>>> I agree that a server should normally be doing the wait on its own if >>>>> the wait would be on the order of an rpc round trip. >>>>> >>>>> So I'd be inclined to start with a delay that was an order of magnitude >>>>> or two more than a round trip. >>>>> >>>>> And I'd expect NFS isn't common on networks with 1-second latencies. >>>>> >>>>> So the 1/10 second we're using in the synchronous case sounds closer to >>>>> the right ballpark to me. >>>> >>>> OK, then. Now all I need is actual motivation for changing the existing >>>> code other than handwaving arguments about "polling is better than flat >>>> waits". >>>> What actual use cases are impacting us now, other than the AIX design >>>> decision to force CLOSE to retry at least once before succeeding? >>> >>> Nah, I've got nothing, and I agree that the AIX problem is there bug. >>> >>> Just for fun I looked at re-checked the Linux server cases. As far as I >>> can tell they are: >>> >>> - delegations: returned immediately on detection of any >>> conflict. The current behavior in the sync case looks >>> reasonable to me. >>> - allocation failures: not really sure it's the best error, but >>> it seems to be all the protocol offers. We probably don't >>> care much what the client does in this case. >>> - some rare cases that would probably indicate bugs (e.g., >>> attempting to destroy a client while other rpc's from that >>> client are running.) Again we don't care what the client does >>> here. >>> - the 4.1 slot-inuse case. >>> >>> We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK, >>> ENOMEM) to delay. I thought I remembered one of those being used by >>> some HFS system, but can't actually find an example now. A quick grep >>> doesn't show anything interesting. >> >> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server. > > I thought they'd decided they'll be forced to find a different way to do > that? > > (The issue being that it only works if you're using 4.1, and if the > session state itself isn't part of the state to be transferred. > Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY > is seqid-modifying.) The answer is not to return NFS4ERR_DELAY on seqid-modifying operations. The source server can return NFS4ERR_DELAY to the client's migration recovery operations (the GETATTR(fs_locations) request) for example. Or, the server could return it on the initial PUTFH operation in a compound containing seqid-modifying operations. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 18:51 ` Chuck Lever @ 2013-04-25 18:57 ` bfields 0 siblings, 0 replies; 21+ messages in thread From: bfields @ 2013-04-25 18:57 UTC (permalink / raw) To: Chuck Lever Cc: Myklebust, Trond, David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, Apr 25, 2013 at 02:51:20PM -0400, Chuck Lever wrote: > > On Apr 25, 2013, at 2:46 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote: > > > On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote: > >> > >> On Apr 25, 2013, at 2:19 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote: > >> > >>> On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote: > >>>> On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote: > >>>>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote: > >>>>>> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote: > >>>>>> > >>>>>>> My position is that we simply have no idea what order of magnitude even > >>>>>>> delay should be. And that in such a situation exponential backoff such > >>>>>>> as implemented in the synchronous case seems the reasonable default as > >>>>>>> it guarantees at worst doubling the delay while still bounding the > >>>>>>> long-term average frequency of retries. > >>>>>> > >>>>>> So we start with a 15 second delay, and then go to 60 seconds? > >>>>> > >>>>> I agree that a server should normally be doing the wait on its own if > >>>>> the wait would be on the order of an rpc round trip. > >>>>> > >>>>> So I'd be inclined to start with a delay that was an order of magnitude > >>>>> or two more than a round trip. > >>>>> > >>>>> And I'd expect NFS isn't common on networks with 1-second latencies. > >>>>> > >>>>> So the 1/10 second we're using in the synchronous case sounds closer to > >>>>> the right ballpark to me. > >>>> > >>>> OK, then. Now all I need is actual motivation for changing the existing > >>>> code other than handwaving arguments about "polling is better than flat > >>>> waits". > >>>> What actual use cases are impacting us now, other than the AIX design > >>>> decision to force CLOSE to retry at least once before succeeding? > >>> > >>> Nah, I've got nothing, and I agree that the AIX problem is there bug. > >>> > >>> Just for fun I looked at re-checked the Linux server cases. As far as I > >>> can tell they are: > >>> > >>> - delegations: returned immediately on detection of any > >>> conflict. The current behavior in the sync case looks > >>> reasonable to me. > >>> - allocation failures: not really sure it's the best error, but > >>> it seems to be all the protocol offers. We probably don't > >>> care much what the client does in this case. > >>> - some rare cases that would probably indicate bugs (e.g., > >>> attempting to destroy a client while other rpc's from that > >>> client are running.) Again we don't care what the client does > >>> here. > >>> - the 4.1 slot-inuse case. > >>> > >>> We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK, > >>> ENOMEM) to delay. I thought I remembered one of those being used by > >>> some HFS system, but can't actually find an example now. A quick grep > >>> doesn't show anything interesting. > >> > >> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server. > > > > I thought they'd decided they'll be forced to find a different way to do > > that? > > > > (The issue being that it only works if you're using 4.1, and if the > > session state itself isn't part of the state to be transferred. > > Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY > > is seqid-modifying.) > > The answer is not to return NFS4ERR_DELAY on seqid-modifying operations. > > The source server can return NFS4ERR_DELAY to the client's migration recovery operations (the GETATTR(fs_locations) request) for example. > > Or, the server could return it on the initial PUTFH operation in a compound containing seqid-modifying operations. Oh, right, I'd forgotten that approach.... --b. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 18:46 ` bfields 2013-04-25 18:51 ` Chuck Lever @ 2013-04-25 18:52 ` Myklebust, Trond 1 sibling, 0 replies; 21+ messages in thread From: Myklebust, Trond @ 2013-04-25 18:52 UTC (permalink / raw) To: bfields@fieldses.org Cc: Chuck Lever, Myklebust, Trond, David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Apr 25, 2013, at 2:46 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote: > On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote: >> >> On Apr 25, 2013, at 2:19 PM, "bfields@fieldses.org" <bfields@fieldses.org> wrote: >> >>> On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote: >>>> On Thu, 2013-04-25 at 09:49 -0400, bfields@fieldses.org wrote: >>>>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote: >>>>>> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote: >>>>>> >>>>>>> My position is that we simply have no idea what order of magnitude even >>>>>>> delay should be. And that in such a situation exponential backoff such >>>>>>> as implemented in the synchronous case seems the reasonable default as >>>>>>> it guarantees at worst doubling the delay while still bounding the >>>>>>> long-term average frequency of retries. >>>>>> >>>>>> So we start with a 15 second delay, and then go to 60 seconds? >>>>> >>>>> I agree that a server should normally be doing the wait on its own if >>>>> the wait would be on the order of an rpc round trip. >>>>> >>>>> So I'd be inclined to start with a delay that was an order of magnitude >>>>> or two more than a round trip. >>>>> >>>>> And I'd expect NFS isn't common on networks with 1-second latencies. >>>>> >>>>> So the 1/10 second we're using in the synchronous case sounds closer to >>>>> the right ballpark to me. >>>> >>>> OK, then. Now all I need is actual motivation for changing the existing >>>> code other than handwaving arguments about "polling is better than flat >>>> waits". >>>> What actual use cases are impacting us now, other than the AIX design >>>> decision to force CLOSE to retry at least once before succeeding? >>> >>> Nah, I've got nothing, and I agree that the AIX problem is there bug. >>> >>> Just for fun I looked at re-checked the Linux server cases. As far as I >>> can tell they are: >>> >>> - delegations: returned immediately on detection of any >>> conflict. The current behavior in the sync case looks >>> reasonable to me. >>> - allocation failures: not really sure it's the best error, but >>> it seems to be all the protocol offers. We probably don't >>> care much what the client does in this case. >>> - some rare cases that would probably indicate bugs (e.g., >>> attempting to destroy a client while other rpc's from that >>> client are running.) Again we don't care what the client does >>> here. >>> - the 4.1 slot-inuse case. >>> >>> We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK, >>> ENOMEM) to delay. I thought I remembered one of those being used by >>> some HFS system, but can't actually find an example now. A quick grep >>> doesn't show anything interesting. >> >> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server. > > I thought they'd decided they'll be forced to find a different way to do > that? > > (The issue being that it only works if you're using 4.1, and if the > session state itself isn't part of the state to be transferred. > Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY > is seqid-modifying.) Either way, migration is not a performance-critical path that needs 1second or less response times on those NFS4ERR_DELAY replies. Trond ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY 2013-04-25 13:49 ` bfields 2013-04-25 14:10 ` Myklebust, Trond @ 2013-04-25 14:51 ` Chuck Lever 1 sibling, 0 replies; 21+ messages in thread From: Chuck Lever @ 2013-04-25 14:51 UTC (permalink / raw) To: bfields Cc: Myklebust, Trond, David Wysochanski, Dave Chiluk, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Apr 25, 2013, at 9:49 AM, bfields@fieldses.org wrote: > On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote: >> On Thu, 2013-04-25 at 09:29 -0400, bfields@fieldses.org wrote: >> >>> My position is that we simply have no idea what order of magnitude even >>> delay should be. And that in such a situation exponential backoff such >>> as implemented in the synchronous case seems the reasonable default as >>> it guarantees at worst doubling the delay while still bounding the >>> long-term average frequency of retries. >> >> So we start with a 15 second delay, and then go to 60 seconds? > > I agree that a server should normally be doing the wait on its own if > the wait would be on the order of an rpc round trip. > > So I'd be inclined to start with a delay that was an order of magnitude > or two more than a round trip. > > And I'd expect NFS isn't common on networks with 1-second latencies. > > So the 1/10 second we're using in the synchronous case sounds closer to > the right ballpark to me. The RPC layer already keeps RPC round trip statistics, so the client doesn't have to guess with a "one size fits all" number. I'm all for keeping client recovery time short. But after following this argument, I think 10xRTT is crazy short. Aggressive retransmits can lead to data corruption, and RTT on a fast server is going to be on the order of a millisecond. And what about RDMA, where RTT is about 20usecs? A better answer might be to start at one second then exponentially back off to the minimum of 0.25x the lease time and 0.25x the RPC retransmit time out. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2013-04-25 18:57 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-24 20:55 [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY Dave Chiluk 2013-04-24 21:11 ` J. Bruce Fields 2013-04-24 21:28 ` Myklebust, Trond 2013-04-24 21:54 ` Dave Chiluk 2013-04-24 22:35 ` Myklebust, Trond 2013-04-25 12:19 ` David Wysochanski 2013-04-25 13:19 ` Myklebust, Trond 2013-04-25 13:29 ` bfields 2013-04-25 13:30 ` Myklebust, Trond 2013-04-25 13:49 ` bfields 2013-04-25 14:10 ` Myklebust, Trond 2013-04-25 15:28 ` [PATCH] NFSv4: Use exponential backoff delay for Ni Matt W. Benjamin 2013-04-25 15:42 ` Myklebust, Trond 2013-04-25 15:42 ` Myklebust, Trond 2013-04-25 18:19 ` [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY bfields 2013-04-25 18:40 ` Chuck Lever 2013-04-25 18:46 ` bfields 2013-04-25 18:51 ` Chuck Lever 2013-04-25 18:57 ` bfields 2013-04-25 18:52 ` Myklebust, Trond 2013-04-25 14:51 ` Chuck Lever
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.