* NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. @ 2010-10-20 7:17 Neil Brown 2010-10-20 14:29 ` Chuck Lever 2010-10-20 17:55 ` Jeff Layton 0 siblings, 2 replies; 20+ messages in thread From: Neil Brown @ 2010-10-20 7:17 UTC (permalink / raw) To: Linux NFS Mailing List If I don't have any network configured (except loop-back), and try an NFSv3 mount, then it fails quickly: .... mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable mount.nfs: Network is unreachable If I try the same thing with a NFSv4 mount, it times out before it fails, making a much longer delay. This is because mount.nfs doesn't do a portmap lookup but just leaves everything to the kernel. The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN. So at least it doesn't retry after the timeout. But given that we have a clear error, we shouldn't timeout at all. Unfortunately I cannot see an easy way to fix this. The place where ENETUNREACH is in xs_tcp_setup_socket. The comment there says "Retry with the same socket after a delay". The "delay" bit is correct, the "retry" isn't. It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN was set. However we cannot see the task at this point - in fact it seems that there could be a queue of tasks waiting on this connection. I guess some could be soft, and some not. ??? So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to fail immediately when RPC_TASK_SOFTCONN is set ??? This affects people who upgrade from openSUSE11.2 (which didn't support v4 mounts) to openSUSE11.3 (which defaults to v4) and who use network-manager (which configures networks late) and have NFS mounts in /etc/fstab with either explicit IP addresses or host names that can be resolved without the network. This config will work because when the network comes up, network-manager will re-run the 'init.d/nfs' script. However since 11.3 there is an unpleasant pause before boot completes. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-20 7:17 NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts Neil Brown @ 2010-10-20 14:29 ` Chuck Lever 2010-10-20 21:29 ` Neil Brown 2010-10-20 17:55 ` Jeff Layton 1 sibling, 1 reply; 20+ messages in thread From: Chuck Lever @ 2010-10-20 14:29 UTC (permalink / raw) To: Neil Brown; +Cc: Linux NFS Mailing List On Oct 20, 2010, at 3:17 AM, Neil Brown wrote: > > > If I don't have any network configured (except loop-back), and try an NFSv3 > mount, then it fails quickly: > > > .... > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable > mount.nfs: Network is unreachable > > > If I try the same thing with a NFSv4 mount, it times out before it fails, > making a much longer delay. > > This is because mount.nfs doesn't do a portmap lookup but just leaves > everything to the kernel. > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN. > So at least it doesn't retry after the timeout. But given that we have a > clear error, we shouldn't timeout at all. > > Unfortunately I cannot see an easy way to fix this. > > The place where ENETUNREACH is in xs_tcp_setup_socket. The comment there > says "Retry with the same socket after a delay". The "delay" bit is correct, > the "retry" isn't. > > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN > was set. However we cannot see the task at this point - in fact it seems > that there could be a queue of tasks waiting on this connection. I guess > some could be soft, and some not. ??? > > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to > fail immediately when RPC_TASK_SOFTCONN is set ??? ECONNREFUSED should already fail immediately in this case. If it's not failing immediately, that's a bug. I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set. (I thought it already worked this way, but maybe I'm mistaken). > This affects people who upgrade from openSUSE11.2 (which didn't support v4 > mounts) to openSUSE11.3 (which defaults to v4) and who use network-manager > (which configures networks late) and have NFS mounts in /etc/fstab with > either explicit IP addresses or host names that can be resolved without the > network. > This config will work because when the network comes up, network-manager will > re-run the 'init.d/nfs' script. However since 11.3 there is an unpleasant > pause before boot completes. -- chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-20 14:29 ` Chuck Lever @ 2010-10-20 21:29 ` Neil Brown 2010-10-21 0:56 ` Neil Brown 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2010-10-20 21:29 UTC (permalink / raw) To: Chuck Lever; +Cc: Linux NFS Mailing List On Wed, 20 Oct 2010 10:29:05 -0400 Chuck Lever <chuck.lever@ORACLE.COM> wrote: > > On Oct 20, 2010, at 3:17 AM, Neil Brown wrote: > > > > > > > If I don't have any network configured (except loop-back), and try an NFSv3 > > mount, then it fails quickly: > > > > > > .... > > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable > > mount.nfs: Network is unreachable > > > > > > If I try the same thing with a NFSv4 mount, it times out before it fails, > > making a much longer delay. > > > > This is because mount.nfs doesn't do a portmap lookup but just leaves > > everything to the kernel. > > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN. > > So at least it doesn't retry after the timeout. But given that we have a > > clear error, we shouldn't timeout at all. > > > > Unfortunately I cannot see an easy way to fix this. > > > > The place where ENETUNREACH is in xs_tcp_setup_socket. The comment there > > says "Retry with the same socket after a delay". The "delay" bit is correct, > > the "retry" isn't. > > > > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN > > was set. However we cannot see the task at this point - in fact it seems > > that there could be a queue of tasks waiting on this connection. I guess > > some could be soft, and some not. ??? > > > > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to > > fail immediately when RPC_TASK_SOFTCONN is set ??? > > ECONNREFUSED should already fail immediately in this case. If it's not failing immediately, that's a bug. > > I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set. (I thought it already worked this way, but maybe I'm mistaken). There is certainly code that seems to treat ENETUNREACH differently if RPC_TASK_SOFTCONN is set, but it doesn't seem to apply in the particular case I am testing. e.g. call_bind_status handles ENETUNREACH as a retry if not SOFTCONN and as a failure in the SOFTCONN case. I guess NFSv4 doesn't hit this because the port is explicitly set to 2049 so it never does the rpcbind step. So maybe we need to handle ENETUNREACH in call_connect_status as well as call_bind_status ?? Maybe something like that ... The placement of rpc_delay seems a little of to me, but follows call_bind_status, so it could be correct. ?? I haven't thought how EHOSTUNREACH fits into this... presumably it should fail-quickly when SOFTCONN (which Jeff suggests it does) and should retry for not SOFTCONN (which I haven't checked). NeilBrown diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index fa55490..539885e 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1245,6 +1245,12 @@ call_connect_status(struct rpc_task *task) } switch (status) { + case -ENETUNREACH: + case -ECONNRESET: + case -ECONNREFUSED: + if (!RPC_IS_SOFTCONN(task)) + rpc_delay(task, 5*HZ); + /* fall through */ /* if soft mounted, test if we've timed out */ case -ETIMEDOUT: task->tk_action = call_timeout; diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fe9306b..0743994 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1906,7 +1906,8 @@ static void xs_tcp_setup_socket(struct rpc_xprt *xprt, case -ECONNREFUSED: case -ECONNRESET: case -ENETUNREACH: - /* retry with existing socket, after a delay */ + /* allow upper layers to choose between failure and retry */ + goto out; case 0: case -EINPROGRESS: case -EALREADY: ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-20 21:29 ` Neil Brown @ 2010-10-21 0:56 ` Neil Brown 2010-10-21 12:09 ` Jeff Layton 2010-10-21 14:10 ` Chuck Lever 0 siblings, 2 replies; 20+ messages in thread From: Neil Brown @ 2010-10-21 0:56 UTC (permalink / raw) To: Chuck Lever; +Cc: Linux NFS Mailing List On Thu, 21 Oct 2010 08:29:38 +1100 Neil Brown <neilb@suse.de> wrote: > On Wed, 20 Oct 2010 10:29:05 -0400 > Chuck Lever <chuck.lever@ORACLE.COM> wrote: > > > > > On Oct 20, 2010, at 3:17 AM, Neil Brown wrote: > > > > > > > > > > > If I don't have any network configured (except loop-back), and try an NFSv3 > > > mount, then it fails quickly: > > > > > > > > > .... > > > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable > > > mount.nfs: Network is unreachable > > > > > > > > > If I try the same thing with a NFSv4 mount, it times out before it fails, > > > making a much longer delay. > > > > > > This is because mount.nfs doesn't do a portmap lookup but just leaves > > > everything to the kernel. > > > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN. > > > So at least it doesn't retry after the timeout. But given that we have a > > > clear error, we shouldn't timeout at all. > > > > > > Unfortunately I cannot see an easy way to fix this. > > > > > > The place where ENETUNREACH is in xs_tcp_setup_socket. The comment there > > > says "Retry with the same socket after a delay". The "delay" bit is correct, > > > the "retry" isn't. > > > > > > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN > > > was set. However we cannot see the task at this point - in fact it seems > > > that there could be a queue of tasks waiting on this connection. I guess > > > some could be soft, and some not. ??? > > > > > > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to > > > fail immediately when RPC_TASK_SOFTCONN is set ??? > > > > ECONNREFUSED should already fail immediately in this case. If it's not failing immediately, that's a bug. > > > > I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set. (I thought it already worked this way, but maybe I'm mistaken). > > There is certainly code that seems to treat ENETUNREACH differently if > RPC_TASK_SOFTCONN is set, but it doesn't seem to apply in the particular case > I am testing. > e.g. call_bind_status handles ENETUNREACH as a retry if not SOFTCONN and as a > failure in the SOFTCONN case. > I guess NFSv4 doesn't hit this because the port is explicitly set to 2049 so > it never does the rpcbind step. > So maybe we need to handle ENETUNREACH in call_connect_status as well as > call_bind_status ?? > > Maybe something like that ... The placement of rpc_delay seems a little of > to me, but follows call_bind_status, so it could be correct. > I did a bit of testing of the patch that I sent and it isn't quite write - the ENETUNREACH doesn't propagate all the way up to call_connect_status. This patch fixes that. With it, the rpc_ping fails nicely, but when a reconnect is tried on an already-mounted filesystem it doesn't fail but rather retries every 5 seconds. This is what I wanted to happen. However I'm not at all sure that "5 seconds" is correct. I copied it from call_bind_status, but it seems a bit short. Maybe the number in call_bind_status is a bit low??? Here is my current patch - which is more a starting point for discussion than a concrete proposal. Thanks, NeilBrown diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index fa55490..539885e 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1245,6 +1245,12 @@ call_connect_status(struct rpc_task *task) } switch (status) { + case -ENETUNREACH: + case -ECONNRESET: + case -ECONNREFUSED: + if (!RPC_IS_SOFTCONN(task)) + rpc_delay(task, 5*HZ); + /* fall through */ /* if soft mounted, test if we've timed out */ case -ETIMEDOUT: task->tk_action = call_timeout; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 970fb00..27673d9 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -733,6 +733,10 @@ static void xprt_connect_status(struct rpc_task *task) } switch (task->tk_status) { + case -ENETUNREACH: + case -ECONNREFUSED: + case -ECONNRESET: + break; case -EAGAIN: dprintk("RPC: %5u xprt_connect_status: retrying\n", task->tk_pid); break; diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fe9306b..0743994 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1906,7 +1906,8 @@ static void xs_tcp_setup_socket(struct rpc_xprt *xprt, case -ECONNREFUSED: case -ECONNRESET: case -ENETUNREACH: - /* retry with existing socket, after a delay */ + /* allow upper layers to choose between failure and retry */ + goto out; case 0: case -EINPROGRESS: case -EALREADY: ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 0:56 ` Neil Brown @ 2010-10-21 12:09 ` Jeff Layton 2010-10-21 13:52 ` Chuck Lever 2010-10-21 14:10 ` Chuck Lever 1 sibling, 1 reply; 20+ messages in thread From: Jeff Layton @ 2010-10-21 12:09 UTC (permalink / raw) To: Neil Brown; +Cc: Chuck Lever, Linux NFS Mailing List On Thu, 21 Oct 2010 11:56:22 +1100 Neil Brown <neilb@suse.de> wrote: > On Thu, 21 Oct 2010 08:29:38 +1100 > Neil Brown <neilb@suse.de> wrote: > > > On Wed, 20 Oct 2010 10:29:05 -0400 > > Chuck Lever <chuck.lever@ORACLE.COM> wrote: > > > > > > > > On Oct 20, 2010, at 3:17 AM, Neil Brown wrote: > > > > > > > > > > > > > > > If I don't have any network configured (except loop-back), and try an NFSv3 > > > > mount, then it fails quickly: > > > > > > > > > > > > .... > > > > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable > > > > mount.nfs: Network is unreachable > > > > > > > > > > > > If I try the same thing with a NFSv4 mount, it times out before it fails, > > > > making a much longer delay. > > > > > > > > This is because mount.nfs doesn't do a portmap lookup but just leaves > > > > everything to the kernel. > > > > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN. > > > > So at least it doesn't retry after the timeout. But given that we have a > > > > clear error, we shouldn't timeout at all. > > > > > > > > Unfortunately I cannot see an easy way to fix this. > > > > > > > > The place where ENETUNREACH is in xs_tcp_setup_socket. The comment there > > > > says "Retry with the same socket after a delay". The "delay" bit is correct, > > > > the "retry" isn't. > > > > > > > > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN > > > > was set. However we cannot see the task at this point - in fact it seems > > > > that there could be a queue of tasks waiting on this connection. I guess > > > > some could be soft, and some not. ??? > > > > > > > > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to > > > > fail immediately when RPC_TASK_SOFTCONN is set ??? > > > > > > ECONNREFUSED should already fail immediately in this case. If it's not failing immediately, that's a bug. > > > > > > I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set. (I thought it already worked this way, but maybe I'm mistaken). > > > > There is certainly code that seems to treat ENETUNREACH differently if > > RPC_TASK_SOFTCONN is set, but it doesn't seem to apply in the particular case > > I am testing. > > e.g. call_bind_status handles ENETUNREACH as a retry if not SOFTCONN and as a > > failure in the SOFTCONN case. > > I guess NFSv4 doesn't hit this because the port is explicitly set to 2049 so > > it never does the rpcbind step. > > So maybe we need to handle ENETUNREACH in call_connect_status as well as > > call_bind_status ?? > > > > Maybe something like that ... The placement of rpc_delay seems a little of > > to me, but follows call_bind_status, so it could be correct. > > > > I did a bit of testing of the patch that I sent and it isn't quite write - > the ENETUNREACH doesn't propagate all the way up to call_connect_status. > This patch fixes that. > > With it, the rpc_ping fails nicely, but when a reconnect is tried on an > already-mounted filesystem it doesn't fail but rather retries every 5 seconds. > This is what I wanted to happen. > > However I'm not at all sure that "5 seconds" is correct. I copied it from > call_bind_status, but it seems a bit short. Maybe the number in > call_bind_status is a bit low??? > > Here is my current patch - which is more a starting point for discussion than > a concrete proposal. > > Thanks, > NeilBrown > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index fa55490..539885e 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -1245,6 +1245,12 @@ call_connect_status(struct rpc_task *task) > } > > switch (status) { > + case -ENETUNREACH: > + case -ECONNRESET: > + case -ECONNREFUSED: > + if (!RPC_IS_SOFTCONN(task)) > + rpc_delay(task, 5*HZ); > + /* fall through */ Maybe instead of the above, we should do something like: if (RPC_IS_SOFTCONN(task)) { rpc_exit(task, status); return; } ...IOW, if this is a SOFTCONN task, return connect errors immediately. If it's not a SOFTCONN task, treat it as we would a timeout? That'll would probably fix the -ENETUNREACH case, but I'm not sure what to do about the cases that rely on xs_error_report. It seems a little suspicious that those errors all get turned into -EAGAIN. > /* if soft mounted, test if we've timed out */ > case -ETIMEDOUT: > task->tk_action = call_timeout; > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 970fb00..27673d9 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -733,6 +733,10 @@ static void xprt_connect_status(struct rpc_task *task) > } > > switch (task->tk_status) { > + case -ENETUNREACH: > + case -ECONNREFUSED: > + case -ECONNRESET: > + break; > case -EAGAIN: > dprintk("RPC: %5u xprt_connect_status: retrying\n", task->tk_pid); > break; > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index fe9306b..0743994 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -1906,7 +1906,8 @@ static void xs_tcp_setup_socket(struct rpc_xprt *xprt, > case -ECONNREFUSED: > case -ECONNRESET: > case -ENETUNREACH: > - /* retry with existing socket, after a delay */ > + /* allow upper layers to choose between failure and retry */ > + goto out; > case 0: > case -EINPROGRESS: > case -EALREADY: > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 12:09 ` Jeff Layton @ 2010-10-21 13:52 ` Chuck Lever 0 siblings, 0 replies; 20+ messages in thread From: Chuck Lever @ 2010-10-21 13:52 UTC (permalink / raw) To: Jeff Layton; +Cc: Neil Brown, Linux NFS Mailing List On Oct 21, 2010, at 8:09 AM, Jeff Layton wrote: > On Thu, 21 Oct 2010 11:56:22 +1100 > Neil Brown <neilb@suse.de> wrote: > >> On Thu, 21 Oct 2010 08:29:38 +1100 >> Neil Brown <neilb@suse.de> wrote: >> >>> On Wed, 20 Oct 2010 10:29:05 -0400 >>> Chuck Lever <chuck.lever@ORACLE.COM> wrote: >>> >>>> >>>> On Oct 20, 2010, at 3:17 AM, Neil Brown wrote: >>>> >>>>> >>>>> >>>>> If I don't have any network configured (except loop-back), and try an NFSv3 >>>>> mount, then it fails quickly: >>>>> >>>>> >>>>> .... >>>>> mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable >>>>> mount.nfs: Network is unreachable >>>>> >>>>> >>>>> If I try the same thing with a NFSv4 mount, it times out before it fails, >>>>> making a much longer delay. >>>>> >>>>> This is because mount.nfs doesn't do a portmap lookup but just leaves >>>>> everything to the kernel. >>>>> The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN. >>>>> So at least it doesn't retry after the timeout. But given that we have a >>>>> clear error, we shouldn't timeout at all. >>>>> >>>>> Unfortunately I cannot see an easy way to fix this. >>>>> >>>>> The place where ENETUNREACH is in xs_tcp_setup_socket. The comment there >>>>> says "Retry with the same socket after a delay". The "delay" bit is correct, >>>>> the "retry" isn't. >>>>> >>>>> It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN >>>>> was set. However we cannot see the task at this point - in fact it seems >>>>> that there could be a queue of tasks waiting on this connection. I guess >>>>> some could be soft, and some not. ??? >>>>> >>>>> So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to >>>>> fail immediately when RPC_TASK_SOFTCONN is set ??? >>>> >>>> ECONNREFUSED should already fail immediately in this case. If it's not failing immediately, that's a bug. >>>> >>>> I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set. (I thought it already worked this way, but maybe I'm mistaken). >>> >>> There is certainly code that seems to treat ENETUNREACH differently if >>> RPC_TASK_SOFTCONN is set, but it doesn't seem to apply in the particular case >>> I am testing. >>> e.g. call_bind_status handles ENETUNREACH as a retry if not SOFTCONN and as a >>> failure in the SOFTCONN case. >>> I guess NFSv4 doesn't hit this because the port is explicitly set to 2049 so >>> it never does the rpcbind step. >>> So maybe we need to handle ENETUNREACH in call_connect_status as well as >>> call_bind_status ?? >>> >>> Maybe something like that ... The placement of rpc_delay seems a little of >>> to me, but follows call_bind_status, so it could be correct. >>> >> >> I did a bit of testing of the patch that I sent and it isn't quite write - >> the ENETUNREACH doesn't propagate all the way up to call_connect_status. >> This patch fixes that. >> >> With it, the rpc_ping fails nicely, but when a reconnect is tried on an >> already-mounted filesystem it doesn't fail but rather retries every 5 seconds. >> This is what I wanted to happen. >> >> However I'm not at all sure that "5 seconds" is correct. I copied it from >> call_bind_status, but it seems a bit short. Maybe the number in >> call_bind_status is a bit low??? >> >> Here is my current patch - which is more a starting point for discussion than >> a concrete proposal. >> >> Thanks, >> NeilBrown >> >> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c >> index fa55490..539885e 100644 >> --- a/net/sunrpc/clnt.c >> +++ b/net/sunrpc/clnt.c >> @@ -1245,6 +1245,12 @@ call_connect_status(struct rpc_task *task) >> } >> >> switch (status) { >> + case -ENETUNREACH: >> + case -ECONNRESET: >> + case -ECONNREFUSED: >> + if (!RPC_IS_SOFTCONN(task)) >> + rpc_delay(task, 5*HZ); >> + /* fall through */ > > Maybe instead of the above, we should do something like: > > if (RPC_IS_SOFTCONN(task)) { > rpc_exit(task, status); > return; > } > > ...IOW, if this is a SOFTCONN task, return connect errors immediately. > If it's not a SOFTCONN task, treat it as we would a timeout? > > That'll would probably fix the -ENETUNREACH case, but I'm not sure what > to do about the cases that rely on xs_error_report. It seems a little > suspicious that those errors all get turned into -EAGAIN. I think the theory is that all of the transport connect problems should be handled by the lower layers (xprtsock and the kernel's socket code) during the call_transmit step. If something needs to be retried (say, a connection attempt) then the lower layer returns EAGAIN and the finite state machine will retry. One of the issues is how to deal with partially transmitted requests and GSS credentials. Some cases you have to re-encode the request before transmitting again. The finite state machine attempts to deal with that. > >> /* if soft mounted, test if we've timed out */ >> case -ETIMEDOUT: >> task->tk_action = call_timeout; >> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >> index 970fb00..27673d9 100644 >> --- a/net/sunrpc/xprt.c >> +++ b/net/sunrpc/xprt.c >> @@ -733,6 +733,10 @@ static void xprt_connect_status(struct rpc_task *task) >> } >> >> switch (task->tk_status) { >> + case -ENETUNREACH: >> + case -ECONNREFUSED: >> + case -ECONNRESET: >> + break; >> case -EAGAIN: >> dprintk("RPC: %5u xprt_connect_status: retrying\n", task->tk_pid); >> break; >> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >> index fe9306b..0743994 100644 >> --- a/net/sunrpc/xprtsock.c >> +++ b/net/sunrpc/xprtsock.c >> @@ -1906,7 +1906,8 @@ static void xs_tcp_setup_socket(struct rpc_xprt *xprt, >> case -ECONNREFUSED: >> case -ECONNRESET: >> case -ENETUNREACH: >> - /* retry with existing socket, after a delay */ >> + /* allow upper layers to choose between failure and retry */ >> + goto out; >> case 0: >> case -EINPROGRESS: >> case -EALREADY: >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > -- > Jeff Layton <jlayton@redhat.com> -- chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 0:56 ` Neil Brown 2010-10-21 12:09 ` Jeff Layton @ 2010-10-21 14:10 ` Chuck Lever 1 sibling, 0 replies; 20+ messages in thread From: Chuck Lever @ 2010-10-21 14:10 UTC (permalink / raw) To: Neil Brown; +Cc: Linux NFS Mailing List On Oct 20, 2010, at 8:56 PM, Neil Brown wrote: > On Thu, 21 Oct 2010 08:29:38 +1100 > Neil Brown <neilb@suse.de> wrote: > >> On Wed, 20 Oct 2010 10:29:05 -0400 >> Chuck Lever <chuck.lever@ORACLE.COM> wrote: >> >>> >>> On Oct 20, 2010, at 3:17 AM, Neil Brown wrote: >>> >>>> >>>> >>>> If I don't have any network configured (except loop-back), and try an NFSv3 >>>> mount, then it fails quickly: >>>> >>>> >>>> .... >>>> mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable >>>> mount.nfs: Network is unreachable >>>> >>>> >>>> If I try the same thing with a NFSv4 mount, it times out before it fails, >>>> making a much longer delay. >>>> >>>> This is because mount.nfs doesn't do a portmap lookup but just leaves >>>> everything to the kernel. >>>> The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN. >>>> So at least it doesn't retry after the timeout. But given that we have a >>>> clear error, we shouldn't timeout at all. >>>> >>>> Unfortunately I cannot see an easy way to fix this. >>>> >>>> The place where ENETUNREACH is in xs_tcp_setup_socket. The comment there >>>> says "Retry with the same socket after a delay". The "delay" bit is correct, >>>> the "retry" isn't. >>>> >>>> It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN >>>> was set. However we cannot see the task at this point - in fact it seems >>>> that there could be a queue of tasks waiting on this connection. I guess >>>> some could be soft, and some not. ??? >>>> >>>> So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to >>>> fail immediately when RPC_TASK_SOFTCONN is set ??? >>> >>> ECONNREFUSED should already fail immediately in this case. If it's not failing immediately, that's a bug. >>> >>> I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set. (I thought it already worked this way, but maybe I'm mistaken). >> >> There is certainly code that seems to treat ENETUNREACH differently if >> RPC_TASK_SOFTCONN is set, but it doesn't seem to apply in the particular case >> I am testing. >> e.g. call_bind_status handles ENETUNREACH as a retry if not SOFTCONN and as a >> failure in the SOFTCONN case. >> I guess NFSv4 doesn't hit this because the port is explicitly set to 2049 so >> it never does the rpcbind step. >> So maybe we need to handle ENETUNREACH in call_connect_status as well as >> call_bind_status ?? >> >> Maybe something like that ... The placement of rpc_delay seems a little of >> to me, but follows call_bind_status, so it could be correct. >> > > I did a bit of testing of the patch that I sent and it isn't quite write - > the ENETUNREACH doesn't propagate all the way up to call_connect_status. > This patch fixes that. > > With it, the rpc_ping fails nicely, but when a reconnect is tried on an > already-mounted filesystem it doesn't fail but rather retries every 5 seconds. > This is what I wanted to happen. > > However I'm not at all sure that "5 seconds" is correct. I copied it from > call_bind_status, but it seems a bit short. Maybe the number in > call_bind_status is a bit low??? I think that number is arbitrary. Five seconds might be too often to try on a server that is trying to come up, I agree. > Here is my current patch - which is more a starting point for discussion than > a concrete proposal. Sure. I'm sorry I can't be more helpful here, it's been a while. An important thing to remember is that the call_bind step is different than the call_connect step: call_bind can actually perform a whole other RPC under the covers. And: rpcbind doesn't do an initial NULL RPC, it just does the RPCBIND request. So the three places where this SOFTCONN thingie seems to matter is a) during the initial RPC ping, b) during an rpcbind request, and c) when the server is probing the callback parameters the client gave it via SETCLIENTID. We should be sure that all three places are working as expected; ie. the check-in criteria should include more than just "mount". Also, TCP is not the only connection-oriented transport we support here, so we should be careful that we're not doing anything that will break RDMA. > Thanks, > NeilBrown > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index fa55490..539885e 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -1245,6 +1245,12 @@ call_connect_status(struct rpc_task *task) > } > > switch (status) { > + case -ENETUNREACH: > + case -ECONNRESET: > + case -ECONNREFUSED: > + if (!RPC_IS_SOFTCONN(task)) > + rpc_delay(task, 5*HZ); > + /* fall through */ > /* if soft mounted, test if we've timed out */ > case -ETIMEDOUT: > task->tk_action = call_timeout; > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 970fb00..27673d9 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -733,6 +733,10 @@ static void xprt_connect_status(struct rpc_task *task) > } > > switch (task->tk_status) { > + case -ENETUNREACH: > + case -ECONNREFUSED: > + case -ECONNRESET: > + break; I'm not sure why you are including ECONN* here (and above). My impression was that case was working as expected. Can you explain the (non-NETUNREACH) transport connection-related problem you see? Theoretically, the architecture of the connection retry logic means that all three of these should be mapped to EAGAIN by the transports (xprtsock.c, in this specific case). > case -EAGAIN: > dprintk("RPC: %5u xprt_connect_status: retrying\n", task->tk_pid); > break; > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index fe9306b..0743994 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -1906,7 +1906,8 @@ static void xs_tcp_setup_socket(struct rpc_xprt *xprt, > case -ECONNREFUSED: > case -ECONNRESET: > case -ENETUNREACH: > - /* retry with existing socket, after a delay */ > + /* allow upper layers to choose between failure and retry */ > + goto out; > case 0: > case -EINPROGRESS: > case -EALREADY: > -- chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-20 7:17 NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts Neil Brown 2010-10-20 14:29 ` Chuck Lever @ 2010-10-20 17:55 ` Jeff Layton 2010-10-20 19:16 ` Jeff Layton 1 sibling, 1 reply; 20+ messages in thread From: Jeff Layton @ 2010-10-20 17:55 UTC (permalink / raw) To: Neil Brown; +Cc: Linux NFS Mailing List On Wed, 20 Oct 2010 18:17:01 +1100 Neil Brown <neilb@suse.de> wrote: > > > If I don't have any network configured (except loop-back), and try an NFSv3 > mount, then it fails quickly: > > > .... > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable > mount.nfs: Network is unreachable > > > If I try the same thing with a NFSv4 mount, it times out before it fails, > making a much longer delay. > > This is because mount.nfs doesn't do a portmap lookup but just leaves > everything to the kernel. > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN. > So at least it doesn't retry after the timeout. But given that we have a > clear error, we shouldn't timeout at all. > > Unfortunately I cannot see an easy way to fix this. > > The place where ENETUNREACH is in xs_tcp_setup_socket. The comment there > says "Retry with the same socket after a delay". The "delay" bit is correct, > the "retry" isn't. > > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN > was set. However we cannot see the task at this point - in fact it seems > that there could be a queue of tasks waiting on this connection. I guess > some could be soft, and some not. ??? > > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to > fail immediately when RPC_TASK_SOFTCONN is set ??? > > > This affects people who upgrade from openSUSE11.2 (which didn't support v4 > mounts) to openSUSE11.3 (which defaults to v4) and who use network-manager > (which configures networks late) and have NFS mounts in /etc/fstab with > either explicit IP addresses or host names that can be resolved without the > network. > This config will work because when the network comes up, network-manager will > re-run the 'init.d/nfs' script. However since 11.3 there is an unpleasant > pause before boot completes. > Took me a few tries to get an ENETUNREACH error but I see the same hang you do. For the record I was able to get one by not configuring an IPv6 addr on the box and attempting to mount an IPv6 address. Interestingly while I was trying to reproduce it, I ended up reproducing an EHOSTUNREACH error by trying to mount a IPv6 host to which I didn't have a route. That error returns quickly from the kernel. Maybe we can solve this simply by treating ENETUNREACH the same as EHOSTUNREACH in this situation? I'm not quite sure exactly how to make that happen, but it seems like reasonable behavior. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-20 17:55 ` Jeff Layton @ 2010-10-20 19:16 ` Jeff Layton 2010-10-20 20:40 ` Neil Brown 0 siblings, 1 reply; 20+ messages in thread From: Jeff Layton @ 2010-10-20 19:16 UTC (permalink / raw) To: Neil Brown, chuck.lever; +Cc: Linux NFS Mailing List On Wed, 20 Oct 2010 13:55:25 -0400 Jeff Layton <jlayton@redhat.com> wrote: > On Wed, 20 Oct 2010 18:17:01 +1100 > Neil Brown <neilb@suse.de> wrote: > > > > > > > If I don't have any network configured (except loop-back), and try an NFSv3 > > mount, then it fails quickly: > > > > > > .... > > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable > > mount.nfs: Network is unreachable > > > > > > If I try the same thing with a NFSv4 mount, it times out before it fails, > > making a much longer delay. > > > > This is because mount.nfs doesn't do a portmap lookup but just leaves > > everything to the kernel. > > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN. > > So at least it doesn't retry after the timeout. But given that we have a > > clear error, we shouldn't timeout at all. > > > > Unfortunately I cannot see an easy way to fix this. > > > > The place where ENETUNREACH is in xs_tcp_setup_socket. The comment there > > says "Retry with the same socket after a delay". The "delay" bit is correct, > > the "retry" isn't. > > > > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN > > was set. However we cannot see the task at this point - in fact it seems > > that there could be a queue of tasks waiting on this connection. I guess > > some could be soft, and some not. ??? > > > > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to > > fail immediately when RPC_TASK_SOFTCONN is set ??? > > > > > > This affects people who upgrade from openSUSE11.2 (which didn't support v4 > > mounts) to openSUSE11.3 (which defaults to v4) and who use network-manager > > (which configures networks late) and have NFS mounts in /etc/fstab with > > either explicit IP addresses or host names that can be resolved without the > > network. > > This config will work because when the network comes up, network-manager will > > re-run the 'init.d/nfs' script. However since 11.3 there is an unpleasant > > pause before boot completes. > > > > Took me a few tries to get an ENETUNREACH error but I see the same hang > you do. For the record I was able to get one by not configuring an IPv6 > addr on the box and attempting to mount an IPv6 address. > > Interestingly while I was trying to reproduce it, I ended up > reproducing an EHOSTUNREACH error by trying to mount a IPv6 host to > which I didn't have a route. That error returns quickly from the > kernel. Maybe we can solve this simply by treating ENETUNREACH the same > as EHOSTUNREACH in this situation? > > I'm not quite sure exactly how to make that happen, but it seems like > reasonable behavior. > Sigh, nothing's ever easy in the RPC layer. Please bear with my scatterbrained analysis... There's a bit of difference at the socket layer between those two cases. xs_tcp_finish_connecting calls this to connect the socket: kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK); ...in the ENETUNREACH case, this returns immediately with the error. In EHOSTUNREACH case, it returns EINPROGRESS and then the sk_error_report handles the rest. Fine...we can emulate the similar behavior, but... Then what happens is that xs_tcp_send_request gets called again to try to resend the packet. In the EHOSTUNREACH case, that returns EHOSTUNREACH which eventually causes an rpc_exit with that error. In the ENETUNREACH case that returns EPIPE, which makes the state machine move next to call_bind and the whole thing starts over again. I'm still not sure what the right approach is here. The fact that attempting to send on the socket in this case gives us an EPIPE makes it tough to handle this case the same way as EHOSTUNREACH. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-20 19:16 ` Jeff Layton @ 2010-10-20 20:40 ` Neil Brown 2010-10-21 0:45 ` Jeff Layton 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2010-10-20 20:40 UTC (permalink / raw) To: Jeff Layton; +Cc: chuck.lever, Linux NFS Mailing List On Wed, 20 Oct 2010 15:16:57 -0400 Jeff Layton <jlayton@redhat.com> wrote: > On Wed, 20 Oct 2010 13:55:25 -0400 > Jeff Layton <jlayton@redhat.com> wrote: > > > On Wed, 20 Oct 2010 18:17:01 +1100 > > Neil Brown <neilb@suse.de> wrote: > > > > > > > > > > > If I don't have any network configured (except loop-back), and try an NFSv3 > > > mount, then it fails quickly: > > > > > > > > > .... > > > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable > > > mount.nfs: Network is unreachable > > > > > > > > > If I try the same thing with a NFSv4 mount, it times out before it fails, > > > making a much longer delay. > > > > > > This is because mount.nfs doesn't do a portmap lookup but just leaves > > > everything to the kernel. > > > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN. > > > So at least it doesn't retry after the timeout. But given that we have a > > > clear error, we shouldn't timeout at all. > > > > > > Unfortunately I cannot see an easy way to fix this. > > > > > > The place where ENETUNREACH is in xs_tcp_setup_socket. The comment there > > > says "Retry with the same socket after a delay". The "delay" bit is correct, > > > the "retry" isn't. > > > > > > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN > > > was set. However we cannot see the task at this point - in fact it seems > > > that there could be a queue of tasks waiting on this connection. I guess > > > some could be soft, and some not. ??? > > > > > > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to > > > fail immediately when RPC_TASK_SOFTCONN is set ??? > > > > > > > > > This affects people who upgrade from openSUSE11.2 (which didn't support v4 > > > mounts) to openSUSE11.3 (which defaults to v4) and who use network-manager > > > (which configures networks late) and have NFS mounts in /etc/fstab with > > > either explicit IP addresses or host names that can be resolved without the > > > network. > > > This config will work because when the network comes up, network-manager will > > > re-run the 'init.d/nfs' script. However since 11.3 there is an unpleasant > > > pause before boot completes. > > > > > > > Took me a few tries to get an ENETUNREACH error but I see the same hang > > you do. For the record I was able to get one by not configuring an IPv6 > > addr on the box and attempting to mount an IPv6 address. > > > > Interestingly while I was trying to reproduce it, I ended up > > reproducing an EHOSTUNREACH error by trying to mount a IPv6 host to > > which I didn't have a route. That error returns quickly from the > > kernel. Maybe we can solve this simply by treating ENETUNREACH the same > > as EHOSTUNREACH in this situation? > > > > I'm not quite sure exactly how to make that happen, but it seems like > > reasonable behavior. > > > > Sigh, nothing's ever easy in the RPC layer. Please bear with my > scatterbrained analysis... > > There's a bit of difference at the socket layer between those two cases. > xs_tcp_finish_connecting calls this to connect the socket: > > kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK); > > ...in the ENETUNREACH case, this returns immediately with the error. In > EHOSTUNREACH case, it returns EINPROGRESS and then the sk_error_report > handles the rest. Fine...we can emulate the similar behavior, but... That all seems to make sense and helps complete the picture, however ... > > Then what happens is that xs_tcp_send_request gets called again to try > to resend the packet. In the EHOSTUNREACH case, that returns > EHOSTUNREACH which eventually causes an rpc_exit with that error. In > the ENETUNREACH case that returns EPIPE, which makes the state machine > move next to call_bind and the whole thing starts over again. This confuses me. Why would xs_tcp_send_request (aka ->send_request) get called before the connect has succeeded? Can you make sense of that? Thanks, NeilBrown > > I'm still not sure what the right approach is here. The fact that > attempting to send on the socket in this case gives us an EPIPE makes > it tough to handle this case the same way as EHOSTUNREACH. > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-20 20:40 ` Neil Brown @ 2010-10-21 0:45 ` Jeff Layton 2010-10-21 3:25 ` Neil Brown 0 siblings, 1 reply; 20+ messages in thread From: Jeff Layton @ 2010-10-21 0:45 UTC (permalink / raw) To: Neil Brown; +Cc: chuck.lever, Linux NFS Mailing List On Thu, 21 Oct 2010 07:40:28 +1100 Neil Brown <neilb@suse.de> wrote: > > > > Then what happens is that xs_tcp_send_request gets called again to try > > to resend the packet. In the EHOSTUNREACH case, that returns > > EHOSTUNREACH which eventually causes an rpc_exit with that error. In > > the ENETUNREACH case that returns EPIPE, which makes the state machine > > move next to call_bind and the whole thing starts over again. > > This confuses me. Why would xs_tcp_send_request (aka ->send_request) get > called before the connect has succeeded? Can you make sense of that? > It confuses me too. I suspect that this may actually be a bug... So EINPROGRESS makes the connect_worker task clear the connecting bit and return. Eventually, the EHOSTUNREACH error is reported to xs_error_report. That function does this: xprt_wake_pending_tasks(xprt, -EAGAIN); The task that was waiting on the connect_worker is then woken up. call_connect_status does this: if (status >= 0 || status == -EAGAIN) { clnt->cl_stats->netreconn++; task->tk_action = call_transmit; return; } ...and we end up in call_transmit without the socket being connected. So I understand how this happened, but I don't really understand the design of the connect mechanism well enough to know whether this is by design or not. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 0:45 ` Jeff Layton @ 2010-10-21 3:25 ` Neil Brown 2010-10-21 14:05 ` Trond Myklebust 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2010-10-21 3:25 UTC (permalink / raw) To: Jeff Layton; +Cc: chuck.lever, Linux NFS Mailing List, Trond Myklebust On Wed, 20 Oct 2010 20:45:32 -0400 Jeff Layton <jlayton@redhat.com> wrote: > On Thu, 21 Oct 2010 07:40:28 +1100 > Neil Brown <neilb@suse.de> wrote: > > > > > > > Then what happens is that xs_tcp_send_request gets called again to try > > > to resend the packet. In the EHOSTUNREACH case, that returns > > > EHOSTUNREACH which eventually causes an rpc_exit with that error. In > > > the ENETUNREACH case that returns EPIPE, which makes the state machine > > > move next to call_bind and the whole thing starts over again. > > > > This confuses me. Why would xs_tcp_send_request (aka ->send_request) get > > called before the connect has succeeded? Can you make sense of that? > > > > It confuses me too. I suspect that this may actually be a bug... > > So EINPROGRESS makes the connect_worker task clear the connecting bit > and return. Eventually, the EHOSTUNREACH error is reported to > xs_error_report. That function does this: > > xprt_wake_pending_tasks(xprt, -EAGAIN); > > The task that was waiting on the connect_worker is then woken up. > call_connect_status does this: > > if (status >= 0 || status == -EAGAIN) { > clnt->cl_stats->netreconn++; > task->tk_action = call_transmit; > return; > } > > ...and we end up in call_transmit without the socket being connected. > > So I understand how this happened, but I don't really understand the > design of the connect mechanism well enough to know whether this is > by design or not. > Now that *is* interesting..... I thought that code in call_connect_status was hard to understand too, so I asked git who to blame it on. It said: commit 2a4919919a97911b0aa4b9f5ac1eab90ba87652b Author: Trond Myklebust <Trond.Myklebust@netapp.com> Date: Wed Mar 11 14:38:00 2009 -0400 SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending While we should definitely return socket errors to the task that is currently trying to send data, there is no need to propagate the same error to all the other tasks on xprt->pending. Doing so actually slows down recovery, since it causes more than one tasks to attempt socket recovery. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> That commit not only adds the "status == -EAGAIN" test, but also introduced some of the xprtsock.c code that I suggested changing in a previous patch. So there seem to be some correlation between that commit and the present problem. I tried compiling the kernel just prior to that commit, and mount -t nfs4 unrouteable.ip.addres:/ /mnt took 3 seconds to time fail. I then stepped forward to that commit and the same command took 3 *minutes* to time out. So something isn't right there. Unfortunately I don't know what. Trond: can you comment on this - maybe explain the reasoning behind that commit better, and suggest how we can get ENOTCONN to fail SOFTCONN connections faster without undoing the things this patch tried to achieve? Thanks, NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 3:25 ` Neil Brown @ 2010-10-21 14:05 ` Trond Myklebust 2010-10-21 14:31 ` Chuck Lever 0 siblings, 1 reply; 20+ messages in thread From: Trond Myklebust @ 2010-10-21 14:05 UTC (permalink / raw) To: Neil Brown; +Cc: Jeff Layton, chuck.lever, Linux NFS Mailing List On Thu, 2010-10-21 at 14:25 +1100, Neil Brown wrote: > On Wed, 20 Oct 2010 20:45:32 -0400 > Jeff Layton <jlayton@redhat.com> wrote: > > > On Thu, 21 Oct 2010 07:40:28 +1100 > > Neil Brown <neilb@suse.de> wrote: > > > > > > > > > > Then what happens is that xs_tcp_send_request gets called again to try > > > > to resend the packet. In the EHOSTUNREACH case, that returns > > > > EHOSTUNREACH which eventually causes an rpc_exit with that error. In > > > > the ENETUNREACH case that returns EPIPE, which makes the state machine > > > > move next to call_bind and the whole thing starts over again. > > > > > > This confuses me. Why would xs_tcp_send_request (aka ->send_request) get > > > called before the connect has succeeded? Can you make sense of that? > > > > > > > It confuses me too. I suspect that this may actually be a bug... > > > > So EINPROGRESS makes the connect_worker task clear the connecting bit > > and return. Eventually, the EHOSTUNREACH error is reported to > > xs_error_report. That function does this: > > > > xprt_wake_pending_tasks(xprt, -EAGAIN); > > > > The task that was waiting on the connect_worker is then woken up. > > call_connect_status does this: > > > > if (status >= 0 || status == -EAGAIN) { > > clnt->cl_stats->netreconn++; > > task->tk_action = call_transmit; > > return; > > } > > > > ...and we end up in call_transmit without the socket being connected. > > > > So I understand how this happened, but I don't really understand the > > design of the connect mechanism well enough to know whether this is > > by design or not. > > > > Now that *is* interesting..... > > I thought that code in call_connect_status was hard to understand too, so I > asked git who to blame it on. It said: > > commit 2a4919919a97911b0aa4b9f5ac1eab90ba87652b > Author: Trond Myklebust <Trond.Myklebust@netapp.com> > Date: Wed Mar 11 14:38:00 2009 -0400 > > SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending > > While we should definitely return socket errors to the task that is > currently trying to send data, there is no need to propagate the same error > to all the other tasks on xprt->pending. Doing so actually slows down > recovery, since it causes more than one tasks to attempt socket recovery. > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > > > > That commit not only adds the "status == -EAGAIN" test, but also introduced > some of the xprtsock.c code that I suggested changing in a previous patch. That is because ENOTCONN was trying to report the current state of the socket to a bunch of tasks on the 'pending' list. The problem is that because none of those tasks hold any form of lock, then by the time they get round to processing that ENOTCONN, the state of the socket can (and usually will) have changed. > So there seem to be some correlation between that commit and the present > problem. > > I tried compiling the kernel just prior to that commit, and > mount -t nfs4 unrouteable.ip.addres:/ /mnt > > took 3 seconds to time fail. > > I then stepped forward to that commit and the same command took 3 *minutes* to > time out. So something isn't right there. Unfortunately I don't know what. > > Trond: can you comment on this - maybe explain the reasoning behind that > commit better, and suggest how we can get ENOTCONN to fail SOFTCONN > connections faster without undoing the things this patch tried to achieve? It seems to me that the problem is basically that RPC_IS_SOFTCONN is poorly defined. IMO, the definition should really be that 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run call_bind or a call_connect more than once (i.e. if they ever have to loop back). With that in mind, I really don't see why the RPC_IS_SOFTCONN case is being handled in call_transmit_status() instead of in call_status(). I furthermore don't see why ECONNRESET, ENOTCONN and EPIPE should be treated any differently from ECONNREFUSED, EHOSTDOWN, EHOSTUNREACH and ENETUNREACH. Cheers Trond ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 14:05 ` Trond Myklebust @ 2010-10-21 14:31 ` Chuck Lever 2010-10-21 14:42 ` Trond Myklebust 0 siblings, 1 reply; 20+ messages in thread From: Chuck Lever @ 2010-10-21 14:31 UTC (permalink / raw) To: Trond Myklebust; +Cc: Neil Brown, Jeff Layton, Linux NFS Mailing List On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote: > On Thu, 2010-10-21 at 14:25 +1100, Neil Brown wrote: >> On Wed, 20 Oct 2010 20:45:32 -0400 >> Jeff Layton <jlayton@redhat.com> wrote: >> >>> On Thu, 21 Oct 2010 07:40:28 +1100 >>> Neil Brown <neilb@suse.de> wrote: >>> >>>>> >>>>> Then what happens is that xs_tcp_send_request gets called again to try >>>>> to resend the packet. In the EHOSTUNREACH case, that returns >>>>> EHOSTUNREACH which eventually causes an rpc_exit with that error. In >>>>> the ENETUNREACH case that returns EPIPE, which makes the state machine >>>>> move next to call_bind and the whole thing starts over again. >>>> >>>> This confuses me. Why would xs_tcp_send_request (aka ->send_request) get >>>> called before the connect has succeeded? Can you make sense of that? >>>> >>> >>> It confuses me too. I suspect that this may actually be a bug... >>> >>> So EINPROGRESS makes the connect_worker task clear the connecting bit >>> and return. Eventually, the EHOSTUNREACH error is reported to >>> xs_error_report. That function does this: >>> >>> xprt_wake_pending_tasks(xprt, -EAGAIN); >>> >>> The task that was waiting on the connect_worker is then woken up. >>> call_connect_status does this: >>> >>> if (status >= 0 || status == -EAGAIN) { >>> clnt->cl_stats->netreconn++; >>> task->tk_action = call_transmit; >>> return; >>> } >>> >>> ...and we end up in call_transmit without the socket being connected. >>> >>> So I understand how this happened, but I don't really understand the >>> design of the connect mechanism well enough to know whether this is >>> by design or not. >>> >> >> Now that *is* interesting..... >> >> I thought that code in call_connect_status was hard to understand too, so I >> asked git who to blame it on. It said: >> >> commit 2a4919919a97911b0aa4b9f5ac1eab90ba87652b >> Author: Trond Myklebust <Trond.Myklebust@netapp.com> >> Date: Wed Mar 11 14:38:00 2009 -0400 >> >> SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending >> >> While we should definitely return socket errors to the task that is >> currently trying to send data, there is no need to propagate the same error >> to all the other tasks on xprt->pending. Doing so actually slows down >> recovery, since it causes more than one tasks to attempt socket recovery. >> >> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> >> >> >> >> That commit not only adds the "status == -EAGAIN" test, but also introduced >> some of the xprtsock.c code that I suggested changing in a previous patch. > > That is because ENOTCONN was trying to report the current state of the > socket to a bunch of tasks on the 'pending' list. The problem is that > because none of those tasks hold any form of lock, then by the time they > get round to processing that ENOTCONN, the state of the socket can (and > usually will) have changed. > >> So there seem to be some correlation between that commit and the present >> problem. >> >> I tried compiling the kernel just prior to that commit, and >> mount -t nfs4 unrouteable.ip.addres:/ /mnt >> >> took 3 seconds to time fail. >> >> I then stepped forward to that commit and the same command took 3 *minutes* to >> time out. So something isn't right there. Unfortunately I don't know what. >> >> Trond: can you comment on this - maybe explain the reasoning behind that >> commit better, and suggest how we can get ENOTCONN to fail SOFTCONN >> connections faster without undoing the things this patch tried to achieve? > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is > poorly defined. IMO, the definition should really be that > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run > call_bind or a call_connect more than once (i.e. if they ever have to > loop back). I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on. Yes? > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is > being handled in call_transmit_status() instead of in call_status(). I tried it in call_status(), but it didn't work as expected. My notes aren't specific about the problem. One thing is for sure, there seems to be some redundant error handling logic in both of those states. Unraveling it might be more than Neil bargained for, but would help "future generations." > I furthermore don't see why ECONNRESET, ENOTCONN and EPIPE should be > treated any differently from ECONNREFUSED, EHOSTDOWN, EHOSTUNREACH and > ENETUNREACH. I can get behind that. It looks like call_bind_status already combines all these error codes, so it makes some sense. -- chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 14:31 ` Chuck Lever @ 2010-10-21 14:42 ` Trond Myklebust 2010-10-21 19:40 ` Jeff Layton 0 siblings, 1 reply; 20+ messages in thread From: Trond Myklebust @ 2010-10-21 14:42 UTC (permalink / raw) To: Chuck Lever; +Cc: Neil Brown, Jeff Layton, Linux NFS Mailing List On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote: > On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote: > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is > > poorly defined. IMO, the definition should really be that > > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run > > call_bind or a call_connect more than once (i.e. if they ever have to > > loop back). > > I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on. Yes? Yes. > > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is > > being handled in call_transmit_status() instead of in call_status(). > > I tried it in call_status(), but it didn't work as expected. My notes aren't specific about the problem. One thing is for sure, there seems to be some redundant error handling logic in both of those states. Unraveling it might be more than Neil bargained for, but would help "future generations." call_transmit_status() is there basically in order to call xprt_end_transmit() in the cases where we want to free up the socket write lock (and then possibly sleep). Actual state error handling is supposed occur in call_status(). Cheers Trond ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 14:42 ` Trond Myklebust @ 2010-10-21 19:40 ` Jeff Layton 2010-10-21 19:47 ` Trond Myklebust 0 siblings, 1 reply; 20+ messages in thread From: Jeff Layton @ 2010-10-21 19:40 UTC (permalink / raw) To: Trond Myklebust; +Cc: Chuck Lever, Neil Brown, Linux NFS Mailing List On Thu, 21 Oct 2010 10:42:04 -0400 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote: > > On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote: > > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is > > > poorly defined. IMO, the definition should really be that > > > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run > > > call_bind or a call_connect more than once (i.e. if they ever have to > > > loop back). > > > > I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on. Yes? > > Yes. > > > > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is > > > being handled in call_transmit_status() instead of in call_status(). > > > > I tried it in call_status(), but it didn't work as expected. My notes aren't specific about the problem. One thing is for sure, there seems to be some redundant error handling logic in both of those states. Unraveling it might be more than Neil bargained for, but would help "future generations." > > call_transmit_status() is there basically in order to call > xprt_end_transmit() in the cases where we want to free up the socket > write lock (and then possibly sleep). > > Actual state error handling is supposed occur in call_status(). > This is terribly confusing. So the connect handling is done in call_transmit? It seems like it would make more sense to have the connect handling mostly done in call_connect and call_connect_status, and only allow the tasks to proceed to the call_transmit phase after the connect has succeeded. If that's not the case, then maybe some renaming of functions is in order so that the purpose of them is more clear? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 19:40 ` Jeff Layton @ 2010-10-21 19:47 ` Trond Myklebust 2010-10-21 20:08 ` Jeff Layton 0 siblings, 1 reply; 20+ messages in thread From: Trond Myklebust @ 2010-10-21 19:47 UTC (permalink / raw) To: Jeff Layton; +Cc: Chuck Lever, Neil Brown, Linux NFS Mailing List On Thu, 2010-10-21 at 15:40 -0400, Jeff Layton wrote: > On Thu, 21 Oct 2010 10:42:04 -0400 > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote: > > > On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote: > > > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is > > > > poorly defined. IMO, the definition should really be that > > > > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run > > > > call_bind or a call_connect more than once (i.e. if they ever have to > > > > loop back). > > > > > > I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on. Yes? > > > > Yes. > > > > > > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is > > > > being handled in call_transmit_status() instead of in call_status(). > > > > > > I tried it in call_status(), but it didn't work as expected. My notes aren't specific about the problem. One thing is for sure, there seems to be some redundant error handling logic in both of those states. Unraveling it might be more than Neil bargained for, but would help "future generations." > > > > call_transmit_status() is there basically in order to call > > xprt_end_transmit() in the cases where we want to free up the socket > > write lock (and then possibly sleep). > > > > Actual state error handling is supposed occur in call_status(). > > > > This is terribly confusing. So the connect handling is done in > call_transmit? No. Connection-related _error conditions_ that result from trying to send stuff through a socket that is not connected, are handled in call_status(). Usually, by sending the process back to call_connect(). > It seems like it would make more sense to have the connect handling > mostly done in call_connect and call_connect_status, and only allow the > tasks to proceed to the call_transmit phase after the connect has > succeeded. That is the case. However sockets can and do get closed by the _server_ in unpredictable ways. This is what may need to be handled after the task has passed the call_connect state. Trond ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 19:47 ` Trond Myklebust @ 2010-10-21 20:08 ` Jeff Layton 2010-10-21 20:18 ` Trond Myklebust 0 siblings, 1 reply; 20+ messages in thread From: Jeff Layton @ 2010-10-21 20:08 UTC (permalink / raw) To: Trond Myklebust; +Cc: Chuck Lever, Neil Brown, Linux NFS Mailing List On Thu, 21 Oct 2010 15:47:24 -0400 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Thu, 2010-10-21 at 15:40 -0400, Jeff Layton wrote: > > On Thu, 21 Oct 2010 10:42:04 -0400 > > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > > > On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote: > > > > On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote: > > > > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is > > > > > poorly defined. IMO, the definition should really be that > > > > > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run > > > > > call_bind or a call_connect more than once (i.e. if they ever have to > > > > > loop back). > > > > > > > > I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on. Yes? > > > > > > Yes. > > > > > > > > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is > > > > > being handled in call_transmit_status() instead of in call_status(). > > > > > > > > I tried it in call_status(), but it didn't work as expected. My notes aren't specific about the problem. One thing is for sure, there seems to be some redundant error handling logic in both of those states. Unraveling it might be more than Neil bargained for, but would help "future generations." > > > > > > call_transmit_status() is there basically in order to call > > > xprt_end_transmit() in the cases where we want to free up the socket > > > write lock (and then possibly sleep). > > > > > > Actual state error handling is supposed occur in call_status(). > > > > > > > This is terribly confusing. So the connect handling is done in > > call_transmit? > > No. Connection-related _error conditions_ that result from trying to > send stuff through a socket that is not connected, are handled in > call_status(). Usually, by sending the process back to call_connect(). > Yep, I get that part... > > It seems like it would make more sense to have the connect handling > > mostly done in call_connect and call_connect_status, and only allow the > > tasks to proceed to the call_transmit phase after the connect has > > succeeded. > > That is the case. However sockets can and do get closed by the _server_ > in unpredictable ways. This is what may need to be handled after the > task has passed the call_connect state. > Right, the socket can change state at any time, but the code doesn't seem to do what you describe for initial connects. In the EHOSTDOWN case, kernel_connect returns EINPROGRESS and then xs_error_report is called with a socket error of EHOSTDOWN. That wakes up the tasks with a status of EAGAIN and call_connect_status sends the task to call_transmit even though the socket is still not connected. Having xs_error_report set the status of all tasks with EAGAIN seems wrong to me since we're essentially losing the error code that the socket layer sent up. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 20:08 ` Jeff Layton @ 2010-10-21 20:18 ` Trond Myklebust 2011-03-23 6:41 ` NeilBrown 0 siblings, 1 reply; 20+ messages in thread From: Trond Myklebust @ 2010-10-21 20:18 UTC (permalink / raw) To: Jeff Layton; +Cc: Chuck Lever, Neil Brown, Linux NFS Mailing List On Thu, 2010-10-21 at 16:08 -0400, Jeff Layton wrote: > On Thu, 21 Oct 2010 15:47:24 -0400 > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > On Thu, 2010-10-21 at 15:40 -0400, Jeff Layton wrote: > > > On Thu, 21 Oct 2010 10:42:04 -0400 > > > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > > > > > On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote: > > > > > On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote: > > > > > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is > > > > > > poorly defined. IMO, the definition should really be that > > > > > > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run > > > > > > call_bind or a call_connect more than once (i.e. if they ever have to > > > > > > loop back). > > > > > > > > > > I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on. Yes? > > > > > > > > Yes. > > > > > > > > > > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is > > > > > > being handled in call_transmit_status() instead of in call_status(). > > > > > > > > > > I tried it in call_status(), but it didn't work as expected. My notes aren't specific about the problem. One thing is for sure, there seems to be some redundant error handling logic in both of those states. Unraveling it might be more than Neil bargained for, but would help "future generations." > > > > > > > > call_transmit_status() is there basically in order to call > > > > xprt_end_transmit() in the cases where we want to free up the socket > > > > write lock (and then possibly sleep). > > > > > > > > Actual state error handling is supposed occur in call_status(). > > > > > > > > > > This is terribly confusing. So the connect handling is done in > > > call_transmit? > > > > No. Connection-related _error conditions_ that result from trying to > > send stuff through a socket that is not connected, are handled in > > call_status(). Usually, by sending the process back to call_connect(). > > > > Yep, I get that part... > > > > It seems like it would make more sense to have the connect handling > > > mostly done in call_connect and call_connect_status, and only allow the > > > tasks to proceed to the call_transmit phase after the connect has > > > succeeded. > > > > That is the case. However sockets can and do get closed by the _server_ > > in unpredictable ways. This is what may need to be handled after the > > task has passed the call_connect state. > > > > Right, the socket can change state at any time, but the code doesn't > seem to do what you describe for initial connects. > > In the EHOSTDOWN case, kernel_connect returns EINPROGRESS and then > xs_error_report is called with a socket error of EHOSTDOWN. That wakes > up the tasks with a status of EAGAIN and call_connect_status sends the > task to call_transmit even though the socket is still not connected. > > Having xs_error_report set the status of all tasks with EAGAIN seems > wrong to me since we're essentially losing the error code that the > socket layer sent up. No. The next task to try to access the socket should pick it up. See my previous comment about ensuring that we don't try to handle state-related without adequate protection (i.e. locks) to ensure that the state being reported by the error hasn't changed. Trond ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts. 2010-10-21 20:18 ` Trond Myklebust @ 2011-03-23 6:41 ` NeilBrown 0 siblings, 0 replies; 20+ messages in thread From: NeilBrown @ 2011-03-23 6:41 UTC (permalink / raw) To: Trond Myklebust; +Cc: Jeff Layton, Chuck Lever, Linux NFS Mailing List Hi all, I wold like to resurrect this old thread from last October, as the issue is still a problem. i.e. when the NFSv4 mount client in the kernel gets ENETUNREACH it doesn't fail immediately but rather it times out and then fails. Note that it doesn't timeout, retry, then fail. Rather it gets the error status, waits the timeout (2 minutes), and then fails with no retry. The message below was the last in the thread. There-in Trond says something which sounds sensible and relevant, but unfortunately I completely fail to understand the full relevance of because I don't know the ins-and-outs of the rpc layer well enough. Trond (or anyone else): Could you possibly expand on this, maybe with some pointers into the code. I'm happy to try to come up with a fix myself, but I don't like my chances with my current level of understanding. Thanks, NeilBrown On Thu, 21 Oct 2010 16:18:18 -0400 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Thu, 2010-10-21 at 16:08 -0400, Jeff Layton wrote: > > On Thu, 21 Oct 2010 15:47:24 -0400 > > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > > > On Thu, 2010-10-21 at 15:40 -0400, Jeff Layton wrote: > > > > On Thu, 21 Oct 2010 10:42:04 -0400 > > > > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > > > > > > > On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote: > > > > > > On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote: > > > > > > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is > > > > > > > poorly defined. IMO, the definition should really be that > > > > > > > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run > > > > > > > call_bind or a call_connect more than once (i.e. if they ever have to > > > > > > > loop back). > > > > > > > > > > > > I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on. Yes? > > > > > > > > > > Yes. > > > > > > > > > > > > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is > > > > > > > being handled in call_transmit_status() instead of in call_status(). > > > > > > > > > > > > I tried it in call_status(), but it didn't work as expected. My notes aren't specific about the problem. One thing is for sure, there seems to be some redundant error handling logic in both of those states. Unraveling it might be more than Neil bargained for, but would help "future generations." > > > > > > > > > > call_transmit_status() is there basically in order to call > > > > > xprt_end_transmit() in the cases where we want to free up the socket > > > > > write lock (and then possibly sleep). > > > > > > > > > > Actual state error handling is supposed occur in call_status(). > > > > > > > > > > > > > This is terribly confusing. So the connect handling is done in > > > > call_transmit? > > > > > > No. Connection-related _error conditions_ that result from trying to > > > send stuff through a socket that is not connected, are handled in > > > call_status(). Usually, by sending the process back to call_connect(). > > > > > > > Yep, I get that part... > > > > > > It seems like it would make more sense to have the connect handling > > > > mostly done in call_connect and call_connect_status, and only allow the > > > > tasks to proceed to the call_transmit phase after the connect has > > > > succeeded. > > > > > > That is the case. However sockets can and do get closed by the _server_ > > > in unpredictable ways. This is what may need to be handled after the > > > task has passed the call_connect state. > > > > > > > Right, the socket can change state at any time, but the code doesn't > > seem to do what you describe for initial connects. > > > > In the EHOSTDOWN case, kernel_connect returns EINPROGRESS and then > > xs_error_report is called with a socket error of EHOSTDOWN. That wakes > > up the tasks with a status of EAGAIN and call_connect_status sends the > > task to call_transmit even though the socket is still not connected. > > > > Having xs_error_report set the status of all tasks with EAGAIN seems > > wrong to me since we're essentially losing the error code that the > > socket layer sent up. > > No. The next task to try to access the socket should pick it up. > > See my previous comment about ensuring that we don't try to handle > state-related without adequate protection (i.e. locks) to ensure that > the state being reported by the error hasn't changed. > > Trond > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2011-03-23 6:41 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-20 7:17 NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts Neil Brown 2010-10-20 14:29 ` Chuck Lever 2010-10-20 21:29 ` Neil Brown 2010-10-21 0:56 ` Neil Brown 2010-10-21 12:09 ` Jeff Layton 2010-10-21 13:52 ` Chuck Lever 2010-10-21 14:10 ` Chuck Lever 2010-10-20 17:55 ` Jeff Layton 2010-10-20 19:16 ` Jeff Layton 2010-10-20 20:40 ` Neil Brown 2010-10-21 0:45 ` Jeff Layton 2010-10-21 3:25 ` Neil Brown 2010-10-21 14:05 ` Trond Myklebust 2010-10-21 14:31 ` Chuck Lever 2010-10-21 14:42 ` Trond Myklebust 2010-10-21 19:40 ` Jeff Layton 2010-10-21 19:47 ` Trond Myklebust 2010-10-21 20:08 ` Jeff Layton 2010-10-21 20:18 ` Trond Myklebust 2011-03-23 6:41 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).