* [PATCH] sunrpc: cancel delayed connect working when conncet success @ 2010-08-18 9:49 Mi Jinlong 2010-08-30 19:18 ` Trond Myklebust 0 siblings, 1 reply; 4+ messages in thread From: Mi Jinlong @ 2010-08-18 9:49 UTC (permalink / raw) To: NFSv3 list, Trond.Myklebust, J. Bruce Fields, Chuck Lever, Jeff Layton As network partition or some other reason, when client connect success, maybe there is some delayed connect working in connect_work list. Aug 2 12:51:32 TEST-M kernel: RPC: xs_connect delayed xprt ccc4c800 for 96 seconds Aug 2 12:51:32 TEST-M kernel: RPC: xs_error_report client ccc4c800... Aug 2 12:51:32 TEST-M kernel: RPC: error 111 ... snip ... Aug 2 12:53:08 TEST-M kernel: RPC: disconnected transport ccc4c800 Aug 2 12:53:08 TEST-M kernel: RPC: worker connecting xprt ccc4c800 via tcp to 192.168.0.21 (port 2049) Aug 2 12:53:08 TEST-M kernel: RPC: ccc4c800 connect status 115 connected 0 sock state 2 Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect_status: retrying Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_prepare_transmit Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_transmit(136) Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_send_request(136) = -11 Aug 2 12:53:08 TEST-M kernel: RPC: 228 xmit incomplete (136 left of 136) Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect xprt ccc4c800 is not connected Aug 2 12:53:08 TEST-M kernel: RPC: xs_connect delayed xprt ccc4c800 for 192 seconds Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_state_change client ccc4c800... Aug 2 12:53:08 TEST-M kernel: RPC: state 1 conn 0 dead 0 zapped 1 Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect_status: retrying Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_prepare_transmit Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_transmit(136) Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_send_request(136) = 136 Aug 2 12:53:08 TEST-M kernel: RPC: 228 xmit complete Aug 2 12:53:08 TEST-M kernel: RPC: 229 xprt_prepare_transmit As the debug message show, "xs_connect delayed xprt ccc4c800 for 192 seconds" means a connecting work have be delayed at connect_worker list. "state 1 conn 0 dead 0 zapped 1" shows the connect have successed but a delayed work still alive at connect_worker list. Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> --- net/sunrpc/xprtsock.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 49a62f0..823f1db 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1324,6 +1324,10 @@ static void xs_tcp_state_change(struct sock *sk) transport->tcp_flags = TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; + if (xprt_connecting(xprt) && + cancel_delayed_work(&transport->connect_worker)) + xprt_clear_connecting(xprt); + xprt_wake_pending_tasks(xprt, -EAGAIN); } spin_unlock_bh(&xprt->transport_lock); -- 1.7.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] sunrpc: cancel delayed connect working when conncet success 2010-08-18 9:49 [PATCH] sunrpc: cancel delayed connect working when conncet success Mi Jinlong @ 2010-08-30 19:18 ` Trond Myklebust 2010-08-31 0:47 ` Mi Jinlong 0 siblings, 1 reply; 4+ messages in thread From: Trond Myklebust @ 2010-08-30 19:18 UTC (permalink / raw) To: Mi Jinlong; +Cc: NFSv3 list, J. Bruce Fields, Chuck Lever, Jeff Layton On Wed, 2010-08-18 at 17:49 +0800, Mi Jinlong wrote: > As network partition or some other reason, when client connect > success, maybe there is some delayed connect working in connect_work list. > > Aug 2 12:51:32 TEST-M kernel: RPC: xs_connect delayed xprt ccc4c800 for 96 seconds > Aug 2 12:51:32 TEST-M kernel: RPC: xs_error_report client ccc4c800... > Aug 2 12:51:32 TEST-M kernel: RPC: error 111 > ... snip ... > Aug 2 12:53:08 TEST-M kernel: RPC: disconnected transport ccc4c800 > Aug 2 12:53:08 TEST-M kernel: RPC: worker connecting xprt ccc4c800 via tcp to 192.168.0.21 (port 2049) > Aug 2 12:53:08 TEST-M kernel: RPC: ccc4c800 connect status 115 connected 0 sock state 2 > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect_status: retrying > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_prepare_transmit > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_transmit(136) > Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_send_request(136) = -11 > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xmit incomplete (136 left of 136) > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect xprt ccc4c800 is not connected > Aug 2 12:53:08 TEST-M kernel: RPC: xs_connect delayed xprt ccc4c800 for 192 seconds > Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_state_change client ccc4c800... > Aug 2 12:53:08 TEST-M kernel: RPC: state 1 conn 0 dead 0 zapped 1 > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect_status: retrying > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_prepare_transmit > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_transmit(136) > Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_send_request(136) = 136 > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xmit complete > Aug 2 12:53:08 TEST-M kernel: RPC: 229 xprt_prepare_transmit > > As the debug message show, "xs_connect delayed xprt ccc4c800 for 192 seconds" > means a connecting work have be delayed at connect_worker list. > "state 1 conn 0 dead 0 zapped 1" shows the connect have successed > but a delayed work still alive at connect_worker list. > > Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> > > --- > net/sunrpc/xprtsock.c | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index 49a62f0..823f1db 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -1324,6 +1324,10 @@ static void xs_tcp_state_change(struct sock *sk) > transport->tcp_flags = > TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; > > + if (xprt_connecting(xprt) && > + cancel_delayed_work(&transport->connect_worker)) > + xprt_clear_connecting(xprt); > + > xprt_wake_pending_tasks(xprt, -EAGAIN); > } > spin_unlock_bh(&xprt->transport_lock); Wait... According to the above trace, the connect request is _failing_ due to an ECONNREFUSED error. In that case, we _want_ to delay the reconnection in order to give the server time to set itself up. Cheers Trond ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] sunrpc: cancel delayed connect working when conncet success 2010-08-30 19:18 ` Trond Myklebust @ 2010-08-31 0:47 ` Mi Jinlong 2010-09-09 10:11 ` Mi Jinlong 0 siblings, 1 reply; 4+ messages in thread From: Mi Jinlong @ 2010-08-31 0:47 UTC (permalink / raw) To: Trond Myklebust; +Cc: NFSv3 list, J. Bruce Fields, Chuck Lever, Jeff Layton Hi Trond, Trond Myklebust 写道: > On Wed, 2010-08-18 at 17:49 +0800, Mi Jinlong wrote: >> As network partition or some other reason, when client connect >> success, maybe there is some delayed connect working in connect_work list. >> >> Aug 2 12:51:32 TEST-M kernel: RPC: xs_connect delayed xprt ccc4c800 for 96 seconds >> Aug 2 12:51:32 TEST-M kernel: RPC: xs_error_report client ccc4c800... >> Aug 2 12:51:32 TEST-M kernel: RPC: error 111 >> ... snip ... >> Aug 2 12:53:08 TEST-M kernel: RPC: disconnected transport ccc4c800 >> Aug 2 12:53:08 TEST-M kernel: RPC: worker connecting xprt ccc4c800 via tcp to 192.168.0.21 (port 2049) >> Aug 2 12:53:08 TEST-M kernel: RPC: ccc4c800 connect status 115 connected 0 sock state 2 >> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect_status: retrying >> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_prepare_transmit >> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_transmit(136) >> Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_send_request(136) = -11 >> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xmit incomplete (136 left of 136) >> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect xprt ccc4c800 is not connected >> Aug 2 12:53:08 TEST-M kernel: RPC: xs_connect delayed xprt ccc4c800 for 192 seconds >> Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_state_change client ccc4c800... >> Aug 2 12:53:08 TEST-M kernel: RPC: state 1 conn 0 dead 0 zapped 1 >> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect_status: retrying >> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_prepare_transmit >> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_transmit(136) >> Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_send_request(136) = 136 >> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xmit complete >> Aug 2 12:53:08 TEST-M kernel: RPC: 229 xprt_prepare_transmit >> >> As the debug message show, "xs_connect delayed xprt ccc4c800 for 192 seconds" >> means a connecting work have be delayed at connect_worker list. >> "state 1 conn 0 dead 0 zapped 1" shows the connect have successed >> but a delayed work still alive at connect_worker list. >> >> Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> >> >> --- >> net/sunrpc/xprtsock.c | 4 ++++ >> 1 files changed, 4 insertions(+), 0 deletions(-) >> >> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >> index 49a62f0..823f1db 100644 >> --- a/net/sunrpc/xprtsock.c >> +++ b/net/sunrpc/xprtsock.c >> @@ -1324,6 +1324,10 @@ static void xs_tcp_state_change(struct sock *sk) >> transport->tcp_flags = >> TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; >> >> + if (xprt_connecting(xprt) && >> + cancel_delayed_work(&transport->connect_worker)) >> + xprt_clear_connecting(xprt); >> + >> xprt_wake_pending_tasks(xprt, -EAGAIN); >> } >> spin_unlock_bh(&xprt->transport_lock); > > Wait... According to the above trace, the connect request is _failing_ > due to an ECONNREFUSED error. In that case, we _want_ to delay the > reconnection in order to give the server time to set itself up. Yes, that's right. But, the important part of the trace is " Aug 2 12:53:08 TEST-M kernel: RPC: 228 xmit incomplete (136 left of 136) Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect xprt ccc4c800 is not connected Aug 2 12:53:08 TEST-M kernel: RPC: xs_connect delayed xprt ccc4c800 for 192 seconds Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_state_change client ccc4c800... Aug 2 12:53:08 TEST-M kernel: RPC: state 1 conn 0 dead 0 zapped 1 Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect_status: retrying ". The SUNRPC's TCP connecting is asynchronous, but the tcp_connect() only send a SYN but don't waiting for the ACK reply. CLIENT SERVER 1. The first connecting |-xs_connect() |-kernel_connect(O_NONBLOCK) |-tcp_connet() -------- SYN ---------> xs_connect() return with EINPROGRESS and the ACK have not reply. 2. a reconnecting of 1 |-xs_connect() queue_delayed_work(rpciod_workqueue, &transport->connect_worker, xprt->reestablish_timeout); <---------------ACK SYN-------------- After the reconnecting have put the connect working to connect_worker, the ACK-SYN for the first connecting reply, the connect is OK now. At this instance, a delayed connect working will be exist at connect_worker after connecting success, we should cancel this working. thanks, Mi Jinlong ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] sunrpc: cancel delayed connect working when conncet success 2010-08-31 0:47 ` Mi Jinlong @ 2010-09-09 10:11 ` Mi Jinlong 0 siblings, 0 replies; 4+ messages in thread From: Mi Jinlong @ 2010-09-09 10:11 UTC (permalink / raw) To: Trond Myklebust; +Cc: NFSv3 list, J. Bruce Fields, Chuck Lever, Jeff Layton Hi Trond, what do you think about this problem ? thanks, Mi Jinlong Mi Jinlong =E5=86=99=E9=81=93: > Hi Trond, >=20 > Trond Myklebust =E5=86=99=E9=81=93: >> On Wed, 2010-08-18 at 17:49 +0800, Mi Jinlong wrote: >>> As network partition or some other reason, when client connect=20 >>> success, maybe there is some delayed connect working in connect_wor= k list. >>> >>> Aug 2 12:51:32 TEST-M kernel: RPC: xs_connect delayed xprt c= cc4c800 for 96 seconds >>> Aug 2 12:51:32 TEST-M kernel: RPC: xs_error_report client cc= c4c800... >>> Aug 2 12:51:32 TEST-M kernel: RPC: error 111 >>> ... snip ... >>> Aug 2 12:53:08 TEST-M kernel: RPC: disconnected transport cc= c4c800 >>> Aug 2 12:53:08 TEST-M kernel: RPC: worker connecting xprt cc= c4c800 via tcp to 192.168.0.21 (port 2049) >>> Aug 2 12:53:08 TEST-M kernel: RPC: ccc4c800 connect status 1= 15 connected 0 sock state 2 >>> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect_status: retr= ying >>> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_prepare_transmit >>> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_transmit(136) >>> Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_send_request(136) = =3D -11 >>> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xmit incomplete (136 left= of 136) >>> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect xprt ccc4c80= 0 is not connected >>> Aug 2 12:53:08 TEST-M kernel: RPC: xs_connect delayed xprt c= cc4c800 for 192 seconds >>> Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_state_change clien= t ccc4c800... >>> Aug 2 12:53:08 TEST-M kernel: RPC: state 1 conn 0 dead 0 zap= ped 1 >>> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect_status: retr= ying >>> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_prepare_transmit >>> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_transmit(136) >>> Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_send_request(136) = =3D 136 >>> Aug 2 12:53:08 TEST-M kernel: RPC: 228 xmit complete >>> Aug 2 12:53:08 TEST-M kernel: RPC: 229 xprt_prepare_transmit >>> >>> As the debug message show, "xs_connect delayed xprt ccc4c800 for 19= 2 seconds" >>> means a connecting work have be delayed at connect_worker list. >>> "state 1 conn 0 dead 0 zapped 1" shows the connect have successed=20 >>> but a delayed work still alive at connect_worker list. >>> >>> Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> >>> >>> --- >>> net/sunrpc/xprtsock.c | 4 ++++ >>> 1 files changed, 4 insertions(+), 0 deletions(-) >>> >>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >>> index 49a62f0..823f1db 100644 >>> --- a/net/sunrpc/xprtsock.c >>> +++ b/net/sunrpc/xprtsock.c >>> @@ -1324,6 +1324,10 @@ static void xs_tcp_state_change(struct sock = *sk) >>> transport->tcp_flags =3D >>> TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; >>> =20 >>> + if (xprt_connecting(xprt) && >>> + cancel_delayed_work(&transport->connect_worker)) >>> + xprt_clear_connecting(xprt); >>> + >>> xprt_wake_pending_tasks(xprt, -EAGAIN); >>> } >>> spin_unlock_bh(&xprt->transport_lock); >> Wait... According to the above trace, the connect request is _failin= g_ >> due to an ECONNREFUSED error. In that case, we _want_ to delay the >> reconnection in order to give the server time to set itself up. >=20 > Yes, that's right. >=20 > But, the important part of the trace is=20 > " > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xmit incomplete (136 lef= t of 136) > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect xprt ccc4c8= 00 is not connected > Aug 2 12:53:08 TEST-M kernel: RPC: xs_connect delayed xprt = ccc4c800 for 192 seconds > Aug 2 12:53:08 TEST-M kernel: RPC: xs_tcp_state_change clie= nt ccc4c800... > Aug 2 12:53:08 TEST-M kernel: RPC: state 1 conn 0 dead 0 za= pped 1 > Aug 2 12:53:08 TEST-M kernel: RPC: 228 xprt_connect_status: ret= rying > ". >=20 > The SUNRPC's TCP connecting is asynchronous, but the tcp_connect() > only send a SYN but don't waiting for the ACK reply. >=20 > CLIENT SERVER >=20 > 1. The first connecting >=20 > |-xs_connect() > |-kernel_connect(O_NONBLOCK) > |-tcp_connet() -------- SYN ---------> >=20 > xs_connect() return with EINPROGRESS and the ACK have not reply.=20 >=20 > 2. a reconnecting of 1 > |-xs_connect() > queue_delayed_work(rpciod_workqueue, > &transport->connect_worker, > xprt->reestablish_timeout); >=20 > <---------------ACK SYN-------------- >=20 > After the reconnecting have put the connect working to connect_wor= ker, > the ACK-SYN for the first connecting reply, the connect is OK now. >=20 > At this instance, a delayed connect working will be exist at connect_= worker > after connecting success, we should cancel this working. >=20 > thanks, > Mi Jinlong >=20 > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" = in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >=20 ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-09-09 10:09 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-18 9:49 [PATCH] sunrpc: cancel delayed connect working when conncet success Mi Jinlong 2010-08-30 19:18 ` Trond Myklebust 2010-08-31 0:47 ` Mi Jinlong 2010-09-09 10:11 ` Mi Jinlong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox