netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: TCP hangs
       [not found]   ` <Pine.LNX.4.58.0405031238110.18691@artax.karlin.mff.cuni.cz>
@ 2004-05-04 15:37     ` Nivedita Singhvi
  2004-05-04 16:26       ` Mikulas Patocka
  0 siblings, 1 reply; 4+ messages in thread
From: Nivedita Singhvi @ 2004-05-04 15:37 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: linux-kernel, netdev

[cc'ing netdev]

Mikulas Patocka wrote:

>>Which Linux kernel (distro, version)? What hangs?  Is it just that
>>connection? All the kernel? Client end or remote server end?

> I don't know what version is that. Have no access to it. Neither server
> nor client crashes, it just stops receiving data in connection.

That is strange - since your trace clearly showed the client
sending a FIN and a reset, so the client socket should have
gone away. Does netstat still show the connection? What state
is it in?

>>It is not wrong to send no data in a zero window probe. TCP MUST,
>>however, continue sending the probes while the window is zero.
>>
>>Assuming some reordering (see embedded comments in the trace,
>>below), all of the following looks correct on the Linux end.
>>Also, since the client side responds to the data coming
>>in with resets and zero windows, the client socket has gone
>>away too.
>>
>>Once the resets reach the server, it presumably has torn down its
>>socket - and there are no more packets exchanged. All done.
>>
>>So nothing in the trace looks like a hang or an incorrect
>>resolution. (other than the fact that the app seems to have died,
>>and it doesn't respond to the zero window probes as it should).
>>Client/app broken, seems like.
> 
> 
> Yes. So if client receives ACK, it should respond with other ACK?
> How does the TCP prevent ping-pong effect --- clients sending ACKs to each
> other indefinitely? How should the client know if the ACK is window probe
> (to which it should respond) or normal ACK (to which it shoudn't respond).
> What RFC part does say that?

You are right, it should send data. What the implementations
do (at least recent Linux 2.4, 2.6) is send an out of window
sequence number (just the previous one acked by the client)
to force the receiver to ack. Not sure why its not doing
that in this case - but could be an old Linux. Its confusing
since in the trace, the window seems to open up again (2 + scale).
I'll check to see if we send acks or zerop probes under any
circumstances in this way in the current code again, but don't
think so.

But since the client kernel has seen the app go away, and
has sent resets to the server, and presumably the server tears
down the connection when it gets the reset and never sends
anything again, why is the client having a problem at all?
Nothing need hang here, or even seems to from the kernel
point of view.

Note that the window scale factor of 10 implies a pretty
big window, and so for that to come down to zero implies
the app has really crashed or aborted..

Can you recreate the problem? What was happening in user
space?

thanks,
Nivedita


> Mikulas
> 
> 
>>thanks,
>>Nivedita
>>
>>Edited trace:
>>
>>par = paranoia.kolej.mff.cuni.cz.65461
>>http = 213.29.7.213.http	[Linux box]
>>
>>1.
>>16:34:49.832097  par > http: SWE 1711254266:1711254266(0)
>>win 8192 <mss 1460,sackOK,wscale 10,eol>
>>
>>2.
>>16:34:49.838957  http > par: S 1163781419:1163781419(0)
>>ack 1711254267 win 5840 <mss 1460,nop,nop,sackOK,nop,wscale 0>
>>
>>3.
>>16:34:49.838968  par > http: P ack 1 win 8
>>
>>4.
>>16:34:49.840002  par > http: P 1:500(499) ack 1 win 8
>>
>>5.
>>16:34:49.847349  http > par: . ack 500 win 6432
>>
>>6.
>>16:34:49.863592  http > par: . 1:1461(1460) ack 500 win 6432
>>
>>7.
>>16:34:49.863651  par > http: P ack 1461 win 6
>>
>>8.
>>16:34:49.867490  http > par: . 1461:2921(1460) ack 500 win 6432
>>
>>9.
>>16:34:49.867558  par > http: P ack 2921 win 6
>>
>>10.
>>16:34:49.871498  http > par: . 2921:4381(1460) ack 500 win 6432
>>
>>11.
>>16:34:49.871567  par > http: P ack 4381 win 5
>>
>>12.
>>16:34:49.872729  http > par: . 4381:5841(1460) ack 500 win 6432
>>
>>13.
>>16:34:49.872777  par > http: P ack 5841 win 3
>>
>>14.
>>16:34:49.875631  http > par: . 7301:8761(1460) ack 500 win 6432
>>
>>15.
>>16:34:49.875714  par > http: P ack 5841 win 3 <nop,nop,sack sack 1 {7301:8761} >
>>
>>16.
>>16:34:49.876881  http > par: . 5841:7301(1460) ack 500 win 6432
>>
>>17.
>>16:34:49.876953  par > http: P ack 8761 win 0
>>
>>18.
>>16:34:49.907290  par > http: P ack 8761 win 2
>>^^^^ this packet was probably lost or the last two were reordered
>>
>>19.
>>16:34:50.088544  http > par: . ack 500 win 6432
>>
>>20.
>>16:34:50.512936  http > par: . ack 500 win 6432
>>^^^ this looks to me like a bug --- window probe doesn't contain data
>>==> not a problem, the receiving client should respond
>>==> with an ack and updated window. But why is not the
>>==> client responding to the window probes?
>>
>>21.
>>16:34:51.348911  http > par: . ack 500 win 6432
>>
>>22.
>>16:34:53.028754  http > par: . ack 500 win 6432
>>
>>23.
>>16:34:56.389624  http > par: . ack 500 win 6432
>>
>>24.
>>16:35:03.110512  http > par: . ack 500 win 6432
>>^^^ exponential backoff on window probes is fine, except that
>>the packets are pure acks
>>
>>25.
>>16:35:16.552095  http > par: . ack 500 win 6432
>>
>>26.
>>16:35:43.435482  http > par: . ack 500 win 6432
>>
>>27.
>>16:35:58.706896  par > http: FP 500:500(0) ack 8761 win 17
>>^^^ paranoia closed the connection without receiving any data
>>
>>==> So presumably the client application did a close or
>>==> has gone away?
> 
> 
> Did close().
> 
> 
>>28.
>>16:35:58.717487  http > par: . 10221:11681(1460) ack 501 win 6432
>>==> missing/expected 8761:10221
>>
>>29.
>>16:35:58.717569  par > http: R 501:501(0) ack 11681 win 0
>>==> clearly reordered trace since client is acking 11681
>>==> which we have not yet seen arrive in the trace
> 
> 
> This is reset, not ack. It just means that client is not willing to
> receive more data after shutdown(SHUT_RD).
> 
> 
>>30.
>>16:35:58.718673  http > par: . 8761:10221(1460) ack 501 win 6432
>>==> Nooo, if the previous reset (R) reached http, it
>>==> should not be barfing more data at us. Going by
>>==> the ack from the client above, this was sent first.
>>
>>31.
>>16:35:58.718692  par > http: R 501:501(0) ack 10221 win 0
>>==> reset, continued window of 0, implies no socket
>>==> remaining here (?).
>>
>>32.
>>16:35:58.720054  http > par: . 11681:13141(1460) ack 501 win 6432
>>
>>33.
>>16:35:58.720074  par > http: R 501:501(0) ack 13141 win 0
>>
>>==> A more likely sequence of events is:
>>==> packet #30, #28, #32 are sent by the http server, and
>>==> packets #31, #29, #33 are sent in response when they
>>==> reach the client.
>>
>>
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: TCP hangs
  2004-05-04 15:37     ` TCP hangs Nivedita Singhvi
@ 2004-05-04 16:26       ` Mikulas Patocka
  2004-05-04 16:48         ` Nivedita Singhvi
  0 siblings, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2004-05-04 16:26 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: linux-kernel, netdev



On Tue, 4 May 2004, Nivedita Singhvi wrote:

> Mikulas Patocka wrote:
>
> >>Which Linux kernel (distro, version)? What hangs?  Is it just that
> >>connection? All the kernel? Client end or remote server end?
>
> > I don't know what version is that. Have no access to it. Neither server
> > nor client crashes, it just stops receiving data in connection.
>
> That is strange - since your trace clearly showed the client
> sending a FIN and a reset, so the client socket should have
> gone away. Does netstat still show the connection? What state
> is it in?

TCP should send RST on received data after shutdown(SHUT_RD) ---
RFC2525, sections 2.16, 2.17.

> > Yes. So if client receives ACK, it should respond with other ACK?
> > How does the TCP prevent ping-pong effect --- clients sending ACKs to each
> > other indefinitely? How should the client know if the ACK is window probe
> > (to which it should respond) or normal ACK (to which it shoudn't respond).
> > What RFC part does say that?
>
> You are right, it should send data. What the implementations
> do (at least recent Linux 2.4, 2.6) is send an out of window
> sequence number (just the previous one acked by the client)
> to force the receiver to ack. Not sure why its not doing
> that in this case - but could be an old Linux. Its confusing
> since in the trace, the window seems to open up again (2 + scale).
> I'll check to see if we send acks or zerop probes under any
> circumstances in this way in the current code again, but don't
> think so.

It happens that the stack at the client ignores seq number if packet
doesn't contain any data. I fixed the client so that it replies with ack,
if sequence number doesn't match. Is it correct fix?

> But since the client kernel has seen the app go away, and

The app didn't go away, it just called close.

> has sent resets to the server, and presumably the server tears
> down the connection when it gets the reset and never sends
> anything again, why is the client having a problem at all?
> Nothing need hang here, or even seems to from the kernel
> point of view.
>
> Note that the window scale factor of 10 implies a pretty
> big window, and so for that to come down to zero implies
> the app has really crashed or aborted..

It has initial window just 8kb (8 << 10), lets it slightly close if app
doesn't read data, and opens it up to 512kb proportinaly to the rate at
which app is reading data. I think Linux is doing something similar.

What happened at the trace was, that the app was receiving data aith
read() slower than the server sent them.

> Can you recreate the problem?

Probably not --- it happens once in few days. Maybe I could recreate it
artifically by modifying the client to not send open-window ACKs. But I
think now it's not necessary, because I know what caused it and fixed it.

> What was happening in user space?

App was just not receiving any data, until I stopped the connection in
browser.

Mikulas


> thanks,
> Nivedita
>
>
> > Mikulas
> >
> >
> >>thanks,
> >>Nivedita
> >>
> >>Edited trace:
> >>
> >>par = paranoia.kolej.mff.cuni.cz.65461
> >>http = 213.29.7.213.http	[Linux box]
> >>
> >>1.
> >>16:34:49.832097  par > http: SWE 1711254266:1711254266(0)
> >>win 8192 <mss 1460,sackOK,wscale 10,eol>
> >>
> >>2.
> >>16:34:49.838957  http > par: S 1163781419:1163781419(0)
> >>ack 1711254267 win 5840 <mss 1460,nop,nop,sackOK,nop,wscale 0>
> >>
> >>3.
> >>16:34:49.838968  par > http: P ack 1 win 8
> >>
> >>4.
> >>16:34:49.840002  par > http: P 1:500(499) ack 1 win 8
> >>
> >>5.
> >>16:34:49.847349  http > par: . ack 500 win 6432
> >>
> >>6.
> >>16:34:49.863592  http > par: . 1:1461(1460) ack 500 win 6432
> >>
> >>7.
> >>16:34:49.863651  par > http: P ack 1461 win 6
> >>
> >>8.
> >>16:34:49.867490  http > par: . 1461:2921(1460) ack 500 win 6432
> >>
> >>9.
> >>16:34:49.867558  par > http: P ack 2921 win 6
> >>
> >>10.
> >>16:34:49.871498  http > par: . 2921:4381(1460) ack 500 win 6432
> >>
> >>11.
> >>16:34:49.871567  par > http: P ack 4381 win 5
> >>
> >>12.
> >>16:34:49.872729  http > par: . 4381:5841(1460) ack 500 win 6432
> >>
> >>13.
> >>16:34:49.872777  par > http: P ack 5841 win 3
> >>
> >>14.
> >>16:34:49.875631  http > par: . 7301:8761(1460) ack 500 win 6432
> >>
> >>15.
> >>16:34:49.875714  par > http: P ack 5841 win 3 <nop,nop,sack sack 1 {7301:8761} >
> >>
> >>16.
> >>16:34:49.876881  http > par: . 5841:7301(1460) ack 500 win 6432
> >>
> >>17.
> >>16:34:49.876953  par > http: P ack 8761 win 0
> >>
> >>18.
> >>16:34:49.907290  par > http: P ack 8761 win 2
> >>^^^^ this packet was probably lost or the last two were reordered
> >>
> >>19.
> >>16:34:50.088544  http > par: . ack 500 win 6432
> >>
> >>20.
> >>16:34:50.512936  http > par: . ack 500 win 6432
> >>^^^ this looks to me like a bug --- window probe doesn't contain data
> >>==> not a problem, the receiving client should respond
> >>==> with an ack and updated window. But why is not the
> >>==> client responding to the window probes?
> >>
> >>21.
> >>16:34:51.348911  http > par: . ack 500 win 6432
> >>
> >>22.
> >>16:34:53.028754  http > par: . ack 500 win 6432
> >>
> >>23.
> >>16:34:56.389624  http > par: . ack 500 win 6432
> >>
> >>24.
> >>16:35:03.110512  http > par: . ack 500 win 6432
> >>^^^ exponential backoff on window probes is fine, except that
> >>the packets are pure acks
> >>
> >>25.
> >>16:35:16.552095  http > par: . ack 500 win 6432
> >>
> >>26.
> >>16:35:43.435482  http > par: . ack 500 win 6432
> >>
> >>27.
> >>16:35:58.706896  par > http: FP 500:500(0) ack 8761 win 17
> >>^^^ paranoia closed the connection without receiving any data
> >>
> >>==> So presumably the client application did a close or
> >>==> has gone away?
> >
> >
> > Did close().
> >
> >
> >>28.
> >>16:35:58.717487  http > par: . 10221:11681(1460) ack 501 win 6432
> >>==> missing/expected 8761:10221
> >>
> >>29.
> >>16:35:58.717569  par > http: R 501:501(0) ack 11681 win 0
> >>==> clearly reordered trace since client is acking 11681
> >>==> which we have not yet seen arrive in the trace
> >
> >
> > This is reset, not ack. It just means that client is not willing to
> > receive more data after shutdown(SHUT_RD).
> >
> >
> >>30.
> >>16:35:58.718673  http > par: . 8761:10221(1460) ack 501 win 6432
> >>==> Nooo, if the previous reset (R) reached http, it
> >>==> should not be barfing more data at us. Going by
> >>==> the ack from the client above, this was sent first.
> >>
> >>31.
> >>16:35:58.718692  par > http: R 501:501(0) ack 10221 win 0
> >>==> reset, continued window of 0, implies no socket
> >>==> remaining here (?).
> >>
> >>32.
> >>16:35:58.720054  http > par: . 11681:13141(1460) ack 501 win 6432
> >>
> >>33.
> >>16:35:58.720074  par > http: R 501:501(0) ack 13141 win 0
> >>
> >>==> A more likely sequence of events is:
> >>==> packet #30, #28, #32 are sent by the http server, and
> >>==> packets #31, #29, #33 are sent in response when they
> >>==> reach the client.
> >>
> >>
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: TCP hangs
  2004-05-04 16:26       ` Mikulas Patocka
@ 2004-05-04 16:48         ` Nivedita Singhvi
  2004-05-04 21:20           ` Mikulas Patocka
  0 siblings, 1 reply; 4+ messages in thread
From: Nivedita Singhvi @ 2004-05-04 16:48 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: linux-kernel, netdev

Mikulas Patocka wrote:

>TCP should send RST on received data after shutdown(SHUT_RD) ---
>RFC2525, sections 2.16, 2.17.
>  
>

Yes, but that should lead to a shutdown on both ends. If you
have sent a reset, why are you not tearing down your end of
whatever remains of the connection? You have asked the
other side to tear down. RFC 793:

"The receiver of a RST first validates it, then changes 
state.  If the receiver was in the LISTEN state, it ignores it.  
If the receiver was in SYN-RECEIVED state and had previously 
been in the LISTEN state, then the receiver returns to the 
LISTEN state, otherwise the receiver aborts the connection 
and goes to the CLOSED state.  If the receiver was in any 
other state, it aborts the connection and advises the user
and goes to the CLOSED state."


>It happens that the stack at the client ignores seq number if packet
>doesn't contain any data. I fixed the client so that it replies with ack,
>if sequence number doesn't match. Is it correct fix?
>

That will work. 

>The app didn't go away, it just called close.
>

See above.

>App was just not receiving any data, until I stopped the connection in
>browser.
>
>Mikulas
>  
>

Hmm, that sounds like a root cause issue, then (why the app wasn't
receiving data in the first place)..

Thanks for the info..

Nivedita

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: TCP hangs
  2004-05-04 16:48         ` Nivedita Singhvi
@ 2004-05-04 21:20           ` Mikulas Patocka
  0 siblings, 0 replies; 4+ messages in thread
From: Mikulas Patocka @ 2004-05-04 21:20 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: linux-kernel, netdev



On Tue, 4 May 2004, Nivedita Singhvi wrote:

> Mikulas Patocka wrote:
>
> >TCP should send RST on received data after shutdown(SHUT_RD) ---
> >RFC2525, sections 2.16, 2.17.
> >
> >
>
> Yes, but that should lead to a shutdown on both ends. If you
> have sent a reset, why are you not tearing down your end of
> whatever remains of the connection? You have asked the
> other side to tear down. RFC 793:
>
> "The receiver of a RST first validates it, then changes
> state.  If the receiver was in the LISTEN state, it ignores it.
> If the receiver was in SYN-RECEIVED state and had previously
> been in the LISTEN state, then the receiver returns to the
> LISTEN state, otherwise the receiver aborts the connection
> and goes to the CLOSED state.  If the receiver was in any
> other state, it aborts the connection and advises the user
> and goes to the CLOSED state."

Good point. Now I see that in client's code, that it doesn't kill the
connection after sending reset. However if it did, the trace would look
exactly the same, because when client receives packet for port without
connection, it would reply with RST anyway.

Mikulas

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-05-04 21:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.58.0405021602120.20423@artax.karlin.mff.cuni.cz>
     [not found] ` <409583B1.5040906@us.ibm.com>
     [not found]   ` <Pine.LNX.4.58.0405031238110.18691@artax.karlin.mff.cuni.cz>
2004-05-04 15:37     ` TCP hangs Nivedita Singhvi
2004-05-04 16:26       ` Mikulas Patocka
2004-05-04 16:48         ` Nivedita Singhvi
2004-05-04 21:20           ` Mikulas Patocka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).