public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* TCP stack behaviour question
@ 2006-09-15 17:28 Stuart MacDonald
  2006-09-18  8:29 ` Andi Kleen
  0 siblings, 1 reply; 16+ messages in thread
From: Stuart MacDonald @ 2006-09-15 17:28 UTC (permalink / raw)
  To: LKML

I'm having some trouble with a network application I've written. I've
done a lot of research the last few days; man 7 ip, man 7 tcp, kernel
2.4.31 source code, Stevens' Illustrated TCP/IP Vol 1 & 3 (for some
reason we don't have Vol 2), Usenet, websites. I'm hoping someone here
can help me out, or point me in the correct direction.

Distro: Debian 3.0 r2
Kernel: Stock 2.4.24

tcp_retries1 == 3
tcp_retries2 == 15

I have an application that setups up a TCP connection to a server. If
the server has a power failure, TCP starts retransmitting the packet
that wasn't ACKed. I see the exponential backoff.

Question 1: There's the original packet, plus 7 retransmitted packets
for a total of 8, then TCP gives up. How is 7 (or 8) derived from the
tcp_retries[12] settings?

Question 1a: The time between last and second-last retransmit packets
is only about 27 seconds. I've read there's a maximum time, but also
that it's usually 100 or 120 seconds. Where can I find that setting in
/proc?

Question 1b: If RTO is that high, why is retransmit stopping?

Question 2: After the retransmit has given up, the app is still
making an occasional write(), which succeeds! However, tearing down
and attemting a new connection results in an immediate EHOSTUNREACH
error. Why is the write() succeeding?

Question 2a: How can my app find out the EHOSTUNREACH error
immediately? IP_RECVERR is not implemented on TCP, and SO_ERROR always
reports no error (0).

..Stu


^ permalink raw reply	[flat|nested] 16+ messages in thread
* RE: TCP stack behaviour question
@ 2006-09-18 18:29 Stuart MacDonald
  2006-09-19 12:03 ` Samuel Tardieu
  0 siblings, 1 reply; 16+ messages in thread
From: Stuart MacDonald @ 2006-09-18 18:29 UTC (permalink / raw)
  To: 'Stuart MacDonald', 'Andi Kleen'; +Cc: linux-kernel

From: Stuart MacDonald [mailto:stuartm@connecttech.com] 
> What happened was this: I had a run where I captured output with
> tcpdump. My original post was based on that, and the results of the
> debug output from my app. For whatever reason, it appears the stack
> didn't generate all of the packets it should have. When the log showed
> a second-last to last retransmit time of about 27 seconds, and then a
> gap of about 400 to the very next packet of any kind, I assumed that
> meant the stack had given up on the retransmits when it appears
> something else was going on.

I did another run and confirmed this. The tcpdump capture shows that
seven retransmits are sent, obeying the exponential backoff. Then
something odd happens. Instead of the 8th retransmit at 7th + 26.88
seconds, there is an arp at 7th + 4.159722 seconds. There are three
arps in fact, each one second apart and directed to the MAC of the
powered-off machine. After this there are further arps (in groups of
three one second apart), but they are broadcast and have a backoff
schedule.

The kernel debugging shows that tcp_write_timeout() and
tcp_retransmit_timer() are still being called though, right up to what
would be the 16th retransmit.

I suppose that the TCP retransmits aren't being sent because the
ethernet and/or IP layers don't know what's going on, which is what's
producing the arps. Is that correct? Is that documented anywhere?

I was expecting to see all 15 retransmits, and was confused when I
didn't see them.

..Stu


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-09-20  9:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-15 17:28 TCP stack behaviour question Stuart MacDonald
2006-09-18  8:29 ` Andi Kleen
2006-09-18 13:20   ` Stuart MacDonald
2006-09-18 13:54     ` Andi Kleen
2006-09-18 14:19       ` Stuart MacDonald
2006-09-18 14:31         ` Andi Kleen
2006-09-18 15:38           ` Michael Kerrisk
2006-09-18 17:01             ` Stuart MacDonald
2006-09-19  6:13               ` Michael Kerrisk
2006-09-19  6:47                 ` Andi Kleen
2006-09-19 14:50               ` Michael Kerrisk
2006-09-20  9:55                 ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2006-09-18 18:29 Stuart MacDonald
2006-09-19 12:03 ` Samuel Tardieu
2006-09-19 14:00   ` Stuart MacDonald
2006-09-20  9:54     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox