Re: TCP keepalive timer problem

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: TCP keepalive timer problem
       [not found] <0939B589FC103041945B9F13274963E303B1A9D4@CORPUSMX90A.corp.emc.com>
@ 2009-08-25 13:13 ` Eric Dumazet
  2009-08-25 14:05   ` Li_Xin2
  2009-08-25 14:04 ` Andi Kleen
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2009-08-25 13:13 UTC (permalink / raw)
  To: Li_Xin2; +Cc: linux-kernel, Linux Netdev List

Li_Xin2@emc.com a écrit :
> Greetings,
> 
> I found one problem in Linux TCP keepalive timer processing, after
> searching on google, I found Daniel Stempel reported the same problem in
> 2007 (http://lkml.indiana.edu/hypermail/linux/kernel/0702.2/1136.html),
> but got no answer. So I have to reraise it.
> 
> Can anyone help answer this two-years long question?
> 
>

You should explain your problem in detail, since Daniel one was probably different.

He mentioned "(timeout is set to e.g. 30 seconds)" which is kind of nasty, given normal one is 7200

If some packets are in flight, keepalive is not fired at all, since normal
retransmits should take place (check tcp_retries2 sysctl).

TCP Keepalive is only fired when no trafic occurred for a long time, only if 
SO_KEEPALIVE socket option was enabled by application.

tcp_retries2 (integer; default: 15)
    The maximum number of times a TCP packet is retransmitted in established state
before giving up. The default value is 15, which corresponds to a duration of
approximately between 13 to 30 minutes, depending on the retransmission timeout.
The RFC 1122 specified minimum limit of 100 seconds is typically deemed too short. 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: TCP keepalive timer problem
  2009-08-25 13:13 ` TCP keepalive timer problem Eric Dumazet
@ 2009-08-25 14:05   ` Li_Xin2
  2009-08-27 12:45     ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Li_Xin2 @ 2009-08-25 14:05 UTC (permalink / raw)
  To: eric.dumazet; +Cc: linux-kernel, netdev

Thanks for your quick reply, let me explain my problem in detail.

Suppose the client side of communication sets the keep alive socket option, connects to server, then we pulls out the network cable of server box. After the connection is idle for TCP_KEEPIDLE seconds, the first keepalive probe packet is sent, and of course no reply is received. Just after the first probe packet, the client sends some data. No response is received, and as you said, the normal retransmission takes place and no further keepalive probe will be sent. 

	The problem is: application that tries the keepalive mechanism expects communication peer crash detection within TCP_KEEPIDLE + TCP_KEEPCNT * TCP_KEEPINTVL seconds. Application may set relative smaller TCP_KEEPIDLE, TCP_KEEPCNT and TCP_KEEPINTVL value so that peer crash can be detected quickly, for example, 60 seconds. But if the keepalive is intervened with retransmission, the latter takes higher priority, so that peer crash will be detected after 13 to 30 minutes, which may not be acceptable for some applications.

We tried TCP implementation on Windows XP SP3, the keepalive and retransmission don't intervene.

Regards,
Xin Li
EMC Shanghai R&D Centre
Email: Li_Xin2@emc.com
Tel: 86 21 6095 1100 x 2257

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Sent: 2009年8月25日 21:13
To: Li, Xin
Cc: linux-kernel@vger.kernel.org; Linux Netdev List
Subject: Re: TCP keepalive timer problem

Li_Xin2@emc.com a écrit :
> Greetings,
> 
> I found one problem in Linux TCP keepalive timer processing, after
> searching on google, I found Daniel Stempel reported the same problem in
> 2007 (http://lkml.indiana.edu/hypermail/linux/kernel/0702.2/1136.html),
> but got no answer. So I have to reraise it.
> 
> Can anyone help answer this two-years long question?
> 
>

You should explain your problem in detail, since Daniel one was probably different.

He mentioned "(timeout is set to e.g. 30 seconds)" which is kind of nasty, given normal one is 7200

If some packets are in flight, keepalive is not fired at all, since normal
retransmits should take place (check tcp_retries2 sysctl).

TCP Keepalive is only fired when no trafic occurred for a long time, only if 
SO_KEEPALIVE socket option was enabled by application.

tcp_retries2 (integer; default: 15)
    The maximum number of times a TCP packet is retransmitted in established state
before giving up. The default value is 15, which corresponds to a duration of
approximately between 13 to 30 minutes, depending on the retransmission timeout.
The RFC 1122 specified minimum limit of 100 seconds is typically deemed too short. 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCP keepalive timer problem
  2009-08-25 14:05   ` Li_Xin2
@ 2009-08-27 12:45     ` Eric Dumazet
  2009-08-27 13:35       ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2009-08-27 12:45 UTC (permalink / raw)
  To: Li_Xin2; +Cc: linux-kernel, netdev

Please dont top post on these lists, find my answers below

Li_Xin2@emc.com a écrit :
>  
> Thanks for your quick reply, let me explain my problem in detail.
> 
> Suppose the client side of communication sets the keep alive socket option, connects to
> server, then > we pulls out the network cable of server box. After the connection is idle for TCP_KEEPIDLE 

seconds, the first keepalive probe packet is sent, and of course no reply is received. 

Just after the first probe packet, the client sends some data. No response is received, and 

as you said, the normal retransmission takes place and no further keepalive probe will be sent. 
> 
> 	The problem is: application that tries the keepalive mechanism expects communication peer 

crash detection within TCP_KEEPIDLE + TCP_KEEPCNT * TCP_KEEPINTVL seconds. Application may set

 relative smaller TCP_KEEPIDLE, TCP_KEEPCNT and TCP_KEEPINTVL value so that peer crash can be

 detected quickly, for example, 60 seconds. But if the keepalive is intervened with 

retransmission, the latter takes higher priority, so that peer crash will be detected after

 13 to 30 minutes, which may not be acceptable for some applications.
> 
> We tried TCP implementation on Windows XP SP3, the keepalive and retransmission don't intervene.
> 


> Regards,
> Xin Li
> EMC Shanghai R&D Centre
> Email: Li_Xin2@emc.com
> Tel: 86 21 6095 1100 x 2257
> 
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
> Sent: 2009年8月25日 21:13
> To: Li, Xin
> Cc: linux-kernel@vger.kernel.org; Linux Netdev List
> Subject: Re: TCP keepalive timer problem
> 
> Li_Xin2@emc.com a écrit :
>> Greetings,
>>
>> I found one problem in Linux TCP keepalive timer processing, after
>> searching on google, I found Daniel Stempel reported the same problem in
>> 2007 (http://lkml.indiana.edu/hypermail/linux/kernel/0702.2/1136.html),
>> but got no answer. So I have to reraise it.
>>
>> Can anyone help answer this two-years long question?
>>
>>
> 
> You should explain your problem in detail, since Daniel one was probably different.
> 
> He mentioned "(timeout is set to e.g. 30 seconds)" which is kind of nasty, given normal one is 7200
> 
> If some packets are in flight, keepalive is not fired at all, since normal
> retransmits should take place (check tcp_retries2 sysctl).
> 
> TCP Keepalive is only fired when no trafic occurred for a long time, only if 
> SO_KEEPALIVE socket option was enabled by application.
> 
> tcp_retries2 (integer; default: 15)
>     The maximum number of times a TCP packet is retransmitted in established state
> before giving up. The default value is 15, which corresponds to a duration of
> approximately between 13 to 30 minutes, depending on the retransmission timeout.
> The RFC 1122 specified minimum limit of 100 seconds is typically deemed too short. 
> 

RFC1122 , section 4.2.3.6 tells :

Keep-alive packets MUST only be sent when no data or acknowledgement packets have
 been received for the connection within an interval. This interval MUST be 
configurable and MUST default to no less than two hours. 

So :

Normal tcp_retries2 settings should make sure connection is reset if packets in flight are not acknowledged way before TCP_KEEPIDLE (>= 7200 seconds)


Now, 7200 seconds might be inappropriate for special needs, and considering
there is no way to change tcp_retries2 for a given socket (only choice being the global
tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC,
and get smaller keepalive timers if possible.

So when keepalive_timer fires, we should not care of outgoing packets,
only care on tp->rcv_tstamp, timestamp of last received ACK.


diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index b144a26..719f198 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -484,18 +484,13 @@ static void tcp_keepalive_timer (unsigned long data)
 			}
 		}
 		tcp_send_active_reset(sk, GFP_ATOMIC);
-		goto death;
+		tcp_done(sk);
+		goto out;
 	}
 
 	if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE)
 		goto out;
 
-	elapsed = keepalive_time_when(tp);
-
-	/* It is alive without keepalive 8) */
-	if (tp->packets_out || tcp_send_head(sk))
-		goto resched;
-
 	elapsed = tcp_time_stamp - tp->rcv_tstamp;
 
 	if (elapsed >= keepalive_time_when(tp)) {
@@ -522,13 +517,7 @@ static void tcp_keepalive_timer (unsigned long data)
 	TCP_CHECK_TIMER(sk);
 	sk_mem_reclaim(sk);
 
-resched:
 	inet_csk_reset_keepalive_timer (sk, elapsed);
-	goto out;
-
-death:
-	tcp_done(sk);
-
 out:
 	bh_unlock_sock(sk);
 	sock_put(sk);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: TCP keepalive timer problem
  2009-08-27 12:45     ` Eric Dumazet
@ 2009-08-27 13:35       ` Andi Kleen
  2009-08-27 14:17         ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2009-08-27 13:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Li_Xin2, linux-kernel, netdev

Eric Dumazet <eric.dumazet@gmail.com> writes:
>
> Now, 7200 seconds might be inappropriate for special needs, and considering
> there is no way to change tcp_retries2 for a given socket (only choice being the global
> tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC,
> and get smaller keepalive timers if possible.

I think the better fix would be to just to only do that when
tcp_retries2 > keep alive time. So keep the existing behaviour
with default keep alive, but switch when the user defined
a very short keep alive.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCP keepalive timer problem
  2009-08-27 13:35       ` Andi Kleen
@ 2009-08-27 14:17         ` Eric Dumazet
  2009-08-27 14:29           ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2009-08-27 14:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Li_Xin2, linux-kernel, netdev

Andi Kleen a écrit :
> Eric Dumazet <eric.dumazet@gmail.com> writes:
>> Now, 7200 seconds might be inappropriate for special needs, and considering
>> there is no way to change tcp_retries2 for a given socket (only choice being the global
>> tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC,
>> and get smaller keepalive timers if possible.
> 
> I think the better fix would be to just to only do that when
> tcp_retries2 > keep alive time. So keep the existing behaviour
> with default keep alive, but switch when the user defined
> a very short keep alive.
> 

tcp_retries2 is a number of retries, its difficult to derive a time from it.

Also, it's not clear what behavior you are refering to.
Imagine we can be smart and compute tcp_retries2_time (in jiffies) from tcp_retries2
If keepalive_timer fires and we have packets in flight, what heuristic do you suggest ?

if (tp->packets_out || tcp_send_head(sk))
	if (tcp_retries2_time < keepalive_time_when(tp))
		goto resched;
elapsed = tcp_time_stamp - tp->rcv_tstamp;
...

What would be the gain ?
Arming timer exactly every keepalive_time_when(tp)
instead of keepalive_time_when(tp) - (tcp_time_stamp - tp->rcv_tstamp) ?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCP keepalive timer problem
  2009-08-27 14:17         ` Eric Dumazet
@ 2009-08-27 14:29           ` Andi Kleen
  2009-08-27 14:49             ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2009-08-27 14:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, Li_Xin2, linux-kernel, netdev

On Thu, Aug 27, 2009 at 04:17:10PM +0200, Eric Dumazet wrote:
> Andi Kleen a écrit :
> > Eric Dumazet <eric.dumazet@gmail.com> writes:
> >> Now, 7200 seconds might be inappropriate for special needs, and considering
> >> there is no way to change tcp_retries2 for a given socket (only choice being the global
> >> tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC,
> >> and get smaller keepalive timers if possible.
> > 
> > I think the better fix would be to just to only do that when
> > tcp_retries2 > keep alive time. So keep the existing behaviour
> > with default keep alive, but switch when the user defined
> > a very short keep alive.
> > 
> 
> tcp_retries2 is a number of retries, its difficult to derive a time from it.

That shouldn't be too hard. 

> 
> Also, it's not clear what behavior you are refering to.
> Imagine we can be smart and compute tcp_retries2_time (in jiffies) from tcp_retries2
> If keepalive_timer fires and we have packets in flight, what heuristic do you suggest ?

I didn't suggest to change something at firing time, just pattern
the code you removed with if (keepalive_time > retries2 time)

That's not perfect, but likely good enough.


> if (tp->packets_out || tcp_send_head(sk))
> 	if (tcp_retries2_time < keepalive_time_when(tp))
> 		goto resched;
> elapsed = tcp_time_stamp - tp->rcv_tstamp;
> ...
> 
> What would be the gain ?
> Arming timer exactly every keepalive_time_when(tp)
> instead of keepalive_time_when(tp) - (tcp_time_stamp - tp->rcv_tstamp) ?

The gain would be that you don't send unnecessary packets by default (following the RFC), but 
still give expected behaviour to users who explicitely set short keepalives.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCP keepalive timer problem
  2009-08-27 14:29           ` Andi Kleen
@ 2009-08-27 14:49             ` Eric Dumazet
  2009-08-28  1:55               ` Li_Xin2
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2009-08-27 14:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Li_Xin2, linux-kernel, netdev

Andi Kleen a écrit :
> On Thu, Aug 27, 2009 at 04:17:10PM +0200, Eric Dumazet wrote:
>> Andi Kleen a écrit :
>>> Eric Dumazet <eric.dumazet@gmail.com> writes:
>>>> Now, 7200 seconds might be inappropriate for special needs, and considering
>>>> there is no way to change tcp_retries2 for a given socket (only choice being the global
>>>> tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC,
>>>> and get smaller keepalive timers if possible.
>>> I think the better fix would be to just to only do that when
>>> tcp_retries2 > keep alive time. So keep the existing behaviour
>>> with default keep alive, but switch when the user defined
>>> a very short keep alive.
>>>
>> tcp_retries2 is a number of retries, its difficult to derive a time from it.
> 
> That shouldn't be too hard. 
> 
>> Also, it's not clear what behavior you are refering to.
>> Imagine we can be smart and compute tcp_retries2_time (in jiffies) from tcp_retries2
>> If keepalive_timer fires and we have packets in flight, what heuristic do you suggest ?
> 
> I didn't suggest to change something at firing time, just pattern
> the code you removed with if (keepalive_time > retries2 time)
> 
> That's not perfect, but likely good enough.
> 
> 
>> if (tp->packets_out || tcp_send_head(sk))
>> 	if (tcp_retries2_time < keepalive_time_when(tp))
>> 		goto resched;
>> elapsed = tcp_time_stamp - tp->rcv_tstamp;
>> ...
>>
>> What would be the gain ?
>> Arming timer exactly every keepalive_time_when(tp)
>> instead of keepalive_time_when(tp) - (tcp_time_stamp - tp->rcv_tstamp) ?
> 
> The gain would be that you don't send unnecessary packets by default (following the RFC), but 
> still give expected behaviour to users who explicitely set short keepalives.
> 

Yep, so to recap we have two changes :

1) The one I sent (taking into account the time of last ACK we received) to compute the
timer delays.

2) The one you suggest, avoiding to send a probe if we have packets in flight, relying
on normal retransmits timers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: TCP keepalive timer problem
  2009-08-27 14:49             ` Eric Dumazet
@ 2009-08-28  1:55               ` Li_Xin2
  2009-08-28  7:05                 ` Damian Lukowski
  0 siblings, 1 reply; 10+ messages in thread
From: Li_Xin2 @ 2009-08-28  1:55 UTC (permalink / raw)
  To: eric.dumazet, andi; +Cc: linux-kernel, netdev

 > Yep, so to recap we have two changes :

> 1) The one I sent (taking into account the time of last ACK we
received) to compute the
> timer delays.

> 2) The one you suggest, avoiding to send a probe if we have packets in
flight, relying
> on normal retransmits timers.

I agree with these two changes, but I think the patch for point 2 given
by Eric is not correct:

elapsed = keepalive_time_when(tp);

 if (tp->packets_out || tcp_send_head(sk))
 	if (tcp_retries2_time < keepalive_time_when(tp))
 		goto resched;
 elapsed = tcp_time_stamp - tp->rcv_tstamp;

The problem is: you should not always compare tcp_retries2_time with
keepalive_time_when. If the first keepalive probe is already sent, you
should compare with keepalive_intvl_when
 ( or keepalive_intvl_when * (
tp->keepalive_probes?:sysctl_tcp_keepalive_probes -
icsk->icsk_probes_out ) if keepalive probe is already sent, and the
elapsed time also needs to take that into account.

elapsed =
icsk->icsk_probes_out?keepalive_intvl_when(tp):keepalive_time_when(tp);

 if (tp->packets_out || tcp_send_head(sk))
 	if (tcp_retries2_time < (icsk->icsk_probes_out?
keepalive_intvl_when(tp)  : keepalive_time_when(tp))
 		goto resched;
 elapsed = tcp_time_stamp - tp->rcv_tstamp;

How do you think?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCP keepalive timer problem
  2009-08-28  1:55               ` Li_Xin2
@ 2009-08-28  7:05                 ` Damian Lukowski
  0 siblings, 0 replies; 10+ messages in thread
From: Damian Lukowski @ 2009-08-28  7:05 UTC (permalink / raw)
  To: Li_Xin2; +Cc: eric.dumazet, andi, netdev

Li_Xin2@emc.com schrieb:
>  > Yep, so to recap we have two changes :
> 
>> 1) The one I sent (taking into account the time of last ACK we
> received) to compute the
>> timer delays.
> 
>> 2) The one you suggest, avoiding to send a probe if we have packets in
> flight, relying
>> on normal retransmits timers.
> 
> I agree with these two changes, but I think the patch for point 2 given
> by Eric is not correct:
> 
> 
> elapsed = keepalive_time_when(tp);
> 
>  if (tp->packets_out || tcp_send_head(sk))
>  	if (tcp_retries2_time < keepalive_time_when(tp))
>  		goto resched;
>  elapsed = tcp_time_stamp - tp->rcv_tstamp;

Hello,
I didn't follow the conversation in all details, but I submitted a series of
patches recently, in which one of those the time was calculated out of
tcp_retries. It's the post "[PATCH 3/3] Revert Backoff [v3]: Calculate TCP's
connection close threshold as a time value." from Aug 26, 2009 at 12:16 CEST.

This is the code with the calculation procedure:

--
> static inline bool retransmits_timed_out(const struct sock *sk,
> 					 unsigned int boundary)
> {
> 	int limit, K;
> 	if (!inet_csk(sk)->icsk_retransmits)
> 		return false;
> 
> 	K = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
> 
> 	if (boundary <= K)
> 		limit = ((2 << boundary) - 1) * TCP_RTO_MIN;
> 	else
> 		limit = ((2 << K) - 1) * TCP_RTO_MIN +
> 			(boundary - K) * TCP_RTO_MAX;
> 
> 	return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= limit;
> }
--

Given some boundary like tcp_retries2, it computes a limit, which should
be your tcp_retries2_time.
If that's not what you meant, then I'm sorry, and you can ignore it. :)

Regards

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCP keepalive timer problem
       [not found] <0939B589FC103041945B9F13274963E303B1A9D4@CORPUSMX90A.corp.emc.com>
  2009-08-25 13:13 ` TCP keepalive timer problem Eric Dumazet
@ 2009-08-25 14:04 ` Andi Kleen
  1 sibling, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2009-08-25 14:04 UTC (permalink / raw)
  To: Li_Xin2; +Cc: linux-kernel, netdev

<Li_Xin2@emc.com> writes:

[cc netdev]

> Greetings,
>
> I found one problem in Linux TCP keepalive timer processing, after
> searching on google, I found Daniel Stempel reported the same problem in
> 2007 (http://lkml.indiana.edu/hypermail/linux/kernel/0702.2/1136.html),
> but got no answer. So I have to reraise it.
>
> Can anyone help answer this two-years long question?

I think the idea behind the tcp_keepalive_timer check referrenced in the 
other mail is: when there are outstanding non acked packets then 
the normal retransmit timer will do keep alive because it will
retransmit and retransmit in exponential backoff and eventually notice 
something is wrong.

The obvious hole is that if the keepalive is shorter than the worst
case retransmit timeout then you'll have to wait for the longer
timeout.  I presume that's what is happening for you? You set the 
keep alive timeout very low and expect the timeout to be very low,
but it's 30+mins (default retransmit timeout)?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-08-28  7:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <0939B589FC103041945B9F13274963E303B1A9D4@CORPUSMX90A.corp.emc.com>
2009-08-25 13:13 ` TCP keepalive timer problem Eric Dumazet
2009-08-25 14:05   ` Li_Xin2
2009-08-27 12:45     ` Eric Dumazet
2009-08-27 13:35       ` Andi Kleen
2009-08-27 14:17         ` Eric Dumazet
2009-08-27 14:29           ` Andi Kleen
2009-08-27 14:49             ` Eric Dumazet
2009-08-28  1:55               ` Li_Xin2
2009-08-28  7:05                 ` Damian Lukowski
2009-08-25 14:04 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).