* Re: TCP keepalive timer problem [not found] <0939B589FC103041945B9F13274963E303B1A9D4@CORPUSMX90A.corp.emc.com> @ 2009-08-25 13:13 ` Eric Dumazet 2009-08-25 14:05 ` Li_Xin2 2009-08-25 14:04 ` Andi Kleen 1 sibling, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2009-08-25 13:13 UTC (permalink / raw) To: Li_Xin2; +Cc: linux-kernel, Linux Netdev List Li_Xin2@emc.com a écrit : > Greetings, > > I found one problem in Linux TCP keepalive timer processing, after > searching on google, I found Daniel Stempel reported the same problem in > 2007 (http://lkml.indiana.edu/hypermail/linux/kernel/0702.2/1136.html), > but got no answer. So I have to reraise it. > > Can anyone help answer this two-years long question? > > You should explain your problem in detail, since Daniel one was probably different. He mentioned "(timeout is set to e.g. 30 seconds)" which is kind of nasty, given normal one is 7200 If some packets are in flight, keepalive is not fired at all, since normal retransmits should take place (check tcp_retries2 sysctl). TCP Keepalive is only fired when no trafic occurred for a long time, only if SO_KEEPALIVE socket option was enabled by application. tcp_retries2 (integer; default: 15) The maximum number of times a TCP packet is retransmitted in established state before giving up. The default value is 15, which corresponds to a duration of approximately between 13 to 30 minutes, depending on the retransmission timeout. The RFC 1122 specified minimum limit of 100 seconds is typically deemed too short. ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: TCP keepalive timer problem 2009-08-25 13:13 ` TCP keepalive timer problem Eric Dumazet @ 2009-08-25 14:05 ` Li_Xin2 2009-08-27 12:45 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Li_Xin2 @ 2009-08-25 14:05 UTC (permalink / raw) To: eric.dumazet; +Cc: linux-kernel, netdev Thanks for your quick reply, let me explain my problem in detail. Suppose the client side of communication sets the keep alive socket option, connects to server, then we pulls out the network cable of server box. After the connection is idle for TCP_KEEPIDLE seconds, the first keepalive probe packet is sent, and of course no reply is received. Just after the first probe packet, the client sends some data. No response is received, and as you said, the normal retransmission takes place and no further keepalive probe will be sent. The problem is: application that tries the keepalive mechanism expects communication peer crash detection within TCP_KEEPIDLE + TCP_KEEPCNT * TCP_KEEPINTVL seconds. Application may set relative smaller TCP_KEEPIDLE, TCP_KEEPCNT and TCP_KEEPINTVL value so that peer crash can be detected quickly, for example, 60 seconds. But if the keepalive is intervened with retransmission, the latter takes higher priority, so that peer crash will be detected after 13 to 30 minutes, which may not be acceptable for some applications. We tried TCP implementation on Windows XP SP3, the keepalive and retransmission don't intervene. Regards, Xin Li EMC Shanghai R&D Centre Email: Li_Xin2@emc.com Tel: 86 21 6095 1100 x 2257 -----Original Message----- From: Eric Dumazet [mailto:eric.dumazet@gmail.com] Sent: 2009年8月25日 21:13 To: Li, Xin Cc: linux-kernel@vger.kernel.org; Linux Netdev List Subject: Re: TCP keepalive timer problem Li_Xin2@emc.com a écrit : > Greetings, > > I found one problem in Linux TCP keepalive timer processing, after > searching on google, I found Daniel Stempel reported the same problem in > 2007 (http://lkml.indiana.edu/hypermail/linux/kernel/0702.2/1136.html), > but got no answer. So I have to reraise it. > > Can anyone help answer this two-years long question? > > You should explain your problem in detail, since Daniel one was probably different. He mentioned "(timeout is set to e.g. 30 seconds)" which is kind of nasty, given normal one is 7200 If some packets are in flight, keepalive is not fired at all, since normal retransmits should take place (check tcp_retries2 sysctl). TCP Keepalive is only fired when no trafic occurred for a long time, only if SO_KEEPALIVE socket option was enabled by application. tcp_retries2 (integer; default: 15) The maximum number of times a TCP packet is retransmitted in established state before giving up. The default value is 15, which corresponds to a duration of approximately between 13 to 30 minutes, depending on the retransmission timeout. The RFC 1122 specified minimum limit of 100 seconds is typically deemed too short. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TCP keepalive timer problem 2009-08-25 14:05 ` Li_Xin2 @ 2009-08-27 12:45 ` Eric Dumazet 2009-08-27 13:35 ` Andi Kleen 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2009-08-27 12:45 UTC (permalink / raw) To: Li_Xin2; +Cc: linux-kernel, netdev Please dont top post on these lists, find my answers below Li_Xin2@emc.com a écrit : > > Thanks for your quick reply, let me explain my problem in detail. > > Suppose the client side of communication sets the keep alive socket option, connects to > server, then > we pulls out the network cable of server box. After the connection is idle for TCP_KEEPIDLE seconds, the first keepalive probe packet is sent, and of course no reply is received. Just after the first probe packet, the client sends some data. No response is received, and as you said, the normal retransmission takes place and no further keepalive probe will be sent. > > The problem is: application that tries the keepalive mechanism expects communication peer crash detection within TCP_KEEPIDLE + TCP_KEEPCNT * TCP_KEEPINTVL seconds. Application may set relative smaller TCP_KEEPIDLE, TCP_KEEPCNT and TCP_KEEPINTVL value so that peer crash can be detected quickly, for example, 60 seconds. But if the keepalive is intervened with retransmission, the latter takes higher priority, so that peer crash will be detected after 13 to 30 minutes, which may not be acceptable for some applications. > > We tried TCP implementation on Windows XP SP3, the keepalive and retransmission don't intervene. > > Regards, > Xin Li > EMC Shanghai R&D Centre > Email: Li_Xin2@emc.com > Tel: 86 21 6095 1100 x 2257 > > -----Original Message----- > From: Eric Dumazet [mailto:eric.dumazet@gmail.com] > Sent: 2009年8月25日 21:13 > To: Li, Xin > Cc: linux-kernel@vger.kernel.org; Linux Netdev List > Subject: Re: TCP keepalive timer problem > > Li_Xin2@emc.com a écrit : >> Greetings, >> >> I found one problem in Linux TCP keepalive timer processing, after >> searching on google, I found Daniel Stempel reported the same problem in >> 2007 (http://lkml.indiana.edu/hypermail/linux/kernel/0702.2/1136.html), >> but got no answer. So I have to reraise it. >> >> Can anyone help answer this two-years long question? >> >> > > You should explain your problem in detail, since Daniel one was probably different. > > He mentioned "(timeout is set to e.g. 30 seconds)" which is kind of nasty, given normal one is 7200 > > If some packets are in flight, keepalive is not fired at all, since normal > retransmits should take place (check tcp_retries2 sysctl). > > TCP Keepalive is only fired when no trafic occurred for a long time, only if > SO_KEEPALIVE socket option was enabled by application. > > tcp_retries2 (integer; default: 15) > The maximum number of times a TCP packet is retransmitted in established state > before giving up. The default value is 15, which corresponds to a duration of > approximately between 13 to 30 minutes, depending on the retransmission timeout. > The RFC 1122 specified minimum limit of 100 seconds is typically deemed too short. > RFC1122 , section 4.2.3.6 tells : Keep-alive packets MUST only be sent when no data or acknowledgement packets have been received for the connection within an interval. This interval MUST be configurable and MUST default to no less than two hours. So : Normal tcp_retries2 settings should make sure connection is reset if packets in flight are not acknowledged way before TCP_KEEPIDLE (>= 7200 seconds) Now, 7200 seconds might be inappropriate for special needs, and considering there is no way to change tcp_retries2 for a given socket (only choice being the global tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC, and get smaller keepalive timers if possible. So when keepalive_timer fires, we should not care of outgoing packets, only care on tp->rcv_tstamp, timestamp of last received ACK. diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index b144a26..719f198 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -484,18 +484,13 @@ static void tcp_keepalive_timer (unsigned long data) } } tcp_send_active_reset(sk, GFP_ATOMIC); - goto death; + tcp_done(sk); + goto out; } if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE) goto out; - elapsed = keepalive_time_when(tp); - - /* It is alive without keepalive 8) */ - if (tp->packets_out || tcp_send_head(sk)) - goto resched; - elapsed = tcp_time_stamp - tp->rcv_tstamp; if (elapsed >= keepalive_time_when(tp)) { @@ -522,13 +517,7 @@ static void tcp_keepalive_timer (unsigned long data) TCP_CHECK_TIMER(sk); sk_mem_reclaim(sk); -resched: inet_csk_reset_keepalive_timer (sk, elapsed); - goto out; - -death: - tcp_done(sk); - out: bh_unlock_sock(sk); sock_put(sk); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: TCP keepalive timer problem 2009-08-27 12:45 ` Eric Dumazet @ 2009-08-27 13:35 ` Andi Kleen 2009-08-27 14:17 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Andi Kleen @ 2009-08-27 13:35 UTC (permalink / raw) To: Eric Dumazet; +Cc: Li_Xin2, linux-kernel, netdev Eric Dumazet <eric.dumazet@gmail.com> writes: > > Now, 7200 seconds might be inappropriate for special needs, and considering > there is no way to change tcp_retries2 for a given socket (only choice being the global > tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC, > and get smaller keepalive timers if possible. I think the better fix would be to just to only do that when tcp_retries2 > keep alive time. So keep the existing behaviour with default keep alive, but switch when the user defined a very short keep alive. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TCP keepalive timer problem 2009-08-27 13:35 ` Andi Kleen @ 2009-08-27 14:17 ` Eric Dumazet 2009-08-27 14:29 ` Andi Kleen 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2009-08-27 14:17 UTC (permalink / raw) To: Andi Kleen; +Cc: Li_Xin2, linux-kernel, netdev Andi Kleen a écrit : > Eric Dumazet <eric.dumazet@gmail.com> writes: >> Now, 7200 seconds might be inappropriate for special needs, and considering >> there is no way to change tcp_retries2 for a given socket (only choice being the global >> tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC, >> and get smaller keepalive timers if possible. > > I think the better fix would be to just to only do that when > tcp_retries2 > keep alive time. So keep the existing behaviour > with default keep alive, but switch when the user defined > a very short keep alive. > tcp_retries2 is a number of retries, its difficult to derive a time from it. Also, it's not clear what behavior you are refering to. Imagine we can be smart and compute tcp_retries2_time (in jiffies) from tcp_retries2 If keepalive_timer fires and we have packets in flight, what heuristic do you suggest ? if (tp->packets_out || tcp_send_head(sk)) if (tcp_retries2_time < keepalive_time_when(tp)) goto resched; elapsed = tcp_time_stamp - tp->rcv_tstamp; ... What would be the gain ? Arming timer exactly every keepalive_time_when(tp) instead of keepalive_time_when(tp) - (tcp_time_stamp - tp->rcv_tstamp) ? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TCP keepalive timer problem 2009-08-27 14:17 ` Eric Dumazet @ 2009-08-27 14:29 ` Andi Kleen 2009-08-27 14:49 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Andi Kleen @ 2009-08-27 14:29 UTC (permalink / raw) To: Eric Dumazet; +Cc: Andi Kleen, Li_Xin2, linux-kernel, netdev On Thu, Aug 27, 2009 at 04:17:10PM +0200, Eric Dumazet wrote: > Andi Kleen a écrit : > > Eric Dumazet <eric.dumazet@gmail.com> writes: > >> Now, 7200 seconds might be inappropriate for special needs, and considering > >> there is no way to change tcp_retries2 for a given socket (only choice being the global > >> tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC, > >> and get smaller keepalive timers if possible. > > > > I think the better fix would be to just to only do that when > > tcp_retries2 > keep alive time. So keep the existing behaviour > > with default keep alive, but switch when the user defined > > a very short keep alive. > > > > tcp_retries2 is a number of retries, its difficult to derive a time from it. That shouldn't be too hard. > > Also, it's not clear what behavior you are refering to. > Imagine we can be smart and compute tcp_retries2_time (in jiffies) from tcp_retries2 > If keepalive_timer fires and we have packets in flight, what heuristic do you suggest ? I didn't suggest to change something at firing time, just pattern the code you removed with if (keepalive_time > retries2 time) That's not perfect, but likely good enough. > if (tp->packets_out || tcp_send_head(sk)) > if (tcp_retries2_time < keepalive_time_when(tp)) > goto resched; > elapsed = tcp_time_stamp - tp->rcv_tstamp; > ... > > What would be the gain ? > Arming timer exactly every keepalive_time_when(tp) > instead of keepalive_time_when(tp) - (tcp_time_stamp - tp->rcv_tstamp) ? The gain would be that you don't send unnecessary packets by default (following the RFC), but still give expected behaviour to users who explicitely set short keepalives. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TCP keepalive timer problem 2009-08-27 14:29 ` Andi Kleen @ 2009-08-27 14:49 ` Eric Dumazet 2009-08-28 1:55 ` Li_Xin2 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2009-08-27 14:49 UTC (permalink / raw) To: Andi Kleen; +Cc: Li_Xin2, linux-kernel, netdev Andi Kleen a écrit : > On Thu, Aug 27, 2009 at 04:17:10PM +0200, Eric Dumazet wrote: >> Andi Kleen a écrit : >>> Eric Dumazet <eric.dumazet@gmail.com> writes: >>>> Now, 7200 seconds might be inappropriate for special needs, and considering >>>> there is no way to change tcp_retries2 for a given socket (only choice being the global >>>> tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC, >>>> and get smaller keepalive timers if possible. >>> I think the better fix would be to just to only do that when >>> tcp_retries2 > keep alive time. So keep the existing behaviour >>> with default keep alive, but switch when the user defined >>> a very short keep alive. >>> >> tcp_retries2 is a number of retries, its difficult to derive a time from it. > > That shouldn't be too hard. > >> Also, it's not clear what behavior you are refering to. >> Imagine we can be smart and compute tcp_retries2_time (in jiffies) from tcp_retries2 >> If keepalive_timer fires and we have packets in flight, what heuristic do you suggest ? > > I didn't suggest to change something at firing time, just pattern > the code you removed with if (keepalive_time > retries2 time) > > That's not perfect, but likely good enough. > > >> if (tp->packets_out || tcp_send_head(sk)) >> if (tcp_retries2_time < keepalive_time_when(tp)) >> goto resched; >> elapsed = tcp_time_stamp - tp->rcv_tstamp; >> ... >> >> What would be the gain ? >> Arming timer exactly every keepalive_time_when(tp) >> instead of keepalive_time_when(tp) - (tcp_time_stamp - tp->rcv_tstamp) ? > > The gain would be that you don't send unnecessary packets by default (following the RFC), but > still give expected behaviour to users who explicitely set short keepalives. > Yep, so to recap we have two changes : 1) The one I sent (taking into account the time of last ACK we received) to compute the timer delays. 2) The one you suggest, avoiding to send a probe if we have packets in flight, relying on normal retransmits timers. ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: TCP keepalive timer problem 2009-08-27 14:49 ` Eric Dumazet @ 2009-08-28 1:55 ` Li_Xin2 2009-08-28 7:05 ` Damian Lukowski 0 siblings, 1 reply; 10+ messages in thread From: Li_Xin2 @ 2009-08-28 1:55 UTC (permalink / raw) To: eric.dumazet, andi; +Cc: linux-kernel, netdev > Yep, so to recap we have two changes : > 1) The one I sent (taking into account the time of last ACK we received) to compute the > timer delays. > 2) The one you suggest, avoiding to send a probe if we have packets in flight, relying > on normal retransmits timers. I agree with these two changes, but I think the patch for point 2 given by Eric is not correct: elapsed = keepalive_time_when(tp); if (tp->packets_out || tcp_send_head(sk)) if (tcp_retries2_time < keepalive_time_when(tp)) goto resched; elapsed = tcp_time_stamp - tp->rcv_tstamp; The problem is: you should not always compare tcp_retries2_time with keepalive_time_when. If the first keepalive probe is already sent, you should compare with keepalive_intvl_when ( or keepalive_intvl_when * ( tp->keepalive_probes?:sysctl_tcp_keepalive_probes - icsk->icsk_probes_out ) if keepalive probe is already sent, and the elapsed time also needs to take that into account. elapsed = icsk->icsk_probes_out?keepalive_intvl_when(tp):keepalive_time_when(tp); if (tp->packets_out || tcp_send_head(sk)) if (tcp_retries2_time < (icsk->icsk_probes_out? keepalive_intvl_when(tp) : keepalive_time_when(tp)) goto resched; elapsed = tcp_time_stamp - tp->rcv_tstamp; How do you think? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TCP keepalive timer problem 2009-08-28 1:55 ` Li_Xin2 @ 2009-08-28 7:05 ` Damian Lukowski 0 siblings, 0 replies; 10+ messages in thread From: Damian Lukowski @ 2009-08-28 7:05 UTC (permalink / raw) To: Li_Xin2; +Cc: eric.dumazet, andi, netdev Li_Xin2@emc.com schrieb: > > Yep, so to recap we have two changes : > >> 1) The one I sent (taking into account the time of last ACK we > received) to compute the >> timer delays. > >> 2) The one you suggest, avoiding to send a probe if we have packets in > flight, relying >> on normal retransmits timers. > > I agree with these two changes, but I think the patch for point 2 given > by Eric is not correct: > > > elapsed = keepalive_time_when(tp); > > if (tp->packets_out || tcp_send_head(sk)) > if (tcp_retries2_time < keepalive_time_when(tp)) > goto resched; > elapsed = tcp_time_stamp - tp->rcv_tstamp; Hello, I didn't follow the conversation in all details, but I submitted a series of patches recently, in which one of those the time was calculated out of tcp_retries. It's the post "[PATCH 3/3] Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value." from Aug 26, 2009 at 12:16 CEST. This is the code with the calculation procedure: -- > static inline bool retransmits_timed_out(const struct sock *sk, > unsigned int boundary) > { > int limit, K; > if (!inet_csk(sk)->icsk_retransmits) > return false; > > K = ilog2(TCP_RTO_MAX/TCP_RTO_MIN); > > if (boundary <= K) > limit = ((2 << boundary) - 1) * TCP_RTO_MIN; > else > limit = ((2 << K) - 1) * TCP_RTO_MIN + > (boundary - K) * TCP_RTO_MAX; > > return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= limit; > } -- Given some boundary like tcp_retries2, it computes a limit, which should be your tcp_retries2_time. If that's not what you meant, then I'm sorry, and you can ignore it. :) Regards ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TCP keepalive timer problem [not found] <0939B589FC103041945B9F13274963E303B1A9D4@CORPUSMX90A.corp.emc.com> 2009-08-25 13:13 ` TCP keepalive timer problem Eric Dumazet @ 2009-08-25 14:04 ` Andi Kleen 1 sibling, 0 replies; 10+ messages in thread From: Andi Kleen @ 2009-08-25 14:04 UTC (permalink / raw) To: Li_Xin2; +Cc: linux-kernel, netdev <Li_Xin2@emc.com> writes: [cc netdev] > Greetings, > > I found one problem in Linux TCP keepalive timer processing, after > searching on google, I found Daniel Stempel reported the same problem in > 2007 (http://lkml.indiana.edu/hypermail/linux/kernel/0702.2/1136.html), > but got no answer. So I have to reraise it. > > Can anyone help answer this two-years long question? I think the idea behind the tcp_keepalive_timer check referrenced in the other mail is: when there are outstanding non acked packets then the normal retransmit timer will do keep alive because it will retransmit and retransmit in exponential backoff and eventually notice something is wrong. The obvious hole is that if the keepalive is shorter than the worst case retransmit timeout then you'll have to wait for the longer timeout. I presume that's what is happening for you? You set the keep alive timeout very low and expect the timeout to be very low, but it's 30+mins (default retransmit timeout)? -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-08-28 7:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <0939B589FC103041945B9F13274963E303B1A9D4@CORPUSMX90A.corp.emc.com>
2009-08-25 13:13 ` TCP keepalive timer problem Eric Dumazet
2009-08-25 14:05 ` Li_Xin2
2009-08-27 12:45 ` Eric Dumazet
2009-08-27 13:35 ` Andi Kleen
2009-08-27 14:17 ` Eric Dumazet
2009-08-27 14:29 ` Andi Kleen
2009-08-27 14:49 ` Eric Dumazet
2009-08-28 1:55 ` Li_Xin2
2009-08-28 7:05 ` Damian Lukowski
2009-08-25 14:04 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).