netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Li_Xin2@emc.com
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: TCP keepalive timer problem
Date: Thu, 27 Aug 2009 14:45:02 +0200	[thread overview]
Message-ID: <4A967FCE.3000807@gmail.com> (raw)
In-Reply-To: <0939B589FC103041945B9F13274963E303B1AD89@CORPUSMX90A.corp.emc.com>

Please dont top post on these lists, find my answers below

Li_Xin2@emc.com a écrit :
>  
> Thanks for your quick reply, let me explain my problem in detail.
> 
> Suppose the client side of communication sets the keep alive socket option, connects to
> server, then > we pulls out the network cable of server box. After the connection is idle for TCP_KEEPIDLE 

seconds, the first keepalive probe packet is sent, and of course no reply is received. 

Just after the first probe packet, the client sends some data. No response is received, and 

as you said, the normal retransmission takes place and no further keepalive probe will be sent. 
> 
> 	The problem is: application that tries the keepalive mechanism expects communication peer 

crash detection within TCP_KEEPIDLE + TCP_KEEPCNT * TCP_KEEPINTVL seconds. Application may set

 relative smaller TCP_KEEPIDLE, TCP_KEEPCNT and TCP_KEEPINTVL value so that peer crash can be

 detected quickly, for example, 60 seconds. But if the keepalive is intervened with 

retransmission, the latter takes higher priority, so that peer crash will be detected after

 13 to 30 minutes, which may not be acceptable for some applications.
> 
> We tried TCP implementation on Windows XP SP3, the keepalive and retransmission don't intervene.
> 


> Regards,
> Xin Li
> EMC Shanghai R&D Centre
> Email: Li_Xin2@emc.com
> Tel: 86 21 6095 1100 x 2257
> 
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
> Sent: 2009年8月25日 21:13
> To: Li, Xin
> Cc: linux-kernel@vger.kernel.org; Linux Netdev List
> Subject: Re: TCP keepalive timer problem
> 
> Li_Xin2@emc.com a écrit :
>> Greetings,
>>
>> I found one problem in Linux TCP keepalive timer processing, after
>> searching on google, I found Daniel Stempel reported the same problem in
>> 2007 (http://lkml.indiana.edu/hypermail/linux/kernel/0702.2/1136.html),
>> but got no answer. So I have to reraise it.
>>
>> Can anyone help answer this two-years long question?
>>
>>
> 
> You should explain your problem in detail, since Daniel one was probably different.
> 
> He mentioned "(timeout is set to e.g. 30 seconds)" which is kind of nasty, given normal one is 7200
> 
> If some packets are in flight, keepalive is not fired at all, since normal
> retransmits should take place (check tcp_retries2 sysctl).
> 
> TCP Keepalive is only fired when no trafic occurred for a long time, only if 
> SO_KEEPALIVE socket option was enabled by application.
> 
> tcp_retries2 (integer; default: 15)
>     The maximum number of times a TCP packet is retransmitted in established state
> before giving up. The default value is 15, which corresponds to a duration of
> approximately between 13 to 30 minutes, depending on the retransmission timeout.
> The RFC 1122 specified minimum limit of 100 seconds is typically deemed too short. 
> 

RFC1122 , section 4.2.3.6 tells :

Keep-alive packets MUST only be sent when no data or acknowledgement packets have
 been received for the connection within an interval. This interval MUST be 
configurable and MUST default to no less than two hours. 

So :

Normal tcp_retries2 settings should make sure connection is reset if packets in flight are not acknowledged way before TCP_KEEPIDLE (>= 7200 seconds)


Now, 7200 seconds might be inappropriate for special needs, and considering
there is no way to change tcp_retries2 for a given socket (only choice being the global
tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC,
and get smaller keepalive timers if possible.

So when keepalive_timer fires, we should not care of outgoing packets,
only care on tp->rcv_tstamp, timestamp of last received ACK.


diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index b144a26..719f198 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -484,18 +484,13 @@ static void tcp_keepalive_timer (unsigned long data)
 			}
 		}
 		tcp_send_active_reset(sk, GFP_ATOMIC);
-		goto death;
+		tcp_done(sk);
+		goto out;
 	}
 
 	if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE)
 		goto out;
 
-	elapsed = keepalive_time_when(tp);
-
-	/* It is alive without keepalive 8) */
-	if (tp->packets_out || tcp_send_head(sk))
-		goto resched;
-
 	elapsed = tcp_time_stamp - tp->rcv_tstamp;
 
 	if (elapsed >= keepalive_time_when(tp)) {
@@ -522,13 +517,7 @@ static void tcp_keepalive_timer (unsigned long data)
 	TCP_CHECK_TIMER(sk);
 	sk_mem_reclaim(sk);
 
-resched:
 	inet_csk_reset_keepalive_timer (sk, elapsed);
-	goto out;
-
-death:
-	tcp_done(sk);
-
 out:
 	bh_unlock_sock(sk);
 	sock_put(sk);

  reply	other threads:[~2009-08-27 12:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0939B589FC103041945B9F13274963E303B1A9D4@CORPUSMX90A.corp.emc.com>
2009-08-25 13:13 ` TCP keepalive timer problem Eric Dumazet
2009-08-25 14:05   ` Li_Xin2
2009-08-27 12:45     ` Eric Dumazet [this message]
2009-08-27 13:35       ` Andi Kleen
2009-08-27 14:17         ` Eric Dumazet
2009-08-27 14:29           ` Andi Kleen
2009-08-27 14:49             ` Eric Dumazet
2009-08-28  1:55               ` Li_Xin2
2009-08-28  7:05                 ` Damian Lukowski
2009-08-25 14:04 ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A967FCE.3000807@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=Li_Xin2@emc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).