All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Heffner <jheffner@psc.edu>
To: TJ <linux@tjworld.net>
Cc: netdev@vger.kernel.org
Subject: Re: Problem with implementation of TCP_DEFER_ACCEPT?
Date: Fri, 24 Aug 2007 00:40:14 -0400	[thread overview]
Message-ID: <46CE612E.60208@psc.edu> (raw)
In-Reply-To: <1187914105.26866.41.camel@bagoas.tjworld.net>

[-- Attachment #1: Type: text/plain, Size: 2665 bytes --]

TJ wrote:
> client SYN > server LISTENING
> client < SYN ACK server SYN_RECEIVED (time-out 3s)
>                  server: inet_rsk(req)->acked = 1
> 
> client ACK > server (discarded)
> 
> client < SYN ACK (DUP) server (time-out 6s)
> client ACK (DUP) > server (discarded)
> 
> client < SYN ACK (DUP) server (time-out 12s)
> client ACK (DUP) > server (discarded)
> 
> client < SYN ACK (DUP) server (time-out 24s)
> client ACK (DUP) > server (discarded)
> 
> client < SYN ACK (DUP) server (time-out 48s)
> client ACK (DUP) > server (discarded)
> 
> client < SYN ACK (DUP) server (time-out 96s)
> client ACK (DUP) > server (discarded)
> 
> server: half-open socket closed.
> 
> With each client ACK being dropped by the kernel's TCP_DEFER_ACCEPT
> mechanism eventually the handshake fails after the 'SYN ACK' retries and
> time-outs expire.
> 
> There is a case for arguing the kernel should be operating in an
> enhanced handshaking mode when TCP_DEFER_ACCEPT is enabled, not an
> alternative mode, and therefore should accept *both* RFC 793 and
> TCP_DEFER_ACCEPT. I've been unable to find a specification or RFC for
> implementing TCP_DEFER_ACCEPT aka BSD's SO_ACCEPTFILTER to give me firm
> guidance.
> 
> It seems incorrect to penalise a client that is trying to complete the
> handshake according to the RFC 793 specification, especially as the
> client has no way of knowing ahead of time whether or not the server is
> operating deferred accept.

Interesting problem.  TCP_DEFER_ACCEPT does not conform to any standard 
I'm aware of.  (In fact, I'd say it's in violation of RFC 793.)  The 
implementation does exactly what it claims, though -- it "allows a 
listener to be awakened only  when  data  arrives  on  the  socket."

I think a more useful spec might have been "allows a listener to be 
awakened only when data arrives on the socket, unless the specified 
timeout has expired."  Once the timeout expires, it should process the 
embryonic connection as if TCP_DEFER_ACCEPT is not set.  Unfortunately, 
I don't think we can retroactively change this definition, as an 
application might depend on data being available and do a non-blocking 
read() after the accept(), expecting data to be there.  Is this worth 
trying to fix?

Also, a listen socket with a backlog and TCP_DEFER_ACCEPT will have reqs 
sit in the backlog for the full defer timeout, even if they've received 
data, which is not really the right thing to do.

I've attached a patch implementing this suggestion (compile tested only 
-- I think I got the logic right but it's late ;).  Kind of ugly, and 
uses up a bit in struct inet_request_sock.  Maybe can be done better...

   -John

[-- Attachment #2: tcp_defer_accept.patch --]
[-- Type: text/plain, Size: 2737 bytes --]

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 62daf21..f9f64a5 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -72,7 +72,8 @@ struct inet_request_sock {
 				sack_ok	   : 1,
 				wscale_ok  : 1,
 				ecn_ok	   : 1,
-				acked	   : 1;
+				acked	   : 1,
+				deferred   : 1;
 	struct ip_options	*opt;
 };
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 185c7ec..cad2490 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -978,6 +978,7 @@ static inline void tcp_openreq_init(struct request_sock *req,
 	ireq->snd_wscale = rx_opt->snd_wscale;
 	ireq->wscale_ok = rx_opt->wscale_ok;
 	ireq->acked = 0;
+	ireq->deferred = 0;
 	ireq->ecn_ok = 0;
 	ireq->rmt_port = tcp_hdr(skb)->source;
 }
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index fbe7714..1207fb8 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -444,9 +444,6 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 		}
 	}
 
-	if (queue->rskq_defer_accept)
-		max_retries = queue->rskq_defer_accept;
-
 	budget = 2 * (lopt->nr_table_entries / (timeout / interval));
 	i = lopt->clock_hand;
 
@@ -455,7 +452,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 		while ((req = *reqp) != NULL) {
 			if (time_after_eq(now, req->expires)) {
 				if ((req->retrans < thresh ||
-				     (inet_rsk(req)->acked && req->retrans < max_retries))
+				     (inet_rsk(req)->acked && req->retrans < max_retries) ||
+				     (inet_rsk(req)->deferred && req->retrans <
+				      queue->rskq_defer_accept + max_retries))
 				    && !req->rsk_ops->rtx_syn_ack(parent, req, NULL)) {
 					unsigned long timeo;
 
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index a12b08f..c4867f3 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -637,8 +637,10 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 
 		/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
 		if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
-		    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
-			inet_rsk(req)->acked = 1;
+		    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1 &&
+		    !inet_rsk(req)->acked && req->retrans <
+		    inet_csk(sk)->icsk_accept_queue.rskq_defer_accept) {
+			inet_rsk(req)->deferred = 1;
 			return NULL;
 		}
 
@@ -686,6 +688,9 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 	listen_overflow:
 		if (!sysctl_tcp_abort_on_overflow) {
 			inet_rsk(req)->acked = 1;
+			/* If deferred, ACK must contain data.  Shortcut defer. */
+			if (inet_rsk(req)->deferred)
+				req->retrans = inet_csk(sk)->icsk_accept_queue.rskq_defer_accept;
 			return NULL;
 		}
 

  reply	other threads:[~2007-08-24  4:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-24  0:08 Problem with implementation of TCP_DEFER_ACCEPT? TJ
2007-08-24  4:40 ` John Heffner [this message]
2007-08-24  7:15 ` Lennert Buytenhek
2007-08-24  8:40   ` Alexey Kuznetsov
2007-08-24  9:31     ` TJ
2007-08-24 16:09       ` John Heffner
2007-09-02  7:30       ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46CE612E.60208@psc.edu \
    --to=jheffner@psc.edu \
    --cc=linux@tjworld.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.