From: Wayne Badger <badger@yahoo-inc.com>
To: <netdev@vger.kernel.org>
Subject: TCP_DEFER_ACCEPT wakes up without data
Date: Wed, 4 Jun 2014 19:14:30 -0500 [thread overview]
Message-ID: <538FB666.9050303@yahoo-inc.com> (raw)
I would like to revisit the current state of TCP_DEFER_ACCEPT. I know
that it has been discussed in the past and the discussion about 4.5
years ago led to commit d1b99ba41d6c5aa1ed2fc634323449dd656899e9 (tcp:
accept socket after TCP_DEFER_ACCEPT period) being introduced which
changed the semantics of TCP_DEFER_ACCEPT.
Prior to commit d1b99ba41d, the open request was never queued to the
listener for accepting if no data was ever received. After the commit,
the open request is queued to the listener upon receipt of an ACK from
the retransmitted SYN-ACK, which is sent TCP_DEFER_ACCEPT seconds
(which value is provided in the TCP_DEFER_ACCEPT setsockopt) after the
3-way handshake occurs (with appropriate rounding up to account for the
geometric rexmit algorithm).
The tcp(7) man page says that the listener is awakened _only_ when data
arrives on the socket. We prefer this behavior because it reduces
memory (minisock vs. sock) and scheduling/context switch overhead. When
no data is ever sent by the client, letting the kernel dispose of the
open request is more efficient than letting the listener dispose of it.
Is there a way to get this behavior with a recent kernel that I have not
discovered?
This discussion is not just academic. We constantly receive dataless
connection requests from clients. On some machines, we have seen these
types of connections comprise up to 20% of our connections. With the
current TCP_DEFER_ACCEPT behavior, apache creates many threads that end
up just waiting for data, timing out, and then closing the connection.
We can increase the TCP_DEFER_ACCEPT value, but that means that we are
just consuming kernel resources for longer than we want and end up
having apache close the connection anyway.
A few questions.
If the new behavior was desired at the time of the commit, why was the
existing TCP_DEFER_ACCEPT behavior changed instead of just adding a new
sockopt that implements the new behavior? Why was the tcp(7) man page
not modified to describe the new behavior?
I propose the following patch to regain the tcp(7) semantics. The patch
resolves dataless connections in one of two ways. After the initial
timeout, a SYN-ACK is retransmitted and if a bare ACK is received, then
a RST is sent to the client and the connection is immediately dropped.
If nothing is received from the client, then the connection is silently
dropped after another rexmit timeout (the next one in the geometric
sequence).
I understand that the patch is incomplete. For example, it doesn't
handle syncookies but I wanted to initiate the discussion.
Thanks,
Wayne
--- tcp_minisocks.c 2014-05-01 15:53:47.000000000 -0500
+++ tcp_minisocks.c.new 2014-05-16 16:08:59.000000000 -0500
@@ -672,12 +672,31 @@ struct sock *tcp_check_req(struct sock *
if (fastopen)
return sk;
- /* While TCP_DEFER_ACCEPT is active, drop bare ACK. */
- if (req->num_timeout < inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
- TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
- inet_rsk(req)->acked = 1;
- NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPDEFERACCEPTDROP);
- return NULL;
+ if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept) {
+ /* While TCP_DEFER_ACCEPT is active, drop bare ACK.
+ * Send a RST and drop the connection if the timer has expired.
+ */
+ if (TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
+ if (req->retrans < inet_csk(sk)->icsk_accept_queue.rskq_defer_accept) {
+ inet_rsk(req)->acked = 1;
+ NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPDEFERACCEPTDROP);
+ } else {
+ req->rsk_ops->send_reset(sk, skb);
+ inet_csk_reqsk_queue_drop(sk, req, prev);
+ }
+ return NULL;
+ }
+
+ /* Make a finished and dataless TCP_DEFER_ACCEPT connection
+ * request "disappear" without instantiating the socket or
+ * bothering user space.
+ */
+ if ((TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 2) &&
+ (tcp_flag_word(th) & TCP_FLAG_FIN)) {
+ req->rsk_ops->send_reset(sk, skb);
+ inet_csk_reqsk_queue_drop(sk, req, prev);
+ return NULL;
+ }
}
/* OK, ACK is valid, create big socket and
next reply other threads:[~2014-06-05 0:26 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-05 0:14 Wayne Badger [this message]
2014-06-07 15:41 ` TCP_DEFER_ACCEPT wakes up without data Julian Anastasov
2014-06-11 0:57 ` Wayne Badger
2014-06-12 6:00 ` Julian Anastasov
[not found] ` <58a4abb51fe9411fbc7b1a58a2a6f5da@UCL-MBX03.OASIS.UCLOUVAIN.BE>
2020-06-04 23:18 ` Christoph Paasch
2020-06-05 1:27 ` Eric Dumazet
2020-06-05 14:57 ` Christoph Paasch
2020-06-05 16:48 ` Eric Dumazet
2020-06-07 10:00 ` Florian Westphal
2020-06-07 15:05 ` Leif Hedstrom
2020-06-09 16:45 ` Christoph Paasch
2020-06-05 19:47 ` Julian Anastasov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=538FB666.9050303@yahoo-inc.com \
--to=badger@yahoo-inc.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).