From: Eric Dumazet <eric.dumazet@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: netdev <netdev@vger.kernel.org>,
"Tom Herbert" <therbert@google.com>,
"Neal Cardwell" <ncardwell@google.com>,
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>,
"H.K. Jerry Chu" <hkchu@google.com>,
"Yuchung Cheng" <ycheng@google.com>
Subject: [PATCH net-next] tcp: reduce out_of_order memory use
Date: Sun, 18 Mar 2012 06:37:34 -0700 [thread overview]
Message-ID: <1332077854.3722.52.camel@edumazet-laptop> (raw)
With increasing receive window sizes, but speed of light not improved
that much, out of order queue can contain a huge number of skbs, waiting
to be moved to receive_queue when missing packets can fill the holes.
Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
sk_buff)) to store regular (MTU <= 1500) frames. This makes highly
probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in
many cases.
When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true
latency killer and cpu cache blower.
Doing the coalescing attempt each time we add a frame in ofo queue
permits to keep memory use tight and in many cases avoid the
tcp_collapse() thing later.
Tested on various wireless setups (b43, ath9k, ...) known to use big skb
truesize, this patch removed the "packets collapsed in receive queue due
to low socket buffer" I had before.
This also reduced average memory used by tcp sockets.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
include/linux/snmp.h | 1 +
net/ipv4/proc.c | 1 +
net/ipv4/tcp_input.c | 19 +++++++++++++++++--
3 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/include/linux/snmp.h b/include/linux/snmp.h
index 8ee8af4..2e68f5b 100644
--- a/include/linux/snmp.h
+++ b/include/linux/snmp.h
@@ -233,6 +233,7 @@ enum
LINUX_MIB_TCPREQQFULLDOCOOKIES, /* TCPReqQFullDoCookies */
LINUX_MIB_TCPREQQFULLDROP, /* TCPReqQFullDrop */
LINUX_MIB_TCPRETRANSFAIL, /* TCPRetransFail */
+ LINUX_MIB_TCPRCVCOALESCE, /* TCPRcvCoalesce */
__LINUX_MIB_MAX
};
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 02d6107..8af0d44 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -257,6 +257,7 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM("TCPReqQFullDoCookies", LINUX_MIB_TCPREQQFULLDOCOOKIES),
SNMP_MIB_ITEM("TCPReqQFullDrop", LINUX_MIB_TCPREQQFULLDROP),
SNMP_MIB_ITEM("TCPRetransFail", LINUX_MIB_TCPRETRANSFAIL),
+ SNMP_MIB_ITEM("TCPRcvCoalesce", LINUX_MIB_TCPRCVCOALESCE),
SNMP_MIB_SENTINEL
};
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 68d4057..51fe686 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4590,8 +4590,23 @@ drop:
u32 end_seq = TCP_SKB_CB(skb)->end_seq;
if (seq == TCP_SKB_CB(skb1)->end_seq) {
- __skb_queue_after(&tp->out_of_order_queue, skb1, skb);
-
+ /* Packets in ofo can stay in queue a long time.
+ * Better try to coalesce them right now
+ * to avoid future tcp_collapse_ofo_queue(),
+ * probably the most expensive function in tcp stack.
+ */
+ if (skb->len <= skb_tailroom(skb1) && !th->fin) {
+ NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
+ BUG_ON(skb_copy_bits(skb, 0,
+ skb_put(skb1, skb->len),
+ skb->len));
+ TCP_SKB_CB(skb1)->end_seq = end_seq;
+ TCP_SKB_CB(skb1)->ack_seq = TCP_SKB_CB(skb)->ack_seq;
+ __kfree_skb(skb);
+ skb = NULL;
+ } else {
+ __skb_queue_after(&tp->out_of_order_queue, skb1, skb);
+ }
if (!tp->rx_opt.num_sacks ||
tp->selective_acks[0].end_seq != seq)
goto add_sack;
next reply other threads:[~2012-03-18 13:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-18 13:37 Eric Dumazet [this message]
2012-03-18 16:55 ` [PATCH net-next] tcp: reduce out_of_order memory use Neal Cardwell
2012-03-18 19:40 ` Eric Dumazet
2012-03-18 21:06 ` [PATCH v2 net-next 1/2] tcp: introduce tcp_data_queue_ofo Eric Dumazet
2012-03-19 3:49 ` Neal Cardwell
2012-03-19 20:57 ` David Miller
2012-03-18 21:07 ` [PATCH v2 net-next 2/2] tcp: reduce out_of_order memory use Eric Dumazet
2012-03-19 3:53 ` Neal Cardwell
2012-03-19 20:57 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1332077854.3722.52.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=davem@davemloft.net \
--cc=hkchu@google.com \
--cc=ilpo.jarvinen@helsinki.fi \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=therbert@google.com \
--cc=ycheng@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox