public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	Wei Liu <wei.liu2@citrix.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Yuchung Cheng <ycheng@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 3.10 02/54] tcp: TSQ can use a dynamic limit
Date: Fri,  1 Nov 2013 15:03:30 -0700	[thread overview]
Message-ID: <20131101220211.591152354@linuxfoundation.org> (raw)
In-Reply-To: <20131101220211.311926234@linuxfoundation.org>

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit c9eeec26e32e087359160406f96e0949b3cc6f10 ]

When TCP Small Queues was added, we used a sysctl to limit amount of
packets queues on Qdisc/device queues for a given TCP flow.

Problem is this limit is either too big for low rates, or too small
for high rates.

Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
auto sizing, it can better control number of packets in Qdisc/device
queues.

New limit is two packets or at least 1 to 2 ms worth of packets.

Low rates flows benefit from this patch by having even smaller
number of packets in queues, allowing for faster recovery,
better RTT estimations.

High rates flows benefit from this patch by allowing more than 2 packets
in flight as we had reports this was a limiting factor to reach line
rate. [ In particular if TX completion is delayed because of coalescing
parameters ]

Example for a single flow on 10Gbp link controlled by FQ/pacing

14 packets in flight instead of 2

$ tc -s -d qd
qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 1024 quantum 3028 initial_quantum 15140
 Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
requeues 6822476)
 rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
  2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
  2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit

Note that sk_pacing_rate is currently set to twice the actual rate, but
this might be refined in the future when a flow is in congestion
avoidance.

Additional change : skb->destructor should be set to tcp_wfree().

A future patch (for linux 3.13+) might remove tcp_limit_output_bytes

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_output.c |   17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -887,8 +887,7 @@ static int tcp_transmit_skb(struct sock
 
 	skb_orphan(skb);
 	skb->sk = sk;
-	skb->destructor = (sysctl_tcp_limit_output_bytes > 0) ?
-			  tcp_wfree : sock_wfree;
+	skb->destructor = tcp_wfree;
 	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
 
 	/* Build TCP header and checksum it. */
@@ -1832,7 +1831,6 @@ static bool tcp_write_xmit(struct sock *
 	while ((skb = tcp_send_head(sk))) {
 		unsigned int limit;
 
-
 		tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
 		BUG_ON(!tso_segs);
 
@@ -1861,13 +1859,20 @@ static bool tcp_write_xmit(struct sock *
 				break;
 		}
 
-		/* TSQ : sk_wmem_alloc accounts skb truesize,
-		 * including skb overhead. But thats OK.
+		/* TCP Small Queues :
+		 * Control number of packets in qdisc/devices to two packets / or ~1 ms.
+		 * This allows for :
+		 *  - better RTT estimation and ACK scheduling
+		 *  - faster recovery
+		 *  - high rates
 		 */
-		if (atomic_read(&sk->sk_wmem_alloc) >= sysctl_tcp_limit_output_bytes) {
+		limit = max(skb->truesize, sk->sk_pacing_rate >> 10);
+
+		if (atomic_read(&sk->sk_wmem_alloc) > limit) {
 			set_bit(TSQ_THROTTLED, &tp->tsq_flags);
 			break;
 		}
+
 		limit = mss_now;
 		if (tso_segs > 1 && !tcp_urg_mode(tp))
 			limit = tcp_mss_split_point(sk, skb, mss_now,



  parent reply	other threads:[~2013-11-01 22:43 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-01 22:03 [PATCH 3.10 00/54] 3.10.18-stable review Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 01/54] tcp: TSO packets automatic sizing Greg Kroah-Hartman
2013-11-01 22:03 ` Greg Kroah-Hartman [this message]
2013-11-01 22:03 ` [PATCH 3.10 03/54] tcp: must unclone packets before mangling them Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 04/54] tcp: do not forget FIN in tcp_shifted_skb() Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 05/54] tcp: fix incorrect ca_state in tail loss probe Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 06/54] net: do not call sock_put() on TIMEWAIT sockets Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 07/54] l2tp: fix kernel panic when using IPv4-mapped IPv6 addresses Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 08/54] l2tp: Fix build warning with ipv6 disabled Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 09/54] net: mv643xx_eth: update statistics timer from timer context only Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 10/54] net: mv643xx_eth: fix orphaned statistics timer crash Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 11/54] net: heap overflow in __audit_sockaddr() Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 12/54] proc connector: fix info leaks Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 13/54] ipv4: fix ineffective source address selection Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 14/54] can: dev: fix nlmsg size calculation in can_get_size() Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 15/54] net: secure_seq: Fix warning when CONFIG_IPV6 and CONFIG_INET are not selected Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 16/54] xen-netback: Dont destroy the netdev until the vif is shut down Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 17/54] net: vlan: fix nlmsg size calculation in vlan_get_size() Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 18/54] vti: get rid of nf mark rule in prerouting Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 19/54] l2tp: must disable bh before calling l2tp_xmit_skb() Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 20/54] farsync: fix info leak in ioctl Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 21/54] unix_diag: fix info leak Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 22/54] connector: use nlmsg_len() to check message length Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 23/54] bnx2x: record rx queue for LRO packets Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 24/54] virtio-net: dont respond to cpu hotplug notifier if were not ready Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 25/54] virtio-net: fix the race between channels setting and refill Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 26/54] virtio-net: refill only when device is up during setting queues Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 27/54] bridge: Correctly clamp MAX forward_delay when enabling STP Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 28/54] net: dst: provide accessor function to dst->xfrm Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 29/54] sctp: Use software crc32 checksum when xfrm transform will happen Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 30/54] sctp: Perform software checksum if packet has to be fragmented Greg Kroah-Hartman
2013-11-01 22:03 ` [PATCH 3.10 31/54] wanxl: fix info leak in ioctl Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 32/54] be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 33/54] net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 34/54] net: fix cipso packet validation when !NETLABEL Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 35/54] inet: fix possible memory corruption with UDP_CORK and UFO Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 36/54] ipv6: always prefer rt6i_gateway if present Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 37/54] ipv6: fill rt6i_gateway with nexthop address Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 38/54] netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 39/54] ipv6: probe routes asynchronous in rt6_probe Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 40/54] davinci_emac.c: Fix IFF_ALLMULTI setup Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 41/54] ARM: 7851/1: check for number of arguments in syscall_get/set_arguments() Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 42/54] ARM: integrator: deactivate timer0 on the Integrator/CP Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 43/54] gpio/lynxpoint: check if the interrupt is enabled in IRQ handler Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 44/54] dm snapshot: fix data corruption Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 45/54] i2c: ismt: initialize DMA buffer Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 46/54] mm: fix BUG in __split_huge_page_pmd Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 47/54] ALSA: us122l: Fix pcm_usb_stream mmapping regression Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 48/54] ALSA: hda - Fix inverted internal mic not indicated on some machines Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 49/54] writeback: fix negative bdi max pause Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 50/54] wireless: radiotap: fix parsing buffer overrun Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 51/54] serial: vt8500: add missing braces Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 52/54] USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 53/54] USB: serial: option: add support for Inovia SEW858 device Greg Kroah-Hartman
2013-11-01 22:04 ` [PATCH 3.10 54/54] usb: serial: option: blacklist Olivetti Olicard200 Greg Kroah-Hartman
2013-11-02  2:30 ` [PATCH 3.10 00/54] 3.10.18-stable review Guenter Roeck
2013-11-02 21:32 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131101220211.591152354@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=stable@vger.kernel.org \
    --cc=wei.liu2@citrix.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox