netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next PATCH V6 0/2] qdisc: bulk dequeue support
@ 2014-10-01 20:35 Jesper Dangaard Brouer
  2014-10-01 20:35 ` [net-next PATCH V6 1/2] qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE Jesper Dangaard Brouer
                   ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: Jesper Dangaard Brouer @ 2014-10-01 20:35 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, David S. Miller, Tom Herbert,
	Eric Dumazet, Hannes Frederic Sowa, Florian Westphal,
	Daniel Borkmann
  Cc: Jamal Hadi Salim, Alexander Duyck, John Fastabend, Dave Taht,
	toke

This patchset uses DaveM's recent API changes to dev_hard_start_xmit(),
from the qdisc layer, to implement dequeue bulking.

Patch01: "qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"
 - Implement basic qdisc dequeue bulking
 - This time, 100% relying on BQL limits, no magic safe-guard constants

Patch02: "qdisc: dequeue bulking also pickup GSO/TSO packets"
 - Extend bulking to bulk several GSO/TSO packets
 - Seperate patch, as it introduce a small regression, see test section.

We do have a patch03, which exports a userspace tunable as a BQL
tunable, that can byte-cap or disable the bulking/bursting.  But we
could not agree on it internally, thus not sending it now.  We
basically strive to avoid adding any new userspace tunable.


Testing patch01:
================
 Demonstrating the performance improvement of qdisc dequeue bulking, is
tricky because the effect only "kicks-in" once the qdisc system have a
backlog. Thus, for a backlog to form, we need either 1) to exceed wirespeed
of the link or 2) exceed the capability of the device driver.

For practical use-cases, the measureable effect of this will be a
reduction in CPU usage

01-TCP_STREAM:
--------------
Testing effect for TCP involves disabling TSO and GSO, because TCP
already benefit from bulking, via TSO and especially for GSO segmented
packets.  This patch view TSO/GSO as a seperate kind of bulking, and
avoid further bulking of these packet types.

The measured perf diff benefit (at 10Gbit/s) for a single netperf
TCP_STREAM were 9.24% less CPU used on calls to _raw_spin_lock()
(mostly from sch_direct_xmit).

If my E5-2695v2(ES) CPU is tuned according to:
 http://netoptimizer.blogspot.dk/2014/04/basic-tuning-for-network-overload.html
Then it is possible that a single netperf TCP_STREAM, with GSO and TSO
disabled, can utilize all bandwidth on a 10Gbit/s link.  This will
then cause a standing backlog queue at the qdisc layer.

Trying to pressure the system some more CPU util wise, I'm starting
24x TCP_STREAMs and monitoring the overall CPU utilization.  This
confirms bulking saves CPU cycles when it "kicks-in".

Tool mpstat, while stressing the system with netperf 24x TCP_STREAM, shows:
 * Disabled bulking: sys:2.58%  soft:8.50%  idle:88.78%
 * Enabled  bulking: sys:2.43%  soft:7.66%  idle:89.79%

02-UDP_STREAM
-------------
The measured perf diff benefit for UDP_STREAM were 6.41% less CPU used
on calls to _raw_spin_lock().  24x UDP_STREAM with packet size -m 1472 (to
avoid sending UDP/IP fragments).

03-trafgen driver test
----------------------
The performance of the 10Gbit/s ixgbe driver is limited due to
updating the HW ring-queue tail-pointer on every packet.  As
previously demonstrated with pktgen.

Using trafgen to send RAW frames from userspace (via AF_PACKET), and
forcing it through qdisc path (with option --qdisc-path and -t0),
sending with 12 CPUs.

I can demonstrate this driver layer limitation:
 * 12.8 Mpps with no qdisc bulking
 * 14.8 Mpps with qdisc bulking (full 10G-wirespeed)


Testing patch02:
================
Testing Bulking several GSO/TSO packets:

Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
netperf-wrapper. Bulking several TSO show no performance regressions
(requeues were in the area 32 requeues/sec for 10G while transmitting
approx 813Kpps).

Bulking several GSOs does show small regression or very small
improvement (requeues were in the area 8000 requeues/sec, for 10G
while transmitting approx 813Kpps).

 Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
latency. Base-case, which is "normal" GSO bulking, sees varying
high-prio queue delay between 0.38ms to 0.47ms.  Bulking several GSOs
together, result in a stable high-prio queue delay of 0.50ms.

Corrosponding to:
 (10000*10^6)*((0.50-0.47)/10^3)/8 = 37500 bytes
 (10000*10^6)*((0.50-0.38)/10^3)/8 = 150000 bytes
 37500/1500  = 25 pkts
 150000/1500 = 100 pkts

 Using igb at 100Mbit/s with GSO bulking, shows an improvement.
Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
diff of 0.12ms corrosponding to 1500 bytes at 100Mbit/s. Bulking
several GSOs together, result in a stable high-prio queue delay of
2.23ms.

---

Jesper Dangaard Brouer (2):
      qdisc: dequeue bulking also pickup GSO/TSO packets
      qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE


 include/net/sch_generic.h |   16 ++++++++++++++++
 net/sched/sch_generic.c   |   40 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 54 insertions(+), 2 deletions(-)

-- 

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2014-10-08 17:39 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-01 20:35 [net-next PATCH V6 0/2] qdisc: bulk dequeue support Jesper Dangaard Brouer
2014-10-01 20:35 ` [net-next PATCH V6 1/2] qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE Jesper Dangaard Brouer
2014-10-01 20:36 ` [net-next PATCH V6 2/2] qdisc: dequeue bulking also pickup GSO/TSO packets Jesper Dangaard Brouer
2014-10-02 14:35   ` Eric Dumazet
2014-10-02 14:38     ` Daniel Borkmann
2014-10-02 14:42 ` [net-next PATCH V6 0/2] qdisc: bulk dequeue support Tom Herbert
2014-10-02 15:04   ` Eric Dumazet
2014-10-02 15:24     ` [PATCH net-next] mlx4: add a new xmit_more counter Eric Dumazet
2014-10-05  0:04       ` David Miller
2014-10-02 15:27     ` [net-next PATCH V6 0/2] qdisc: bulk dequeue support Tom Herbert
2014-10-02 16:52   ` Florian Westphal
2014-10-02 17:32     ` Eric Dumazet
2014-10-02 17:35     ` Tom Herbert
2014-10-03 19:38 ` David Miller
2014-10-03 20:57   ` Eric Dumazet
2014-10-03 21:56     ` David Miller
2014-10-03 21:57       ` Eric Dumazet
2014-10-03 22:15         ` Eric Dumazet
2014-10-03 22:19           ` Tom Herbert
2014-10-03 22:56             ` Eric Dumazet
2014-10-03 22:30           ` David Miller
2014-10-03 22:31         ` [PATCH net-next] qdisc: validate skb without holding lock Eric Dumazet
2014-10-03 22:36           ` David Miller
2014-10-03 23:30             ` Eric Dumazet
2014-10-07  7:34               ` Quota in __qdisc_run() (was: qdisc: validate skb without holding lock) Jesper Dangaard Brouer
2014-10-07 12:47                 ` Eric Dumazet
2014-10-07 13:30                   ` Jesper Dangaard Brouer
2014-10-07 14:43                     ` Hannes Frederic Sowa
2014-10-07 15:01                       ` Eric Dumazet
2014-10-07 15:06                         ` Eric Dumazet
2014-10-07 17:19                         ` Quota in __qdisc_run() David Miller
2014-10-07 17:32                           ` Eric Dumazet
2014-10-07 18:37                             ` Jesper Dangaard Brouer
2014-10-07 20:07                               ` Jesper Dangaard Brouer
2014-10-07 18:03                           ` Jesper Dangaard Brouer
2014-10-07 19:10                             ` Eric Dumazet
2014-10-07 19:34                               ` Jesper Dangaard Brouer
2014-10-07 15:26                       ` Quota in __qdisc_run() (was: qdisc: validate skb without holding lock) Jesper Dangaard Brouer
2014-10-08 17:38                   ` Quota in __qdisc_run() John Fastabend
2014-10-06 14:12             ` [PATCH net-next] qdisc: validate skb without holding lock Jesper Dangaard Brouer
2014-10-04  3:59           ` [PATCH net-next] net: skb_segment() provides list head and tail Eric Dumazet
2014-10-06  4:38             ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).