From mboxrd@z Thu Jan  1 00:00:00 1970
From: John Fastabend <john.fastabend@gmail.com>
Subject: [RFC PATCH 00/17] lockless qdisc
Date: Mon, 13 Nov 2017 12:07:38 -0800
Message-ID: <20171113195256.6245.64676.stgit@john-Precision-Tower-5810>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Cc: make0818@gmail.com, netdev@vger.kernel.org, jiri@resnulli.us,
        xiyou.wangcong@gmail.com
To: willemdebruijn.kernel@gmail.com, daniel@iogearbox.net,
        eric.dumazet@gmail.com
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-pg0-f50.google.com ([74.125.83.50]:56616 "EHLO
        mail-pg0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1755170AbdKMUHx (ORCPT
        <rfc822;netdev@vger.kernel.org>); Mon, 13 Nov 2017 15:07:53 -0500
Received: by mail-pg0-f50.google.com with SMTP id z184so8112870pgd.13
        for <netdev@vger.kernel.org>; Mon, 13 Nov 2017 12:07:52 -0800 (PST)
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

Multiple folks asked me about this series at net(dev)conf so with
a 10+hour flight and a bit testing once back home I think these
are ready to be submitted. Net-next is closed at the moment

  http://vger.kernel.org/~davem/net-next.html

but, once it opens up we can get these in first thing and have
plenty of time to resolve in fallout. Although I haven't seen any
issues with my latest testing.

My first test case uses multiple containers (via cilium) where
multiple client containers use 'wrk' to benchmark connections with
a server container running lighttpd. Where lighttpd is configured
to use multiple threads, one per core. Additionally this test has
a proxy agent running so all traffic takes an extra hop through a
proxy container. In these cases each TCP packet traverses the egress
qdisc layer at least four times and the ingress qdisc layer an
additional four times. This makes for a good stress test IMO, perf
details below.

The other micro-benchmark I run is injecting packets directly into
qdisc layer using pktgen. This uses the benchmark script,

 ./pktgen_bench_xmit_mode_queue_xmit.sh 

Benchmarks taken in two cases, "base" running latest net-next no
changes to qdisc layer and "qdisc" tests run with qdisc lockless
updates. Numbers reported in req/sec. All virtual 'veth' devices
run with pfifo_fast in the qdisc test case.

`wrk -t16 -c $conns -d30 "http://[$SERVER_IP4]:80"`

conns    16      32     64   1024
-----------------------------------------------
base:   18831  20201  21393  29151
qdisc:  19309  21063  23899  29265

notice in all cases we see performance improvement when running
with qdisc case.

Microbenchmarks using pktgen are as follows,

`pktgen_bench_xmit_mode_queue_xmit.sh -t 1 -i eth2 -c 20000000

base(mq):          2.1Mpps
base(pfifo_fast):  2.1Mpps
qdisc(mq):         2.6Mpps
qdisc(pfifo_fast): 2.6Mpps

notice numbers are the same for mq and pfifo_fast because only
testing a single thread here.

Comments and feedback welcome. Anyone willing to do additional testing
would be greatly appreciated. The patches can be pulled here,

  https://github.com/cilium/linux/tree/qdisc

Thanks,
John

---

John Fastabend (17):
      net: sched: cleanup qdisc_run and __qdisc_run semantics
      net: sched: allow qdiscs to handle locking
      net: sched: remove remaining uses for qdisc_qlen in xmit path
      net: sched: provide per cpu qstat helpers
      net: sched: a dflt qdisc may be used with per cpu stats
      net: sched: explicit locking in gso_cpu fallback
      net: sched: drop qdisc_reset from dev_graft_qdisc
      net: sched: use skb list for skb_bad_tx
      net: sched: check for frozen queue before skb_bad_txq check
      net: sched: qdisc_qlen for per cpu logic
      net: sched: helper to sum qlen
      net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
      net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio
      net: skb_array: expose peek API
      net: sched: pfifo_fast use skb_array
      net: skb_array additions for unlocked consumer
      net: sched: lock once per bulk dequeue


 0 files changed