From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: "David S. Miller" <davem@davemloft.net>,
John Fastabend <john.r.fastabend@intel.com>,
Jamal Hadi Salim <jhs@mojatatu.com>,
Daniel Borkmann <daniel@iogearbox.net>,
netdev@vger.kernel.org, brouer@redhat.com
Subject: Re: [PATCH v2 net-next] net: sched: run ingress qdisc without locks
Date: Mon, 4 May 2015 13:04:05 +0200 [thread overview]
Message-ID: <20150504130405.3ff6672e@redhat.com> (raw)
In-Reply-To: <5546FFCB.50903@plumgrid.com>
On Sun, 03 May 2015 22:12:43 -0700
Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 5/3/15 8:42 AM, Jesper Dangaard Brouer wrote:
> >
> > I was actually expecting to see a higher performance boost.
> > improvement diff = -2.85 ns
> ...
> > The patch is removing two atomic operations, spin_{un,}lock, which I
> > have benchmarked[1] to cost approx 14ns on my system. Your system
> > likely is faster, but not that much (p.s. benchmark your own system
> > with [1])
> >
> > [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
>
> have tried you tight loop spin_lock test on my box and it showed:
> time_bench: Type:spin_lock_unlock Per elem: 40 cycles(tsc) 11.070 ns
> and yet the total single cpu gain from removal of spin_lock/unlock
> in ingress path is smaller than 11ns. I think this observation is
> telling us that tight loop benchmarking is inherently flawed.
> I'm guessing that uops that cmpxchg is broken into can execute in
> parallel with uops of other insns, so tight loops of the same sequence
> of uops has more alu dependencies whereas in more normal insn flow
> these uops can mix and match better. Would be great if intel microarch
> experts can chime in.
How do you activate the ingress code path?
I'm just doing (is this enough?):
export DEV=eth4
tc qdisc add dev $DEV handle ffff: ingress
I re-ran the experiment, and I can also only show a 2.68ns
improvement. This is rather strange, and I cannot explain it.
The lock clearly shows up in perf report[1] with 12.23% raw_spin_lock,
and perf report[2] it clearly gone, but we don't see a 12% improvement
in performance, but around 4.7%.
Before activating qdisc ingress code : 25.3Mpps (25398057)
Activating qdisc ingress with lock : 16.9Mpps (16989315)
Activating qdisc ingress without lock: 17.8Mpps (17800496)
(1/17800496*10^9)-(1/16989315*10^9) = -2.68 ns
The "cost" of activating the ingress qdisc is also interesting:
(1/25398057*10^9)-(1/16989315*10^9) = -19.49 ns
(1/25398057*10^9)-(1/17800496*10^9) = -16.81 ns
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
My setup
* Tested on top of commit 4749c3ef854
* gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)
* CPU E5-2695(ES) @ 2.8GHz
[1] perf report with ingress qlock
Samples: 2K of event 'cycles', Event count (approx.): 1762298819
Overhead Command Shared Object Symbol
+ 35.86% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb_core
+ 17.81% kpktgend_0 [kernel.vmlinux] [k] kfree_skb
+ 12.23% kpktgend_0 [kernel.vmlinux] [k] _raw_spin_lock
- _raw_spin_lock
+ 93.54% __netif_receive_skb_core
+ 6.46% __netif_receive_skb
+ 5.45% kpktgend_0 [sch_ingress] [k] ingress_enqueue
+ 4.65% kpktgend_0 [pktgen] [k] pktgen_thread_worker
+ 4.23% kpktgend_0 [kernel.vmlinux] [k] ip_rcv
+ 3.95% kpktgend_0 [kernel.vmlinux] [k] tc_classify_compat
+ 3.71% kpktgend_0 [kernel.vmlinux] [k] tc_classify
+ 3.03% kpktgend_0 [kernel.vmlinux] [k] netif_receive_skb_internal
+ 2.65% kpktgend_0 [kernel.vmlinux] [k] netif_receive_skb_sk
+ 1.97% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb
+ 0.71% kpktgend_0 [kernel.vmlinux] [k] __local_bh_enable_ip
+ 0.28% kpktgend_0 [kernel.vmlinux] [k] kthread_should_stop
[2] perf report without ingress qlock
Samples: 2K of event 'cycles', Event count (approx.): 1633499063
Overhead Command Shared Object Symbol
+ 39.29% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb_core
+ 19.24% kpktgend_0 [kernel.vmlinux] [k] kfree_skb
+ 11.05% kpktgend_0 [sch_ingress] [k] ingress_enqueue
+ 4.69% kpktgend_0 [kernel.vmlinux] [k] tc_classify
+ 4.48% kpktgend_0 [kernel.vmlinux] [k] ip_rcv
+ 4.43% kpktgend_0 [kernel.vmlinux] [k] tc_classify_compat
+ 4.19% kpktgend_0 [pktgen] [k] pktgen_thread_worker
+ 3.50% kpktgend_0 [kernel.vmlinux] [k] netif_receive_skb_internal
+ 2.61% kpktgend_0 [kernel.vmlinux] [k] netif_receive_skb_sk
+ 2.26% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb
+ 0.43% kpktgend_0 [kernel.vmlinux] [k] __local_bh_enable_ip
+ 0.13% swapper [kernel.vmlinux] [k] mwait_idle
next prev parent reply other threads:[~2015-05-04 11:04 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-02 5:27 [PATCH v2 net-next] net: sched: run ingress qdisc without locks Alexei Starovoitov
2015-05-03 15:42 ` Jesper Dangaard Brouer
2015-05-04 5:12 ` Alexei Starovoitov
2015-05-04 11:04 ` Jesper Dangaard Brouer [this message]
2015-05-05 1:27 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150504130405.3ff6672e@redhat.com \
--to=brouer@redhat.com \
--cc=ast@plumgrid.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=jhs@mojatatu.com \
--cc=john.r.fastabend@intel.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.