From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jakub Kicinski Subject: Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions Date: Wed, 29 Aug 2018 20:12:11 +0200 Message-ID: <20180829201211.6c58b827@cakuba.netronome.com> References: <20180809150118.5275.63824.stgit@wsfd-netdev20.ntdv.lab.eng.bos.redhat.com> <20180809202608.6b816326@cakuba.netronome.com> <20180811.120627.662252154567814394.davem@davemloft.net> <4E288F34-0559-4C8A-8B3B-4410364791AA@redhat.com> <20180817042722.0e534ce0@cakuba> <229BA7FA-916B-47EA-8FD4-3F0B8BDDD145@redhat.com> <20180823201446.3802e84b@cakuba.netronome.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "David Miller" , netdev@vger.kernel.org, jhs@mojatatu.com, xiyou.wangcong@gmail.com, jiri@resnulli.us, simon.horman@netronome.com, "Marcelo Ricardo Leitner" , louis.peens@netronome.com To: "Eelco Chaudron" Return-path: Received: from mail-pf1-f195.google.com ([209.85.210.195]:38216 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727399AbeH2WK0 (ORCPT ); Wed, 29 Aug 2018 18:10:26 -0400 Received: by mail-pf1-f195.google.com with SMTP id x17-v6so2638521pfh.5 for ; Wed, 29 Aug 2018 11:12:21 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Wed, 29 Aug 2018 11:43:47 +0200, Eelco Chaudron wrote: > On 23 Aug 2018, at 20:14, Jakub Kicinski wrote: >=20 > > On Mon, 20 Aug 2018 16:03:40 +0200, Eelco Chaudron wrote: =20 > >> On 17 Aug 2018, at 13:27, Jakub Kicinski wrote: =20 > >>> On Thu, 16 Aug 2018 14:02:44 +0200, Eelco Chaudron wrote: =20 > >>>> On 11 Aug 2018, at 21:06, David Miller wrote: > >>>> =20 > >>>>> From: Jakub Kicinski > >>>>> Date: Thu, 9 Aug 2018 20:26:08 -0700 > >>>>> =20 > >>>>>> It is not immediately clear why this is needed. The memory and > >>>>>> updating two sets of counters won't come for free, so perhaps a > >>>>>> stronger justification than troubleshooting is due? :S > >>>>>> > >>>>>> Netdev has counters for fallback vs forwarded traffic, so you'd > >>>>>> know > >>>>>> that traffic hits the SW datapath, plus the rules which are in_hw > >>>>>> will > >>>>>> most likely not match as of today for flower (assuming > >>>>>> correctness). =20 > >>>> > >>>> I strongly believe that these counters are a requirement for a=20 > >>>> mixed > >>>> software/hardware (flow) based forwarding environment. The global > >>>> counters will not help much here as you might have chosen to have > >>>> certain traffic forwarded by software. > >>>> > >>>> These counters are probably the only option you have to figure out > >>>> why > >>>> forwarding is not as fast as expected, and you want to blame the TC > >>>> offload NIC. =20 > >>> > >>> The suggested debugging flow would be: > >>> (1) check the global counter for fallback are incrementing; > >>> (2) find a flow with high stats but no in_hw flag set. > >>> > >>> The in_hw indication should be sufficient in most cases (unless=20 > >>> there > >>> are shared blocks between netdevs of different ASICs...). =20 > >> > >> I guess the aim is to find miss behaving hardware, i.e. having the=20 > >> in_hw > >> flag set, but flows still coming to the kernel. =20 > > > > For misbehaving hardware in_hw will not work indeed. Whether we need > > these extra always-on stats for such use case could be debated :) > > =20 > >>>>>> I'm slightly concerned about potential performance impact, would > >>>>>> you > >>>>>> be able to share some numbers for non-trivial number of flows=20 > >>>>>> (100k > >>>>>> active?)? =20 > >>>>> > >>>>> Agreed, features used for diagnostics cannot have a harmful=20 > >>>>> penalty > >>>>> for fast path performance. =20 > >>>> > >>>> Fast path performance is not affected as these counters are not > >>>> incremented there. They are only incremented by the nic driver when > >>>> they > >>>> gather their statistics from hardware. =20 > >>> > >>> Not by much, you are adding state to performance-critical=20 > >>> structures, > >>> though, for what is effectively debugging purposes. > >>> > >>> I was mostly talking about the HW offload stat updates (sorry for=20 > >>> not > >>> being clear). > >>> > >>> We can have some hundreds of thousands active offloaded flows, each=20 > >>> of > >>> them can have multiple actions, and stats have to be updated=20 > >>> multiple > >>> times per second and dumped probably around once a second, too. On=20 > >>> a > >>> busy system the stats will get evicted from cache between each=20 > >>> round. > >>> > >>> But I'm speculating let's see if I can get some numbers on it (if=20 > >>> you > >>> could get some too, that would be great!). =20 > >> > >> I=E2=80=99ll try to measure some of this later this week/early next we= ek. =20 > > > > I asked Louis to run some tests while I'm travelling, and he reports > > that my worry about reporting the extra stats was unfounded. Update > > function does not show up in traces at all. It seems under stress > > (generated with stress-ng) the thread dumping the stats in userspace > > (in OvS it would be the revalidator) actually consumes less CPU in > > __gnet_stats_copy_basic (0.4% less for ~2.0% total). > > > > Would this match with your results? I'm not sure why dumping would be > > faster with your change.. =20 >=20 > Tested with OVS and https://github.com/chaudron/ovs_perf using 300K TC=20 > rules installed in HW. >=20 > For __gnet_stats_copy_basic() being faster I have (had) a theory. Now=20 > this function is called twice, and I assumed the first call would cache=20 > memory and the second call would be faster. >=20 > Sampling a lot of perf data, I get an average of 1115ns with the base=20 > kernel and 954ns with the fix applied, so about ~14%. >=20 > Thought I would perf tcf_action_copy_stats() as it is the place updating= =20 > the additional counter. But even in this case, I see a better=20 > performance with the patch applied. >=20 > In average 13581ns with the fix, vs base kernel at 1391ns, so about=20 > 2.3%. >=20 > I guess the changes to the tc_action structure got better cache=20 > alignment. Interesting you could reproduce the speed up too! +1 for the guess. Seems like my caution about slowing down SW paths to support HW offload landed on a very unfortunate patch :)