From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jakub Kicinski Subject: Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions Date: Wed, 29 Aug 2018 20:06:56 +0200 Message-ID: <20180829200656.3d7e87d5@cakuba.netronome.com> References: <20180809150118.5275.63824.stgit@wsfd-netdev20.ntdv.lab.eng.bos.redhat.com> <20180809202608.6b816326@cakuba.netronome.com> <20180811.120627.662252154567814394.davem@davemloft.net> <4E288F34-0559-4C8A-8B3B-4410364791AA@redhat.com> <20180817042722.0e534ce0@cakuba> <229BA7FA-916B-47EA-8FD4-3F0B8BDDD145@redhat.com> <20180823201446.3802e84b@cakuba.netronome.com> <3d74699054d901e963bdac39898cac441f00658f.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Eelco Chaudron , David Miller , netdev@vger.kernel.org, jhs@mojatatu.com, xiyou.wangcong@gmail.com, jiri@resnulli.us, simon.horman@netronome.com, Marcelo Ricardo Leitner , louis.peens@netronome.com To: Paolo Abeni Return-path: Received: from mail-pg1-f194.google.com ([209.85.215.194]:46789 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727204AbeH2WFL (ORCPT ); Wed, 29 Aug 2018 18:05:11 -0400 Received: by mail-pg1-f194.google.com with SMTP id b129-v6so2653842pga.13 for ; Wed, 29 Aug 2018 11:07:07 -0700 (PDT) In-Reply-To: <3d74699054d901e963bdac39898cac441f00658f.camel@redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, 29 Aug 2018 12:23:15 +0200, Paolo Abeni wrote: > On Thu, 2018-08-23 at 20:14 +0200, Jakub Kicinski wrote: > > I asked Louis to run some tests while I'm travelling, and he reports > > that my worry about reporting the extra stats was unfounded. Update > > function does not show up in traces at all. It seems under stress > > (generated with stress-ng) the thread dumping the stats in userspace > > (in OvS it would be the revalidator) actually consumes less CPU in > > __gnet_stats_copy_basic (0.4% less for ~2.0% total). > > > > Would this match with your results? I'm not sure why dumping would be > > faster with your change.. > > Wild guess on my side: the relevant patch changes a bit the binary > layout of the 'tc_action' struct, possibly (I still need to check with > pahole) moving the tcf_lock and the stats field on different > cachelines, reducing false sharing that could affect badly such test. I think in our tests we tried with and without pinning relevant processing to one core, and both results shown improvement. I don't have the actual samples any more, just perf script dump without CPU IDs to confirm things were pinned correctly.. :(