linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	"Namhyung Kim" <namhyung@kernel.org>,
	Joe Mario <jmario@redhat.com>, Leo Yan <leo.yan@linaro.org>,
	<linux-perf-users@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, Andi Kleen <ak@linux.intel.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Xing Zhengjun <zhengjun.xing@linux.intel.com>,
	<tim.c.chen@intel.com>
Subject: Re: [PATCH v2] perf c2c: Add report option to show false sharing in adjacent cachelines
Date: Wed, 15 Feb 2023 09:03:16 +0800	[thread overview]
Message-ID: <Y+wvVNWqXb70l4uy@feng-clx> (raw)
In-Reply-To: <Y+vhcAYQrX6Lv7cL@kernel.org>

Hi Arnaldo,

On Tue, Feb 14, 2023 at 04:30:56PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Feb 14, 2023 at 03:58:23PM +0800, Feng Tang escreveu:
> > Many platforms have feature of adjacent cachelines prefetch, when it
> > is enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity,
> > if one is fetched to cache, the other one could likely be fetched too,
> > which sort of extends the cacheline size to double, thus the false
> > sharing could happens in adjacent cachelines.
> > 
> > 0Day has captured performance changed related with this [1], and some
> > commercial software explicitly makes its hot global variables 128 bytes
> > aligned (2 cache lines) to avoid this kind of extended false sharing.
> > 
> > So add an option "-a" or "--double-cl" for c2c report to show false
> > sharing in double cache line granularity, which acts just like the
> > cacheline size is doubled. There is no change to c2c record. The
> > hardware events of shared cacheline are still per cacheline, and
> > this option just changes the granularity of how events are grouped
> > and displayed.
> > 
> > In the c2c report below (will-it-scale's 'pagefault2' case on old kernel):
> > 
> >   ----------------------------------------------------------------------
> >      26       31        2        0        0        0  0xffff888103ec6000
> >   ----------------------------------------------------------------------
> >    35.48%   50.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff8133148b      1153        66       971     3748        74  [k] get_mem_cgroup_from_mm
> >     6.45%    0.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff813396e4       570         0      1531      879        75  [k] mem_cgroup_charge
> >    25.81%   50.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81331472       949        70       593     3359        74  [k] get_mem_cgroup_from_mm
> >    19.35%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81339686      1352         0      1073     1022        74  [k] mem_cgroup_charge
> >     9.68%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff813396d6      1401         0       863      768        74  [k] mem_cgroup_charge
> >     3.23%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81333106       618         0       804       11         9  [k] uncharge_batch
> > 
> > The offset 0x10 and 0x54 used to displayed in 2 groups, and now they
> > are listed together to give users a hint of extended false sharing.
> > 
> > [1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
> > 
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > Reviewed-by: Andi Kleen <ak@linux.intel.com>
> > Reviewed-by: Leo Yan <leo.yan@linaro.org>
> > Tested-by: Leo Yan <leo.yan@linaro.org>
> > ---
> > Changelog:
> > 
> >   v1 -> v2:
> >   * Refine comments and fix typos (Leo Yan)
> >   * Add reviewd-by and tested-by(for Arm64) tag from Leo Yan
> >   * Refine cmd description and commit log to avoid using
> >     architecture specific name (Joe Mario)
> > 
> >  tools/perf/Documentation/perf-c2c.txt |  7 +++++++
> >  tools/perf/builtin-c2c.c              | 22 +++++++++++++---------
> >  tools/perf/util/cacheline.h           | 25 ++++++++++++++++++++-----
> >  tools/perf/util/sort.c                | 13 ++++++++++---
> >  tools/perf/util/sort.h                |  1 +
> >  5 files changed, 51 insertions(+), 17 deletions(-)
> > 
> > diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
> > index 4e8c263e1721..cc18d61ec5d5 100644
> > --- a/tools/perf/Documentation/perf-c2c.txt
> > +++ b/tools/perf/Documentation/perf-c2c.txt
> > @@ -130,6 +130,13 @@ REPORT OPTIONS
> >  	The known limitations include exception handing such as
> >  	setjmp/longjmp will have calls/returns not match.
> >  
> > +-a::
> > +--double-cl::
> > +	Group the detection of shared cacheline events into double cacheline
> > +	granularity. Some architectures have an Adjacent Cacheline Prefetch
> > +	feature, which causes cacheline sharing to behave like the cacheline
> > +	size is doubled.
> > +
> 
> Humm, this is something not that usual, so I think we should have it
> just as --double-cl, ok?
> 
> I can do the adjustment here if you agree.

Sure. Many thanks for the review and suggestion!

- Feng

> Thanks,
> 
> - Arnaldo
 

      reply	other threads:[~2023-02-15  1:10 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-14  7:58 [PATCH v2] perf c2c: Add report option to show false sharing in adjacent cachelines Feng Tang
2023-02-14 19:30 ` Arnaldo Carvalho de Melo
2023-02-15  1:03   ` Feng Tang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+wvVNWqXb70l4uy@feng-clx \
    --to=feng.tang@intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=jmario@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).