From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>,
Kan Liang <kan.liang@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>,
Adrian Hunter <adrian.hunter@intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 4/6] perf annotate-data: Check memory access with two registers
Date: Sat, 4 May 2024 15:26:58 -0300 [thread overview]
Message-ID: <ZjZ98gLSmr0qXih2@x1> (raw)
In-Reply-To: <CAM9d7cg_YL1x8YfJ5+7+o+0dccFJJxUye8L_FLrgdGeAh81LBA@mail.gmail.com>
On Thu, May 02, 2024 at 11:14:50AM -0700, Namhyung Kim wrote:
> On Thu, May 2, 2024 at 7:05 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> >
> > On Wed, May 01, 2024 at 11:00:09PM -0700, Namhyung Kim wrote:
> > > The following instruction pattern is used to access a global variable.
> > >
> > > mov $0x231c0, %rax
> > > movsql %edi, %rcx
> > > mov -0x7dc94ae0(,%rcx,8), %rcx
> > > cmpl $0x0, 0xa60(%rcx,%rax,1) <<<--- here
> > >
> > > The first instruction set the address of the per-cpu variable (here, it
> > > is 'runqueus' of struct rq). The second instruction seems like a cpu
> >
> > You mean 'runqueues', i.e. this one:
> >
> > kernel/sched/core.c
> > DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
> >
> > ?
>
> Right, sorry for the typo.
>
> >
> > But that 0xa60 would be in an alignment hole, at least in:
> >
> > $ pahole --hex rq | egrep 0xa40 -A12
> > struct mm_struct * prev_mm; /* 0xa40 0x8 */
> > unsigned int clock_update_flags; /* 0xa48 0x4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > u64 clock; /* 0xa50 0x8 */
> >
> > /* XXX 40 bytes hole, try to pack */
> >
> > /* --- cacheline 42 boundary (2688 bytes) --- */
> > u64 clock_task __attribute__((__aligned__(64))); /* 0xa80 0x8 */
> > u64 clock_pelt; /* 0xa88 0x8 */
> > long unsigned int lost_idle_time; /* 0xa90 0x8 */
> > $ uname -a
> > Linux toolbox 6.7.11-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 27 16:50:39 UTC 2024 x86_64 GNU/Linux
> > $
>
> This would be different on kernel version, config and
> other changes like backports or local modifications.
>
> On my system, it was cpu_stop_work.arg.
Sure, so please include the pahole output for the data that lead you to
the conclusions in the explanation for the results obtained, so that we
can have a better mental map of all the pieces and thus get convinced of
the results and have a way to try to reproduce it in our systems.
In the future we will be grateful to this effort when looking back at
these patches :-)
Thanks for all your work in these features!
- Arnaldo
> $ pahole --hex rq | grep 0xa40 -C1
> /* --- cacheline 41 boundary (2624 bytes) --- */
> struct cpu_stop_work active_balance_work; /* 0xa40 0x30 */
> int cpu; /* 0xa70 0x4 */
>
> $ pahole --hex cpu_stop_work
> struct cpu_stop_work {
> struct list_head list; /* 0 0x10 */
> cpu_stop_fn_t fn; /* 0x10 0x8 */
> long unsigned int caller; /* 0x18 0x8 */
> void * arg; /* 0x20 0x8 */
> struct cpu_stop_done * done; /* 0x28 0x8 */
>
> /* size: 48, cachelines: 1, members: 5 */
> /* last cacheline: 48 bytes */
> };
>
>
> >
> > The paragraph then reads:
> >
> > ----
> > The first instruction set the address of the per-cpu variable (here, it
> > is 'runqueues' of type 'struct rq'). The second instruction seems like
> > a cpu number of the per-cpu base. The third instruction get the base
> > offset of per-cpu area for that cpu. The last instruction compares the
> > value of the per-cpu variable at the offset of 0xa60.
> > ----
> >
> > Ok?
>
> Yep, looks good.
>
> Thanks,
> Namhyung
next prev parent reply other threads:[~2024-05-04 18:27 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-02 6:00 [PATCHSET 0/6] perf annotate-data: Small updates in the data type profiling (v1) Namhyung Kim
2024-05-02 6:00 ` [PATCH 1/6] perf dwarf-aux: Add die_collect_global_vars() Namhyung Kim
2024-05-02 6:00 ` [PATCH 2/6] perf annotate-data: Collect global variables in advance Namhyung Kim
2024-05-02 13:50 ` Arnaldo Carvalho de Melo
2024-05-02 18:23 ` Namhyung Kim
2024-05-02 23:28 ` Namhyung Kim
2024-05-02 6:00 ` [PATCH 3/6] perf annotate-data: Handle direct global variable access Namhyung Kim
2024-05-02 6:00 ` [PATCH 4/6] perf annotate-data: Check memory access with two registers Namhyung Kim
2024-05-02 14:05 ` Arnaldo Carvalho de Melo
2024-05-02 18:14 ` Namhyung Kim
2024-05-04 18:26 ` Arnaldo Carvalho de Melo [this message]
2024-05-02 6:00 ` [PATCH 5/6] perf annotate-data: Handle multi regs in find_data_type_block() Namhyung Kim
2024-05-02 6:00 ` [PATCH 6/6] perf annotate-data: Check kind of stack variables Namhyung Kim
2024-05-02 14:25 ` [PATCHSET 0/6] perf annotate-data: Small updates in the data type profiling (v1) Arnaldo Carvalho de Melo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZjZ98gLSmr0qXih2@x1 \
--to=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.