From: Ian Rogers <irogers@google.com>
To: Vince Weaver <vincent.weaver@maine.edu>,
"Liang, Kan" <kan.liang@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: perf_event: avoiding gpf on rdpmc for top-down-events on hybrid
Date: Sun, 15 Jun 2025 16:23:54 -0700 [thread overview]
Message-ID: <CAP-5=fVa14Y+0a=nrxo85uEOHfiEJRbQXt6fOV7E3dQJDf4fdQ@mail.gmail.com> (raw)
In-Reply-To: <CAP-5=fVivxz-TPVap6-+6U=ueU9W9fdZSH3w5F7e-h=uN--8kA@mail.gmail.com>
On Wed, Dec 4, 2024 at 4:57 PM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, Dec 4, 2024 at 2:06 PM Vince Weaver <vincent.weaver@maine.edu> wrote:
> >
> > Hello
> >
> > so the PAPI team is working on trying to get Intel top-down event support
> > working.
> >
> > We ran into a problem where on hybrid machines (Alder/Raptor Lake) topdown
> > events are only supported on P-cores but not E-cores.
> >
> > So you have code that is happily using rdpmc to read the data on P-cores
> > but if you have bad luck and get rescheduled to an E-core then the rdpmc
> > instruction will segfault/gpf the whole program.
> >
> > Is there any way, short of setting up a complex segfault signal handler,
> > to avoid this happening?
> >
> > In theory you could try to check what core type you are on before doing
> > the rdpmc but there's a race there if you get rescheduled after the check
> > but before the actual rdpmc instruction.
>
> Perhaps this is a use-case for restartable sequences? The current
> logic in libperf doesn't handle this, nor hybrid:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/tests/test-evsel.c?h=perf-tools-next#n127
> which is a shame.
So I spoke to Kan and posted a revised perf test that works on hybrid:
https://lore.kernel.org/lkml/20250614004528.1652860-1-irogers@google.com/
Basically on hybrid before doing the rdpmc instruction the test now
ensures the affinity matches that of the CPUs the perf event can be
scheduled upon.
I do think that there is still a race and the race is there even
without hybrid. One thought is that an event may get scheduled on
fixed or generic counters depending on what was previously scheduled
during sched_in. The mmap has an "index" value described as the
"hardware event identifier":
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/perf_event.h#n632
and this value minus 1 gets passed to rdpmc in the rdpmc loop (as in
libperf and shown in perf_event.h). If a pinned event were using a
fixed counter on certain CPUs and the same event were being used by
rdpmc on all/any CPUs, then the particular counter used for the event
may vary causing the index to perhaps show the fixed counter but on
the CPU the rdpmc is executed the generic counter was scheduled with
the event and should be read instead. I think something similar could
happen if an event were deleted on a CPU between the read of "index"
and the user by rdpmc. Perhaps I'm ignorant of the inner workings of
the user page and scheduling, but it seems a restartable sequence is
needed to make this somewhat atomic. Even if it were in a restartable
sequence I think a remote delete of an event could cause the
counter/"index" to change while the reader stays on the same CPU (ie
the restartable sequence needn't restart but the counters changed).
Perhaps there needs to be more "buyer beware" language around the
rdpmc instruction in perf_event.h and associated man pages, while the
perf tool should avoid rdpmc due to the need for at least thread
affinity calls.
Thanks,
Ian
next prev parent reply other threads:[~2025-06-15 23:24 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-04 21:20 perf_event: avoiding gpf on rdpmc for top-down-events on hybrid Vince Weaver
2024-12-05 0:57 ` Ian Rogers
2025-06-15 23:23 ` Ian Rogers [this message]
2025-06-16 19:34 ` Vince Weaver
2025-06-17 4:36 ` Ian Rogers
2025-06-17 16:37 ` Ian Rogers
2025-06-17 18:17 ` Peter Zijlstra
2025-06-17 20:36 ` Ian Rogers
2025-06-18 8:45 ` Peter Zijlstra
2025-06-18 11:55 ` Peter Zijlstra
2025-06-18 13:57 ` Vince Weaver
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAP-5=fVa14Y+0a=nrxo85uEOHfiEJRbQXt6fOV7E3dQJDf4fdQ@mail.gmail.com' \
--to=irogers@google.com \
--cc=kan.liang@linux.intel.com \
--cc=linux-perf-users@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=vincent.weaver@maine.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).