* perf: is it possible to userspace rdpmc but only on a certain core type
@ 2025-01-17 22:04 Vince Weaver
2025-01-20 16:44 ` Liang, Kan
0 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2025-01-17 22:04 UTC (permalink / raw)
To: peterz
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang, Kan,
linux-kernel
Hello
so we've been working on PAPI support for Intel Top-Down events, which
let's say does "exciting" things involving the rdpmc instruction.
One issue we are having is that on a hybrid machine (Raptor Lake in this
case with performance/efficiency cores) there is no top-down support
for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down
events.
Obviously PAPI would like to avoid this, and somehow only run the rdpmc
from userspace if scheduled on a P-core.
Is there any way to atomically do this? Somehow detect what core we are
on and atomically execute a userspace instruction before a core-reschedule
can happen?
Or barring that, any other way to handle this in a way that won't crash
without having to have the users have to bind to a core any time they want
to run PAPI?
Thanks
Vince Weaver
vincent.weaver@maine.edu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type
2025-01-17 22:04 perf: is it possible to userspace rdpmc but only on a certain core type Vince Weaver
@ 2025-01-20 16:44 ` Liang, Kan
2025-01-21 12:52 ` Peter Zijlstra
0 siblings, 1 reply; 8+ messages in thread
From: Liang, Kan @ 2025-01-20 16:44 UTC (permalink / raw)
To: Vince Weaver, peterz
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
Andi Kleen
On 2025-01-17 5:04 p.m., Vince Weaver wrote:
> Hello
>
> so we've been working on PAPI support for Intel Top-Down events, which
> let's say does "exciting" things involving the rdpmc instruction.
>
> One issue we are having is that on a hybrid machine (Raptor Lake in this
> case with performance/efficiency cores) there is no top-down support
> for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down
> events.
>
> Obviously PAPI would like to avoid this, and somehow only run the rdpmc
> from userspace if scheduled on a P-core.
>
> Is there any way to atomically do this? Somehow detect what core we are
> on and atomically execute a userspace instruction before a core-reschedule
> can happen?
>
> Or barring that, any other way to handle this in a way that won't crash
> without having to have the users have to bind to a core any time they want
> to run PAPI?
Can the PAPI rely on the event_idx(), similar to what Andi's pmu-tools
do? For a stopped event, the index is always 0.
https://github.com/andikleen/pmu-tools/blob/master/jevents/rdpmc.c#L117
Thanks,
Kan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type
2025-01-20 16:44 ` Liang, Kan
@ 2025-01-21 12:52 ` Peter Zijlstra
2025-01-21 14:30 ` Mathieu Desnoyers
2025-01-22 21:51 ` Vince Weaver
0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2025-01-21 12:52 UTC (permalink / raw)
To: Liang, Kan
Cc: Vince Weaver, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
linux-kernel, Andi Kleen, mathieu.desnoyers
On Mon, Jan 20, 2025 at 11:44:37AM -0500, Liang, Kan wrote:
>
>
> On 2025-01-17 5:04 p.m., Vince Weaver wrote:
> > Hello
> >
> > so we've been working on PAPI support for Intel Top-Down events, which
> > let's say does "exciting" things involving the rdpmc instruction.
> >
> > One issue we are having is that on a hybrid machine (Raptor Lake in this
> > case with performance/efficiency cores) there is no top-down support
> > for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down
> > events.
> >
> > Obviously PAPI would like to avoid this, and somehow only run the rdpmc
> > from userspace if scheduled on a P-core.
> >
> > Is there any way to atomically do this? Somehow detect what core we are
> > on and atomically execute a userspace instruction before a core-reschedule
> > can happen?
> >
> > Or barring that, any other way to handle this in a way that won't crash
> > without having to have the users have to bind to a core any time they want
> > to run PAPI?
>
> Can the PAPI rely on the event_idx(), similar to what Andi's pmu-tools
> do? For a stopped event, the index is always 0.
That's not race-free, the task can get migrated to an E core the moment
after you done the load and before the rdpmc instruction.
I suppose you can wrap the whole thing in RSEQ though, it's a bit of a
pain, but RSEQ can be configured to abort on migration.
The very latest libc (2.35+) should have rseq registered by default,
older will have to do so itself -- there is example code in
tools/testing/selftests/rseq but also
https://git.kernel.org/pub/scm/libs/librseq/librseq.git
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type
2025-01-21 12:52 ` Peter Zijlstra
@ 2025-01-21 14:30 ` Mathieu Desnoyers
2025-01-22 21:51 ` Vince Weaver
1 sibling, 0 replies; 8+ messages in thread
From: Mathieu Desnoyers @ 2025-01-21 14:30 UTC (permalink / raw)
To: Peter Zijlstra, Liang, Kan
Cc: Vince Weaver, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
linux-kernel, Andi Kleen
On 2025-01-21 07:52, Peter Zijlstra wrote:
> On Mon, Jan 20, 2025 at 11:44:37AM -0500, Liang, Kan wrote:
>>
>>
>> On 2025-01-17 5:04 p.m., Vince Weaver wrote:
>>> Hello
>>>
>>> so we've been working on PAPI support for Intel Top-Down events, which
>>> let's say does "exciting" things involving the rdpmc instruction.
>>>
>>> One issue we are having is that on a hybrid machine (Raptor Lake in this
>>> case with performance/efficiency cores) there is no top-down support
>>> for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down
>>> events.
>>>
>>> Obviously PAPI would like to avoid this, and somehow only run the rdpmc
>>> from userspace if scheduled on a P-core.
>>>
>>> Is there any way to atomically do this? Somehow detect what core we are
>>> on and atomically execute a userspace instruction before a core-reschedule
>>> can happen?
>>>
>>> Or barring that, any other way to handle this in a way that won't crash
>>> without having to have the users have to bind to a core any time they want
>>> to run PAPI?
>>
>> Can the PAPI rely on the event_idx(), similar to what Andi's pmu-tools
>> do? For a stopped event, the index is always 0.
>
> That's not race-free, the task can get migrated to an E core the moment
> after you done the load and before the rdpmc instruction.
>
> I suppose you can wrap the whole thing in RSEQ though, it's a bit of a
> pain, but RSEQ can be configured to abort on migration.
>
> The very latest libc (2.35+) should have rseq registered by default,
> older will have to do so itself -- there is example code in
> tools/testing/selftests/rseq but also
> https://git.kernel.org/pub/scm/libs/librseq/librseq.git
Indeed, you could start from a copy of this function:
https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/include/rseq/arch/x86/bits.h#n161
and tweak it to issue "rdpmc" rather than "addq", thus creating a helper
such as:
int rseq_try_rdpmc(params..., int cpu);
(e.g. return 0 on success, -1 on abort)
and use it as such from C (untested code snippet):
static inline bool rseq_rdpmc(params...)
{
bool rdpmc_issued = false;
for (;;) {
int cpu = rseq_current_cpu();
if (!cpu_is_p_core(cpu))
break;
if (!rseq_try_rdpmc(params..., cpu)) {
rdpmc_issued = true;
break;
}
}
return rdpmc_issued;
}
The rseq critical section in rseq_try_rdpmc will either abort if migrated
elsewhere, else it will issue the rdpmc instruction if it is still on the
right cpu when the instruction is executed.
Thanks,
Mathieu
>
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type
2025-01-21 12:52 ` Peter Zijlstra
2025-01-21 14:30 ` Mathieu Desnoyers
@ 2025-01-22 21:51 ` Vince Weaver
2025-01-23 18:14 ` Andi Kleen
1 sibling, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2025-01-22 21:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Liang, Kan, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
linux-kernel, Andi Kleen, mathieu.desnoyers
On Tue, 21 Jan 2025, Peter Zijlstra wrote:
> That's not race-free, the task can get migrated to an E core the moment
> after you done the load and before the rdpmc instruction.
>
> I suppose you can wrap the whole thing in RSEQ though, it's a bit of a
> pain, but RSEQ can be configured to abort on migration.
>
> The very latest libc (2.35+) should have rseq registered by default,
> older will have to do so itself -- there is example code in
> tools/testing/selftests/rseq but also
> https://git.kernel.org/pub/scm/libs/librseq/librseq.git
thanks, I had forgotten all about RSEQ, it's more or less exactly what we
were looking for.
I have a student working on this for PAPI. If we get it working we can
see if perf could use support too if it doesn't have it already.
Vince Weaver
vincent.weaver@maine.edu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type
2025-01-22 21:51 ` Vince Weaver
@ 2025-01-23 18:14 ` Andi Kleen
2025-01-23 19:45 ` Vince Weaver
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2025-01-23 18:14 UTC (permalink / raw)
To: Vince Weaver
Cc: Peter Zijlstra, Liang, Kan, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, linux-kernel, mathieu.desnoyers
> I have a student working on this for PAPI. If we get it working we can
> see if perf could use support too if it doesn't have it already.
perf user space doesn't have a ring 3 self access library.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type
2025-01-23 18:14 ` Andi Kleen
@ 2025-01-23 19:45 ` Vince Weaver
2025-01-24 5:18 ` Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2025-01-23 19:45 UTC (permalink / raw)
To: Andi Kleen
Cc: Vince Weaver, Peter Zijlstra, Liang, Kan, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
mathieu.desnoyers
On Thu, 23 Jan 2025, Andi Kleen wrote:
> > I have a student working on this for PAPI. If we get it working we can
> > see if perf could use support too if it doesn't have it already.
>
> perf user space doesn't have a ring 3 self access library.
what happens if you're doing top-down measurments with perf on a hybrid
system and perf gets migrated to an E-core?
or are you saying perf always uses a syscall to read the top-down values
and doesn't use rdpmc in that case? I guess that makes sense, I was
confused because the documentation for userspace topdown support is in the
tools/perf/Documentation directory.
Vince
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type
2025-01-23 19:45 ` Vince Weaver
@ 2025-01-24 5:18 ` Andi Kleen
0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2025-01-24 5:18 UTC (permalink / raw)
To: Vince Weaver
Cc: Peter Zijlstra, Liang, Kan, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, linux-kernel, mathieu.desnoyers
On Thu, Jan 23, 2025 at 02:45:33PM -0500, Vince Weaver wrote:
> On Thu, 23 Jan 2025, Andi Kleen wrote:
>
> > > I have a student working on this for PAPI. If we get it working we can
> > > see if perf could use support too if it doesn't have it already.
> >
> > perf user space doesn't have a ring 3 self access library.
>
> what happens if you're doing top-down measurments with perf on a hybrid
> system and perf gets migrated to an E-core?
>
> or are you saying perf always uses a syscall to read the top-down values
> and doesn't use rdpmc in that case?
perf always uses system calls or mmap.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-01-24 5:18 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17 22:04 perf: is it possible to userspace rdpmc but only on a certain core type Vince Weaver
2025-01-20 16:44 ` Liang, Kan
2025-01-21 12:52 ` Peter Zijlstra
2025-01-21 14:30 ` Mathieu Desnoyers
2025-01-22 21:51 ` Vince Weaver
2025-01-23 18:14 ` Andi Kleen
2025-01-23 19:45 ` Vince Weaver
2025-01-24 5:18 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox