public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* perf: is it possible to userspace rdpmc but only on a certain core type
@ 2025-01-17 22:04 Vince Weaver
  2025-01-20 16:44 ` Liang, Kan
  0 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2025-01-17 22:04 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang, Kan,
	linux-kernel

Hello

so we've been working on PAPI support for Intel Top-Down events, which
let's say does "exciting" things involving the rdpmc instruction.

One issue we are having is that on a hybrid machine (Raptor Lake in this 
case with performance/efficiency cores) there is no top-down support
for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down 
events.

Obviously PAPI would like to avoid this, and somehow only run the rdpmc 
from userspace if scheduled on a P-core.

Is there any way to atomically do this?  Somehow detect what core we are 
on and atomically execute a userspace instruction before a core-reschedule 
can happen?

Or barring that, any other way to handle this in a way that won't crash 
without having to have the users have to bind to a core any time they want 
to run PAPI?

Thanks

Vince Weaver
vincent.weaver@maine.edu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: is it possible to userspace rdpmc but only on a certain core type
  2025-01-17 22:04 perf: is it possible to userspace rdpmc but only on a certain core type Vince Weaver
@ 2025-01-20 16:44 ` Liang, Kan
  2025-01-21 12:52   ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Liang, Kan @ 2025-01-20 16:44 UTC (permalink / raw)
  To: Vince Weaver, peterz
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	Andi Kleen



On 2025-01-17 5:04 p.m., Vince Weaver wrote:
> Hello
> 
> so we've been working on PAPI support for Intel Top-Down events, which
> let's say does "exciting" things involving the rdpmc instruction.
> 
> One issue we are having is that on a hybrid machine (Raptor Lake in this 
> case with performance/efficiency cores) there is no top-down support
> for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down 
> events.
> 
> Obviously PAPI would like to avoid this, and somehow only run the rdpmc 
> from userspace if scheduled on a P-core.
> 
> Is there any way to atomically do this?  Somehow detect what core we are 
> on and atomically execute a userspace instruction before a core-reschedule 
> can happen?
> 
> Or barring that, any other way to handle this in a way that won't crash 
> without having to have the users have to bind to a core any time they want 
> to run PAPI?

Can the PAPI rely on the event_idx(), similar to what Andi's pmu-tools
do? For a stopped event, the index is always 0.
https://github.com/andikleen/pmu-tools/blob/master/jevents/rdpmc.c#L117

Thanks,
Kan



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: is it possible to userspace rdpmc but only on a certain core type
  2025-01-20 16:44 ` Liang, Kan
@ 2025-01-21 12:52   ` Peter Zijlstra
  2025-01-21 14:30     ` Mathieu Desnoyers
  2025-01-22 21:51     ` Vince Weaver
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2025-01-21 12:52 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Vince Weaver, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, Andi Kleen, mathieu.desnoyers

On Mon, Jan 20, 2025 at 11:44:37AM -0500, Liang, Kan wrote:
> 
> 
> On 2025-01-17 5:04 p.m., Vince Weaver wrote:
> > Hello
> > 
> > so we've been working on PAPI support for Intel Top-Down events, which
> > let's say does "exciting" things involving the rdpmc instruction.
> > 
> > One issue we are having is that on a hybrid machine (Raptor Lake in this 
> > case with performance/efficiency cores) there is no top-down support
> > for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down 
> > events.
> > 
> > Obviously PAPI would like to avoid this, and somehow only run the rdpmc 
> > from userspace if scheduled on a P-core.
> > 
> > Is there any way to atomically do this?  Somehow detect what core we are 
> > on and atomically execute a userspace instruction before a core-reschedule 
> > can happen?
> > 
> > Or barring that, any other way to handle this in a way that won't crash 
> > without having to have the users have to bind to a core any time they want 
> > to run PAPI?
> 
> Can the PAPI rely on the event_idx(), similar to what Andi's pmu-tools
> do? For a stopped event, the index is always 0.

That's not race-free, the task can get migrated to an E core the moment
after you done the load and before the rdpmc instruction.

I suppose you can wrap the whole thing in RSEQ though, it's a bit of a
pain, but RSEQ can be configured to abort on migration.

The very latest libc (2.35+) should have rseq registered by default,
older will have to do so itself -- there is example code in
tools/testing/selftests/rseq but also
https://git.kernel.org/pub/scm/libs/librseq/librseq.git



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: is it possible to userspace rdpmc but only on a certain core type
  2025-01-21 12:52   ` Peter Zijlstra
@ 2025-01-21 14:30     ` Mathieu Desnoyers
  2025-01-22 21:51     ` Vince Weaver
  1 sibling, 0 replies; 8+ messages in thread
From: Mathieu Desnoyers @ 2025-01-21 14:30 UTC (permalink / raw)
  To: Peter Zijlstra, Liang, Kan
  Cc: Vince Weaver, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, Andi Kleen

On 2025-01-21 07:52, Peter Zijlstra wrote:
> On Mon, Jan 20, 2025 at 11:44:37AM -0500, Liang, Kan wrote:
>>
>>
>> On 2025-01-17 5:04 p.m., Vince Weaver wrote:
>>> Hello
>>>
>>> so we've been working on PAPI support for Intel Top-Down events, which
>>> let's say does "exciting" things involving the rdpmc instruction.
>>>
>>> One issue we are having is that on a hybrid machine (Raptor Lake in this
>>> case with performance/efficiency cores) there is no top-down support
>>> for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down
>>> events.
>>>
>>> Obviously PAPI would like to avoid this, and somehow only run the rdpmc
>>> from userspace if scheduled on a P-core.
>>>
>>> Is there any way to atomically do this?  Somehow detect what core we are
>>> on and atomically execute a userspace instruction before a core-reschedule
>>> can happen?
>>>
>>> Or barring that, any other way to handle this in a way that won't crash
>>> without having to have the users have to bind to a core any time they want
>>> to run PAPI?
>>
>> Can the PAPI rely on the event_idx(), similar to what Andi's pmu-tools
>> do? For a stopped event, the index is always 0.
> 
> That's not race-free, the task can get migrated to an E core the moment
> after you done the load and before the rdpmc instruction.
> 
> I suppose you can wrap the whole thing in RSEQ though, it's a bit of a
> pain, but RSEQ can be configured to abort on migration.
> 
> The very latest libc (2.35+) should have rseq registered by default,
> older will have to do so itself -- there is example code in
> tools/testing/selftests/rseq but also
> https://git.kernel.org/pub/scm/libs/librseq/librseq.git

Indeed, you could start from a copy of this function:

https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/include/rseq/arch/x86/bits.h#n161

and tweak it to issue "rdpmc" rather than "addq", thus creating a helper
such as:

int rseq_try_rdpmc(params..., int cpu);
(e.g. return 0 on success, -1 on abort)

and use it as such from C (untested code snippet):

static inline bool rseq_rdpmc(params...)
{
     bool rdpmc_issued = false;

     for (;;) {
         int cpu = rseq_current_cpu();

         if (!cpu_is_p_core(cpu))
            break;
         if (!rseq_try_rdpmc(params..., cpu)) {
            rdpmc_issued = true;
            break;
         }
     }
     return rdpmc_issued;
}

The rseq critical section in rseq_try_rdpmc will either abort if migrated
elsewhere, else it will issue the rdpmc instruction if it is still on the
right cpu when the instruction is executed.

Thanks,

Mathieu

> 
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: is it possible to userspace rdpmc but only on a certain core type
  2025-01-21 12:52   ` Peter Zijlstra
  2025-01-21 14:30     ` Mathieu Desnoyers
@ 2025-01-22 21:51     ` Vince Weaver
  2025-01-23 18:14       ` Andi Kleen
  1 sibling, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2025-01-22 21:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liang, Kan, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, Andi Kleen, mathieu.desnoyers

On Tue, 21 Jan 2025, Peter Zijlstra wrote:

> That's not race-free, the task can get migrated to an E core the moment
> after you done the load and before the rdpmc instruction.
> 
> I suppose you can wrap the whole thing in RSEQ though, it's a bit of a
> pain, but RSEQ can be configured to abort on migration.
> 
> The very latest libc (2.35+) should have rseq registered by default,
> older will have to do so itself -- there is example code in
> tools/testing/selftests/rseq but also
> https://git.kernel.org/pub/scm/libs/librseq/librseq.git

thanks, I had forgotten all about RSEQ, it's more or less exactly what we 
were looking for.

I have a student working on this for PAPI.  If we get it working we can 
see if perf could use support too if it doesn't have it already.

Vince Weaver
vincent.weaver@maine.edu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: is it possible to userspace rdpmc but only on a certain core type
  2025-01-22 21:51     ` Vince Weaver
@ 2025-01-23 18:14       ` Andi Kleen
  2025-01-23 19:45         ` Vince Weaver
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2025-01-23 18:14 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Liang, Kan, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-kernel, mathieu.desnoyers

> I have a student working on this for PAPI.  If we get it working we can 
> see if perf could use support too if it doesn't have it already.

perf user space doesn't have a ring 3 self access library.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: is it possible to userspace rdpmc but only on a certain core type
  2025-01-23 18:14       ` Andi Kleen
@ 2025-01-23 19:45         ` Vince Weaver
  2025-01-24  5:18           ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2025-01-23 19:45 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Vince Weaver, Peter Zijlstra, Liang, Kan, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	mathieu.desnoyers

On Thu, 23 Jan 2025, Andi Kleen wrote:

> > I have a student working on this for PAPI.  If we get it working we can 
> > see if perf could use support too if it doesn't have it already.
> 
> perf user space doesn't have a ring 3 self access library.

what happens if you're doing top-down measurments with perf on a hybrid 
system and perf gets migrated to an E-core?

or are you saying perf always uses a syscall to read the top-down values 
and doesn't use rdpmc in that case?  I guess that makes sense, I was 
confused because the documentation for userspace topdown support is in the 
tools/perf/Documentation directory.

Vince

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: is it possible to userspace rdpmc but only on a certain core type
  2025-01-23 19:45         ` Vince Weaver
@ 2025-01-24  5:18           ` Andi Kleen
  0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2025-01-24  5:18 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Liang, Kan, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-kernel, mathieu.desnoyers

On Thu, Jan 23, 2025 at 02:45:33PM -0500, Vince Weaver wrote:
> On Thu, 23 Jan 2025, Andi Kleen wrote:
> 
> > > I have a student working on this for PAPI.  If we get it working we can 
> > > see if perf could use support too if it doesn't have it already.
> > 
> > perf user space doesn't have a ring 3 self access library.
> 
> what happens if you're doing top-down measurments with perf on a hybrid 
> system and perf gets migrated to an E-core?
> 
> or are you saying perf always uses a syscall to read the top-down values 
> and doesn't use rdpmc in that case? 

perf always uses system calls or mmap.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-01-24  5:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17 22:04 perf: is it possible to userspace rdpmc but only on a certain core type Vince Weaver
2025-01-20 16:44 ` Liang, Kan
2025-01-21 12:52   ` Peter Zijlstra
2025-01-21 14:30     ` Mathieu Desnoyers
2025-01-22 21:51     ` Vince Weaver
2025-01-23 18:14       ` Andi Kleen
2025-01-23 19:45         ` Vince Weaver
2025-01-24  5:18           ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox