* perf: is it possible to userspace rdpmc but only on a certain core type @ 2025-01-17 22:04 Vince Weaver 2025-01-20 16:44 ` Liang, Kan 0 siblings, 1 reply; 8+ messages in thread From: Vince Weaver @ 2025-01-17 22:04 UTC (permalink / raw) To: peterz Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang, Kan, linux-kernel Hello so we've been working on PAPI support for Intel Top-Down events, which let's say does "exciting" things involving the rdpmc instruction. One issue we are having is that on a hybrid machine (Raptor Lake in this case with performance/efficiency cores) there is no top-down support for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down events. Obviously PAPI would like to avoid this, and somehow only run the rdpmc from userspace if scheduled on a P-core. Is there any way to atomically do this? Somehow detect what core we are on and atomically execute a userspace instruction before a core-reschedule can happen? Or barring that, any other way to handle this in a way that won't crash without having to have the users have to bind to a core any time they want to run PAPI? Thanks Vince Weaver vincent.weaver@maine.edu ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type 2025-01-17 22:04 perf: is it possible to userspace rdpmc but only on a certain core type Vince Weaver @ 2025-01-20 16:44 ` Liang, Kan 2025-01-21 12:52 ` Peter Zijlstra 0 siblings, 1 reply; 8+ messages in thread From: Liang, Kan @ 2025-01-20 16:44 UTC (permalink / raw) To: Vince Weaver, peterz Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel, Andi Kleen On 2025-01-17 5:04 p.m., Vince Weaver wrote: > Hello > > so we've been working on PAPI support for Intel Top-Down events, which > let's say does "exciting" things involving the rdpmc instruction. > > One issue we are having is that on a hybrid machine (Raptor Lake in this > case with performance/efficiency cores) there is no top-down support > for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down > events. > > Obviously PAPI would like to avoid this, and somehow only run the rdpmc > from userspace if scheduled on a P-core. > > Is there any way to atomically do this? Somehow detect what core we are > on and atomically execute a userspace instruction before a core-reschedule > can happen? > > Or barring that, any other way to handle this in a way that won't crash > without having to have the users have to bind to a core any time they want > to run PAPI? Can the PAPI rely on the event_idx(), similar to what Andi's pmu-tools do? For a stopped event, the index is always 0. https://github.com/andikleen/pmu-tools/blob/master/jevents/rdpmc.c#L117 Thanks, Kan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type 2025-01-20 16:44 ` Liang, Kan @ 2025-01-21 12:52 ` Peter Zijlstra 2025-01-21 14:30 ` Mathieu Desnoyers 2025-01-22 21:51 ` Vince Weaver 0 siblings, 2 replies; 8+ messages in thread From: Peter Zijlstra @ 2025-01-21 12:52 UTC (permalink / raw) To: Liang, Kan Cc: Vince Weaver, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel, Andi Kleen, mathieu.desnoyers On Mon, Jan 20, 2025 at 11:44:37AM -0500, Liang, Kan wrote: > > > On 2025-01-17 5:04 p.m., Vince Weaver wrote: > > Hello > > > > so we've been working on PAPI support for Intel Top-Down events, which > > let's say does "exciting" things involving the rdpmc instruction. > > > > One issue we are having is that on a hybrid machine (Raptor Lake in this > > case with performance/efficiency cores) there is no top-down support > > for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down > > events. > > > > Obviously PAPI would like to avoid this, and somehow only run the rdpmc > > from userspace if scheduled on a P-core. > > > > Is there any way to atomically do this? Somehow detect what core we are > > on and atomically execute a userspace instruction before a core-reschedule > > can happen? > > > > Or barring that, any other way to handle this in a way that won't crash > > without having to have the users have to bind to a core any time they want > > to run PAPI? > > Can the PAPI rely on the event_idx(), similar to what Andi's pmu-tools > do? For a stopped event, the index is always 0. That's not race-free, the task can get migrated to an E core the moment after you done the load and before the rdpmc instruction. I suppose you can wrap the whole thing in RSEQ though, it's a bit of a pain, but RSEQ can be configured to abort on migration. The very latest libc (2.35+) should have rseq registered by default, older will have to do so itself -- there is example code in tools/testing/selftests/rseq but also https://git.kernel.org/pub/scm/libs/librseq/librseq.git ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type 2025-01-21 12:52 ` Peter Zijlstra @ 2025-01-21 14:30 ` Mathieu Desnoyers 2025-01-22 21:51 ` Vince Weaver 1 sibling, 0 replies; 8+ messages in thread From: Mathieu Desnoyers @ 2025-01-21 14:30 UTC (permalink / raw) To: Peter Zijlstra, Liang, Kan Cc: Vince Weaver, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel, Andi Kleen On 2025-01-21 07:52, Peter Zijlstra wrote: > On Mon, Jan 20, 2025 at 11:44:37AM -0500, Liang, Kan wrote: >> >> >> On 2025-01-17 5:04 p.m., Vince Weaver wrote: >>> Hello >>> >>> so we've been working on PAPI support for Intel Top-Down events, which >>> let's say does "exciting" things involving the rdpmc instruction. >>> >>> One issue we are having is that on a hybrid machine (Raptor Lake in this >>> case with performance/efficiency cores) there is no top-down support >>> for the E-cores, and it will gpf/segfault if you try to rdpmc the top-down >>> events. >>> >>> Obviously PAPI would like to avoid this, and somehow only run the rdpmc >>> from userspace if scheduled on a P-core. >>> >>> Is there any way to atomically do this? Somehow detect what core we are >>> on and atomically execute a userspace instruction before a core-reschedule >>> can happen? >>> >>> Or barring that, any other way to handle this in a way that won't crash >>> without having to have the users have to bind to a core any time they want >>> to run PAPI? >> >> Can the PAPI rely on the event_idx(), similar to what Andi's pmu-tools >> do? For a stopped event, the index is always 0. > > That's not race-free, the task can get migrated to an E core the moment > after you done the load and before the rdpmc instruction. > > I suppose you can wrap the whole thing in RSEQ though, it's a bit of a > pain, but RSEQ can be configured to abort on migration. > > The very latest libc (2.35+) should have rseq registered by default, > older will have to do so itself -- there is example code in > tools/testing/selftests/rseq but also > https://git.kernel.org/pub/scm/libs/librseq/librseq.git Indeed, you could start from a copy of this function: https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/include/rseq/arch/x86/bits.h#n161 and tweak it to issue "rdpmc" rather than "addq", thus creating a helper such as: int rseq_try_rdpmc(params..., int cpu); (e.g. return 0 on success, -1 on abort) and use it as such from C (untested code snippet): static inline bool rseq_rdpmc(params...) { bool rdpmc_issued = false; for (;;) { int cpu = rseq_current_cpu(); if (!cpu_is_p_core(cpu)) break; if (!rseq_try_rdpmc(params..., cpu)) { rdpmc_issued = true; break; } } return rdpmc_issued; } The rseq critical section in rseq_try_rdpmc will either abort if migrated elsewhere, else it will issue the rdpmc instruction if it is still on the right cpu when the instruction is executed. Thanks, Mathieu > > -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type 2025-01-21 12:52 ` Peter Zijlstra 2025-01-21 14:30 ` Mathieu Desnoyers @ 2025-01-22 21:51 ` Vince Weaver 2025-01-23 18:14 ` Andi Kleen 1 sibling, 1 reply; 8+ messages in thread From: Vince Weaver @ 2025-01-22 21:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Liang, Kan, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel, Andi Kleen, mathieu.desnoyers On Tue, 21 Jan 2025, Peter Zijlstra wrote: > That's not race-free, the task can get migrated to an E core the moment > after you done the load and before the rdpmc instruction. > > I suppose you can wrap the whole thing in RSEQ though, it's a bit of a > pain, but RSEQ can be configured to abort on migration. > > The very latest libc (2.35+) should have rseq registered by default, > older will have to do so itself -- there is example code in > tools/testing/selftests/rseq but also > https://git.kernel.org/pub/scm/libs/librseq/librseq.git thanks, I had forgotten all about RSEQ, it's more or less exactly what we were looking for. I have a student working on this for PAPI. If we get it working we can see if perf could use support too if it doesn't have it already. Vince Weaver vincent.weaver@maine.edu ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type 2025-01-22 21:51 ` Vince Weaver @ 2025-01-23 18:14 ` Andi Kleen 2025-01-23 19:45 ` Vince Weaver 0 siblings, 1 reply; 8+ messages in thread From: Andi Kleen @ 2025-01-23 18:14 UTC (permalink / raw) To: Vince Weaver Cc: Peter Zijlstra, Liang, Kan, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel, mathieu.desnoyers > I have a student working on this for PAPI. If we get it working we can > see if perf could use support too if it doesn't have it already. perf user space doesn't have a ring 3 self access library. -Andi ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type 2025-01-23 18:14 ` Andi Kleen @ 2025-01-23 19:45 ` Vince Weaver 2025-01-24 5:18 ` Andi Kleen 0 siblings, 1 reply; 8+ messages in thread From: Vince Weaver @ 2025-01-23 19:45 UTC (permalink / raw) To: Andi Kleen Cc: Vince Weaver, Peter Zijlstra, Liang, Kan, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel, mathieu.desnoyers On Thu, 23 Jan 2025, Andi Kleen wrote: > > I have a student working on this for PAPI. If we get it working we can > > see if perf could use support too if it doesn't have it already. > > perf user space doesn't have a ring 3 self access library. what happens if you're doing top-down measurments with perf on a hybrid system and perf gets migrated to an E-core? or are you saying perf always uses a syscall to read the top-down values and doesn't use rdpmc in that case? I guess that makes sense, I was confused because the documentation for userspace topdown support is in the tools/perf/Documentation directory. Vince ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf: is it possible to userspace rdpmc but only on a certain core type 2025-01-23 19:45 ` Vince Weaver @ 2025-01-24 5:18 ` Andi Kleen 0 siblings, 0 replies; 8+ messages in thread From: Andi Kleen @ 2025-01-24 5:18 UTC (permalink / raw) To: Vince Weaver Cc: Peter Zijlstra, Liang, Kan, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel, mathieu.desnoyers On Thu, Jan 23, 2025 at 02:45:33PM -0500, Vince Weaver wrote: > On Thu, 23 Jan 2025, Andi Kleen wrote: > > > > I have a student working on this for PAPI. If we get it working we can > > > see if perf could use support too if it doesn't have it already. > > > > perf user space doesn't have a ring 3 self access library. > > what happens if you're doing top-down measurments with perf on a hybrid > system and perf gets migrated to an E-core? > > or are you saying perf always uses a syscall to read the top-down values > and doesn't use rdpmc in that case? perf always uses system calls or mmap. -Andi ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-01-24 5:18 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-17 22:04 perf: is it possible to userspace rdpmc but only on a certain core type Vince Weaver 2025-01-20 16:44 ` Liang, Kan 2025-01-21 12:52 ` Peter Zijlstra 2025-01-21 14:30 ` Mathieu Desnoyers 2025-01-22 21:51 ` Vince Weaver 2025-01-23 18:14 ` Andi Kleen 2025-01-23 19:45 ` Vince Weaver 2025-01-24 5:18 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox