* [PATCH] perf/headers: Document PERF_PMU_CAP capability flags
@ 2025-06-18 19:08 Nicolas Frattaroli
2025-06-18 20:14 ` Ian Rogers
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Nicolas Frattaroli @ 2025-06-18 19:08 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter
Cc: kernel, Jonathan Corbet, linux-perf-users, linux-kernel,
Nicolas Frattaroli
Over the years, capability flags for perf PMUs were introduced in a
piecemeal fashion whenever a new driver needed to signal to the perf
core some limitation or special feature.
Since one more undocumented flag that can have its meaning inferred from
the commit message and implementation never seems that bad, it's
understandable that this resulted in a total of 11 undocumented
capability flags, which authors of new perf PMU drivers are expected to
set correctly for their particular device.
Since I am in the process of becoming such an author of a new perf
driver, it feels proper to pay it forward by documenting all
PERF_PMU_CAP_ constants, so that no future person has to go through an
hour or two of git blame + reading perf core code to figure out which
capability flags are right for them.
Add comments in kernel-doc format that describes each flag. This follows
the somewhat verbose "Object-like macro documentation" format, and can
be verified with
./scripts/kernel-doc -v -none include/linux/perf_event.h
The current in-tree kernel documentation does not include a page on the
perf subsystem, but once it does, these comments should render as proper
documentation annotation. Until then, they'll also be quite useful for
anyone looking at the header file.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
---
There may be more perf documentation patches in the future, but right
now I'm focused on getting a minimally viable driver for the hardware
I'm working on going. Documenting these seemed to have a fairly good
effort-to-future-payoff ratio though.
I Cc'd Corbet in case he has any input on the verbosity of the
kernel-doc syntax here, maybe I'm missing something and all of these
could be in a single /* comment */, but as it is in this patch doesn't
seem too awful to me either.
---
include/linux/perf_event.h | 74 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ec9d96025683958e909bb2463439dc69634f4ceb..7d749fd5225be12543df6e475277563bf16c05b1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -294,16 +294,90 @@ struct perf_event_pmu_context;
/**
* pmu::capabilities flags
*/
+
+/**
+ * define PERF_PMU_CAP_NO_INTERRUPT - \
+ * PMU is incapable of generating hardware interrupts
+ */
#define PERF_PMU_CAP_NO_INTERRUPT 0x0001
+/**
+ * define PERF_PMU_CAP_NO_NMI - \
+ * PMU is guaranteed to not generate non-maskable interrupts
+ */
#define PERF_PMU_CAP_NO_NMI 0x0002
+/**
+ * define PERF_PMU_CAP_AUX_NO_SG - \
+ * PMU does not support using scatter-gather as the output
+ *
+ * The PERF_PMU_CAP_AUX_NO_SG flag indicates that the PMU does not support
+ * scatter-gather for its output buffer, and needs a larger contiguous buffer
+ * to output to.
+ */
#define PERF_PMU_CAP_AUX_NO_SG 0x0004
+/**
+ * define PERF_PMU_CAP_EXTENDED_REGS - \
+ * PMU is capable of sampling extended registers
+ *
+ * Some architectures have a concept of extended registers, e.g. XMM0 on x86
+ * or VG on arm64. If the PMU is capable of sampling these registers, then the
+ * flag PERF_PMU_CAP_EXTENDED_REGS should be set.
+ */
#define PERF_PMU_CAP_EXTENDED_REGS 0x0008
+/**
+ * define PERF_PMU_CAP_EXCLUSIVE - \
+ * PMU can only have one scheduled event at a time
+ *
+ * Certain PMU hardware cannot track several events at the same time. Such
+ * hardware must set PERF_PMU_CAP_EXCLUSIVE in order to avoid conflicts.
+ */
#define PERF_PMU_CAP_EXCLUSIVE 0x0010
+/**
+ * define PERF_PMU_CAP_ITRACE - PMU traces instructions
+ *
+ * Some PMU hardware does instruction tracing, in that it traces execution of
+ * each instruction. Setting this capability flag makes the perf core generate
+ * a %PERF_RECORD_ITRACE_START event, recording the profiled task's PID and TID,
+ * to allow tools to properly decode such traces.
+ */
#define PERF_PMU_CAP_ITRACE 0x0020
+/**
+ * define PERF_PMU_CAP_NO_EXCLUDE - \
+ * PMU is incapable of excluding events based on context
+ *
+ * Some PMU hardware will count events regardless of context, including e.g.
+ * idle, kernel and guest. Drivers for such hardware should set the
+ * PERF_PMU_CAP_NO_EXCLUDE flag to explicitly advertise that they're unable to
+ * help themselves, so that the perf core can reject requests to exclude events
+ * based on context.
+ */
#define PERF_PMU_CAP_NO_EXCLUDE 0x0040
+/**
+ * define PERF_PMU_CAP_AUX_OUTPUT - PMU non-AUX events generate AUX data
+ *
+ * Drivers for PMU hardware that supports non-AUX events which generate data for
+ * AUX events should set PERF_PMU_CAP_AUX_OUTPUT. This flag tells the perf core
+ * to schedule non-AUX events together with AUX events, so that this data isn't
+ * lost.
+ */
#define PERF_PMU_CAP_AUX_OUTPUT 0x0080
+/**
+ * define PERF_PMU_CAP_EXTENDED_HW_TYPE - \
+ * PMU supports PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
+ */
#define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
+/**
+ * define PERF_PMU_CAP_AUX_PAUSE - \
+ * PMU can pause and resume AUX area traces based on events
+ */
#define PERF_PMU_CAP_AUX_PAUSE 0x0200
+/**
+ * define PERF_PMU_CAP_AUX_PREFER_LARGE - PMU prefers contiguous output buffers
+ *
+ * The PERF_PMU_CAP_AUX_PREFER_LARGE capability flag is a less strict variant of
+ * %PERF_PMU_CAP_AUX_NO_SG. PMU drivers for hardware that doesn't strictly
+ * require contiguous output buffers, but find the benefits outweigh the
+ * downside of increased memory fragmentation, may set this capability flag.
+ */
#define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
/**
---
base-commit: 31d56636e10e92ced06ead14b7541867f955e41d
change-id: 20250618-perf-pmu-cap-docs-a13e4ae939ac
Best regards,
--
Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] perf/headers: Document PERF_PMU_CAP capability flags
2025-06-18 19:08 [PATCH] perf/headers: Document PERF_PMU_CAP capability flags Nicolas Frattaroli
@ 2025-06-18 20:14 ` Ian Rogers
2025-06-19 14:50 ` Peter Zijlstra
2025-06-20 9:14 ` James Clark
2 siblings, 0 replies; 6+ messages in thread
From: Ian Rogers @ 2025-06-18 20:14 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, kernel, Jonathan Corbet, linux-perf-users,
linux-kernel
On Wed, Jun 18, 2025 at 12:09 PM Nicolas Frattaroli
<nicolas.frattaroli@collabora.com> wrote:
>
> Over the years, capability flags for perf PMUs were introduced in a
> piecemeal fashion whenever a new driver needed to signal to the perf
> core some limitation or special feature.
>
> Since one more undocumented flag that can have its meaning inferred from
> the commit message and implementation never seems that bad, it's
> understandable that this resulted in a total of 11 undocumented
> capability flags, which authors of new perf PMU drivers are expected to
> set correctly for their particular device.
>
> Since I am in the process of becoming such an author of a new perf
> driver, it feels proper to pay it forward by documenting all
> PERF_PMU_CAP_ constants, so that no future person has to go through an
> hour or two of git blame + reading perf core code to figure out which
> capability flags are right for them.
>
> Add comments in kernel-doc format that describes each flag. This follows
> the somewhat verbose "Object-like macro documentation" format, and can
> be verified with
>
> ./scripts/kernel-doc -v -none include/linux/perf_event.h
>
> The current in-tree kernel documentation does not include a page on the
> perf subsystem, but once it does, these comments should render as proper
> documentation annotation. Until then, they'll also be quite useful for
> anyone looking at the header file.
>
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Thanks for taking the effort to do this!
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> There may be more perf documentation patches in the future, but right
> now I'm focused on getting a minimally viable driver for the hardware
> I'm working on going. Documenting these seemed to have a fairly good
> effort-to-future-payoff ratio though.
>
> I Cc'd Corbet in case he has any input on the verbosity of the
> kernel-doc syntax here, maybe I'm missing something and all of these
> could be in a single /* comment */, but as it is in this patch doesn't
> seem too awful to me either.
> ---
> include/linux/perf_event.h | 74 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 74 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index ec9d96025683958e909bb2463439dc69634f4ceb..7d749fd5225be12543df6e475277563bf16c05b1 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -294,16 +294,90 @@ struct perf_event_pmu_context;
> /**
> * pmu::capabilities flags
> */
> +
> +/**
> + * define PERF_PMU_CAP_NO_INTERRUPT - \
> + * PMU is incapable of generating hardware interrupts
> + */
> #define PERF_PMU_CAP_NO_INTERRUPT 0x0001
> +/**
> + * define PERF_PMU_CAP_NO_NMI - \
> + * PMU is guaranteed to not generate non-maskable interrupts
> + */
> #define PERF_PMU_CAP_NO_NMI 0x0002
> +/**
> + * define PERF_PMU_CAP_AUX_NO_SG - \
> + * PMU does not support using scatter-gather as the output
> + *
> + * The PERF_PMU_CAP_AUX_NO_SG flag indicates that the PMU does not support
> + * scatter-gather for its output buffer, and needs a larger contiguous buffer
> + * to output to.
> + */
> #define PERF_PMU_CAP_AUX_NO_SG 0x0004
> +/**
> + * define PERF_PMU_CAP_EXTENDED_REGS - \
> + * PMU is capable of sampling extended registers
> + *
> + * Some architectures have a concept of extended registers, e.g. XMM0 on x86
> + * or VG on arm64. If the PMU is capable of sampling these registers, then the
> + * flag PERF_PMU_CAP_EXTENDED_REGS should be set.
> + */
> #define PERF_PMU_CAP_EXTENDED_REGS 0x0008
> +/**
> + * define PERF_PMU_CAP_EXCLUSIVE - \
> + * PMU can only have one scheduled event at a time
> + *
> + * Certain PMU hardware cannot track several events at the same time. Such
> + * hardware must set PERF_PMU_CAP_EXCLUSIVE in order to avoid conflicts.
> + */
> #define PERF_PMU_CAP_EXCLUSIVE 0x0010
> +/**
> + * define PERF_PMU_CAP_ITRACE - PMU traces instructions
> + *
> + * Some PMU hardware does instruction tracing, in that it traces execution of
> + * each instruction. Setting this capability flag makes the perf core generate
> + * a %PERF_RECORD_ITRACE_START event, recording the profiled task's PID and TID,
> + * to allow tools to properly decode such traces.
> + */
> #define PERF_PMU_CAP_ITRACE 0x0020
> +/**
> + * define PERF_PMU_CAP_NO_EXCLUDE - \
> + * PMU is incapable of excluding events based on context
> + *
> + * Some PMU hardware will count events regardless of context, including e.g.
> + * idle, kernel and guest. Drivers for such hardware should set the
> + * PERF_PMU_CAP_NO_EXCLUDE flag to explicitly advertise that they're unable to
> + * help themselves, so that the perf core can reject requests to exclude events
> + * based on context.
> + */
> #define PERF_PMU_CAP_NO_EXCLUDE 0x0040
> +/**
> + * define PERF_PMU_CAP_AUX_OUTPUT - PMU non-AUX events generate AUX data
> + *
> + * Drivers for PMU hardware that supports non-AUX events which generate data for
> + * AUX events should set PERF_PMU_CAP_AUX_OUTPUT. This flag tells the perf core
> + * to schedule non-AUX events together with AUX events, so that this data isn't
> + * lost.
> + */
> #define PERF_PMU_CAP_AUX_OUTPUT 0x0080
> +/**
> + * define PERF_PMU_CAP_EXTENDED_HW_TYPE - \
> + * PMU supports PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
> + */
> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> +/**
> + * define PERF_PMU_CAP_AUX_PAUSE - \
> + * PMU can pause and resume AUX area traces based on events
> + */
> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> +/**
> + * define PERF_PMU_CAP_AUX_PREFER_LARGE - PMU prefers contiguous output buffers
> + *
> + * The PERF_PMU_CAP_AUX_PREFER_LARGE capability flag is a less strict variant of
> + * %PERF_PMU_CAP_AUX_NO_SG. PMU drivers for hardware that doesn't strictly
> + * require contiguous output buffers, but find the benefits outweigh the
> + * downside of increased memory fragmentation, may set this capability flag.
> + */
> #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
>
> /**
>
> ---
> base-commit: 31d56636e10e92ced06ead14b7541867f955e41d
> change-id: 20250618-perf-pmu-cap-docs-a13e4ae939ac
>
> Best regards,
> --
> Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] perf/headers: Document PERF_PMU_CAP capability flags
2025-06-18 19:08 [PATCH] perf/headers: Document PERF_PMU_CAP capability flags Nicolas Frattaroli
2025-06-18 20:14 ` Ian Rogers
@ 2025-06-19 14:50 ` Peter Zijlstra
2025-06-19 16:06 ` Nicolas Frattaroli
2025-06-20 9:14 ` James Clark
2 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2025-06-19 14:50 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter, kernel,
Jonathan Corbet, linux-perf-users, linux-kernel
Mark just linked this thread from another thread:
https://lkml.kernel.org/r/20250619144254.GK1613376@noisy.programming.kicks-ass.net
On Wed, Jun 18, 2025 at 09:08:34PM +0200, Nicolas Frattaroli wrote:
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index ec9d96025683958e909bb2463439dc69634f4ceb..7d749fd5225be12543df6e475277563bf16c05b1 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -294,16 +294,90 @@ struct perf_event_pmu_context;
> /**
> * pmu::capabilities flags
> */
> +
> +/**
> + * define PERF_PMU_CAP_NO_INTERRUPT - \
> + * PMU is incapable of generating hardware interrupts
> + */
> #define PERF_PMU_CAP_NO_INTERRUPT 0x0001
This is not quite right; CAP_NO_INTERRUPT means it is not able to
generate samples.
While not being able to generate interrupts and not being able to
generate sample is more or less the same for CPU PMU drivers, this is
not true for uncore drivers. Even if an uncore driver has interrupt
capacility to help with counter overflow, it cannot generate samples.
> +/**
> + * define PERF_PMU_CAP_NO_NMI - \
> + * PMU is guaranteed to not generate non-maskable interrupts
> + */
> #define PERF_PMU_CAP_NO_NMI 0x0002
> +/**
> + * define PERF_PMU_CAP_AUX_NO_SG - \
> + * PMU does not support using scatter-gather as the output
> + *
> + * The PERF_PMU_CAP_AUX_NO_SG flag indicates that the PMU does not support
> + * scatter-gather for its output buffer, and needs a larger contiguous buffer
> + * to output to.
> + */
> #define PERF_PMU_CAP_AUX_NO_SG 0x0004
> +/**
> + * define PERF_PMU_CAP_EXTENDED_REGS - \
> + * PMU is capable of sampling extended registers
> + *
> + * Some architectures have a concept of extended registers, e.g. XMM0 on x86
> + * or VG on arm64. If the PMU is capable of sampling these registers, then the
> + * flag PERF_PMU_CAP_EXTENDED_REGS should be set.
> + */
> #define PERF_PMU_CAP_EXTENDED_REGS 0x0008
> +/**
> + * define PERF_PMU_CAP_EXCLUSIVE - \
> + * PMU can only have one scheduled event at a time
> + *
> + * Certain PMU hardware cannot track several events at the same time. Such
> + * hardware must set PERF_PMU_CAP_EXCLUSIVE in order to avoid conflicts.
> + */
> #define PERF_PMU_CAP_EXCLUSIVE 0x0010
> +/**
> + * define PERF_PMU_CAP_ITRACE - PMU traces instructions
> + *
> + * Some PMU hardware does instruction tracing, in that it traces execution of
> + * each instruction. Setting this capability flag makes the perf core generate
> + * a %PERF_RECORD_ITRACE_START event, recording the profiled task's PID and TID,
> + * to allow tools to properly decode such traces.
> + */
> #define PERF_PMU_CAP_ITRACE 0x0020
> +/**
> + * define PERF_PMU_CAP_NO_EXCLUDE - \
> + * PMU is incapable of excluding events based on context
> + *
> + * Some PMU hardware will count events regardless of context, including e.g.
> + * idle, kernel and guest. Drivers for such hardware should set the
> + * PERF_PMU_CAP_NO_EXCLUDE flag to explicitly advertise that they're unable to
> + * help themselves, so that the perf core can reject requests to exclude events
> + * based on context.
> + */
> #define PERF_PMU_CAP_NO_EXCLUDE 0x0040
More to the point might be saying that it will reject any event that
has: perf_event_attr::exclude_{user,kernel,hv,idle,host,guest} set.
> +/**
> + * define PERF_PMU_CAP_AUX_OUTPUT - PMU non-AUX events generate AUX data
> + *
> + * Drivers for PMU hardware that supports non-AUX events which generate data for
> + * AUX events should set PERF_PMU_CAP_AUX_OUTPUT. This flag tells the perf core
> + * to schedule non-AUX events together with AUX events, so that this data isn't
> + * lost.
> + */
> #define PERF_PMU_CAP_AUX_OUTPUT 0x0080
> +/**
> + * define PERF_PMU_CAP_EXTENDED_HW_TYPE - \
> + * PMU supports PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
> + */
> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> +/**
> + * define PERF_PMU_CAP_AUX_PAUSE - \
> + * PMU can pause and resume AUX area traces based on events
> + */
> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> +/**
> + * define PERF_PMU_CAP_AUX_PREFER_LARGE - PMU prefers contiguous output buffers
> + *
> + * The PERF_PMU_CAP_AUX_PREFER_LARGE capability flag is a less strict variant of
> + * %PERF_PMU_CAP_AUX_NO_SG. PMU drivers for hardware that doesn't strictly
> + * require contiguous output buffers, but find the benefits outweigh the
> + * downside of increased memory fragmentation, may set this capability flag.
> + */
> #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
>
> /**
>
> ---
> base-commit: 31d56636e10e92ced06ead14b7541867f955e41d
> change-id: 20250618-perf-pmu-cap-docs-a13e4ae939ac
>
> Best regards,
> --
> Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] perf/headers: Document PERF_PMU_CAP capability flags
2025-06-19 14:50 ` Peter Zijlstra
@ 2025-06-19 16:06 ` Nicolas Frattaroli
2025-06-20 8:04 ` Peter Zijlstra
0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Frattaroli @ 2025-06-19 16:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter, kernel,
Jonathan Corbet, linux-perf-users, linux-kernel
Hello,
On Thursday, 19 June 2025 16:50:44 Central European Summer Time Peter Zijlstra wrote:
>
> Mark just linked this thread from another thread:
>
> https://lkml.kernel.org/r/20250619144254.GK1613376@noisy.programming.kicks-ass.net
>
>
> On Wed, Jun 18, 2025 at 09:08:34PM +0200, Nicolas Frattaroli wrote:
>
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index ec9d96025683958e909bb2463439dc69634f4ceb..7d749fd5225be12543df6e475277563bf16c05b1 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -294,16 +294,90 @@ struct perf_event_pmu_context;
> > /**
> > * pmu::capabilities flags
> > */
> > +
> > +/**
> > + * define PERF_PMU_CAP_NO_INTERRUPT - \
> > + * PMU is incapable of generating hardware interrupts
> > + */
> > #define PERF_PMU_CAP_NO_INTERRUPT 0x0001
>
> This is not quite right; CAP_NO_INTERRUPT means it is not able to
> generate samples.
>
> While not being able to generate interrupts and not being able to
> generate sample is more or less the same for CPU PMU drivers, this is
> not true for uncore drivers. Even if an uncore driver has interrupt
> capacility to help with counter overflow, it cannot generate samples.
I'll send a follow-up v2 to fix this, though just to make sure I
understand this right, I have some questions for clarification.
Does "uncore" in this context mean PMU drivers for counters that are not
tied to the CPU instruction flow, but are counting other things like
interconnect statistics?
Also, am I correct in assuming "sample" in this context means the
concept represented by struct perf_sample_data, i.e. what appears to be
a snapshot of current process context, including registers and stack
information? Which would then mean going by my understanding of uncore
that basically every uncore driver should set this capability flag, as
they're not performance counter registers on a CPU that are intimately
tied to the ISAs execution state.
To further my understanding: does this mean that
drivers/devfreq/event/rockchip-dfi.c (used for measuring memory
bandwidth) should set PERF_PMU_CAP_NO_INTERRUPT, since it's not a CPU
but a memory controller monitor?
In a more general sense, if anyone has any written resources on writing
PMU drivers, rather than perf from a userspace perspective, I'd be very
happy to get some pointers in their direction.
>
> > +/**
> > + * define PERF_PMU_CAP_NO_NMI - \
> > + * PMU is guaranteed to not generate non-maskable interrupts
> > + */
> > #define PERF_PMU_CAP_NO_NMI 0x0002
> > +/**
> > + * define PERF_PMU_CAP_AUX_NO_SG - \
> > + * PMU does not support using scatter-gather as the output
> > + *
> > + * The PERF_PMU_CAP_AUX_NO_SG flag indicates that the PMU does not support
> > + * scatter-gather for its output buffer, and needs a larger contiguous buffer
> > + * to output to.
> > + */
> > #define PERF_PMU_CAP_AUX_NO_SG 0x0004
> > +/**
> > + * define PERF_PMU_CAP_EXTENDED_REGS - \
> > + * PMU is capable of sampling extended registers
> > + *
> > + * Some architectures have a concept of extended registers, e.g. XMM0 on x86
> > + * or VG on arm64. If the PMU is capable of sampling these registers, then the
> > + * flag PERF_PMU_CAP_EXTENDED_REGS should be set.
> > + */
> > #define PERF_PMU_CAP_EXTENDED_REGS 0x0008
> > +/**
> > + * define PERF_PMU_CAP_EXCLUSIVE - \
> > + * PMU can only have one scheduled event at a time
> > + *
> > + * Certain PMU hardware cannot track several events at the same time. Such
> > + * hardware must set PERF_PMU_CAP_EXCLUSIVE in order to avoid conflicts.
> > + */
> > #define PERF_PMU_CAP_EXCLUSIVE 0x0010
> > +/**
> > + * define PERF_PMU_CAP_ITRACE - PMU traces instructions
> > + *
> > + * Some PMU hardware does instruction tracing, in that it traces execution of
> > + * each instruction. Setting this capability flag makes the perf core generate
> > + * a %PERF_RECORD_ITRACE_START event, recording the profiled task's PID and TID,
> > + * to allow tools to properly decode such traces.
> > + */
> > #define PERF_PMU_CAP_ITRACE 0x0020
> > +/**
> > + * define PERF_PMU_CAP_NO_EXCLUDE - \
> > + * PMU is incapable of excluding events based on context
> > + *
> > + * Some PMU hardware will count events regardless of context, including e.g.
> > + * idle, kernel and guest. Drivers for such hardware should set the
> > + * PERF_PMU_CAP_NO_EXCLUDE flag to explicitly advertise that they're unable to
> > + * help themselves, so that the perf core can reject requests to exclude events
> > + * based on context.
> > + */
> > #define PERF_PMU_CAP_NO_EXCLUDE 0x0040
>
> More to the point might be saying that it will reject any event that
> has: perf_event_attr::exclude_{user,kernel,hv,idle,host,guest} set.
I'll reword it around that, thanks.
Kind regards,
Nicolas Frattaroli
>
> > +/**
> > + * define PERF_PMU_CAP_AUX_OUTPUT - PMU non-AUX events generate AUX data
> > + *
> > + * Drivers for PMU hardware that supports non-AUX events which generate data for
> > + * AUX events should set PERF_PMU_CAP_AUX_OUTPUT. This flag tells the perf core
> > + * to schedule non-AUX events together with AUX events, so that this data isn't
> > + * lost.
> > + */
> > #define PERF_PMU_CAP_AUX_OUTPUT 0x0080
> > +/**
> > + * define PERF_PMU_CAP_EXTENDED_HW_TYPE - \
> > + * PMU supports PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
> > + */
> > #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> > +/**
> > + * define PERF_PMU_CAP_AUX_PAUSE - \
> > + * PMU can pause and resume AUX area traces based on events
> > + */
> > #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> > +/**
> > + * define PERF_PMU_CAP_AUX_PREFER_LARGE - PMU prefers contiguous output buffers
> > + *
> > + * The PERF_PMU_CAP_AUX_PREFER_LARGE capability flag is a less strict variant of
> > + * %PERF_PMU_CAP_AUX_NO_SG. PMU drivers for hardware that doesn't strictly
> > + * require contiguous output buffers, but find the benefits outweigh the
> > + * downside of increased memory fragmentation, may set this capability flag.
> > + */
> > #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
> >
> > /**
> >
> > ---
> > base-commit: 31d56636e10e92ced06ead14b7541867f955e41d
> > change-id: 20250618-perf-pmu-cap-docs-a13e4ae939ac
> >
> > Best regards,
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] perf/headers: Document PERF_PMU_CAP capability flags
2025-06-19 16:06 ` Nicolas Frattaroli
@ 2025-06-20 8:04 ` Peter Zijlstra
0 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2025-06-20 8:04 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter, kernel,
Jonathan Corbet, linux-perf-users, linux-kernel
On Thu, Jun 19, 2025 at 06:06:50PM +0200, Nicolas Frattaroli wrote:
> > > #define PERF_PMU_CAP_NO_INTERRUPT 0x0001
> >
> > This is not quite right; CAP_NO_INTERRUPT means it is not able to
> > generate samples.
> >
> > While not being able to generate interrupts and not being able to
> > generate sample is more or less the same for CPU PMU drivers, this is
> > not true for uncore drivers. Even if an uncore driver has interrupt
> > capacility to help with counter overflow, it cannot generate samples.
>
> I'll send a follow-up v2 to fix this, though just to make sure I
> understand this right, I have some questions for clarification.
>
> Does "uncore" in this context mean PMU drivers for counters that are not
> tied to the CPU instruction flow, but are counting other things like
> interconnect statistics?
Correct.
> Also, am I correct in assuming "sample" in this context means the
> concept represented by struct perf_sample_data, i.e. what appears to be
> a snapshot of current process context, including registers and stack
> information?
Right; perf_event_attr::sample_type, filled out with bits from
perf_event_sample_format.
> Which would then mean going by my understanding of uncore
> that basically every uncore driver should set this capability flag, as
> they're not performance counter registers on a CPU that are intimately
> tied to the ISAs execution state.
Correct again. There is interconnect, memory and even GPU drivers out
there these days.
> To further my understanding: does this mean that
> drivers/devfreq/event/rockchip-dfi.c (used for measuring memory
> bandwidth) should set PERF_PMU_CAP_NO_INTERRUPT, since it's not a CPU
> but a memory controller monitor?
Yup, that would indeed seem to be so.
> In a more general sense, if anyone has any written resources on writing
> PMU drivers, rather than perf from a userspace perspective, I'd be very
> happy to get some pointers in their direction.
I'm afraid not :/ The best we have is the comments in struct pmu.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] perf/headers: Document PERF_PMU_CAP capability flags
2025-06-18 19:08 [PATCH] perf/headers: Document PERF_PMU_CAP capability flags Nicolas Frattaroli
2025-06-18 20:14 ` Ian Rogers
2025-06-19 14:50 ` Peter Zijlstra
@ 2025-06-20 9:14 ` James Clark
2 siblings, 0 replies; 6+ messages in thread
From: James Clark @ 2025-06-20 9:14 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: kernel, Jonathan Corbet, linux-perf-users, linux-kernel,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter
On 18/06/2025 8:08 pm, Nicolas Frattaroli wrote:
> Over the years, capability flags for perf PMUs were introduced in a
> piecemeal fashion whenever a new driver needed to signal to the perf
> core some limitation or special feature.
>
> Since one more undocumented flag that can have its meaning inferred from
> the commit message and implementation never seems that bad, it's
> understandable that this resulted in a total of 11 undocumented
> capability flags, which authors of new perf PMU drivers are expected to
> set correctly for their particular device.
>
> Since I am in the process of becoming such an author of a new perf
> driver, it feels proper to pay it forward by documenting all
> PERF_PMU_CAP_ constants, so that no future person has to go through an
> hour or two of git blame + reading perf core code to figure out which
> capability flags are right for them.
>
> Add comments in kernel-doc format that describes each flag. This follows
> the somewhat verbose "Object-like macro documentation" format, and can
> be verified with
>
> ./scripts/kernel-doc -v -none include/linux/perf_event.h
>
> The current in-tree kernel documentation does not include a page on the
> perf subsystem, but once it does, these comments should render as proper
> documentation annotation. Until then, they'll also be quite useful for
> anyone looking at the header file.
>
Reviewed-by: James Clark <james.clark@linaro.org>
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> ---
> There may be more perf documentation patches in the future, but right
> now I'm focused on getting a minimally viable driver for the hardware
> I'm working on going. Documenting these seemed to have a fairly good
> effort-to-future-payoff ratio though.
>
> I Cc'd Corbet in case he has any input on the verbosity of the
> kernel-doc syntax here, maybe I'm missing something and all of these
> could be in a single /* comment */, but as it is in this patch doesn't
> seem too awful to me either.
> ---
> include/linux/perf_event.h | 74 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 74 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index ec9d96025683958e909bb2463439dc69634f4ceb..7d749fd5225be12543df6e475277563bf16c05b1 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -294,16 +294,90 @@ struct perf_event_pmu_context;
> /**
> * pmu::capabilities flags
> */
> +
> +/**
> + * define PERF_PMU_CAP_NO_INTERRUPT - \
> + * PMU is incapable of generating hardware interrupts
> + */
> #define PERF_PMU_CAP_NO_INTERRUPT 0x0001
> +/**
> + * define PERF_PMU_CAP_NO_NMI - \
> + * PMU is guaranteed to not generate non-maskable interrupts
> + */
> #define PERF_PMU_CAP_NO_NMI 0x0002
> +/**
> + * define PERF_PMU_CAP_AUX_NO_SG - \
> + * PMU does not support using scatter-gather as the output
> + *
> + * The PERF_PMU_CAP_AUX_NO_SG flag indicates that the PMU does not support
> + * scatter-gather for its output buffer, and needs a larger contiguous buffer
> + * to output to.
> + */
> #define PERF_PMU_CAP_AUX_NO_SG 0x0004
> +/**
> + * define PERF_PMU_CAP_EXTENDED_REGS - \
> + * PMU is capable of sampling extended registers
> + *
> + * Some architectures have a concept of extended registers, e.g. XMM0 on x86
> + * or VG on arm64. If the PMU is capable of sampling these registers, then the
> + * flag PERF_PMU_CAP_EXTENDED_REGS should be set.
> + */
> #define PERF_PMU_CAP_EXTENDED_REGS 0x0008
> +/**
> + * define PERF_PMU_CAP_EXCLUSIVE - \
> + * PMU can only have one scheduled event at a time
> + *
> + * Certain PMU hardware cannot track several events at the same time. Such
> + * hardware must set PERF_PMU_CAP_EXCLUSIVE in order to avoid conflicts.
> + */
> #define PERF_PMU_CAP_EXCLUSIVE 0x0010
> +/**
> + * define PERF_PMU_CAP_ITRACE - PMU traces instructions
> + *
> + * Some PMU hardware does instruction tracing, in that it traces execution of
> + * each instruction. Setting this capability flag makes the perf core generate
> + * a %PERF_RECORD_ITRACE_START event, recording the profiled task's PID and TID,
> + * to allow tools to properly decode such traces.
> + */
> #define PERF_PMU_CAP_ITRACE 0x0020
> +/**
> + * define PERF_PMU_CAP_NO_EXCLUDE - \
> + * PMU is incapable of excluding events based on context
> + *
> + * Some PMU hardware will count events regardless of context, including e.g.
> + * idle, kernel and guest. Drivers for such hardware should set the
> + * PERF_PMU_CAP_NO_EXCLUDE flag to explicitly advertise that they're unable to
> + * help themselves, so that the perf core can reject requests to exclude events
> + * based on context.
> + */
> #define PERF_PMU_CAP_NO_EXCLUDE 0x0040
> +/**
> + * define PERF_PMU_CAP_AUX_OUTPUT - PMU non-AUX events generate AUX data
> + *
> + * Drivers for PMU hardware that supports non-AUX events which generate data for
> + * AUX events should set PERF_PMU_CAP_AUX_OUTPUT. This flag tells the perf core
> + * to schedule non-AUX events together with AUX events, so that this data isn't
> + * lost.
> + */
> #define PERF_PMU_CAP_AUX_OUTPUT 0x0080
> +/**
> + * define PERF_PMU_CAP_EXTENDED_HW_TYPE - \
> + * PMU supports PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
> + */
> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> +/**
> + * define PERF_PMU_CAP_AUX_PAUSE - \
> + * PMU can pause and resume AUX area traces based on events
> + */
> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> +/**
> + * define PERF_PMU_CAP_AUX_PREFER_LARGE - PMU prefers contiguous output buffers
> + *
> + * The PERF_PMU_CAP_AUX_PREFER_LARGE capability flag is a less strict variant of
> + * %PERF_PMU_CAP_AUX_NO_SG. PMU drivers for hardware that doesn't strictly
> + * require contiguous output buffers, but find the benefits outweigh the
> + * downside of increased memory fragmentation, may set this capability flag.
> + */
> #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
>
> /**
>
> ---
> base-commit: 31d56636e10e92ced06ead14b7541867f955e41d
> change-id: 20250618-perf-pmu-cap-docs-a13e4ae939ac
>
> Best regards,
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-06-20 9:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18 19:08 [PATCH] perf/headers: Document PERF_PMU_CAP capability flags Nicolas Frattaroli
2025-06-18 20:14 ` Ian Rogers
2025-06-19 14:50 ` Peter Zijlstra
2025-06-19 16:06 ` Nicolas Frattaroli
2025-06-20 8:04 ` Peter Zijlstra
2025-06-20 9:14 ` James Clark
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).