* [PATCH 00/19] perf: Rework event_init checks
@ 2025-08-13 17:00 Robin Murphy
2025-08-13 17:00 ` [PATCH 01/19] perf/arm-cmn: Fix event validation Robin Murphy
` (18 more replies)
0 siblings, 19 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:00 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
Hi all,
[ Note I'm only CC'ing lists for now to avoid spamming nearly 100
individual maintainers/reviewers while we work out the basics ]
Reviving my idea from a few years back, the aim here is to minimise
the amount of event_init boilerplate that most new drivers have to
implement (and so many get wrong), while also trying to establish
some more consistent and easy-to-follow patterns for the things that
drivers should still care about (mostly group validation).
It's ended up somewhat big and ugly, so to start with I've tried to
optimise for ease of review - based on the typical "fixes, cleanup,
new development" order the split of the current patches is like so:
* Group validation rework (patches #1-#15)
- Specific drivers with functional issues by inspection (#1-#7)
- Specific drivers where cleanup changes were non-trivial (#8-#11)
- Common patterns across remaining drivers (#12-#15)
* Capabilities rework (patches #16-#18)
* Giant bonfire of remaining boilerplate! (patch #19)
If the overall idea is acceptable then a more relaxed merge strategy
might be to look at landing the common parts first (#16-#18 and maybe
#13), then rearrange the rest into per-driver patches, but I'm sure
nobody wants a ~70-patch series out of the gate :)
Thanks,
Robin.
Robin Murphy (19):
perf/arm-cmn: Fix event validation
perf/hisilicon: Fix group validation
perf/imx8_ddr: Fix group validation
perf/starfive: Fix group validation
iommu/vt-d: Fix perfmon group validation
ARM: l2x0: Fix group validation
ARM: imx: Fix MMDC PMU group validation
perf/arm_smmu_v3: Improve group validation
perf/qcom: Improve group validation
perf/arm-ni: Improve event validation
perf/arm-cci: Tidy up event validation
perf: Ignore event state for group validation
perf: Add helper for checking grouped events
perf: Clean up redundant group validation
perf: Simplify group validation
perf: Introduce positive capability for sampling
perf: Retire PERF_PMU_CAP_NO_INTERRUPT
perf: Introduce positive capability for raw events
perf: Garbage-collect event_init checks
arch/alpha/kernel/perf_event.c | 5 +-
arch/arc/kernel/perf_event.c | 4 +-
arch/arm/mach-imx/mmdc.c | 29 ++----
arch/arm/mm/cache-l2x0-pmu.c | 19 +---
arch/csky/kernel/perf_event.c | 3 +-
arch/loongarch/kernel/perf_event.c | 1 +
arch/mips/kernel/perf_event_mipsxx.c | 1 +
arch/powerpc/perf/8xx-pmu.c | 3 +-
arch/powerpc/perf/core-book3s.c | 4 +-
arch/powerpc/perf/core-fsl-emb.c | 4 +-
arch/powerpc/perf/hv-24x7.c | 11 ---
arch/powerpc/perf/hv-gpci.c | 11 ---
arch/powerpc/perf/imc-pmu.c | 31 +-----
arch/powerpc/perf/kvm-hv-pmu.c | 5 +-
arch/powerpc/perf/vpa-pmu.c | 13 +--
arch/powerpc/platforms/pseries/papr_scm.c | 18 +---
arch/s390/kernel/perf_cpum_cf.c | 8 +-
arch/s390/kernel/perf_cpum_sf.c | 2 +
arch/s390/kernel/perf_pai_crypto.c | 1 +
arch/s390/kernel/perf_pai_ext.c | 1 +
arch/sh/kernel/perf_event.c | 1 -
arch/sparc/kernel/perf_event.c | 4 +-
arch/x86/events/amd/ibs.c | 32 ++-----
arch/x86/events/amd/iommu.c | 15 ---
arch/x86/events/amd/power.c | 7 --
arch/x86/events/amd/uncore.c | 12 +--
arch/x86/events/core.c | 7 +-
arch/x86/events/intel/bts.c | 3 -
arch/x86/events/intel/cstate.c | 16 +---
arch/x86/events/intel/pt.c | 3 -
arch/x86/events/intel/uncore.c | 16 +---
arch/x86/events/intel/uncore_snb.c | 18 ----
arch/x86/events/msr.c | 8 +-
arch/x86/events/rapl.c | 11 ---
arch/xtensa/kernel/perf_event.c | 1 +
drivers/devfreq/event/rockchip-dfi.c | 13 +--
drivers/dma/idxd/perfmon.c | 17 +---
drivers/fpga/dfl-fme-perf.c | 18 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c | 4 -
drivers/gpu/drm/i915/i915_pmu.c | 13 ---
drivers/gpu/drm/xe/xe_pmu.c | 13 ---
.../hwtracing/coresight/coresight-etm-perf.c | 5 -
drivers/hwtracing/ptt/hisi_ptt.c | 8 --
drivers/iommu/intel/perfmon.c | 28 +++---
drivers/perf/alibaba_uncore_drw_pmu.c | 28 +-----
drivers/perf/amlogic/meson_ddr_pmu_core.c | 9 --
drivers/perf/arm-cci.c | 56 +++--------
drivers/perf/arm-ccn.c | 34 -------
drivers/perf/arm-cmn.c | 15 +--
drivers/perf/arm-ni.c | 35 +++----
drivers/perf/arm_cspmu/arm_cspmu.c | 34 +------
drivers/perf/arm_dmc620_pmu.c | 28 +-----
drivers/perf/arm_dsu_pmu.c | 26 +----
drivers/perf/arm_pmu.c | 19 +---
drivers/perf/arm_pmu_platform.c | 2 +-
drivers/perf/arm_smmuv3_pmu.c | 35 ++-----
drivers/perf/arm_spe_pmu.c | 7 +-
drivers/perf/cxl_pmu.c | 6 --
drivers/perf/dwc_pcie_pmu.c | 21 +---
drivers/perf/fsl_imx8_ddr_perf.c | 32 +------
drivers/perf/fsl_imx9_ddr_perf.c | 27 ------
drivers/perf/hisilicon/hisi_pcie_pmu.c | 25 ++---
drivers/perf/hisilicon/hisi_uncore_pmu.c | 41 ++------
drivers/perf/hisilicon/hns3_pmu.c | 24 ++---
drivers/perf/marvell_cn10k_ddr_pmu.c | 18 ----
drivers/perf/marvell_cn10k_tad_pmu.c | 12 +--
drivers/perf/marvell_pem_pmu.c | 22 +----
drivers/perf/qcom_l2_pmu.c | 96 ++++++-------------
drivers/perf/qcom_l3_pmu.c | 33 ++-----
drivers/perf/riscv_pmu_legacy.c | 1 -
drivers/perf/riscv_pmu_sbi.c | 3 +-
drivers/perf/starfive_starlink_pmu.c | 32 ++-----
drivers/perf/thunderx2_pmu.c | 45 ++-------
drivers/perf/xgene_pmu.c | 29 ------
drivers/powercap/intel_rapl_common.c | 9 +-
include/linux/perf_event.h | 10 +-
kernel/events/core.c | 35 +++++--
kernel/events/hw_breakpoint.c | 1 +
78 files changed, 244 insertions(+), 1053 deletions(-)
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 01/19] perf/arm-cmn: Fix event validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
@ 2025-08-13 17:00 ` Robin Murphy
2025-08-26 10:46 ` Mark Rutland
2025-08-13 17:00 ` [PATCH 02/19] perf/hisilicon: Fix group validation Robin Murphy
` (17 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:00 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
In the hypothetical case where a CMN event is opened with a software
group leader that already has some other hardware sibling, currently
arm_cmn_val_add_event() could try to interpret the other event's data
as an arm_cmn_hw_event, which is not great since we dereference a
pointer from there... Thankfully the way to be more robust is to be
less clever - stop trying to special-case software events and simply
skip any event that isn't for our PMU.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/perf/arm-cmn.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 11fb2234b10f..f8c9be9fa6c0 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1652,7 +1652,7 @@ static void arm_cmn_val_add_event(struct arm_cmn *cmn, struct arm_cmn_val *val,
enum cmn_node_type type;
int i;
- if (is_software_event(event))
+ if (event->pmu != &cmn->pmu)
return;
type = CMN_EVENT_TYPE(event);
@@ -1693,9 +1693,6 @@ static int arm_cmn_validate_group(struct arm_cmn *cmn, struct perf_event *event)
if (leader == event)
return 0;
- if (event->pmu != leader->pmu && !is_software_event(leader))
- return -EINVAL;
-
val = kzalloc(sizeof(*val), GFP_KERNEL);
if (!val)
return -ENOMEM;
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 02/19] perf/hisilicon: Fix group validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
2025-08-13 17:00 ` [PATCH 01/19] perf/arm-cmn: Fix event validation Robin Murphy
@ 2025-08-13 17:00 ` Robin Murphy
2025-08-26 11:15 ` Mark Rutland
2025-08-13 17:00 ` [PATCH 03/19] perf/imx8_ddr: " Robin Murphy
` (16 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:00 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
The group validation logic shared by the HiSilicon HNS3/PCIe drivers is
a bit off, in that given a software group leader, it will consider that
event *in place of* the actual new event being opened. At worst this
could theoretically allow an unschedulable group if the software event
config happens to look like one of the hardware siblings.
The uncore framework avoids that particular issue, but all 3 also share
the common issue of not preventing racy access to the sibling list, and
some redundant checks which can be cleaned up.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/perf/hisilicon/hisi_pcie_pmu.c | 17 ++++++-----------
drivers/perf/hisilicon/hisi_uncore_pmu.c | 23 +++++++----------------
drivers/perf/hisilicon/hns3_pmu.c | 17 ++++++-----------
3 files changed, 19 insertions(+), 38 deletions(-)
diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
index c5394d007b61..3b0b2f7197d0 100644
--- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
+++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
@@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
int counters = 1;
int num;
- event_group[0] = leader;
- if (!is_software_event(leader)) {
- if (leader->pmu != event->pmu)
- return false;
+ if (leader == event)
+ return true;
- if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
- event_group[counters++] = event;
- }
+ event_group[0] = event;
+ if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
+ event_group[counters++] = leader;
for_each_sibling_event(sibling, event->group_leader) {
- if (is_software_event(sibling))
- continue;
-
if (sibling->pmu != event->pmu)
- return false;
+ continue;
for (num = 0; num < counters; num++) {
/*
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index a449651f79c9..3c531b36cf25 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -101,26 +101,17 @@ static bool hisi_validate_event_group(struct perf_event *event)
/* Include count for the event */
int counters = 1;
- if (!is_software_event(leader)) {
- /*
- * We must NOT create groups containing mixed PMUs, although
- * software events are acceptable
- */
- if (leader->pmu != event->pmu)
- return false;
+ if (leader == event)
+ return true;
- /* Increment counter for the leader */
- if (leader != event)
- counters++;
- }
+ /* Increment counter for the leader */
+ if (leader->pmu == event->pmu)
+ counters++;
for_each_sibling_event(sibling, event->group_leader) {
- if (is_software_event(sibling))
- continue;
- if (sibling->pmu != event->pmu)
- return false;
/* Increment counter for each sibling */
- counters++;
+ if (sibling->pmu == event->pmu)
+ counters++;
}
/* The group can not count events more than the counters in the HW */
diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c
index c157f3572cae..382e469257f9 100644
--- a/drivers/perf/hisilicon/hns3_pmu.c
+++ b/drivers/perf/hisilicon/hns3_pmu.c
@@ -1058,21 +1058,16 @@ static bool hns3_pmu_validate_event_group(struct perf_event *event)
int counters = 1;
int num;
- event_group[0] = leader;
- if (!is_software_event(leader)) {
- if (leader->pmu != event->pmu)
- return false;
+ if (leader == event)
+ return true;
- if (leader != event && !hns3_pmu_cmp_event(leader, event))
- event_group[counters++] = event;
- }
+ event_group[0] = event;
+ if (leader->pmu == event->pmu && !hns3_pmu_cmp_event(leader, event))
+ event_group[counters++] = leader;
for_each_sibling_event(sibling, event->group_leader) {
- if (is_software_event(sibling))
- continue;
-
if (sibling->pmu != event->pmu)
- return false;
+ continue;
for (num = 0; num < counters; num++) {
/*
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 03/19] perf/imx8_ddr: Fix group validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
2025-08-13 17:00 ` [PATCH 01/19] perf/arm-cmn: Fix event validation Robin Murphy
2025-08-13 17:00 ` [PATCH 02/19] perf/hisilicon: Fix group validation Robin Murphy
@ 2025-08-13 17:00 ` Robin Murphy
2025-08-13 17:00 ` [PATCH 04/19] perf/starfive: " Robin Murphy
` (15 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:00 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
The group validation here is erroneously inspecting software events,
as well as other hardware siblings, which are only checked for *after*
they've already been misinterpreted. Once again, just ignore events
which don't belong to our PMU, and don't duplicate what
perf_event_open() will already check for us.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/perf/fsl_imx8_ddr_perf.c | 21 +++++----------------
1 file changed, 5 insertions(+), 16 deletions(-)
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index b989ffa95d69..56fe281974d2 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -331,6 +331,9 @@ static u32 ddr_perf_filter_val(struct perf_event *event)
static bool ddr_perf_filters_compatible(struct perf_event *a,
struct perf_event *b)
{
+ /* Ignore grouped events that aren't ours */
+ if (a->pmu != b->pmu)
+ return true;
if (!ddr_perf_is_filtered(a))
return true;
if (!ddr_perf_is_filtered(b))
@@ -409,16 +412,8 @@ static int ddr_perf_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
- /*
- * We must NOT create groups containing mixed PMUs, although software
- * events are acceptable (for example to create a CCN group
- * periodically read when a hrtimer aka cpu-clock leader triggers).
- */
- if (event->group_leader->pmu != event->pmu &&
- !is_software_event(event->group_leader))
- return -EINVAL;
-
- if (pmu->devtype_data->quirks & DDR_CAP_AXI_ID_FILTER) {
+ if (event != event->group_leader &&
+ pmu->devtype_data->quirks & DDR_CAP_AXI_ID_FILTER) {
if (!ddr_perf_filters_compatible(event, event->group_leader))
return -EINVAL;
for_each_sibling_event(sibling, event->group_leader) {
@@ -427,12 +422,6 @@ static int ddr_perf_event_init(struct perf_event *event)
}
}
- for_each_sibling_event(sibling, event->group_leader) {
- if (sibling->pmu != event->pmu &&
- !is_software_event(sibling))
- return -EINVAL;
- }
-
event->cpu = pmu->cpu;
hwc->idx = -1;
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 04/19] perf/starfive: Fix group validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (2 preceding siblings ...)
2025-08-13 17:00 ` [PATCH 03/19] perf/imx8_ddr: " Robin Murphy
@ 2025-08-13 17:00 ` Robin Murphy
2025-08-13 17:00 ` [PATCH 05/19] iommu/vt-d: Fix perfmon " Robin Murphy
` (14 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:00 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
The group validation code here is superficially the right shape, but
is failing to count the group leader, while also erroneously counting
software siblings. Just correctly count the events which belong to our
PMU, and let perf core worry about the rest.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/perf/starfive_starlink_pmu.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/drivers/perf/starfive_starlink_pmu.c b/drivers/perf/starfive_starlink_pmu.c
index 5e5a672b4229..e185f307e639 100644
--- a/drivers/perf/starfive_starlink_pmu.c
+++ b/drivers/perf/starfive_starlink_pmu.c
@@ -347,19 +347,15 @@ static bool starlink_pmu_validate_event_group(struct perf_event *event)
struct perf_event *sibling;
int counter = 1;
- /*
- * Ensure hardware events in the group are on the same PMU,
- * software events are acceptable.
- */
- if (event->group_leader->pmu != event->pmu &&
- !is_software_event(event->group_leader))
- return false;
+ if (leader == event)
+ return true;
+
+ if (leader->pmu == event->pmu)
+ counter++;
for_each_sibling_event(sibling, leader) {
- if (sibling->pmu != event->pmu && !is_software_event(sibling))
- return false;
-
- counter++;
+ if (sibling->pmu == event->pmu)
+ counter++;
}
return counter <= STARLINK_PMU_NUM_COUNTERS;
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 05/19] iommu/vt-d: Fix perfmon group validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (3 preceding siblings ...)
2025-08-13 17:00 ` [PATCH 04/19] perf/starfive: " Robin Murphy
@ 2025-08-13 17:00 ` Robin Murphy
2025-08-13 17:00 ` [PATCH 06/19] ARM: l2x0: Fix " Robin Murphy
` (13 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:00 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
The group validation here has a few issues to fix: firstly, failing to
count the group leader or the event being opened itself. Secondly it
appears wrong not to count disabled sibling events given that they could
be enabled later. Finally there's the subtlety that we should avoid racy
access to the sibling list when the event is its own group leader.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/iommu/intel/perfmon.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/intel/perfmon.c b/drivers/iommu/intel/perfmon.c
index 75f493bcb353..c3a1ac14cb2b 100644
--- a/drivers/iommu/intel/perfmon.c
+++ b/drivers/iommu/intel/perfmon.c
@@ -258,21 +258,25 @@ static int iommu_pmu_validate_group(struct perf_event *event)
{
struct iommu_pmu *iommu_pmu = iommu_event_to_pmu(event);
struct perf_event *sibling;
- int nr = 0;
+ int nr = 1;
+ if (event == event->group_leader)
+ return 0;
/*
* All events in a group must be scheduled simultaneously.
* Check whether there is enough counters for all the events.
*/
- for_each_sibling_event(sibling, event->group_leader) {
- if (!is_iommu_pmu_event(iommu_pmu, sibling) ||
- sibling->state <= PERF_EVENT_STATE_OFF)
- continue;
+ if (is_iommu_pmu_event(iommu_pmu, event->group_leader))
+ ++nr;
- if (++nr > iommu_pmu->num_cntr)
- return -EINVAL;
+ for_each_sibling_event(sibling, event->group_leader) {
+ if (is_iommu_pmu_event(iommu_pmu, sibling))
+ ++nr;
}
+ if (nr > iommu_pmu->num_cntr)
+ return -EINVAL;
+
return 0;
}
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 06/19] ARM: l2x0: Fix group validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (4 preceding siblings ...)
2025-08-13 17:00 ` [PATCH 05/19] iommu/vt-d: Fix perfmon " Robin Murphy
@ 2025-08-13 17:00 ` Robin Murphy
2025-08-13 17:00 ` [PATCH 07/19] ARM: imx: Fix MMDC PMU " Robin Murphy
` (12 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:00 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
The group validation here is almost right, but fails to count the new
event itself. While we fix that, also adopt the standard pattern to
avoid racy access the sibling list and drop checks that are redundant
with core code.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
arch/arm/mm/cache-l2x0-pmu.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index 93ef0502b7ff..6fc1171031a8 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -274,18 +274,17 @@ static bool l2x0_pmu_group_is_valid(struct perf_event *event)
struct pmu *pmu = event->pmu;
struct perf_event *leader = event->group_leader;
struct perf_event *sibling;
- int num_hw = 0;
+ int num_hw = 1;
+
+ if (leader == event)
+ return true;
if (leader->pmu == pmu)
num_hw++;
- else if (!is_software_event(leader))
- return false;
for_each_sibling_event(sibling, leader) {
if (sibling->pmu == pmu)
num_hw++;
- else if (!is_software_event(sibling))
- return false;
}
return num_hw <= PMU_NR_COUNTERS;
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 07/19] ARM: imx: Fix MMDC PMU group validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (5 preceding siblings ...)
2025-08-13 17:00 ` [PATCH 06/19] ARM: l2x0: Fix " Robin Murphy
@ 2025-08-13 17:00 ` Robin Murphy
2025-08-13 17:01 ` [PATCH 08/19] perf/arm_smmu_v3: Improve " Robin Murphy
` (11 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:00 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
The group validation here gets the event and its group leader mixed up,
such that if the group leader belongs to a different PMU, the set_bit()
may go wildly out of bounds. While we fix that, also adopt the standard
pattern to avoid racy access the sibling list and drop checks that are
redundant with core code.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
arch/arm/mach-imx/mmdc.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index 94e4f4a2f73f..f9d432b385a2 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -238,11 +238,8 @@ static bool mmdc_pmu_group_event_is_valid(struct perf_event *event,
{
int cfg = event->attr.config;
- if (is_software_event(event))
- return true;
-
if (event->pmu != pmu)
- return false;
+ return true;
return !test_and_set_bit(cfg, used_counters);
}
@@ -260,12 +257,12 @@ static bool mmdc_pmu_group_is_valid(struct perf_event *event)
struct perf_event *sibling;
unsigned long counter_mask = 0;
- set_bit(leader->attr.config, &counter_mask);
+ if (event == leader)
+ return true;
- if (event != leader) {
- if (!mmdc_pmu_group_event_is_valid(event, pmu, &counter_mask))
- return false;
- }
+ set_bit(event->attr.config, &counter_mask);
+ if (!mmdc_pmu_group_event_is_valid(leader, pmu, &counter_mask))
+ return false;
for_each_sibling_event(sibling, leader) {
if (!mmdc_pmu_group_event_is_valid(sibling, pmu, &counter_mask))
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 08/19] perf/arm_smmu_v3: Improve group validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (6 preceding siblings ...)
2025-08-13 17:00 ` [PATCH 07/19] ARM: imx: Fix MMDC PMU " Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-13 17:01 ` [PATCH 09/19] perf/qcom: " Robin Murphy
` (10 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
The group validation here is OK, except for the benign issue that it
will double-count an event that is its own group leader. Even though
it's highly unlikely we'd ever have PMCG hardware with only one counter,
let's sort that out, cleaning up some reudundant checks in the process.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/perf/arm_smmuv3_pmu.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 621f02a7f43b..7cac380a3528 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -377,9 +377,6 @@ static int smmu_pmu_get_event_idx(struct smmu_pmu *smmu_pmu,
static bool smmu_pmu_events_compatible(struct perf_event *curr,
struct perf_event *new)
{
- if (new->pmu != curr->pmu)
- return false;
-
if (to_smmu_pmu(new->pmu)->global_filter &&
!smmu_pmu_check_global_filter(curr, new))
return false;
@@ -422,15 +419,6 @@ static int smmu_pmu_event_init(struct perf_event *event)
return -EINVAL;
}
- /* Don't allow groups with mixed PMUs, except for s/w events */
- if (!is_software_event(event->group_leader)) {
- if (!smmu_pmu_events_compatible(event->group_leader, event))
- return -EINVAL;
-
- if (++group_num_events > smmu_pmu->num_counters)
- return -EINVAL;
- }
-
/*
* Ensure all events are on the same cpu so all events are in the
* same cpu context, to avoid races on pmu_enable etc.
@@ -442,8 +430,16 @@ static int smmu_pmu_event_init(struct perf_event *event)
if (event->group_leader == event)
return 0;
+ if (event->group_leader->pmu == event->pmu) {
+ if (!smmu_pmu_events_compatible(event->group_leader, event))
+ return -EINVAL;
+
+ if (++group_num_events > smmu_pmu->num_counters)
+ return -EINVAL;
+ }
+
for_each_sibling_event(sibling, event->group_leader) {
- if (is_software_event(sibling))
+ if (sibling->pmu != event->pmu)
continue;
if (!smmu_pmu_events_compatible(sibling, event))
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 09/19] perf/qcom: Improve group validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (7 preceding siblings ...)
2025-08-13 17:01 ` [PATCH 08/19] perf/arm_smmu_v3: Improve " Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-13 17:01 ` [PATCH 10/19] perf/arm-ni: Improve event validation Robin Murphy
` (9 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
The L3 driver's group validation is almost right, except for erroneously
counting a software group leader - which is benign other than
artificially limiting the maximum size of such a group to one less than
it could be. Correct that with the now-established pattern of simply
ignoring all events which do not belong to our PMU.
The L2 driver gets a cleanup of some slightly suspicious logic, and both
can have the same overall simplification to not duplicate things that perf
core will already do, and avoid racy access to the sibling list of group
leader events.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/perf/qcom_l2_pmu.c | 81 +++++++++++++++-----------------------
drivers/perf/qcom_l3_pmu.c | 14 +++----
2 files changed, 37 insertions(+), 58 deletions(-)
diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index ea8c85729937..9c4e1d89718d 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -468,23 +468,6 @@ static int l2_cache_event_init(struct perf_event *event)
return -EINVAL;
}
- /* Don't allow groups with mixed PMUs, except for s/w events */
- if (event->group_leader->pmu != event->pmu &&
- !is_software_event(event->group_leader)) {
- dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
- "Can't create mixed PMU group\n");
- return -EINVAL;
- }
-
- for_each_sibling_event(sibling, event->group_leader) {
- if (sibling->pmu != event->pmu &&
- !is_software_event(sibling)) {
- dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
- "Can't create mixed PMU group\n");
- return -EINVAL;
- }
- }
-
cluster = get_cluster_pmu(l2cache_pmu, event->cpu);
if (!cluster) {
/* CPU has not been initialised */
@@ -493,39 +476,6 @@ static int l2_cache_event_init(struct perf_event *event)
return -EINVAL;
}
- /* Ensure all events in a group are on the same cpu */
- if ((event->group_leader != event) &&
- (cluster->on_cpu != event->group_leader->cpu)) {
- dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
- "Can't create group on CPUs %d and %d",
- event->cpu, event->group_leader->cpu);
- return -EINVAL;
- }
-
- if ((event != event->group_leader) &&
- !is_software_event(event->group_leader) &&
- (L2_EVT_GROUP(event->group_leader->attr.config) ==
- L2_EVT_GROUP(event->attr.config))) {
- dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
- "Column exclusion: conflicting events %llx %llx\n",
- event->group_leader->attr.config,
- event->attr.config);
- return -EINVAL;
- }
-
- for_each_sibling_event(sibling, event->group_leader) {
- if ((sibling != event) &&
- !is_software_event(sibling) &&
- (L2_EVT_GROUP(sibling->attr.config) ==
- L2_EVT_GROUP(event->attr.config))) {
- dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
- "Column exclusion: conflicting events %llx %llx\n",
- sibling->attr.config,
- event->attr.config);
- return -EINVAL;
- }
- }
-
hwc->idx = -1;
hwc->config_base = event->attr.config;
@@ -534,6 +484,37 @@ static int l2_cache_event_init(struct perf_event *event)
* same cpu context, to avoid races on pmu_enable etc.
*/
event->cpu = cluster->on_cpu;
+ if (event->cpu != event->group_leader->cpu) {
+ dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
+ "Can't create group on CPUs %d and %d",
+ event->cpu, event->group_leader->cpu);
+ return -EINVAL;
+ }
+
+ if (event == event->group_leader)
+ return 0;
+
+ if ((event->group_leader->pmu == event->pmu) &&
+ (L2_EVT_GROUP(event->group_leader->attr.config) ==
+ L2_EVT_GROUP(event->attr.config))) {
+ dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
+ "Column exclusion: conflicting events %llx %llx\n",
+ event->group_leader->attr.config,
+ event->attr.config);
+ return -EINVAL;
+ }
+
+ for_each_sibling_event(sibling, event->group_leader) {
+ if ((sibling->pmu == event->pmu) &&
+ (L2_EVT_GROUP(sibling->attr.config) ==
+ L2_EVT_GROUP(event->attr.config))) {
+ dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
+ "Column exclusion: conflicting events %llx %llx\n",
+ sibling->attr.config,
+ event->attr.config);
+ return -EINVAL;
+ }
+ }
return 0;
}
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 66e6cabd6fff..f0cf6c33418d 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -454,18 +454,16 @@ static bool qcom_l3_cache__validate_event_group(struct perf_event *event)
struct perf_event *sibling;
int counters = 0;
- if (leader->pmu != event->pmu && !is_software_event(leader))
- return false;
+ if (leader == event)
+ return true;
counters = event_num_counters(event);
- counters += event_num_counters(leader);
+ if (leader->pmu == event->pmu)
+ counters += event_num_counters(leader);
for_each_sibling_event(sibling, leader) {
- if (is_software_event(sibling))
- continue;
- if (sibling->pmu != event->pmu)
- return false;
- counters += event_num_counters(sibling);
+ if (sibling->pmu == event->pmu)
+ counters += event_num_counters(sibling);
}
/*
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 10/19] perf/arm-ni: Improve event validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (8 preceding siblings ...)
2025-08-13 17:01 ` [PATCH 09/19] perf/qcom: " Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-13 17:01 ` [PATCH 11/19] perf/arm-cci: Tidy up " Robin Murphy
` (8 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
Although it is entirely benign for arm_ni_val_count_event() to count
any old hardware leader/sibling as an NI event (perf core will still
ultimately reject the cross-PMU group), it would still be nicer if it
didn't. Stop trying to special-case software events and simply skip any
event which doesn't belong to our PMU. Similarly drop the early return
paths since they can almost never actually return early.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/perf/arm-ni.c | 29 +++++++++++++----------------
1 file changed, 13 insertions(+), 16 deletions(-)
diff --git a/drivers/perf/arm-ni.c b/drivers/perf/arm-ni.c
index 1615a0564031..d6b683a0264e 100644
--- a/drivers/perf/arm-ni.c
+++ b/drivers/perf/arm-ni.c
@@ -271,40 +271,37 @@ static void arm_ni_pmu_disable(struct pmu *pmu)
}
struct arm_ni_val {
+ const struct pmu *pmu;
unsigned int evcnt;
unsigned int ccnt;
};
-static bool arm_ni_val_count_event(struct perf_event *evt, struct arm_ni_val *val)
+static void arm_ni_val_count_event(struct perf_event *evt, struct arm_ni_val *val)
{
- if (is_software_event(evt))
- return true;
-
- if (NI_EVENT_TYPE(evt) == NI_PMU) {
- val->ccnt++;
- return val->ccnt <= 1;
+ if (evt->pmu == val->pmu) {
+ if (NI_EVENT_TYPE(evt) == NI_PMU)
+ val->ccnt++;
+ else
+ val->evcnt++;
}
-
- val->evcnt++;
- return val->evcnt <= NI_NUM_COUNTERS;
}
static int arm_ni_validate_group(struct perf_event *event)
{
struct perf_event *sibling, *leader = event->group_leader;
- struct arm_ni_val val = { 0 };
+ struct arm_ni_val val = { .pmu = event->pmu };
if (leader == event)
return 0;
arm_ni_val_count_event(event, &val);
- if (!arm_ni_val_count_event(leader, &val))
+ arm_ni_val_count_event(leader, &val);
+ for_each_sibling_event(sibling, leader)
+ arm_ni_val_count_event(sibling, &val);
+
+ if (val.evcnt > NI_NUM_COUNTERS || val.ccnt > 1)
return -EINVAL;
- for_each_sibling_event(sibling, leader) {
- if (!arm_ni_val_count_event(sibling, &val))
- return -EINVAL;
- }
return 0;
}
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 11/19] perf/arm-cci: Tidy up event validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (9 preceding siblings ...)
2025-08-13 17:01 ` [PATCH 10/19] perf/arm-ni: Improve event validation Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-13 17:01 ` [PATCH 12/19] perf: Ignore event state for group validation Robin Murphy
` (7 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
The CCI driver only accepts events of its own type, so it is pointless
to re-check the event type again further into validation. Conversely, if
an event *is* for CCI but has a nonsense config, we should not return
-ENOENT to potentially offer it to other PMUs. Finally it seems wrong
not to count disabled events which may be enabled later.
These are all artefacts left over from the original attempt to fit CCI
into the arm_pmu framework; clean them up, along with the now-redundant
checks for cross-PMU groups which core code will already handle (albeit
not quite as the out-of-date comment says).
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/perf/arm-cci.c | 47 +++++++++++-------------------------------
1 file changed, 12 insertions(+), 35 deletions(-)
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 1cc3214d6b6d..086d4363fcc8 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -333,7 +333,7 @@ static int cci400_validate_hw_event(struct cci_pmu *cci_pmu, unsigned long hw_ev
int if_type;
if (hw_event & ~CCI400_PMU_EVENT_MASK)
- return -ENOENT;
+ return -EINVAL;
if (hw_event == CCI400_PMU_CYCLES)
return hw_event;
@@ -354,14 +354,14 @@ static int cci400_validate_hw_event(struct cci_pmu *cci_pmu, unsigned long hw_ev
if_type = CCI_IF_MASTER;
break;
default:
- return -ENOENT;
+ return -EINVAL;
}
if (ev_code >= cci_pmu->model->event_ranges[if_type].min &&
ev_code <= cci_pmu->model->event_ranges[if_type].max)
return hw_event;
- return -ENOENT;
+ return -EINVAL;
}
static int probe_cci400_revision(struct cci_pmu *cci_pmu)
@@ -541,7 +541,7 @@ static int cci500_validate_hw_event(struct cci_pmu *cci_pmu,
int if_type;
if (hw_event & ~CCI5xx_PMU_EVENT_MASK)
- return -ENOENT;
+ return -EINVAL;
switch (ev_source) {
case CCI5xx_PORT_S0:
@@ -565,14 +565,14 @@ static int cci500_validate_hw_event(struct cci_pmu *cci_pmu,
if_type = CCI_IF_GLOBAL;
break;
default:
- return -ENOENT;
+ return -EINVAL;
}
if (ev_code >= cci_pmu->model->event_ranges[if_type].min &&
ev_code <= cci_pmu->model->event_ranges[if_type].max)
return hw_event;
- return -ENOENT;
+ return -EINVAL;
}
/*
@@ -592,7 +592,7 @@ static int cci550_validate_hw_event(struct cci_pmu *cci_pmu,
int if_type;
if (hw_event & ~CCI5xx_PMU_EVENT_MASK)
- return -ENOENT;
+ return -EINVAL;
switch (ev_source) {
case CCI5xx_PORT_S0:
@@ -617,14 +617,14 @@ static int cci550_validate_hw_event(struct cci_pmu *cci_pmu,
if_type = CCI_IF_GLOBAL;
break;
default:
- return -ENOENT;
+ return -EINVAL;
}
if (ev_code >= cci_pmu->model->event_ranges[if_type].min &&
ev_code <= cci_pmu->model->event_ranges[if_type].max)
return hw_event;
- return -ENOENT;
+ return -EINVAL;
}
#endif /* CONFIG_ARM_CCI5xx_PMU */
@@ -801,17 +801,6 @@ static int pmu_get_event_idx(struct cci_pmu_hw_events *hw, struct perf_event *ev
return -EAGAIN;
}
-static int pmu_map_event(struct perf_event *event)
-{
- struct cci_pmu *cci_pmu = to_cci_pmu(event->pmu);
-
- if (event->attr.type < PERF_TYPE_MAX ||
- !cci_pmu->model->validate_hw_event)
- return -ENOENT;
-
- return cci_pmu->model->validate_hw_event(cci_pmu, event->attr.config);
-}
-
static int pmu_request_irq(struct cci_pmu *cci_pmu, irq_handler_t handler)
{
int i;
@@ -1216,21 +1205,8 @@ static int validate_event(struct pmu *cci_pmu,
struct cci_pmu_hw_events *hw_events,
struct perf_event *event)
{
- if (is_software_event(event))
- return 1;
-
- /*
- * Reject groups spanning multiple HW PMUs (e.g. CPU + CCI). The
- * core perf code won't check that the pmu->ctx == leader->ctx
- * until after pmu->event_init(event).
- */
+ /* Ignore grouped events that aren't ours */
if (event->pmu != cci_pmu)
- return 0;
-
- if (event->state < PERF_EVENT_STATE_OFF)
- return 1;
-
- if (event->state == PERF_EVENT_STATE_OFF && !event->attr.enable_on_exec)
return 1;
return pmu_get_event_idx(hw_events, event) >= 0;
@@ -1266,10 +1242,11 @@ static int validate_group(struct perf_event *event)
static int __hw_perf_event_init(struct perf_event *event)
{
+ struct cci_pmu *cci_pmu = to_cci_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
int mapping;
- mapping = pmu_map_event(event);
+ mapping = cci_pmu->model->validate_hw_event(cci_pmu, event->attr.config);
if (mapping < 0) {
pr_debug("event %x:%llx not supported\n", event->attr.type,
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 12/19] perf: Ignore event state for group validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (10 preceding siblings ...)
2025-08-13 17:01 ` [PATCH 11/19] perf/arm-cci: Tidy up " Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-26 13:03 ` Peter Zijlstra
2025-08-13 17:01 ` [PATCH 13/19] perf: Add helper for checking grouped events Robin Murphy
` (6 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
It may have been different long ago, but today it seems wrong for these
drivers to skip counting disabled sibling events in group validation,
given that perf_event_enable() could make them schedulable again, and
thus increase the effective size of the group later. Conversely, if a
sibling event is truly dead then it stands to reason that the whole
group is dead, so it's not worth going to any special effort to try to
squeeze in a new event that's never going to run anyway. Thus, we can
simply remove all these checks.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
arch/alpha/kernel/perf_event.c | 2 +-
arch/powerpc/perf/core-book3s.c | 3 +--
arch/powerpc/perf/core-fsl-emb.c | 3 +--
arch/sparc/kernel/perf_event.c | 3 +--
arch/x86/events/core.c | 2 +-
arch/x86/events/intel/uncore.c | 3 +--
drivers/dma/idxd/perfmon.c | 3 +--
drivers/perf/arm_pmu.c | 6 ------
8 files changed, 7 insertions(+), 18 deletions(-)
diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c
index a3eaab094ece..8557165e64c0 100644
--- a/arch/alpha/kernel/perf_event.c
+++ b/arch/alpha/kernel/perf_event.c
@@ -352,7 +352,7 @@ static int collect_events(struct perf_event *group, int max_count,
current_idx[n++] = PMC_NO_INDEX;
}
for_each_sibling_event(pe, group) {
- if (!is_software_event(pe) && pe->state != PERF_EVENT_STATE_OFF) {
+ if (!is_software_event(pe)) {
if (n >= max_count)
return -1;
event[n] = pe;
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 8b0081441f85..d67f7d511f13 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1602,8 +1602,7 @@ static int collect_events(struct perf_event *group, int max_count,
events[n++] = group->hw.config;
}
for_each_sibling_event(event, group) {
- if (event->pmu->task_ctx_nr == perf_hw_context &&
- event->state != PERF_EVENT_STATE_OFF) {
+ if (event->pmu->task_ctx_nr == perf_hw_context) {
if (n >= max_count)
return -1;
ctrs[n] = event;
diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
index 7120ab20cbfe..509932b91b75 100644
--- a/arch/powerpc/perf/core-fsl-emb.c
+++ b/arch/powerpc/perf/core-fsl-emb.c
@@ -261,8 +261,7 @@ static int collect_events(struct perf_event *group, int max_count,
n++;
}
for_each_sibling_event(event, group) {
- if (!is_software_event(event) &&
- event->state != PERF_EVENT_STATE_OFF) {
+ if (!is_software_event(event)) {
if (n >= max_count)
return -1;
ctrs[n] = event;
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index cae4d33002a5..706127749c66 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1357,8 +1357,7 @@ static int collect_events(struct perf_event *group, int max_count,
current_idx[n++] = PIC_NO_INDEX;
}
for_each_sibling_event(event, group) {
- if (!is_software_event(event) &&
- event->state != PERF_EVENT_STATE_OFF) {
+ if (!is_software_event(event)) {
if (n >= max_count)
return -1;
evts[n] = event;
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 7610f26dfbd9..eca5bb49aa85 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1211,7 +1211,7 @@ static int collect_events(struct cpu_hw_events *cpuc, struct perf_event *leader,
return n;
for_each_sibling_event(event, leader) {
- if (!is_x86_event(event) || event->state <= PERF_EVENT_STATE_OFF)
+ if (!is_x86_event(event))
continue;
if (collect_event(cpuc, event, max_count, n))
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index a762f7f5b161..297ff5adb667 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -406,8 +406,7 @@ uncore_collect_events(struct intel_uncore_box *box, struct perf_event *leader,
return n;
for_each_sibling_event(event, leader) {
- if (!is_box_event(box, event) ||
- event->state <= PERF_EVENT_STATE_OFF)
+ if (!is_box_event(box, event))
continue;
if (n >= max_count)
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index 4b6af2f15d8a..8c539e1f11da 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -75,8 +75,7 @@ static int perfmon_collect_events(struct idxd_pmu *idxd_pmu,
return n;
for_each_sibling_event(event, leader) {
- if (!is_idxd_event(idxd_pmu, event) ||
- event->state <= PERF_EVENT_STATE_OFF)
+ if (!is_idxd_event(idxd_pmu, event))
continue;
if (n >= max_count)
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 5c310e803dd7..e8a3c8e99da0 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -386,12 +386,6 @@ validate_event(struct pmu *pmu, struct pmu_hw_events *hw_events,
if (event->pmu != pmu)
return 0;
- if (event->state < PERF_EVENT_STATE_OFF)
- return 1;
-
- if (event->state == PERF_EVENT_STATE_OFF && !event->attr.enable_on_exec)
- return 1;
-
armpmu = to_arm_pmu(event->pmu);
return armpmu->get_event_idx(hw_events, event) >= 0;
}
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 13/19] perf: Add helper for checking grouped events
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (11 preceding siblings ...)
2025-08-13 17:01 ` [PATCH 12/19] perf: Ignore event state for group validation Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-14 5:43 ` kernel test robot
2025-08-13 17:01 ` [PATCH 14/19] perf: Clean up redundant group validation Robin Murphy
` (5 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
Several drivers cannot support groups, but enforce this inconsistently
(including not at all) in their event_init routines. Add a helper so
that such drivers can simply and robustly check for the acceptable
conditions that their event is either standalone, or the first one
being added to a software-only group.
In particular it took a while to see that marvell_cn10k_tad_pmu was
seemingly trying to rely on the empirical behaviour of perf tool
creating group leader events with disabled=1 and subsequent siblings
with disabled=0. Down with this sort of thing!
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
arch/x86/events/amd/ibs.c | 30 ++++++---------------------
drivers/devfreq/event/rockchip-dfi.c | 3 +++
drivers/perf/alibaba_uncore_drw_pmu.c | 11 +---------
drivers/perf/arm_dmc620_pmu.c | 12 +----------
drivers/perf/dwc_pcie_pmu.c | 10 ++-------
drivers/perf/marvell_cn10k_tad_pmu.c | 6 ++----
drivers/perf/marvell_pem_pmu.c | 11 ++--------
include/linux/perf_event.h | 7 +++++++
8 files changed, 24 insertions(+), 66 deletions(-)
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 112f43b23ebf..95de309fc7d5 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -248,27 +248,6 @@ int forward_event_to_ibs(struct perf_event *event)
return -ENOENT;
}
-/*
- * Grouping of IBS events is not possible since IBS can have only
- * one event active at any point in time.
- */
-static int validate_group(struct perf_event *event)
-{
- struct perf_event *sibling;
-
- if (event->group_leader == event)
- return 0;
-
- if (event->group_leader->pmu == event->pmu)
- return -EINVAL;
-
- for_each_sibling_event(sibling, event->group_leader) {
- if (sibling->pmu == event->pmu)
- return -EINVAL;
- }
- return 0;
-}
-
static bool perf_ibs_ldlat_event(struct perf_ibs *perf_ibs,
struct perf_event *event)
{
@@ -309,9 +288,12 @@ static int perf_ibs_init(struct perf_event *event)
event->attr.exclude_hv))
return -EINVAL;
- ret = validate_group(event);
- if (ret)
- return ret;
+ /*
+ * Grouping of IBS events is not possible since IBS can have only
+ * one event active at any point in time.
+ */
+ if (in_hardware_group(event))
+ return -EINVAL;
if (hwc->sample_period) {
if (config & perf_ibs->cnt_mask)
diff --git a/drivers/devfreq/event/rockchip-dfi.c b/drivers/devfreq/event/rockchip-dfi.c
index 0470d7c175f4..88a9ecbe96ce 100644
--- a/drivers/devfreq/event/rockchip-dfi.c
+++ b/drivers/devfreq/event/rockchip-dfi.c
@@ -413,6 +413,9 @@ static int rockchip_ddr_perf_event_init(struct perf_event *event)
dev_warn(dfi->dev, "Can't provide per-task data!\n");
return -EINVAL;
}
+ /* Disallow groups since we can't start/stop/read multiple counters at once */
+ if (in_hardware_group(event))
+ return -EINVAL;
return 0;
}
diff --git a/drivers/perf/alibaba_uncore_drw_pmu.c b/drivers/perf/alibaba_uncore_drw_pmu.c
index 99a0ef9817e0..0081618741c3 100644
--- a/drivers/perf/alibaba_uncore_drw_pmu.c
+++ b/drivers/perf/alibaba_uncore_drw_pmu.c
@@ -526,7 +526,6 @@ static int ali_drw_pmu_event_init(struct perf_event *event)
{
struct ali_drw_pmu *drw_pmu = to_ali_drw_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- struct perf_event *sibling;
struct device *dev = drw_pmu->pmu.dev;
if (event->attr.type != event->pmu->type)
@@ -548,19 +547,11 @@ static int ali_drw_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
- if (event->group_leader != event &&
- !is_software_event(event->group_leader)) {
+ if (in_hardware_group(event)) {
dev_err(dev, "driveway only allow one event!\n");
return -EINVAL;
}
- for_each_sibling_event(sibling, event->group_leader) {
- if (sibling != event && !is_software_event(sibling)) {
- dev_err(dev, "driveway event not allowed!\n");
- return -EINVAL;
- }
- }
-
/* reset all the pmu counters */
writel(ALI_DRW_PMU_CNT_RST, drw_pmu->cfg_base + ALI_DRW_PMU_CNT_CTRL);
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 619cf937602f..24308de80246 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -513,7 +513,6 @@ static int dmc620_pmu_event_init(struct perf_event *event)
{
struct dmc620_pmu *dmc620_pmu = to_dmc620_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- struct perf_event *sibling;
if (event->attr.type != event->pmu->type)
return -ENOENT;
@@ -544,22 +543,13 @@ static int dmc620_pmu_event_init(struct perf_event *event)
hwc->idx = -1;
- if (event->group_leader == event)
- return 0;
-
/*
* We can't atomically disable all HW counters so only one event allowed,
* although software events are acceptable.
*/
- if (!is_software_event(event->group_leader))
+ if (in_hardware_group(event))
return -EINVAL;
- for_each_sibling_event(sibling, event->group_leader) {
- if (sibling != event &&
- !is_software_event(sibling))
- return -EINVAL;
- }
-
return 0;
}
diff --git a/drivers/perf/dwc_pcie_pmu.c b/drivers/perf/dwc_pcie_pmu.c
index 146ff57813fb..78c522658d84 100644
--- a/drivers/perf/dwc_pcie_pmu.c
+++ b/drivers/perf/dwc_pcie_pmu.c
@@ -353,7 +353,6 @@ static int dwc_pcie_pmu_event_init(struct perf_event *event)
{
struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
- struct perf_event *sibling;
u32 lane;
if (event->attr.type != event->pmu->type)
@@ -367,15 +366,10 @@ static int dwc_pcie_pmu_event_init(struct perf_event *event)
if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
- if (event->group_leader != event &&
- !is_software_event(event->group_leader))
+ /* Disallow groups since we can't start/stop/read multiple counters at once */
+ if (in_hardware_group(event))
return -EINVAL;
- for_each_sibling_event(sibling, event->group_leader) {
- if (sibling->pmu != event->pmu && !is_software_event(sibling))
- return -EINVAL;
- }
-
if (type < 0 || type >= DWC_PCIE_EVENT_TYPE_MAX)
return -EINVAL;
diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 51ccb0befa05..ee6505cb01a7 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -152,10 +152,8 @@ static int tad_pmu_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;
- if (!event->attr.disabled)
- return -EINVAL;
-
- if (event->state != PERF_EVENT_STATE_OFF)
+ /* Disallow groups since we can't start/stop/read multiple counters at once */
+ if (in_hardware_group(event))
return -EINVAL;
event->cpu = tad_pmu->cpu;
diff --git a/drivers/perf/marvell_pem_pmu.c b/drivers/perf/marvell_pem_pmu.c
index 29fbcd1848e4..53a35a5de7f8 100644
--- a/drivers/perf/marvell_pem_pmu.c
+++ b/drivers/perf/marvell_pem_pmu.c
@@ -190,7 +190,6 @@ static int pem_perf_event_init(struct perf_event *event)
{
struct pem_pmu *pmu = to_pem_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- struct perf_event *sibling;
if (event->attr.type != event->pmu->type)
return -ENOENT;
@@ -206,16 +205,10 @@ static int pem_perf_event_init(struct perf_event *event)
if (event->cpu < 0)
return -EOPNOTSUPP;
- /* We must NOT create groups containing mixed PMUs */
- if (event->group_leader->pmu != event->pmu &&
- !is_software_event(event->group_leader))
+ /* Disallow groups since we can't start/stop/read multiple counters at once */
+ if (in_hardware_group(event))
return -EINVAL;
- for_each_sibling_event(sibling, event->group_leader) {
- if (sibling->pmu != event->pmu &&
- !is_software_event(sibling))
- return -EINVAL;
- }
/*
* Set ownership of event to one CPU, same event can not be observed
* on multiple cpus at same time.
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ec9d96025683..4d439c24c901 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1556,6 +1556,13 @@ static inline int in_software_context(struct perf_event *event)
return event->pmu_ctx->pmu->task_ctx_nr == perf_sw_context;
}
+/* True if the event has (or would have) any non-software siblings */
+static inline bool in_hardware_group(const struct perf_event *event)
+{
+ return event != event->group_leader &&
+ !in_software_context(event->group_leader);
+}
+
static inline int is_exclusive_pmu(struct pmu *pmu)
{
return pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE;
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 14/19] perf: Clean up redundant group validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (12 preceding siblings ...)
2025-08-13 17:01 ` [PATCH 13/19] perf: Add helper for checking grouped events Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-13 17:01 ` [PATCH 15/19] perf: Simplify " Robin Murphy
` (4 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
None of these drivers are doing anything that perf_event_open() doesn't
inherently do as of commit bf480f938566 ("perf/core: Don't allow
grouping events from different hw pmus"). While it's quite possible
that they should be doing some actual validation of the schedulability
of their own events within the given group, for now at least removing
this redundant code makes it even clearer that they are not.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/perf/arm-ccn.c | 16 ----------------
drivers/perf/fsl_imx9_ddr_perf.c | 16 ----------------
drivers/perf/marvell_cn10k_ddr_pmu.c | 5 -----
drivers/perf/xgene_pmu.c | 15 ---------------
4 files changed, 52 deletions(-)
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 1a0d0e1a2263..63549aad3b99 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -708,7 +708,6 @@ static int arm_ccn_pmu_event_init(struct perf_event *event)
u32 node_xp, type, event_id;
int valid;
int i;
- struct perf_event *sibling;
if (event->attr.type != event->pmu->type)
return -ENOENT;
@@ -814,21 +813,6 @@ static int arm_ccn_pmu_event_init(struct perf_event *event)
node_xp, type, port);
}
- /*
- * We must NOT create groups containing mixed PMUs, although software
- * events are acceptable (for example to create a CCN group
- * periodically read when a hrtimer aka cpu-clock leader triggers).
- */
- if (event->group_leader->pmu != event->pmu &&
- !is_software_event(event->group_leader))
- return -EINVAL;
-
- for_each_sibling_event(sibling, event->group_leader) {
- if (sibling->pmu != event->pmu &&
- !is_software_event(sibling))
- return -EINVAL;
- }
-
return 0;
}
diff --git a/drivers/perf/fsl_imx9_ddr_perf.c b/drivers/perf/fsl_imx9_ddr_perf.c
index 267754fdf581..85874ec5ecd0 100644
--- a/drivers/perf/fsl_imx9_ddr_perf.c
+++ b/drivers/perf/fsl_imx9_ddr_perf.c
@@ -552,7 +552,6 @@ static int ddr_perf_event_init(struct perf_event *event)
{
struct ddr_pmu *pmu = to_ddr_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- struct perf_event *sibling;
if (event->attr.type != event->pmu->type)
return -ENOENT;
@@ -565,21 +564,6 @@ static int ddr_perf_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
- /*
- * We must NOT create groups containing mixed PMUs, although software
- * events are acceptable (for example to create a CCN group
- * periodically read when a hrtimer aka cpu-clock leader triggers).
- */
- if (event->group_leader->pmu != event->pmu &&
- !is_software_event(event->group_leader))
- return -EINVAL;
-
- for_each_sibling_event(sibling, event->group_leader) {
- if (sibling->pmu != event->pmu &&
- !is_software_event(sibling))
- return -EINVAL;
- }
-
event->cpu = pmu->cpu;
hwc->idx = -1;
diff --git a/drivers/perf/marvell_cn10k_ddr_pmu.c b/drivers/perf/marvell_cn10k_ddr_pmu.c
index 72ac17efd846..54e3fd206d39 100644
--- a/drivers/perf/marvell_cn10k_ddr_pmu.c
+++ b/drivers/perf/marvell_cn10k_ddr_pmu.c
@@ -487,11 +487,6 @@ static int cn10k_ddr_perf_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
- /* We must NOT create groups containing mixed PMUs */
- if (event->group_leader->pmu != event->pmu &&
- !is_software_event(event->group_leader))
- return -EINVAL;
-
/* Set ownership of event to one CPU, same event can not be observed
* on multiple cpus at same time.
*/
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 33b5497bdc06..5e80ae0e692d 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -877,7 +877,6 @@ static int xgene_perf_event_init(struct perf_event *event)
{
struct xgene_pmu_dev *pmu_dev = to_pmu_dev(event->pmu);
struct hw_perf_event *hw = &event->hw;
- struct perf_event *sibling;
/* Test the event attr type check for PMU enumeration */
if (event->attr.type != event->pmu->type)
@@ -913,20 +912,6 @@ static int xgene_perf_event_init(struct perf_event *event)
*/
hw->config_base = event->attr.config1;
- /*
- * We must NOT create groups containing mixed PMUs, although software
- * events are acceptable
- */
- if (event->group_leader->pmu != event->pmu &&
- !is_software_event(event->group_leader))
- return -EINVAL;
-
- for_each_sibling_event(sibling, event->group_leader) {
- if (sibling->pmu != event->pmu &&
- !is_software_event(sibling))
- return -EINVAL;
- }
-
return 0;
}
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 15/19] perf: Simplify group validation
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (13 preceding siblings ...)
2025-08-13 17:01 ` [PATCH 14/19] perf: Clean up redundant group validation Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-13 17:01 ` [PATCH 16/19] perf: Introduce positive capability for sampling Robin Murphy
` (3 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
All of these drivers copy a pattern of actively policing cross-PMU
groups, which is redundant since commit bf480f938566 ("perf/core: Don't
allow grouping events from different hw pmus"). Clean up these checks to
simplfy matters, especially for thunderx2 which can reduce right down to
trivial counting.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/perf/arm_cspmu/arm_cspmu.c | 7 ++-----
drivers/perf/arm_dsu_pmu.c | 6 ++----
drivers/perf/arm_pmu.c | 11 ++---------
drivers/perf/thunderx2_pmu.c | 30 +++++++-----------------------
4 files changed, 13 insertions(+), 41 deletions(-)
diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
index efa9b229e701..7f5ea749b85c 100644
--- a/drivers/perf/arm_cspmu/arm_cspmu.c
+++ b/drivers/perf/arm_cspmu/arm_cspmu.c
@@ -561,12 +561,9 @@ static bool arm_cspmu_validate_event(struct pmu *pmu,
struct arm_cspmu_hw_events *hw_events,
struct perf_event *event)
{
- if (is_software_event(event))
- return true;
-
- /* Reject groups spanning multiple HW PMUs. */
+ /* Ignore grouped events that aren't ours */
if (event->pmu != pmu)
- return false;
+ return true;
return (arm_cspmu_get_event_idx(hw_events, event) >= 0);
}
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index cb4fb59fe04b..7480fd6fe377 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -492,11 +492,9 @@ static bool dsu_pmu_validate_event(struct pmu *pmu,
struct dsu_hw_events *hw_events,
struct perf_event *event)
{
- if (is_software_event(event))
- return true;
- /* Reject groups spanning multiple HW PMUs. */
+ /* Ignore grouped events that aren't ours */
if (event->pmu != pmu)
- return false;
+ return true;
return dsu_pmu_get_event_idx(hw_events, event) >= 0;
}
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index e8a3c8e99da0..2c1af3a0207c 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -375,16 +375,9 @@ validate_event(struct pmu *pmu, struct pmu_hw_events *hw_events,
{
struct arm_pmu *armpmu;
- if (is_software_event(event))
- return 1;
-
- /*
- * Reject groups spanning multiple HW PMUs (e.g. CPU + CCI). The
- * core perf code won't check that the pmu->ctx == leader->ctx
- * until after pmu->event_init(event).
- */
+ /* Ignore grouped events that aren't ours */
if (event->pmu != pmu)
- return 0;
+ return 1;
armpmu = to_arm_pmu(event->pmu);
return armpmu->get_event_idx(hw_events, event) >= 0;
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
index 6ed4707bd6bb..472eb4494fd1 100644
--- a/drivers/perf/thunderx2_pmu.c
+++ b/drivers/perf/thunderx2_pmu.c
@@ -519,19 +519,6 @@ static enum tx2_uncore_type get_tx2_pmu_type(struct acpi_device *adev)
return (enum tx2_uncore_type)id->driver_data;
}
-static bool tx2_uncore_validate_event(struct pmu *pmu,
- struct perf_event *event, int *counters)
-{
- if (is_software_event(event))
- return true;
- /* Reject groups spanning multiple HW PMUs. */
- if (event->pmu != pmu)
- return false;
-
- *counters = *counters + 1;
- return true;
-}
-
/*
* Make sure the group of events can be scheduled at once
* on the PMU.
@@ -539,23 +526,20 @@ static bool tx2_uncore_validate_event(struct pmu *pmu,
static bool tx2_uncore_validate_event_group(struct perf_event *event,
int max_counters)
{
- struct perf_event *sibling, *leader = event->group_leader;
- int counters = 0;
+ struct perf_event *sibling;
+ int counters = 1;
if (event->group_leader == event)
return true;
- if (!tx2_uncore_validate_event(event->pmu, leader, &counters))
- return false;
+ if (event->group_leader->pmu == event->pmu)
+ ++counters;
- for_each_sibling_event(sibling, leader) {
- if (!tx2_uncore_validate_event(event->pmu, sibling, &counters))
- return false;
+ for_each_sibling_event(sibling, event->group_leader) {
+ if (sibling->pmu == event->pmu)
+ ++counters;
}
- if (!tx2_uncore_validate_event(event->pmu, event, &counters))
- return false;
-
/*
* If the group requires more counters than the HW has,
* it cannot ever be scheduled.
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 16/19] perf: Introduce positive capability for sampling
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (14 preceding siblings ...)
2025-08-13 17:01 ` [PATCH 15/19] perf: Simplify " Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-26 13:08 ` Peter Zijlstra
2025-08-26 13:11 ` Leo Yan
2025-08-13 17:01 ` [PATCH 17/19] perf: Retire PERF_PMU_CAP_NO_INTERRUPT Robin Murphy
` (2 subsequent siblings)
18 siblings, 2 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
Sampling is inherently a feature for CPU PMUs, given that the thing
to be sampled is a CPU context. These days, we have many more
uncore/system PMUs than CPU PMUs, so it no longer makes much sense to
assume sampling support by default and force the ever-growing majority
of drivers to opt out of it (or erroneously fail to). Instead, let's
introduce a positive opt-in capability that's more obvious and easier to
maintain.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
arch/alpha/kernel/perf_event.c | 3 ++-
arch/arc/kernel/perf_event.c | 2 ++
arch/csky/kernel/perf_event.c | 2 ++
arch/loongarch/kernel/perf_event.c | 1 +
arch/mips/kernel/perf_event_mipsxx.c | 1 +
arch/powerpc/perf/core-book3s.c | 1 +
arch/powerpc/perf/core-fsl-emb.c | 1 +
arch/powerpc/perf/imc-pmu.c | 1 +
arch/s390/kernel/perf_cpum_cf.c | 1 +
arch/s390/kernel/perf_cpum_sf.c | 2 ++
arch/s390/kernel/perf_pai_crypto.c | 1 +
arch/s390/kernel/perf_pai_ext.c | 1 +
arch/sparc/kernel/perf_event.c | 1 +
arch/x86/events/amd/ibs.c | 2 ++
arch/x86/events/core.c | 4 +++-
arch/xtensa/kernel/perf_event.c | 1 +
drivers/perf/arm_pmu.c | 3 ++-
drivers/perf/arm_pmu_platform.c | 1 +
drivers/perf/arm_spe_pmu.c | 3 ++-
drivers/perf/riscv_pmu_sbi.c | 2 ++
include/linux/perf_event.h | 3 ++-
kernel/events/core.c | 20 +++++++++++---------
kernel/events/hw_breakpoint.c | 1 +
23 files changed, 44 insertions(+), 14 deletions(-)
diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c
index 8557165e64c0..4de1802d249f 100644
--- a/arch/alpha/kernel/perf_event.c
+++ b/arch/alpha/kernel/perf_event.c
@@ -761,7 +761,8 @@ static struct pmu pmu = {
.start = alpha_pmu_start,
.stop = alpha_pmu_stop,
.read = alpha_pmu_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
+ .capabilities = PERF_PMU_CAP_SAMPLING |
+ PERF_PMU_CAP_NO_EXCLUDE,
};
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index ed6d4f0cd621..1b99b0215027 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -818,6 +818,8 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
if (irq == -1)
arc_pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
+ else
+ arc_pmu->pmu.capabilities |= PERF_PMU_CAP_SAMPLING;
/*
* perf parser doesn't really like '-' symbol in events name, so let's
diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
index e0a36acd265b..c5ba6e235a6f 100644
--- a/arch/csky/kernel/perf_event.c
+++ b/arch/csky/kernel/perf_event.c
@@ -1204,6 +1204,7 @@ int init_hw_perf_events(void)
}
csky_pmu.pmu = (struct pmu) {
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.pmu_enable = csky_pmu_enable,
.pmu_disable = csky_pmu_disable,
.event_init = csky_pmu_event_init,
@@ -1314,6 +1315,7 @@ int csky_pmu_device_probe(struct platform_device *pdev,
ret = csky_pmu_request_irq(csky_pmu_handle_irq);
if (ret) {
+ csky_pmu.pmu.capabilities &= ~PERF_PMU_CAP_SAMPLING;
csky_pmu.pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
pr_notice("[perf] PMU request irq fail!\n");
}
diff --git a/arch/loongarch/kernel/perf_event.c b/arch/loongarch/kernel/perf_event.c
index 8ad098703488..341b17bedd0e 100644
--- a/arch/loongarch/kernel/perf_event.c
+++ b/arch/loongarch/kernel/perf_event.c
@@ -571,6 +571,7 @@ static int loongarch_pmu_event_init(struct perf_event *event)
}
static struct pmu pmu = {
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.pmu_enable = loongarch_pmu_enable,
.pmu_disable = loongarch_pmu_disable,
.event_init = loongarch_pmu_event_init,
diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 196a070349b0..4c5d64d1158e 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -687,6 +687,7 @@ static int mipspmu_event_init(struct perf_event *event)
}
static struct pmu pmu = {
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.pmu_enable = mipspmu_enable,
.pmu_disable = mipspmu_disable,
.event_init = mipspmu_event_init,
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index d67f7d511f13..cfe7d3c120e1 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2207,6 +2207,7 @@ ssize_t power_events_sysfs_show(struct device *dev,
}
static struct pmu power_pmu = {
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.pmu_enable = power_pmu_enable,
.pmu_disable = power_pmu_disable,
.event_init = power_pmu_event_init,
diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
index 509932b91b75..62038ff3663f 100644
--- a/arch/powerpc/perf/core-fsl-emb.c
+++ b/arch/powerpc/perf/core-fsl-emb.c
@@ -570,6 +570,7 @@ static int fsl_emb_pmu_event_init(struct perf_event *event)
}
static struct pmu fsl_emb_pmu = {
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.pmu_enable = fsl_emb_pmu_enable,
.pmu_disable = fsl_emb_pmu_disable,
.event_init = fsl_emb_pmu_event_init,
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 8664a7d297ad..f352dda3baf9 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1507,6 +1507,7 @@ static int update_pmu_ops(struct imc_pmu *pmu)
pmu->pmu.commit_txn = thread_imc_pmu_commit_txn;
break;
case IMC_DOMAIN_TRACE:
+ pmu->pmu.capabilities |= PERF_PMU_CAP_SAMPLING;
pmu->pmu.event_init = trace_imc_event_init;
pmu->pmu.add = trace_imc_event_add;
pmu->pmu.del = trace_imc_event_del;
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 4d09954ebf49..7d10842d54f0 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -1861,6 +1861,7 @@ static const struct attribute_group *cfdiag_attr_groups[] = {
*/
static struct pmu cf_diag = {
.task_ctx_nr = perf_sw_context,
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.event_init = cfdiag_event_init,
.pmu_enable = cpumf_pmu_enable,
.pmu_disable = cpumf_pmu_disable,
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index f432869f8921..3d2c400f0aaa 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -1892,6 +1892,8 @@ static const struct attribute_group *cpumsf_pmu_attr_groups[] = {
};
static struct pmu cpumf_sampling = {
+ .capabilities = PERF_PMU_CAP_SAMPLING,
+
.pmu_enable = cpumsf_pmu_enable,
.pmu_disable = cpumsf_pmu_disable,
diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c
index f373a1009c45..a64b6b056a21 100644
--- a/arch/s390/kernel/perf_pai_crypto.c
+++ b/arch/s390/kernel/perf_pai_crypto.c
@@ -569,6 +569,7 @@ static const struct attribute_group *paicrypt_attr_groups[] = {
/* Performance monitoring unit for mapped counters */
static struct pmu paicrypt = {
.task_ctx_nr = perf_hw_context,
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.event_init = paicrypt_event_init,
.add = paicrypt_add,
.del = paicrypt_del,
diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c
index d827473e7f87..1261f80c6d52 100644
--- a/arch/s390/kernel/perf_pai_ext.c
+++ b/arch/s390/kernel/perf_pai_ext.c
@@ -595,6 +595,7 @@ static const struct attribute_group *paiext_attr_groups[] = {
/* Performance monitoring unit for mapped counters */
static struct pmu paiext = {
.task_ctx_nr = perf_hw_context,
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.event_init = paiext_event_init,
.add = paiext_add,
.del = paiext_del,
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 706127749c66..6ecea8e7b592 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1573,6 +1573,7 @@ static int sparc_pmu_commit_txn(struct pmu *pmu)
}
static struct pmu pmu = {
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.pmu_enable = sparc_pmu_enable,
.pmu_disable = sparc_pmu_disable,
.event_init = sparc_pmu_event_init,
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 95de309fc7d5..ed07d80b6fe0 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -768,6 +768,7 @@ static struct perf_ibs perf_ibs_fetch = {
.pmu = {
.task_ctx_nr = perf_hw_context,
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.event_init = perf_ibs_init,
.add = perf_ibs_add,
.del = perf_ibs_del,
@@ -793,6 +794,7 @@ static struct perf_ibs perf_ibs_op = {
.pmu = {
.task_ctx_nr = perf_hw_context,
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.event_init = perf_ibs_init,
.add = perf_ibs_add,
.del = perf_ibs_del,
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index eca5bb49aa85..72a4c43951ee 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1837,7 +1837,7 @@ static void __init pmu_check_apic(void)
* sample via a hrtimer based software event):
*/
pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
-
+ pmu.capabilities &= ~PERF_PMU_CAP_SAMPLING;
}
static struct attribute_group x86_pmu_format_group __ro_after_init = {
@@ -2698,6 +2698,8 @@ static bool x86_pmu_filter(struct pmu *pmu, int cpu)
}
static struct pmu pmu = {
+ .capabilities = PERF_PMU_CAP_SAMPLING,
+
.pmu_enable = x86_pmu_enable,
.pmu_disable = x86_pmu_disable,
diff --git a/arch/xtensa/kernel/perf_event.c b/arch/xtensa/kernel/perf_event.c
index 223f1d452310..b03a2feb0f92 100644
--- a/arch/xtensa/kernel/perf_event.c
+++ b/arch/xtensa/kernel/perf_event.c
@@ -397,6 +397,7 @@ irqreturn_t xtensa_pmu_irq_handler(int irq, void *dev_id)
}
static struct pmu xtensa_pmu = {
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.pmu_enable = xtensa_pmu_enable,
.pmu_disable = xtensa_pmu_disable,
.event_init = xtensa_pmu_event_init,
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 2c1af3a0207c..72d8f38d0aa5 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -876,7 +876,8 @@ struct arm_pmu *armpmu_alloc(void)
* PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE events on a
* specific PMU.
*/
- .capabilities = PERF_PMU_CAP_EXTENDED_REGS |
+ .capabilities = PERF_PMU_CAP_SAMPLING |
+ PERF_PMU_CAP_EXTENDED_REGS |
PERF_PMU_CAP_EXTENDED_HW_TYPE,
};
diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
index 118170a5cede..ab7a802cd0d6 100644
--- a/drivers/perf/arm_pmu_platform.c
+++ b/drivers/perf/arm_pmu_platform.c
@@ -109,6 +109,7 @@ static int pmu_parse_irqs(struct arm_pmu *pmu)
*/
if (num_irqs == 0) {
dev_warn(dev, "no irqs for PMU, sampling events not supported\n");
+ pmu->pmu.capabilities &= ~PERF_PMU_CAP_SAMPLING;
pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
cpumask_setall(&pmu->supported_cpus);
return 0;
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 369e77ad5f13..dbd52851f5c6 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -955,7 +955,8 @@ static int arm_spe_pmu_perf_init(struct arm_spe_pmu *spe_pmu)
spe_pmu->pmu = (struct pmu) {
.module = THIS_MODULE,
.parent = &spe_pmu->pdev->dev,
- .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
+ .capabilities = PERF_PMU_CAP_SAMPLING |
+ PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
.attr_groups = arm_spe_pmu_attr_groups,
/*
* We hitch a ride on the software context here, so that
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 698de8ddf895..d185ea8c47ba 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -1361,6 +1361,8 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
pr_info("Perf sampling/filtering is not supported as sscof extension is not available\n");
pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
+ } else {
+ pmu->pmu.capabilities |= PERF_PMU_CAP_SAMPLING;
}
pmu->pmu.attr_groups = riscv_pmu_attr_groups;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4d439c24c901..bf2cfbeabba2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -294,7 +294,7 @@ struct perf_event_pmu_context;
/**
* pmu::capabilities flags
*/
-#define PERF_PMU_CAP_NO_INTERRUPT 0x0001
+#define PERF_PMU_CAP_SAMPLING 0x0001
#define PERF_PMU_CAP_NO_NMI 0x0002
#define PERF_PMU_CAP_AUX_NO_SG 0x0004
#define PERF_PMU_CAP_EXTENDED_REGS 0x0008
@@ -305,6 +305,7 @@ struct perf_event_pmu_context;
#define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
#define PERF_PMU_CAP_AUX_PAUSE 0x0200
#define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
+#define PERF_PMU_CAP_NO_INTERRUPT 0x0800
/**
* pmu::scope
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8060c2857bb2..71b2a6730705 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4359,7 +4359,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
continue;
if (!perf_pmu_ctx_is_active(pmu_ctx))
continue;
- if (pmu_ctx->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT)
+ if (!(pmu_ctx->pmu->capabilities & PERF_PMU_CAP_SAMPLING))
continue;
perf_pmu_disable(pmu_ctx->pmu);
@@ -10819,7 +10819,7 @@ static int perf_swevent_init(struct perf_event *event)
static struct pmu perf_swevent = {
.task_ctx_nr = perf_sw_context,
- .capabilities = PERF_PMU_CAP_NO_NMI,
+ .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_NO_NMI,
.event_init = perf_swevent_init,
.add = perf_swevent_add,
@@ -10861,6 +10861,7 @@ static int perf_tp_event_init(struct perf_event *event)
static struct pmu perf_tracepoint = {
.task_ctx_nr = perf_sw_context,
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.event_init = perf_tp_event_init,
.add = perf_trace_add,
.del = perf_trace_del,
@@ -11066,6 +11067,7 @@ static struct pmu perf_kprobe = {
.stop = perf_swevent_stop,
.read = perf_swevent_read,
.attr_groups = kprobe_attr_groups,
+ .capabilities = PERF_PMU_CAP_SAMPLING,
};
static int perf_kprobe_event_init(struct perf_event *event)
@@ -11125,6 +11127,7 @@ static struct pmu perf_uprobe = {
.stop = perf_swevent_stop,
.read = perf_swevent_read,
.attr_groups = uprobe_attr_groups,
+ .capabilities = PERF_PMU_CAP_SAMPLING,
};
static int perf_uprobe_event_init(struct perf_event *event)
@@ -11899,7 +11902,7 @@ static int cpu_clock_event_init(struct perf_event *event)
static struct pmu perf_cpu_clock = {
.task_ctx_nr = perf_sw_context,
- .capabilities = PERF_PMU_CAP_NO_NMI,
+ .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_NO_NMI,
.dev = PMU_NULL_DEV,
.event_init = cpu_clock_event_init,
@@ -11982,7 +11985,7 @@ static int task_clock_event_init(struct perf_event *event)
static struct pmu perf_task_clock = {
.task_ctx_nr = perf_sw_context,
- .capabilities = PERF_PMU_CAP_NO_NMI,
+ .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_NO_NMI,
.dev = PMU_NULL_DEV,
.event_init = task_clock_event_init,
@@ -13476,11 +13479,10 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_task;
}
- if (is_sampling_event(event)) {
- if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
- err = -EOPNOTSUPP;
- goto err_alloc;
- }
+ if (is_sampling_event(event) &&
+ !(event->pmu->capabilities & PERF_PMU_CAP_SAMPLING)) {
+ err = -EOPNOTSUPP;
+ goto err_alloc;
}
/*
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 8ec2cb688903..604be7d7aecf 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -996,6 +996,7 @@ static void hw_breakpoint_stop(struct perf_event *bp, int flags)
static struct pmu perf_breakpoint = {
.task_ctx_nr = perf_sw_context, /* could eventually get its own */
+ .capabilities = PERF_PMU_CAP_SAMPLING,
.event_init = hw_breakpoint_event_init,
.add = hw_breakpoint_add,
.del = hw_breakpoint_del,
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 17/19] perf: Retire PERF_PMU_CAP_NO_INTERRUPT
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (15 preceding siblings ...)
2025-08-13 17:01 ` [PATCH 16/19] perf: Introduce positive capability for sampling Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-26 13:08 ` Peter Zijlstra
2025-08-13 17:01 ` [PATCH 18/19] perf: Introduce positive capability for raw events Robin Murphy
2025-08-13 17:01 ` [PATCH 19/19] perf: Garbage-collect event_init checks Robin Murphy
18 siblings, 1 reply; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
Now that we have a well-defined cap for sampling support, clean up the
remains of the mildly unintuitive and inconsistently-applied
PERF_PMU_CAP_NO_INTERRUPT. Not to mention the obvious redundancy of
some of these drivers still checking for sampling in event_init too.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
arch/arc/kernel/perf_event.c | 4 +---
arch/csky/kernel/perf_event.c | 1 -
arch/powerpc/perf/8xx-pmu.c | 3 +--
arch/powerpc/perf/hv-24x7.c | 3 ---
arch/powerpc/perf/hv-gpci.c | 3 ---
arch/powerpc/perf/kvm-hv-pmu.c | 2 +-
arch/powerpc/perf/vpa-pmu.c | 6 +-----
arch/powerpc/platforms/pseries/papr_scm.c | 7 +------
arch/s390/kernel/perf_cpum_cf.c | 3 ---
arch/sh/kernel/perf_event.c | 1 -
arch/x86/events/amd/uncore.c | 6 +++---
arch/x86/events/core.c | 1 -
arch/x86/events/intel/cstate.c | 9 +++------
arch/x86/events/msr.c | 5 +----
drivers/fpga/dfl-fme-perf.c | 6 ++----
drivers/perf/arm_cspmu/arm_cspmu.c | 14 ++------------
drivers/perf/arm_pmu_platform.c | 1 -
drivers/perf/marvell_cn10k_tad_pmu.c | 3 +--
drivers/perf/riscv_pmu_legacy.c | 1 -
drivers/perf/riscv_pmu_sbi.c | 1 -
drivers/powercap/intel_rapl_common.c | 2 +-
include/linux/perf_event.h | 1 -
22 files changed, 18 insertions(+), 65 deletions(-)
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 1b99b0215027..7e154f6f0abd 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -816,9 +816,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
}
- if (irq == -1)
- arc_pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
- else
+ if (irq != -1)
arc_pmu->pmu.capabilities |= PERF_PMU_CAP_SAMPLING;
/*
diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
index c5ba6e235a6f..ecf4b2863f78 100644
--- a/arch/csky/kernel/perf_event.c
+++ b/arch/csky/kernel/perf_event.c
@@ -1316,7 +1316,6 @@ int csky_pmu_device_probe(struct platform_device *pdev,
ret = csky_pmu_request_irq(csky_pmu_handle_irq);
if (ret) {
csky_pmu.pmu.capabilities &= ~PERF_PMU_CAP_SAMPLING;
- csky_pmu.pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
pr_notice("[perf] PMU request irq fail!\n");
}
diff --git a/arch/powerpc/perf/8xx-pmu.c b/arch/powerpc/perf/8xx-pmu.c
index 1d2972229e3a..71c35bd72eae 100644
--- a/arch/powerpc/perf/8xx-pmu.c
+++ b/arch/powerpc/perf/8xx-pmu.c
@@ -181,8 +181,7 @@ static struct pmu mpc8xx_pmu = {
.add = mpc8xx_pmu_add,
.del = mpc8xx_pmu_del,
.read = mpc8xx_pmu_read,
- .capabilities = PERF_PMU_CAP_NO_INTERRUPT |
- PERF_PMU_CAP_NO_NMI,
+ .capabilities = PERF_PMU_CAP_NO_NMI,
};
static int init_mpc8xx_pmu(void)
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index e42677cc254a..ab906616e570 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1726,9 +1726,6 @@ static int hv_24x7_init(void)
if (!hv_page_cache)
return -ENOMEM;
- /* sampling not supported */
- h_24x7_pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
-
r = create_events_from_catalog(&event_group.attrs,
&event_desc_group.attrs,
&event_long_desc_group.attrs);
diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index 241551d1282f..1726690396ec 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -1008,9 +1008,6 @@ static int hv_gpci_init(void)
if (r)
return r;
- /* sampling not supported */
- h_gpci_pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
-
arg = (void *)get_cpu_var(hv_gpci_reqb);
memset(arg, 0, HGPCI_REQ_BUFFER_SIZE);
diff --git a/arch/powerpc/perf/kvm-hv-pmu.c b/arch/powerpc/perf/kvm-hv-pmu.c
index ae264c9080ef..1c6bc65c986c 100644
--- a/arch/powerpc/perf/kvm-hv-pmu.c
+++ b/arch/powerpc/perf/kvm-hv-pmu.c
@@ -391,7 +391,7 @@ static struct pmu kvmppc_pmu = {
.attr_groups = kvmppc_pmu_attr_groups,
.type = -1,
.scope = PERF_PMU_SCOPE_SYS_WIDE,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
static int __init kvmppc_register_pmu(void)
diff --git a/arch/powerpc/perf/vpa-pmu.c b/arch/powerpc/perf/vpa-pmu.c
index 840733468959..1d360b5bf67c 100644
--- a/arch/powerpc/perf/vpa-pmu.c
+++ b/arch/powerpc/perf/vpa-pmu.c
@@ -75,10 +75,6 @@ static int vpa_pmu_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;
- /* it does not support event sampling mode */
- if (is_sampling_event(event))
- return -EOPNOTSUPP;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
@@ -164,7 +160,7 @@ static struct pmu vpa_pmu = {
.del = vpa_pmu_del,
.read = vpa_pmu_read,
.attr_groups = vpa_pmu_attr_groups,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
static int __init pseries_vpa_pmu_init(void)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f7c9271bda58..d752cdaf8422 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -379,10 +379,6 @@ static int papr_scm_pmu_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;
- /* it does not support event sampling mode */
- if (is_sampling_event(event))
- return -EOPNOTSUPP;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
@@ -463,8 +459,7 @@ static void papr_scm_pmu_register(struct papr_scm_priv *p)
nd_pmu->pmu.add = papr_scm_pmu_add;
nd_pmu->pmu.del = papr_scm_pmu_del;
- nd_pmu->pmu.capabilities = PERF_PMU_CAP_NO_INTERRUPT |
- PERF_PMU_CAP_NO_EXCLUDE;
+ nd_pmu->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
/*updating the cpumask variable */
nodeid = numa_map_to_online_node(dev_to_node(&p->pdev->dev));
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 7d10842d54f0..1a94e0944bc5 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -760,8 +760,6 @@ static int __hw_perf_event_init(struct perf_event *event, unsigned int type)
break;
case PERF_TYPE_HARDWARE:
- if (is_sampling_event(event)) /* No sampling support */
- return -ENOENT;
ev = attr->config;
if (!attr->exclude_user && attr->exclude_kernel) {
/*
@@ -1056,7 +1054,6 @@ static void cpumf_pmu_del(struct perf_event *event, int flags)
/* Performance monitoring unit for s390x */
static struct pmu cpumf_pmu = {
.task_ctx_nr = perf_sw_context,
- .capabilities = PERF_PMU_CAP_NO_INTERRUPT,
.pmu_enable = cpumf_pmu_enable,
.pmu_disable = cpumf_pmu_disable,
.event_init = cpumf_pmu_event_init,
diff --git a/arch/sh/kernel/perf_event.c b/arch/sh/kernel/perf_event.c
index 1d2507f22437..d1b534538524 100644
--- a/arch/sh/kernel/perf_event.c
+++ b/arch/sh/kernel/perf_event.c
@@ -352,7 +352,6 @@ int register_sh_pmu(struct sh_pmu *_pmu)
* no interrupts, and are therefore unable to do sampling without
* further work and timer assistance.
*/
- pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
WARN_ON(_pmu->num_events > MAX_HWEVENTS);
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index e8b6af199c73..050a5567291a 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -767,7 +767,7 @@ int amd_uncore_df_ctx_init(struct amd_uncore *uncore, unsigned int cpu)
.start = amd_uncore_start,
.stop = amd_uncore_stop,
.read = amd_uncore_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
.module = THIS_MODULE,
};
@@ -903,7 +903,7 @@ int amd_uncore_l3_ctx_init(struct amd_uncore *uncore, unsigned int cpu)
.start = amd_uncore_start,
.stop = amd_uncore_stop,
.read = amd_uncore_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
.module = THIS_MODULE,
};
@@ -1068,7 +1068,7 @@ int amd_uncore_umc_ctx_init(struct amd_uncore *uncore, unsigned int cpu)
.start = amd_uncore_umc_start,
.stop = amd_uncore_stop,
.read = amd_uncore_umc_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
.module = THIS_MODULE,
};
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 72a4c43951ee..789dfca2fa67 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1836,7 +1836,6 @@ static void __init pmu_check_apic(void)
* events (user-space has to fall back and
* sample via a hrtimer based software event):
*/
- pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
pmu.capabilities &= ~PERF_PMU_CAP_SAMPLING;
}
diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index ec753e39b007..2a79717b898f 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -281,9 +281,6 @@ static int cstate_pmu_event_init(struct perf_event *event)
return -ENOENT;
/* unsupported modes and filters */
- if (event->attr.sample_period) /* no sampling */
- return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
@@ -397,7 +394,7 @@ static struct pmu cstate_core_pmu = {
.start = cstate_pmu_event_start,
.stop = cstate_pmu_event_stop,
.read = cstate_pmu_event_update,
- .capabilities = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
.scope = PERF_PMU_SCOPE_CORE,
.module = THIS_MODULE,
};
@@ -413,7 +410,7 @@ static struct pmu cstate_pkg_pmu = {
.start = cstate_pmu_event_start,
.stop = cstate_pmu_event_stop,
.read = cstate_pmu_event_update,
- .capabilities = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
.scope = PERF_PMU_SCOPE_PKG,
.module = THIS_MODULE,
};
@@ -429,7 +426,7 @@ static struct pmu cstate_module_pmu = {
.start = cstate_pmu_event_start,
.stop = cstate_pmu_event_stop,
.read = cstate_pmu_event_update,
- .capabilities = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
.scope = PERF_PMU_SCOPE_CLUSTER,
.module = THIS_MODULE,
};
diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index 7f5007a4752a..3285c1f3bb90 100644
--- a/arch/x86/events/msr.c
+++ b/arch/x86/events/msr.c
@@ -210,9 +210,6 @@ static int msr_event_init(struct perf_event *event)
return -ENOENT;
/* unsupported modes and filters */
- if (event->attr.sample_period) /* no sampling */
- return -EINVAL;
-
if (cfg >= PERF_MSR_EVENT_MAX)
return -EINVAL;
@@ -298,7 +295,7 @@ static struct pmu pmu_msr = {
.start = msr_event_start,
.stop = msr_event_stop,
.read = msr_event_update,
- .capabilities = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
.attr_update = attr_update,
};
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
index 7422d2bc6f37..a1e2e7f28a3a 100644
--- a/drivers/fpga/dfl-fme-perf.c
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -806,9 +806,8 @@ static int fme_perf_event_init(struct perf_event *event)
/*
* fme counters are shared across all cores.
* Therefore, it does not support per-process mode.
- * Also, it does not support event sampling mode.
*/
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
+ if (event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
if (event->cpu < 0)
@@ -921,8 +920,7 @@ static int fme_perf_pmu_register(struct platform_device *pdev,
pmu->start = fme_perf_event_start;
pmu->stop = fme_perf_event_stop;
pmu->read = fme_perf_event_read;
- pmu->capabilities = PERF_PMU_CAP_NO_INTERRUPT |
- PERF_PMU_CAP_NO_EXCLUDE;
+ pmu->capabilities = PERF_PMU_CAP_NO_EXCLUDE;
name = devm_kasprintf(priv->dev, GFP_KERNEL, "dfl_fme%d", pdev->id);
diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
index 7f5ea749b85c..761b438db231 100644
--- a/drivers/perf/arm_cspmu/arm_cspmu.c
+++ b/drivers/perf/arm_cspmu/arm_cspmu.c
@@ -608,12 +608,6 @@ static int arm_cspmu_event_init(struct perf_event *event)
* Following other "uncore" PMUs, we do not support sampling mode or
* attach to a task (per-process mode).
*/
- if (is_sampling_event(event)) {
- dev_dbg(cspmu->pmu.dev,
- "Can't support sampling events\n");
- return -EOPNOTSUPP;
- }
-
if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
dev_dbg(cspmu->pmu.dev,
"Can't support per-task counters\n");
@@ -1128,7 +1122,7 @@ static int arm_cspmu_get_cpus(struct arm_cspmu *cspmu)
static int arm_cspmu_register_pmu(struct arm_cspmu *cspmu)
{
- int ret, capabilities;
+ int ret;
ret = arm_cspmu_alloc_attr_groups(cspmu);
if (ret)
@@ -1139,10 +1133,6 @@ static int arm_cspmu_register_pmu(struct arm_cspmu *cspmu)
if (ret)
return ret;
- capabilities = PERF_PMU_CAP_NO_EXCLUDE;
- if (cspmu->irq == 0)
- capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
-
cspmu->pmu = (struct pmu){
.task_ctx_nr = perf_invalid_context,
.module = cspmu->impl.module,
@@ -1156,7 +1146,7 @@ static int arm_cspmu_register_pmu(struct arm_cspmu *cspmu)
.stop = arm_cspmu_stop,
.read = arm_cspmu_read,
.attr_groups = cspmu->attr_groups,
- .capabilities = capabilities,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
/* Hardware counter init */
diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
index ab7a802cd0d6..754dba9e4528 100644
--- a/drivers/perf/arm_pmu_platform.c
+++ b/drivers/perf/arm_pmu_platform.c
@@ -110,7 +110,6 @@ static int pmu_parse_irqs(struct arm_pmu *pmu)
if (num_irqs == 0) {
dev_warn(dev, "no irqs for PMU, sampling events not supported\n");
pmu->pmu.capabilities &= ~PERF_PMU_CAP_SAMPLING;
- pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
cpumask_setall(&pmu->supported_cpus);
return 0;
}
diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index ee6505cb01a7..a162e707a639 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -360,8 +360,7 @@ static int tad_pmu_probe(struct platform_device *pdev)
tad_pmu->pmu = (struct pmu) {
.module = THIS_MODULE,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE |
- PERF_PMU_CAP_NO_INTERRUPT,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
.task_ctx_nr = perf_invalid_context,
.event_init = tad_pmu_event_init,
diff --git a/drivers/perf/riscv_pmu_legacy.c b/drivers/perf/riscv_pmu_legacy.c
index 93c8e0fdb589..40140e457454 100644
--- a/drivers/perf/riscv_pmu_legacy.c
+++ b/drivers/perf/riscv_pmu_legacy.c
@@ -123,7 +123,6 @@ static void pmu_legacy_init(struct riscv_pmu *pmu)
pmu->event_mapped = pmu_legacy_event_mapped;
pmu->event_unmapped = pmu_legacy_event_unmapped;
pmu->csr_index = pmu_legacy_csr_index;
- pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
perf_pmu_register(&pmu->pmu, "cpu", PERF_TYPE_RAW);
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index d185ea8c47ba..4fb1aab0b547 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -1359,7 +1359,6 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
ret = pmu_sbi_setup_irqs(pmu, pdev);
if (ret < 0) {
pr_info("Perf sampling/filtering is not supported as sscof extension is not available\n");
- pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
} else {
pmu->pmu.capabilities |= PERF_PMU_CAP_SAMPLING;
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index c7e7f9bf5313..38470351217b 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -2014,7 +2014,7 @@ static int rapl_pmu_update(struct rapl_package *rp)
rapl_pmu.pmu.stop = rapl_pmu_event_stop;
rapl_pmu.pmu.read = rapl_pmu_event_read;
rapl_pmu.pmu.module = THIS_MODULE;
- rapl_pmu.pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT;
+ rapl_pmu.pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
ret = perf_pmu_register(&rapl_pmu.pmu, "power", -1);
if (ret) {
pr_info("Failed to register PMU\n");
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index bf2cfbeabba2..183b7c48b329 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -305,7 +305,6 @@ struct perf_event_pmu_context;
#define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
#define PERF_PMU_CAP_AUX_PAUSE 0x0200
#define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
-#define PERF_PMU_CAP_NO_INTERRUPT 0x0800
/**
* pmu::scope
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 18/19] perf: Introduce positive capability for raw events
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (16 preceding siblings ...)
2025-08-13 17:01 ` [PATCH 17/19] perf: Retire PERF_PMU_CAP_NO_INTERRUPT Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-19 13:15 ` Robin Murphy
` (2 more replies)
2025-08-13 17:01 ` [PATCH 19/19] perf: Garbage-collect event_init checks Robin Murphy
18 siblings, 3 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
Only a handful of CPU PMUs accept PERF_TYPE_{RAW,HARDWARE,HW_CACHE}
events without registering themselves as PERF_TYPE_RAW in the first
place. Add an explicit opt-in for these special cases, so that we can
make life easier for every other driver (and probably also speed up the
slow-path search) by having perf_try_init_event() do the basic type
checking to cover the majority of cases.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
A further possibility is to automatically add the cap to PERF_TYPE_RAW
PMUs in perf_pmu_register() to have a single point-of-use condition; I'm
undecided...
---
arch/s390/kernel/perf_cpum_cf.c | 1 +
arch/s390/kernel/perf_pai_crypto.c | 2 +-
arch/s390/kernel/perf_pai_ext.c | 2 +-
arch/x86/events/core.c | 2 +-
drivers/perf/arm_pmu.c | 1 +
include/linux/perf_event.h | 1 +
kernel/events/core.c | 15 +++++++++++++++
7 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 1a94e0944bc5..782ab755ddd4 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -1054,6 +1054,7 @@ static void cpumf_pmu_del(struct perf_event *event, int flags)
/* Performance monitoring unit for s390x */
static struct pmu cpumf_pmu = {
.task_ctx_nr = perf_sw_context,
+ .capabilities = PERF_PMU_CAP_RAW_EVENTS,
.pmu_enable = cpumf_pmu_enable,
.pmu_disable = cpumf_pmu_disable,
.event_init = cpumf_pmu_event_init,
diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c
index a64b6b056a21..b5b6d8b5d943 100644
--- a/arch/s390/kernel/perf_pai_crypto.c
+++ b/arch/s390/kernel/perf_pai_crypto.c
@@ -569,7 +569,7 @@ static const struct attribute_group *paicrypt_attr_groups[] = {
/* Performance monitoring unit for mapped counters */
static struct pmu paicrypt = {
.task_ctx_nr = perf_hw_context,
- .capabilities = PERF_PMU_CAP_SAMPLING,
+ .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
.event_init = paicrypt_event_init,
.add = paicrypt_add,
.del = paicrypt_del,
diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c
index 1261f80c6d52..bcd28c38da70 100644
--- a/arch/s390/kernel/perf_pai_ext.c
+++ b/arch/s390/kernel/perf_pai_ext.c
@@ -595,7 +595,7 @@ static const struct attribute_group *paiext_attr_groups[] = {
/* Performance monitoring unit for mapped counters */
static struct pmu paiext = {
.task_ctx_nr = perf_hw_context,
- .capabilities = PERF_PMU_CAP_SAMPLING,
+ .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
.event_init = paiext_event_init,
.add = paiext_add,
.del = paiext_del,
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 789dfca2fa67..764728bb80ae 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2697,7 +2697,7 @@ static bool x86_pmu_filter(struct pmu *pmu, int cpu)
}
static struct pmu pmu = {
- .capabilities = PERF_PMU_CAP_SAMPLING,
+ .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
.pmu_enable = x86_pmu_enable,
.pmu_disable = x86_pmu_disable,
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 72d8f38d0aa5..bc772a3bf411 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -877,6 +877,7 @@ struct arm_pmu *armpmu_alloc(void)
* specific PMU.
*/
.capabilities = PERF_PMU_CAP_SAMPLING |
+ PERF_PMU_CAP_RAW_EVENTS |
PERF_PMU_CAP_EXTENDED_REGS |
PERF_PMU_CAP_EXTENDED_HW_TYPE,
};
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 183b7c48b329..c6ad036c0037 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -305,6 +305,7 @@ struct perf_event_pmu_context;
#define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
#define PERF_PMU_CAP_AUX_PAUSE 0x0200
#define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
+#define PERF_PMU_CAP_RAW_EVENTS 0x0800
/**
* pmu::scope
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 71b2a6730705..2ecee76d2ae2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12556,11 +12556,26 @@ static inline bool has_extended_regs(struct perf_event *event)
(event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK);
}
+static bool is_raw_pmu(const struct pmu *pmu)
+{
+ return pmu->type == PERF_TYPE_RAW ||
+ pmu->capabilities & PERF_PMU_CAP_RAW_EVENTS;
+}
+
static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
{
struct perf_event_context *ctx = NULL;
int ret;
+ /*
+ * Before touching anything, we can safely skip:
+ * - any event for a specific PMU which is not this one
+ * - any common event if this PMU doesn't support them
+ */
+ if (event->attr.type != pmu->type &&
+ (event->attr.type >= PERF_TYPE_MAX || is_raw_pmu(pmu)))
+ return -ENOENT;
+
if (!try_module_get(pmu->module))
return -ENODEV;
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 19/19] perf: Garbage-collect event_init checks
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
` (17 preceding siblings ...)
2025-08-13 17:01 ` [PATCH 18/19] perf: Introduce positive capability for raw events Robin Murphy
@ 2025-08-13 17:01 ` Robin Murphy
2025-08-14 8:04 ` kernel test robot
` (2 more replies)
18 siblings, 3 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-13 17:01 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
All these boilerplate event_init checks are now redundant. Of course
many of them were already redundant, or done in the wrong order so as to
be pointless, and what we don't see here is all the ones which were
missing, but have now been implicitly gained thanks to some of these new
core code behaviours. In summary:
- event->attr.type
Now only relevant to PERF_TYPE_RAW PMUs or those advertising
PERF_PMU_CAP_RAW_EVENTS.
- event->cpu < 0
Already rejected by perf_event_alloc() unless a task is passed,
wherein that will also set PERF_ATTACH_TASK prior to reaching
perf_init_event(), so is always redundant with...
- PERF_ATTACH_TASK
Since at least commit bd2756811766 ("perf: Rewrite core context
handling"), only relevant to PMUs using perf_hw_context or
perf_sw_context; for uncore PMUs this is covered by
perf_event_alloc() again, right after perf_init_event() returns,
by virtue of the same non-NULL task which caused attach_state to
be set in the first place.
- is_sampling_event() (and variations)
Now only relevant to PMUs advertising PERF_PMU_CAP_SAMPLING.
- has_branch_stack()
Now doubly-illogical for PMUs which never supported sampling
anyway.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
arch/arm/mach-imx/mmdc.c | 14 ---------
arch/arm/mm/cache-l2x0-pmu.c | 10 -------
arch/powerpc/perf/hv-24x7.c | 8 -----
arch/powerpc/perf/hv-gpci.c | 8 -----
arch/powerpc/perf/imc-pmu.c | 30 -------------------
arch/powerpc/perf/kvm-hv-pmu.c | 3 --
arch/powerpc/perf/vpa-pmu.c | 7 -----
arch/powerpc/platforms/pseries/papr_scm.c | 11 -------
arch/s390/kernel/perf_cpum_cf.c | 3 +-
arch/x86/events/amd/iommu.c | 15 ----------
arch/x86/events/amd/power.c | 7 -----
arch/x86/events/amd/uncore.c | 6 ----
arch/x86/events/intel/bts.c | 3 --
arch/x86/events/intel/cstate.c | 7 -----
arch/x86/events/intel/pt.c | 3 --
arch/x86/events/intel/uncore.c | 13 --------
arch/x86/events/intel/uncore_snb.c | 18 -----------
arch/x86/events/msr.c | 3 --
arch/x86/events/rapl.c | 11 -------
drivers/devfreq/event/rockchip-dfi.c | 12 --------
drivers/dma/idxd/perfmon.c | 14 ---------
drivers/fpga/dfl-fme-perf.c | 14 ---------
drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c | 4 ---
drivers/gpu/drm/i915/i915_pmu.c | 13 --------
drivers/gpu/drm/xe/xe_pmu.c | 13 --------
.../hwtracing/coresight/coresight-etm-perf.c | 5 ----
drivers/hwtracing/ptt/hisi_ptt.c | 8 -----
drivers/iommu/intel/perfmon.c | 10 -------
drivers/perf/alibaba_uncore_drw_pmu.c | 17 -----------
drivers/perf/amlogic/meson_ddr_pmu_core.c | 9 ------
drivers/perf/arm-cci.c | 9 ------
drivers/perf/arm-ccn.c | 18 -----------
drivers/perf/arm-cmn.c | 10 -------
drivers/perf/arm-ni.c | 6 ----
drivers/perf/arm_cspmu/arm_cspmu.c | 13 --------
drivers/perf/arm_dmc620_pmu.c | 16 ----------
drivers/perf/arm_dsu_pmu.c | 20 -------------
drivers/perf/arm_smmuv3_pmu.c | 13 --------
drivers/perf/arm_spe_pmu.c | 4 ---
drivers/perf/cxl_pmu.c | 6 ----
drivers/perf/dwc_pcie_pmu.c | 11 -------
drivers/perf/fsl_imx8_ddr_perf.c | 11 -------
drivers/perf/fsl_imx9_ddr_perf.c | 11 -------
drivers/perf/hisilicon/hisi_pcie_pmu.c | 8 -----
drivers/perf/hisilicon/hisi_uncore_pmu.c | 18 -----------
drivers/perf/hisilicon/hns3_pmu.c | 7 -----
drivers/perf/marvell_cn10k_ddr_pmu.c | 13 --------
drivers/perf/marvell_cn10k_tad_pmu.c | 3 --
drivers/perf/marvell_pem_pmu.c | 11 -------
drivers/perf/qcom_l2_pmu.c | 15 ----------
drivers/perf/qcom_l3_pmu.c | 19 ------------
drivers/perf/starfive_starlink_pmu.c | 14 ---------
drivers/perf/thunderx2_pmu.c | 15 ----------
drivers/perf/xgene_pmu.c | 14 ---------
drivers/powercap/intel_rapl_common.c | 7 -----
55 files changed, 1 insertion(+), 590 deletions(-)
diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index f9d432b385a2..9e3734e249a2 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -277,20 +277,6 @@ static int mmdc_pmu_event_init(struct perf_event *event)
struct mmdc_pmu *pmu_mmdc = to_mmdc_pmu(event->pmu);
int cfg = event->attr.config;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EOPNOTSUPP;
-
- if (event->cpu < 0) {
- dev_warn(pmu_mmdc->dev, "Can't provide per-task data!\n");
- return -EOPNOTSUPP;
- }
-
- if (event->attr.sample_period)
- return -EINVAL;
-
if (cfg < 0 || cfg >= MMDC_NUM_COUNTERS)
return -EINVAL;
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index 6fc1171031a8..b8753463c1c4 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -294,16 +294,6 @@ static int l2x0_pmu_event_init(struct perf_event *event)
{
struct hw_perf_event *hw = &event->hw;
- if (event->attr.type != l2x0_pmu->type)
- return -ENOENT;
-
- if (is_sampling_event(event) ||
- event->attach_state & PERF_ATTACH_TASK)
- return -EINVAL;
-
- if (event->cpu < 0)
- return -EINVAL;
-
if (event->attr.config & ~L2X0_EVENT_CNT_CFG_SRC_MASK)
return -EINVAL;
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index ab906616e570..5b03d6b34999 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1379,10 +1379,6 @@ static int h_24x7_event_init(struct perf_event *event)
unsigned long hret;
u64 ct;
- /* Not our event */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
/* Unused areas must be 0 */
if (event_get_reserved1(event) ||
event_get_reserved2(event) ||
@@ -1397,10 +1393,6 @@ static int h_24x7_event_init(struct perf_event *event)
return -EINVAL;
}
- /* no branch sampling */
- if (has_branch_stack(event))
- return -EOPNOTSUPP;
-
/* offset must be 8 byte aligned */
if (event_get_offset(event) % 8) {
pr_devel("bad alignment\n");
diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index 1726690396ec..9663aa18bc45 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -775,20 +775,12 @@ static int h_gpci_event_init(struct perf_event *event)
u8 length;
unsigned long ret;
- /* Not our event */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
/* config2 is unused */
if (event->attr.config2) {
pr_devel("config2 set when reserved\n");
return -EINVAL;
}
- /* no branch sampling */
- if (has_branch_stack(event))
- return -EOPNOTSUPP;
-
length = event_get_length(event);
if (length < 1 || length > 8) {
pr_devel("length invalid\n");
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f352dda3baf9..cee6390986dc 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -517,16 +517,6 @@ static int nest_imc_event_init(struct perf_event *event)
struct imc_pmu_ref *ref;
bool flag = false;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* Sampling not supported */
- if (event->hw.sample_period)
- return -EINVAL;
-
- if (event->cpu < 0)
- return -EINVAL;
-
pmu = imc_event_to_pmu(event);
/* Sanity check for config (event offset) */
@@ -819,16 +809,6 @@ static int core_imc_event_init(struct perf_event *event)
struct imc_pmu *pmu;
struct imc_pmu_ref *ref;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* Sampling not supported */
- if (event->hw.sample_period)
- return -EINVAL;
-
- if (event->cpu < 0)
- return -EINVAL;
-
event->hw.idx = -1;
pmu = imc_event_to_pmu(event);
@@ -983,16 +963,9 @@ static int thread_imc_event_init(struct perf_event *event)
struct task_struct *target;
struct imc_pmu *pmu;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
if (!perfmon_capable())
return -EACCES;
- /* Sampling not supported */
- if (event->hw.sample_period)
- return -EINVAL;
-
event->hw.idx = -1;
pmu = imc_event_to_pmu(event);
@@ -1436,9 +1409,6 @@ static void trace_imc_event_del(struct perf_event *event, int flags)
static int trace_imc_event_init(struct perf_event *event)
{
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
if (!perfmon_capable())
return -EACCES;
diff --git a/arch/powerpc/perf/kvm-hv-pmu.c b/arch/powerpc/perf/kvm-hv-pmu.c
index 1c6bc65c986c..513f5b172ba6 100644
--- a/arch/powerpc/perf/kvm-hv-pmu.c
+++ b/arch/powerpc/perf/kvm-hv-pmu.c
@@ -180,9 +180,6 @@ static int kvmppc_pmu_event_init(struct perf_event *event)
__func__, event, event->id, event->cpu,
event->oncpu, config);
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
if (config >= KVMPPC_EVENT_MAX)
return -EINVAL;
diff --git a/arch/powerpc/perf/vpa-pmu.c b/arch/powerpc/perf/vpa-pmu.c
index 1d360b5bf67c..35883a071360 100644
--- a/arch/powerpc/perf/vpa-pmu.c
+++ b/arch/powerpc/perf/vpa-pmu.c
@@ -72,13 +72,6 @@ static const struct attribute_group *vpa_pmu_attr_groups[] = {
static int vpa_pmu_event_init(struct perf_event *event)
{
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* no branch sampling */
- if (has_branch_stack(event))
- return -EOPNOTSUPP;
-
/* Invalid event code */
if ((event->attr.config <= 0) || (event->attr.config > 3))
return -EINVAL;
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index d752cdaf8422..e6474ee0c140 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -372,17 +372,6 @@ static int papr_scm_pmu_event_init(struct perf_event *event)
struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
struct papr_scm_priv *p;
- if (!nd_pmu)
- return -EINVAL;
-
- /* test the event attr type for PMU enumeration */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* no branch sampling */
- if (has_branch_stack(event))
- return -EOPNOTSUPP;
-
p = (struct papr_scm_priv *)nd_pmu->dev->driver_data;
if (!p)
return -EINVAL;
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 782ab755ddd4..fa732e94f6e4 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -1788,8 +1788,7 @@ static int cfdiag_event_init(struct perf_event *event)
struct perf_event_attr *attr = &event->attr;
int err = -ENOENT;
- if (event->attr.config != PERF_EVENT_CPUM_CF_DIAG ||
- event->attr.type != event->pmu->type)
+ if (event->attr.config != PERF_EVENT_CPUM_CF_DIAG)
goto out;
/* Raw events are used to access counters directly,
diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index a721da9987dd..8053bec14dec 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -209,21 +209,6 @@ static int perf_iommu_event_init(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
- /* test the event attr type check for PMU enumeration */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /*
- * IOMMU counters are shared across all cores.
- * Therefore, it does not support per-process mode.
- * Also, it does not support event sampling mode.
- */
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EINVAL;
-
- if (event->cpu < 0)
- return -EINVAL;
-
/* update the hw_perf_event struct with the iommu config data */
hwc->conf = event->attr.config;
hwc->conf1 = event->attr.config1;
diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
index dad42790cf7d..a5e42ee2464a 100644
--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -125,14 +125,7 @@ static int pmu_event_init(struct perf_event *event)
{
u64 cfg = event->attr.config & AMD_POWER_EVENT_MASK;
- /* Only look at AMD power events. */
- if (event->attr.type != pmu_class.type)
- return -ENOENT;
-
/* Unsupported modes and filters. */
- if (event->attr.sample_period)
- return -EINVAL;
-
if (cfg != AMD_POWER_EVENTSEL_PKG)
return -EINVAL;
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 050a5567291a..76f58c7b4c19 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -270,12 +270,6 @@ static int amd_uncore_event_init(struct perf_event *event)
struct amd_uncore_ctx *ctx;
struct hw_perf_event *hwc = &event->hw;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (event->cpu < 0)
- return -EINVAL;
-
pmu = event_to_amd_uncore_pmu(event);
ctx = *per_cpu_ptr(pmu->ctx, event->cpu);
if (!ctx)
diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index 61da6b8a3d51..27e23153ba6f 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -565,9 +565,6 @@ static int bts_event_init(struct perf_event *event)
{
int ret;
- if (event->attr.type != bts_pmu.type)
- return -ENOENT;
-
/*
* BTS leaks kernel addresses even when CPL0 tracing is
* disabled, so disallow intel_bts driver for unprivileged
diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 2a79717b898f..90a884d77864 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -277,13 +277,6 @@ static int cstate_pmu_event_init(struct perf_event *event)
{
u64 cfg = event->attr.config;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* unsupported modes and filters */
- if (event->cpu < 0)
- return -EINVAL;
-
if (event->pmu == &cstate_core_pmu) {
if (cfg >= PERF_CSTATE_CORE_EVENT_MAX)
return -EINVAL;
diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index e8cf29d2b10c..a5004dd7632b 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1795,9 +1795,6 @@ static void pt_event_destroy(struct perf_event *event)
static int pt_event_init(struct perf_event *event)
{
- if (event->attr.type != pt_pmu.pmu.type)
- return -ENOENT;
-
if (!pt_event_valid(event))
return -EINVAL;
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 297ff5adb667..98ffab403bb4 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -731,24 +731,11 @@ static int uncore_pmu_event_init(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
int ret;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
pmu = uncore_event_to_pmu(event);
/* no device found for this pmu */
if (!pmu->registered)
return -ENOENT;
- /* Sampling not supported yet */
- if (hwc->sample_period)
- return -EINVAL;
-
- /*
- * Place all uncore events for a particular physical package
- * onto a single cpu
- */
- if (event->cpu < 0)
- return -EINVAL;
box = uncore_pmu_to_box(pmu, event->cpu);
if (!box || box->cpu < 0)
return -EINVAL;
diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
index 807e582b8f17..8537f61bb093 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -906,29 +906,11 @@ static int snb_uncore_imc_event_init(struct perf_event *event)
u64 cfg = event->attr.config & SNB_UNCORE_PCI_IMC_EVENT_MASK;
int idx, base;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
pmu = uncore_event_to_pmu(event);
/* no device found for this pmu */
if (!pmu->registered)
return -ENOENT;
- /* Sampling not supported yet */
- if (hwc->sample_period)
- return -EINVAL;
-
- /* unsupported modes and filters */
- if (event->attr.sample_period) /* no sampling */
- return -EINVAL;
-
- /*
- * Place all uncore events for a particular physical package
- * onto a single cpu
- */
- if (event->cpu < 0)
- return -EINVAL;
-
/* check only supported bits are set */
if (event->attr.config & ~SNB_UNCORE_PCI_IMC_EVENT_MASK)
return -EINVAL;
diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index 3285c1f3bb90..cf6214849a25 100644
--- a/arch/x86/events/msr.c
+++ b/arch/x86/events/msr.c
@@ -206,9 +206,6 @@ static int msr_event_init(struct perf_event *event)
{
u64 cfg = event->attr.config;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
/* unsupported modes and filters */
if (cfg >= PERF_MSR_EVENT_MAX)
return -EINVAL;
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index defd86137f12..5d298e371b28 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -370,21 +370,10 @@ static int rapl_pmu_event_init(struct perf_event *event)
unsigned int rapl_pmu_idx;
struct rapl_pmus *rapl_pmus;
- /* only look at RAPL events */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* unsupported modes and filters */
- if (event->attr.sample_period) /* no sampling */
- return -EINVAL;
-
/* check only supported bits are set */
if (event->attr.config & ~RAPL_EVENT_MASK)
return -EINVAL;
- if (event->cpu < 0)
- return -EINVAL;
-
rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
if (!rapl_pmus)
return -EINVAL;
diff --git a/drivers/devfreq/event/rockchip-dfi.c b/drivers/devfreq/event/rockchip-dfi.c
index 88a9ecbe96ce..87ec7bc965bd 100644
--- a/drivers/devfreq/event/rockchip-dfi.c
+++ b/drivers/devfreq/event/rockchip-dfi.c
@@ -401,18 +401,6 @@ static const struct attribute_group *attr_groups[] = {
static int rockchip_ddr_perf_event_init(struct perf_event *event)
{
- struct rockchip_dfi *dfi = container_of(event->pmu, struct rockchip_dfi, pmu);
-
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (event->attach_state & PERF_ATTACH_TASK)
- return -EINVAL;
-
- if (event->cpu < 0) {
- dev_warn(dfi->dev, "Can't provide per-task data!\n");
- return -EINVAL;
- }
/* Disallow groups since we can't start/stop/read multiple counters at once */
if (in_hardware_group(event))
return -EINVAL;
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index 8c539e1f11da..4d6f1fc47685 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -171,20 +171,6 @@ static int perfmon_pmu_event_init(struct perf_event *event)
idxd = event_to_idxd(event);
event->hw.idx = -1;
-
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* sampling not supported */
- if (event->attr.sample_period)
- return -EINVAL;
-
- if (event->cpu < 0)
- return -EINVAL;
-
- if (event->pmu != &idxd->idxd_pmu->pmu)
- return -EINVAL;
-
event->hw.event_base = ioread64(PERFMON_TABLE_OFFSET(idxd));
event->hw.config = event->attr.config;
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
index a1e2e7f28a3a..0cc9538e0898 100644
--- a/drivers/fpga/dfl-fme-perf.c
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -799,20 +799,6 @@ static int fme_perf_event_init(struct perf_event *event)
struct fme_perf_event_ops *ops;
u32 eventid, evtype, portid;
- /* test the event attr type check for PMU enumeration */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /*
- * fme counters are shared across all cores.
- * Therefore, it does not support per-process mode.
- */
- if (event->attach_state & PERF_ATTACH_TASK)
- return -EINVAL;
-
- if (event->cpu < 0)
- return -EINVAL;
-
if (event->cpu != priv->cpu)
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
index 6e91ea1de5aa..294a7aea9aaa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
@@ -210,10 +210,6 @@ static int amdgpu_perf_event_init(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
- /* test the event attr type check for PMU enumeration */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
/* update the hw_perf_event struct with config data */
hwc->config = event->attr.config;
hwc->config_base = AMDGPU_PMU_PERF_TYPE_NONE;
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 5bc696bfbb0f..193e96976782 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -626,19 +626,6 @@ static int i915_pmu_event_init(struct perf_event *event)
if (!pmu->registered)
return -ENODEV;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* unsupported modes and filters */
- if (event->attr.sample_period) /* no sampling */
- return -EINVAL;
-
- if (has_branch_stack(event))
- return -EOPNOTSUPP;
-
- if (event->cpu < 0)
- return -EINVAL;
-
if (is_engine_event(event))
ret = engine_event_init(event);
else
diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
index cab51d826345..084e26728c35 100644
--- a/drivers/gpu/drm/xe/xe_pmu.c
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -238,24 +238,11 @@ static int xe_pmu_event_init(struct perf_event *event)
if (!pmu->registered)
return -ENODEV;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* unsupported modes and filters */
- if (event->attr.sample_period) /* no sampling */
- return -EINVAL;
-
- if (event->cpu < 0)
- return -EINVAL;
-
gt = config_to_gt_id(event->attr.config);
id = config_to_event_id(event->attr.config);
if (!event_supported(pmu, gt, id))
return -ENOENT;
- if (has_branch_stack(event))
- return -EOPNOTSUPP;
-
if (!event_param_valid(event))
return -ENOENT;
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index f1551c08ecb2..fd98eb6a1942 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -178,11 +178,6 @@ static int etm_event_init(struct perf_event *event)
{
int ret = 0;
- if (event->attr.type != etm_pmu.type) {
- ret = -ENOENT;
- goto out;
- }
-
ret = etm_addr_filters_alloc(event);
if (ret)
goto out;
diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
index 3090479a2979..470226defa14 100644
--- a/drivers/hwtracing/ptt/hisi_ptt.c
+++ b/drivers/hwtracing/ptt/hisi_ptt.c
@@ -998,14 +998,6 @@ static int hisi_ptt_pmu_event_init(struct perf_event *event)
int ret;
u32 val;
- if (event->attr.type != hisi_ptt->hisi_ptt_pmu.type)
- return -ENOENT;
-
- if (event->cpu < 0) {
- dev_dbg(event->pmu->dev, "Per-task mode not supported\n");
- return -EOPNOTSUPP;
- }
-
if (event->attach_state & PERF_ATTACH_TASK)
return -EOPNOTSUPP;
diff --git a/drivers/iommu/intel/perfmon.c b/drivers/iommu/intel/perfmon.c
index c3a1ac14cb2b..2b9bb89e1fd3 100644
--- a/drivers/iommu/intel/perfmon.c
+++ b/drivers/iommu/intel/perfmon.c
@@ -284,16 +284,6 @@ static int iommu_pmu_event_init(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* sampling not supported */
- if (event->attr.sample_period)
- return -EINVAL;
-
- if (event->cpu < 0)
- return -EINVAL;
-
if (iommu_pmu_validate_event(event))
return -EINVAL;
diff --git a/drivers/perf/alibaba_uncore_drw_pmu.c b/drivers/perf/alibaba_uncore_drw_pmu.c
index 0081618741c3..2404333ff902 100644
--- a/drivers/perf/alibaba_uncore_drw_pmu.c
+++ b/drivers/perf/alibaba_uncore_drw_pmu.c
@@ -528,24 +528,7 @@ static int ali_drw_pmu_event_init(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
struct device *dev = drw_pmu->pmu.dev;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (is_sampling_event(event)) {
- dev_err(dev, "Sampling not supported!\n");
- return -EOPNOTSUPP;
- }
-
- if (event->attach_state & PERF_ATTACH_TASK) {
- dev_err(dev, "Per-task counter cannot allocate!\n");
- return -EOPNOTSUPP;
- }
-
event->cpu = drw_pmu->cpu;
- if (event->cpu < 0) {
- dev_err(dev, "Per-task mode not supported!\n");
- return -EOPNOTSUPP;
- }
if (in_hardware_group(event)) {
dev_err(dev, "driveway only allow one event!\n");
diff --git a/drivers/perf/amlogic/meson_ddr_pmu_core.c b/drivers/perf/amlogic/meson_ddr_pmu_core.c
index c1e755c356a3..8f46cf835fb5 100644
--- a/drivers/perf/amlogic/meson_ddr_pmu_core.c
+++ b/drivers/perf/amlogic/meson_ddr_pmu_core.c
@@ -121,15 +121,6 @@ static int meson_ddr_perf_event_init(struct perf_event *event)
u64 config1 = event->attr.config1;
u64 config2 = event->attr.config2;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EOPNOTSUPP;
-
- if (event->cpu < 0)
- return -EOPNOTSUPP;
-
/* check if the number of parameters is too much */
if (event->attr.config != ALL_CHAN_COUNTER_ID &&
hweight64(config1) + hweight64(config2) > MAX_AXI_PORTS_OF_CHANNEL)
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 086d4363fcc8..84ba97389c65 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1283,13 +1283,6 @@ static int cci_pmu_event_init(struct perf_event *event)
atomic_t *active_events = &cci_pmu->active_events;
int err = 0;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* Shared by all CPUs, no meaningful state to sample */
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EOPNOTSUPP;
-
/*
* Following the example set by other "uncore" PMUs, we accept any CPU
* and rewrite its affinity dynamically rather than having perf core
@@ -1299,8 +1292,6 @@ static int cci_pmu_event_init(struct perf_event *event)
* the event being installed into its context, so the PMU's CPU can't
* change under our feet.
*/
- if (event->cpu < 0)
- return -EINVAL;
event->cpu = cci_pmu->cpu;
event->destroy = hw_perf_event_destroy;
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 63549aad3b99..6ec4cb9417e7 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -704,30 +704,12 @@ static void arm_ccn_pmu_event_release(struct perf_event *event)
static int arm_ccn_pmu_event_init(struct perf_event *event)
{
struct arm_ccn *ccn;
- struct hw_perf_event *hw = &event->hw;
u32 node_xp, type, event_id;
int valid;
int i;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
ccn = pmu_to_arm_ccn(event->pmu);
- if (hw->sample_period) {
- dev_dbg(ccn->dev, "Sampling not supported!\n");
- return -EOPNOTSUPP;
- }
-
- if (has_branch_stack(event)) {
- dev_dbg(ccn->dev, "Can't exclude execution levels!\n");
- return -EINVAL;
- }
-
- if (event->cpu < 0) {
- dev_dbg(ccn->dev, "Can't provide per-task data!\n");
- return -EOPNOTSUPP;
- }
/*
* Many perf core operations (eg. events rotation) operate on a
* single CPU context. This is obvious for CPU PMUs, where one
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index f8c9be9fa6c0..0f65d28c1b7a 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1765,16 +1765,6 @@ static int arm_cmn_event_init(struct perf_event *event)
bool bynodeid;
u16 nodeid, eventid;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EINVAL;
-
- event->cpu = cmn->cpu;
- if (event->cpu < 0)
- return -EINVAL;
-
type = CMN_EVENT_TYPE(event);
/* DTC events (i.e. cycles) already have everything they need */
if (type == CMN_TYPE_DTC)
diff --git a/drivers/perf/arm-ni.c b/drivers/perf/arm-ni.c
index d6b683a0264e..c48c82097412 100644
--- a/drivers/perf/arm-ni.c
+++ b/drivers/perf/arm-ni.c
@@ -309,12 +309,6 @@ static int arm_ni_event_init(struct perf_event *event)
{
struct arm_ni_cd *cd = pmu_to_cd(event->pmu);
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (is_sampling_event(event))
- return -EINVAL;
-
event->cpu = cd_to_ni(cd)->cpu;
if (NI_EVENT_TYPE(event) == NI_PMU)
return arm_ni_validate_group(event);
diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
index 761b438db231..47d207a97bfc 100644
--- a/drivers/perf/arm_cspmu/arm_cspmu.c
+++ b/drivers/perf/arm_cspmu/arm_cspmu.c
@@ -601,19 +601,6 @@ static int arm_cspmu_event_init(struct perf_event *event)
cspmu = to_arm_cspmu(event->pmu);
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /*
- * Following other "uncore" PMUs, we do not support sampling mode or
- * attach to a task (per-process mode).
- */
- if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
- dev_dbg(cspmu->pmu.dev,
- "Can't support per-task counters\n");
- return -EINVAL;
- }
-
/*
* Make sure the CPU assignment is on one of the CPUs associated with
* this PMU.
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 24308de80246..751a06ba5319 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -514,20 +514,6 @@ static int dmc620_pmu_event_init(struct perf_event *event)
struct dmc620_pmu *dmc620_pmu = to_dmc620_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /*
- * DMC 620 PMUs are shared across all cpus and cannot
- * support task bound and sampling events.
- */
- if (is_sampling_event(event) ||
- event->attach_state & PERF_ATTACH_TASK) {
- dev_dbg(dmc620_pmu->pmu.dev,
- "Can't support per-task counters\n");
- return -EOPNOTSUPP;
- }
-
/*
* Many perf core operations (eg. events rotation) operate on a
* single CPU context. This is obvious for CPU PMUs, where one
@@ -538,8 +524,6 @@ static int dmc620_pmu_event_init(struct perf_event *event)
* processor.
*/
event->cpu = dmc620_pmu->irq->cpu;
- if (event->cpu < 0)
- return -EINVAL;
hwc->idx = -1;
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index 7480fd6fe377..eacbe1864794 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -524,26 +524,6 @@ static int dsu_pmu_event_init(struct perf_event *event)
{
struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* We don't support sampling */
- if (is_sampling_event(event)) {
- dev_dbg(dsu_pmu->pmu.dev, "Can't support sampling events\n");
- return -EOPNOTSUPP;
- }
-
- /* We cannot support task bound events */
- if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
- dev_dbg(dsu_pmu->pmu.dev, "Can't support per-task counters\n");
- return -EINVAL;
- }
-
- if (has_branch_stack(event)) {
- dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
- return -EINVAL;
- }
-
if (!cpumask_test_cpu(event->cpu, &dsu_pmu->associated_cpus)) {
dev_dbg(dsu_pmu->pmu.dev,
"Requested cpu is not associated with the DSU\n");
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 7cac380a3528..d534a4eb457a 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -398,19 +398,6 @@ static int smmu_pmu_event_init(struct perf_event *event)
int group_num_events = 1;
u16 event_id;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (hwc->sample_period) {
- dev_dbg(dev, "Sampling not supported\n");
- return -EOPNOTSUPP;
- }
-
- if (event->cpu < 0) {
- dev_dbg(dev, "Per-task mode not supported\n");
- return -EOPNOTSUPP;
- }
-
/* Verify specified event is supported on this PMU */
event_id = get_event(event);
if (event_id < SMMU_PMCG_ARCH_MAX_EVENTS &&
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index dbd52851f5c6..89001d2ceabf 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -718,10 +718,6 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
struct perf_event_attr *attr = &event->attr;
struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu);
- /* This is, of course, deeply driver-specific */
- if (attr->type != event->pmu->type)
- return -ENOENT;
-
if (event->cpu >= 0 &&
!cpumask_test_cpu(event->cpu, &spe_pmu->supported_cpus))
return -ENOENT;
diff --git a/drivers/perf/cxl_pmu.c b/drivers/perf/cxl_pmu.c
index d094030220bf..c4f8d5ae45a1 100644
--- a/drivers/perf/cxl_pmu.c
+++ b/drivers/perf/cxl_pmu.c
@@ -563,12 +563,6 @@ static int cxl_pmu_event_init(struct perf_event *event)
struct cxl_pmu_info *info = pmu_to_cxl_pmu_info(event->pmu);
int rc;
- /* Top level type sanity check - is this a Hardware Event being requested */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EOPNOTSUPP;
/* TODO: Validation of any filter */
/*
diff --git a/drivers/perf/dwc_pcie_pmu.c b/drivers/perf/dwc_pcie_pmu.c
index 78c522658d84..a0eb72c38fdb 100644
--- a/drivers/perf/dwc_pcie_pmu.c
+++ b/drivers/perf/dwc_pcie_pmu.c
@@ -355,17 +355,6 @@ static int dwc_pcie_pmu_event_init(struct perf_event *event)
enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
u32 lane;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* We don't support sampling */
- if (is_sampling_event(event))
- return -EINVAL;
-
- /* We cannot support task bound events */
- if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK)
- return -EINVAL;
-
/* Disallow groups since we can't start/stop/read multiple counters at once */
if (in_hardware_group(event))
return -EINVAL;
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index 56fe281974d2..d63d5d4d9084 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -401,17 +401,6 @@ static int ddr_perf_event_init(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
struct perf_event *sibling;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EOPNOTSUPP;
-
- if (event->cpu < 0) {
- dev_warn(pmu->dev, "Can't provide per-task data!\n");
- return -EOPNOTSUPP;
- }
-
if (event != event->group_leader &&
pmu->devtype_data->quirks & DDR_CAP_AXI_ID_FILTER) {
if (!ddr_perf_filters_compatible(event, event->group_leader))
diff --git a/drivers/perf/fsl_imx9_ddr_perf.c b/drivers/perf/fsl_imx9_ddr_perf.c
index 85874ec5ecd0..9e0b2a969481 100644
--- a/drivers/perf/fsl_imx9_ddr_perf.c
+++ b/drivers/perf/fsl_imx9_ddr_perf.c
@@ -553,17 +553,6 @@ static int ddr_perf_event_init(struct perf_event *event)
struct ddr_pmu *pmu = to_ddr_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EOPNOTSUPP;
-
- if (event->cpu < 0) {
- dev_warn(pmu->dev, "Can't provide per-task data!\n");
- return -EOPNOTSUPP;
- }
-
event->cpu = pmu->cpu;
hwc->idx = -1;
diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
index 3b0b2f7197d0..b0b736af82e3 100644
--- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
+++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
@@ -378,19 +378,11 @@ static int hisi_pcie_pmu_event_init(struct perf_event *event)
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- /* Check the type first before going on, otherwise it's not our event */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
if (EXT_COUNTER_IS_USED(hisi_pcie_get_event(event)))
hwc->event_base = HISI_PCIE_EXT_CNT;
else
hwc->event_base = HISI_PCIE_CNT;
- /* Sampling is not supported. */
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EOPNOTSUPP;
-
if (!hisi_pcie_pmu_valid_filter(event, pcie_pmu))
return -EINVAL;
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index 3c531b36cf25..67d64d664b4f 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -199,24 +199,6 @@ int hisi_uncore_pmu_event_init(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
struct hisi_pmu *hisi_pmu;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /*
- * We do not support sampling as the counters are all
- * shared by all CPU cores in a CPU die(SCCL). Also we
- * do not support attach to a task(per-process mode)
- */
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EOPNOTSUPP;
-
- /*
- * The uncore counters not specific to any CPU, so cannot
- * support per-task
- */
- if (event->cpu < 0)
- return -EINVAL;
-
/*
* Validate if the events in group does not exceed the
* available counters in hardware.
diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c
index 382e469257f9..f6996eafea5a 100644
--- a/drivers/perf/hisilicon/hns3_pmu.c
+++ b/drivers/perf/hisilicon/hns3_pmu.c
@@ -1233,13 +1233,6 @@ static int hns3_pmu_event_init(struct perf_event *event)
int idx;
int ret;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /* Sampling is not supported */
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EOPNOTSUPP;
-
event->cpu = hns3_pmu->on_cpu;
idx = hns3_pmu_get_event_idx(hns3_pmu);
diff --git a/drivers/perf/marvell_cn10k_ddr_pmu.c b/drivers/perf/marvell_cn10k_ddr_pmu.c
index 54e3fd206d39..26ad83cdb735 100644
--- a/drivers/perf/marvell_cn10k_ddr_pmu.c
+++ b/drivers/perf/marvell_cn10k_ddr_pmu.c
@@ -474,19 +474,6 @@ static int cn10k_ddr_perf_event_init(struct perf_event *event)
struct cn10k_ddr_pmu *pmu = to_cn10k_ddr_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- if (is_sampling_event(event)) {
- dev_info(pmu->dev, "Sampling not supported!\n");
- return -EOPNOTSUPP;
- }
-
- if (event->cpu < 0) {
- dev_warn(pmu->dev, "Can't provide per-task data!\n");
- return -EOPNOTSUPP;
- }
-
/* Set ownership of event to one CPU, same event can not be observed
* on multiple cpus at same time.
*/
diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index a162e707a639..6ed30a649ed3 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -149,9 +149,6 @@ static int tad_pmu_event_init(struct perf_event *event)
{
struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
/* Disallow groups since we can't start/stop/read multiple counters at once */
if (in_hardware_group(event))
return -EINVAL;
diff --git a/drivers/perf/marvell_pem_pmu.c b/drivers/perf/marvell_pem_pmu.c
index 53a35a5de7f8..5c7abae77c12 100644
--- a/drivers/perf/marvell_pem_pmu.c
+++ b/drivers/perf/marvell_pem_pmu.c
@@ -191,20 +191,9 @@ static int pem_perf_event_init(struct perf_event *event)
struct pem_pmu *pmu = to_pem_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
if (event->attr.config >= PEM_EVENTIDS_MAX)
return -EINVAL;
- if (is_sampling_event(event) ||
- event->attach_state & PERF_ATTACH_TASK) {
- return -EOPNOTSUPP;
- }
-
- if (event->cpu < 0)
- return -EOPNOTSUPP;
-
/* Disallow groups since we can't start/stop/read multiple counters at once */
if (in_hardware_group(event))
return -EINVAL;
diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 9c4e1d89718d..eba9a7e40293 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -442,23 +442,8 @@ static int l2_cache_event_init(struct perf_event *event)
struct perf_event *sibling;
struct l2cache_pmu *l2cache_pmu;
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
l2cache_pmu = to_l2cache_pmu(event->pmu);
- if (hwc->sample_period) {
- dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
- "Sampling not supported\n");
- return -EOPNOTSUPP;
- }
-
- if (event->cpu < 0) {
- dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
- "Per-task mode not supported\n");
- return -EOPNOTSUPP;
- }
-
if (((L2_EVT_GROUP(event->attr.config) > L2_EVT_GROUP_MAX) ||
((event->attr.config & ~L2_EVT_MASK) != 0)) &&
(event->attr.config != L2CYCLE_CTR_RAW_CODE)) {
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index f0cf6c33418d..af0ced386fb1 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -478,25 +478,6 @@ static int qcom_l3_cache__event_init(struct perf_event *event)
struct l3cache_pmu *l3pmu = to_l3cache_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- /*
- * Is the event for this PMU?
- */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /*
- * Sampling not supported since these events are not core-attributable.
- */
- if (hwc->sample_period)
- return -EINVAL;
-
- /*
- * Task mode not available, we run the counters as socket counters,
- * not attributable to any CPU and therefore cannot attribute per-task.
- */
- if (event->cpu < 0)
- return -EINVAL;
-
/* Validate the group */
if (!qcom_l3_cache__validate_event_group(event))
return -EINVAL;
diff --git a/drivers/perf/starfive_starlink_pmu.c b/drivers/perf/starfive_starlink_pmu.c
index e185f307e639..ee5216403417 100644
--- a/drivers/perf/starfive_starlink_pmu.c
+++ b/drivers/perf/starfive_starlink_pmu.c
@@ -366,20 +366,6 @@ static int starlink_pmu_event_init(struct perf_event *event)
struct starlink_pmu *starlink_pmu = to_starlink_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- /*
- * Sampling is not supported, as counters are shared
- * by all CPU.
- */
- if (hwc->sample_period)
- return -EOPNOTSUPP;
-
- /*
- * Per-task and attach to a task are not supported,
- * as uncore events are not specific to any CPU.
- */
- if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK)
- return -EOPNOTSUPP;
-
if (!starlink_pmu_validate_event_group(event))
return -EINVAL;
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
index 472eb4494fd1..0ef85cb72289 100644
--- a/drivers/perf/thunderx2_pmu.c
+++ b/drivers/perf/thunderx2_pmu.c
@@ -553,21 +553,6 @@ static int tx2_uncore_event_init(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
struct tx2_uncore_pmu *tx2_pmu;
- /* Test the event attr type check for PMU enumeration */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /*
- * SOC PMU counters are shared across all cores.
- * Therefore, it does not support per-process mode.
- * Also, it does not support event sampling mode.
- */
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EINVAL;
-
- if (event->cpu < 0)
- return -EINVAL;
-
tx2_pmu = pmu_to_tx2_pmu(event->pmu);
if (tx2_pmu->cpu >= nr_cpu_ids)
return -EINVAL;
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 5e80ae0e692d..408e69533e7a 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -878,20 +878,6 @@ static int xgene_perf_event_init(struct perf_event *event)
struct xgene_pmu_dev *pmu_dev = to_pmu_dev(event->pmu);
struct hw_perf_event *hw = &event->hw;
- /* Test the event attr type check for PMU enumeration */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
- /*
- * SOC PMU counters are shared across all cores.
- * Therefore, it does not support per-process mode.
- * Also, it does not support event sampling mode.
- */
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EINVAL;
-
- if (event->cpu < 0)
- return -EINVAL;
/*
* Many perf core operations (eg. events rotation) operate on a
* single CPU context. This is obvious for CPU PMUs, where one
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index 38470351217b..eff369b02773 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -1791,17 +1791,10 @@ static int rapl_pmu_event_init(struct perf_event *event)
u64 cfg = event->attr.config & RAPL_EVENT_MASK;
int domain, idx;
- /* Only look at RAPL events */
- if (event->attr.type != event->pmu->type)
- return -ENOENT;
-
/* Check for supported events only */
if (!cfg || cfg >= PERF_RAPL_MAX)
return -EINVAL;
- if (event->cpu < 0)
- return -EINVAL;
-
/* Find out which Package the event belongs to */
list_for_each_entry(pos, &rapl_packages, plist) {
if (is_rp_pmu_cpu(pos, event->cpu)) {
--
2.39.2.101.g768bb238c484.dirty
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH 13/19] perf: Add helper for checking grouped events
2025-08-13 17:01 ` [PATCH 13/19] perf: Add helper for checking grouped events Robin Murphy
@ 2025-08-14 5:43 ` kernel test robot
0 siblings, 0 replies; 52+ messages in thread
From: kernel test robot @ 2025-08-14 5:43 UTC (permalink / raw)
To: Robin Murphy, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: llvm, oe-kbuild-all, linux-perf-users, linux-kernel, linux-alpha,
linux-snps-arc, linux-arm-kernel, imx, linux-csky, loongarch,
linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux,
linux-pm, linux-rockchip, dmaengine, linux-fpga, amd-gfx,
dri-devel
Hi Robin,
kernel test robot noticed the following build warnings:
[auto build test WARNING on linus/master]
[also build test WARNING on v6.17-rc1 next-20250814]
[cannot apply to perf-tools-next/perf-tools-next tip/perf/core perf-tools/perf-tools acme/perf/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Robin-Murphy/perf-arm-cmn-Fix-event-validation/20250814-010626
base: linus/master
patch link: https://lore.kernel.org/r/b05607c3ce0d3ce52de1784823ef9f6de324283c.1755096883.git.robin.murphy%40arm.com
patch subject: [PATCH 13/19] perf: Add helper for checking grouped events
config: i386-randconfig-003-20250814 (https://download.01.org/0day-ci/archive/20250814/202508141353.JZWHsrYP-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250814/202508141353.JZWHsrYP-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508141353.JZWHsrYP-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> arch/x86/events/amd/ibs.c:264:6: warning: unused variable 'ret' [-Wunused-variable]
264 | int ret;
| ^~~
1 warning generated.
vim +/ret +264 arch/x86/events/amd/ibs.c
d20610c19b4a22 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-02-05 258
b716916679e720 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-09-21 259 static int perf_ibs_init(struct perf_event *event)
b716916679e720 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-09-21 260 {
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 261 struct hw_perf_event *hwc = &event->hw;
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 262 struct perf_ibs *perf_ibs;
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 263 u64 config;
7c2128235eff99 arch/x86/events/amd/ibs.c Ravi Bangoria 2023-06-20 @264 int ret;
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 265
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 266 perf_ibs = get_ibs_pmu(event->attr.type);
2fad201fe38ff9 arch/x86/events/amd/ibs.c Ravi Bangoria 2023-05-04 267 if (!perf_ibs)
2fad201fe38ff9 arch/x86/events/amd/ibs.c Ravi Bangoria 2023-05-04 268 return -ENOENT;
2fad201fe38ff9 arch/x86/events/amd/ibs.c Ravi Bangoria 2023-05-04 269
450bbd493d436f arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2012-03-12 270 config = event->attr.config;
450bbd493d436f arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2012-03-12 271
450bbd493d436f arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2012-03-12 272 if (event->pmu != &perf_ibs->pmu)
b716916679e720 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-09-21 273 return -ENOENT;
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 274
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 275 if (config & ~perf_ibs->config_mask)
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 276 return -EINVAL;
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 277
0f9e0d7928d8e8 arch/x86/events/amd/ibs.c Namhyung Kim 2023-11-30 278 if (has_branch_stack(event))
0f9e0d7928d8e8 arch/x86/events/amd/ibs.c Namhyung Kim 2023-11-30 279 return -EOPNOTSUPP;
0f9e0d7928d8e8 arch/x86/events/amd/ibs.c Namhyung Kim 2023-11-30 280
d29e744c71673a arch/x86/events/amd/ibs.c Namhyung Kim 2024-12-03 281 /* handle exclude_{user,kernel} in the IRQ handler */
d29e744c71673a arch/x86/events/amd/ibs.c Namhyung Kim 2024-12-03 282 if (event->attr.exclude_host || event->attr.exclude_guest ||
d29e744c71673a arch/x86/events/amd/ibs.c Namhyung Kim 2024-12-03 283 event->attr.exclude_idle)
d29e744c71673a arch/x86/events/amd/ibs.c Namhyung Kim 2024-12-03 284 return -EINVAL;
d29e744c71673a arch/x86/events/amd/ibs.c Namhyung Kim 2024-12-03 285
d29e744c71673a arch/x86/events/amd/ibs.c Namhyung Kim 2024-12-03 286 if (!(event->attr.config2 & IBS_SW_FILTER_MASK) &&
d29e744c71673a arch/x86/events/amd/ibs.c Namhyung Kim 2024-12-03 287 (event->attr.exclude_kernel || event->attr.exclude_user ||
d29e744c71673a arch/x86/events/amd/ibs.c Namhyung Kim 2024-12-03 288 event->attr.exclude_hv))
d29e744c71673a arch/x86/events/amd/ibs.c Namhyung Kim 2024-12-03 289 return -EINVAL;
d29e744c71673a arch/x86/events/amd/ibs.c Namhyung Kim 2024-12-03 290
ccec93f5de464b arch/x86/events/amd/ibs.c Robin Murphy 2025-08-13 291 /*
ccec93f5de464b arch/x86/events/amd/ibs.c Robin Murphy 2025-08-13 292 * Grouping of IBS events is not possible since IBS can have only
ccec93f5de464b arch/x86/events/amd/ibs.c Robin Murphy 2025-08-13 293 * one event active at any point in time.
ccec93f5de464b arch/x86/events/amd/ibs.c Robin Murphy 2025-08-13 294 */
ccec93f5de464b arch/x86/events/amd/ibs.c Robin Murphy 2025-08-13 295 if (in_hardware_group(event))
ccec93f5de464b arch/x86/events/amd/ibs.c Robin Murphy 2025-08-13 296 return -EINVAL;
7c2128235eff99 arch/x86/events/amd/ibs.c Ravi Bangoria 2023-06-20 297
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 298 if (hwc->sample_period) {
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 299 if (config & perf_ibs->cnt_mask)
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 300 /* raw max_cnt may not be set */
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 301 return -EINVAL;
88c7bcad71c83f arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 302
b2fc7b282bf7c1 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 303 if (event->attr.freq) {
b2fc7b282bf7c1 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 304 hwc->sample_period = perf_ibs->min_period;
b2fc7b282bf7c1 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 305 } else {
88c7bcad71c83f arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 306 /* Silently mask off lower nibble. IBS hw mandates it. */
6accb9cf760804 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2012-04-02 307 hwc->sample_period &= ~0x0FULL;
b2fc7b282bf7c1 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 308 if (hwc->sample_period < perf_ibs->min_period)
b2fc7b282bf7c1 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 309 return -EINVAL;
b2fc7b282bf7c1 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 310 }
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 311 } else {
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 312 u64 period = 0;
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 313
e1e7844ced88f9 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 314 if (event->attr.freq)
e1e7844ced88f9 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 315 return -EINVAL;
e1e7844ced88f9 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 316
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 317 if (perf_ibs == &perf_ibs_op) {
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 318 period = (config & IBS_OP_MAX_CNT) << 4;
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 319 if (ibs_caps & IBS_CAPS_OPCNTEXT)
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 320 period |= config & IBS_OP_MAX_CNT_EXT_MASK;
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 321 } else {
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 322 period = (config & IBS_FETCH_MAX_CNT) << 4;
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 323 }
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 324
db98c5faf8cb35 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 325 config &= ~perf_ibs->cnt_mask;
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 326 event->attr.sample_period = period;
598bdf4fefff5a arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 327 hwc->sample_period = period;
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 328
b2fc7b282bf7c1 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-01-15 329 if (hwc->sample_period < perf_ibs->min_period)
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 330 return -EINVAL;
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 331 }
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 332
d20610c19b4a22 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-02-05 333 if (perf_ibs_ldlat_event(perf_ibs, event)) {
d20610c19b4a22 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-02-05 334 u64 ldlat = event->attr.config1 & 0xFFF;
d20610c19b4a22 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-02-05 335
d20610c19b4a22 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-02-05 336 if (ldlat < 128 || ldlat > 2048)
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 337 return -EINVAL;
d20610c19b4a22 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-02-05 338 ldlat >>= 7;
d20610c19b4a22 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-02-05 339
d20610c19b4a22 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-02-05 340 config |= (ldlat - 1) << 59;
d20610c19b4a22 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-02-05 341 config |= IBS_OP_L3MISSONLY | IBS_OP_LDLAT_EN;
d20610c19b4a22 arch/x86/events/amd/ibs.c Ravi Bangoria 2025-02-05 342 }
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 343
6accb9cf760804 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2012-04-02 344 /*
6accb9cf760804 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2012-04-02 345 * If we modify hwc->sample_period, we also need to update
6accb9cf760804 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2012-04-02 346 * hwc->last_period and hwc->period_left.
6accb9cf760804 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2012-04-02 347 */
6accb9cf760804 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2012-04-02 348 hwc->last_period = hwc->sample_period;
6accb9cf760804 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2012-04-02 349 local64_set(&hwc->period_left, hwc->sample_period);
6accb9cf760804 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2012-04-02 350
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 351 hwc->config_base = perf_ibs->msr;
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 352 hwc->config = config;
510419435c6948 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-12-15 353
b716916679e720 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-09-21 354 return 0;
b716916679e720 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-09-21 355 }
b716916679e720 arch/x86/kernel/cpu/perf_event_amd_ibs.c Robert Richter 2011-09-21 356
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 19/19] perf: Garbage-collect event_init checks
2025-08-13 17:01 ` [PATCH 19/19] perf: Garbage-collect event_init checks Robin Murphy
@ 2025-08-14 8:04 ` kernel test robot
2025-08-19 2:44 ` kernel test robot
2025-08-19 13:25 ` Robin Murphy
2 siblings, 0 replies; 52+ messages in thread
From: kernel test robot @ 2025-08-14 8:04 UTC (permalink / raw)
To: Robin Murphy, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: llvm, oe-kbuild-all, linux-perf-users, linux-kernel, linux-alpha,
linux-snps-arc, linux-arm-kernel, imx, linux-csky, loongarch,
linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux,
linux-pm, linux-rockchip, dmaengine, linux-fpga, amd-gfx,
dri-devel
Hi Robin,
kernel test robot noticed the following build warnings:
[auto build test WARNING on linus/master]
[also build test WARNING on v6.17-rc1 next-20250814]
[cannot apply to perf-tools-next/perf-tools-next tip/perf/core perf-tools/perf-tools acme/perf/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Robin-Murphy/perf-arm-cmn-Fix-event-validation/20250814-010626
base: linus/master
patch link: https://lore.kernel.org/r/ace3532a8a438a96338bf349a27636d8294c7111.1755096883.git.robin.murphy%40arm.com
patch subject: [PATCH 19/19] perf: Garbage-collect event_init checks
config: i386-randconfig-003-20250814 (https://download.01.org/0day-ci/archive/20250814/202508141524.QVgoOKMD-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250814/202508141524.QVgoOKMD-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508141524.QVgoOKMD-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> arch/x86/events/intel/uncore_snb.c:905:24: warning: unused variable 'hwc' [-Wunused-variable]
905 | struct hw_perf_event *hwc = &event->hw;
| ^~~
1 warning generated.
vim +/hwc +905 arch/x86/events/intel/uncore_snb.c
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 896
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 897 /*
9aae1780e7e81e arch/x86/events/intel/uncore_snb.c Kan Liang 2018-05-03 898 * Keep the custom event_init() function compatible with old event
9aae1780e7e81e arch/x86/events/intel/uncore_snb.c Kan Liang 2018-05-03 899 * encoding for free running counters.
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 900 */
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 901 static int snb_uncore_imc_event_init(struct perf_event *event)
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 902 {
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 903 struct intel_uncore_pmu *pmu;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 904 struct intel_uncore_box *box;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 @905 struct hw_perf_event *hwc = &event->hw;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 906 u64 cfg = event->attr.config & SNB_UNCORE_PCI_IMC_EVENT_MASK;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 907 int idx, base;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 908
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 909 pmu = uncore_event_to_pmu(event);
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 910 /* no device found for this pmu */
3f710be02ea648 arch/x86/events/intel/uncore_snb.c Kan Liang 2025-01-08 911 if (!pmu->registered)
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 912 return -ENOENT;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 913
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 914 /* check only supported bits are set */
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 915 if (event->attr.config & ~SNB_UNCORE_PCI_IMC_EVENT_MASK)
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 916 return -EINVAL;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 917
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 918 box = uncore_pmu_to_box(pmu, event->cpu);
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 919 if (!box || box->cpu < 0)
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 920 return -EINVAL;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 921
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 922 event->cpu = box->cpu;
1f2569fac6c6dd arch/x86/events/intel/uncore_snb.c Thomas Gleixner 2016-02-22 923 event->pmu_private = box;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 924
e64cd6f73ff5a7 arch/x86/events/intel/uncore_snb.c David Carrillo-Cisneros 2016-08-17 925 event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
e64cd6f73ff5a7 arch/x86/events/intel/uncore_snb.c David Carrillo-Cisneros 2016-08-17 926
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 927 event->hw.idx = -1;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 928 event->hw.last_tag = ~0ULL;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 929 event->hw.extra_reg.idx = EXTRA_REG_NONE;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 930 event->hw.branch_reg.idx = EXTRA_REG_NONE;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 931 /*
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 932 * check event is known (whitelist, determines counter)
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 933 */
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 934 switch (cfg) {
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 935 case SNB_UNCORE_PCI_IMC_DATA_READS:
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 936 base = SNB_UNCORE_PCI_IMC_DATA_READS_BASE;
9aae1780e7e81e arch/x86/events/intel/uncore_snb.c Kan Liang 2018-05-03 937 idx = UNCORE_PMC_IDX_FREERUNNING;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 938 break;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 939 case SNB_UNCORE_PCI_IMC_DATA_WRITES:
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 940 base = SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE;
9aae1780e7e81e arch/x86/events/intel/uncore_snb.c Kan Liang 2018-05-03 941 idx = UNCORE_PMC_IDX_FREERUNNING;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 942 break;
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 943 case SNB_UNCORE_PCI_IMC_GT_REQUESTS:
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 944 base = SNB_UNCORE_PCI_IMC_GT_REQUESTS_BASE;
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 945 idx = UNCORE_PMC_IDX_FREERUNNING;
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 946 break;
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 947 case SNB_UNCORE_PCI_IMC_IA_REQUESTS:
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 948 base = SNB_UNCORE_PCI_IMC_IA_REQUESTS_BASE;
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 949 idx = UNCORE_PMC_IDX_FREERUNNING;
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 950 break;
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 951 case SNB_UNCORE_PCI_IMC_IO_REQUESTS:
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 952 base = SNB_UNCORE_PCI_IMC_IO_REQUESTS_BASE;
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 953 idx = UNCORE_PMC_IDX_FREERUNNING;
24633d901ea44f arch/x86/events/intel/uncore_snb.c Vaibhav Shankar 2020-08-13 954 break;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 955 default:
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 956 return -EINVAL;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 957 }
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 958
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 959 /* must be done before validate_group */
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 960 event->hw.event_base = base;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 961 event->hw.idx = idx;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 962
8041ffd36f42d8 arch/x86/events/intel/uncore_snb.c Kan Liang 2019-02-27 963 /* Convert to standard encoding format for freerunning counters */
8041ffd36f42d8 arch/x86/events/intel/uncore_snb.c Kan Liang 2019-02-27 964 event->hw.config = ((cfg - 1) << 8) | 0x10ff;
8041ffd36f42d8 arch/x86/events/intel/uncore_snb.c Kan Liang 2019-02-27 965
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 966 /* no group validation needed, we have free running counters */
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 967
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 968 return 0;
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 969 }
92807ffdf32c38 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c Yan, Zheng 2014-07-30 970
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 19/19] perf: Garbage-collect event_init checks
2025-08-13 17:01 ` [PATCH 19/19] perf: Garbage-collect event_init checks Robin Murphy
2025-08-14 8:04 ` kernel test robot
@ 2025-08-19 2:44 ` kernel test robot
2025-08-19 17:49 ` Robin Murphy
2025-08-19 13:25 ` Robin Murphy
2 siblings, 1 reply; 52+ messages in thread
From: kernel test robot @ 2025-08-19 2:44 UTC (permalink / raw)
To: Robin Murphy
Cc: oe-lkp, lkp, linux-arm-kernel, linuxppc-dev, linux-s390,
linux-perf-users, linux-kernel, linux-rockchip, dmaengine,
linux-fpga, amd-gfx, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-pm, peterz, mingo,
will, mark.rutland, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-alpha, linux-snps-arc,
imx, linux-csky, loongarch, linux-mips, linux-sh, sparclinux,
dri-devel, linux-riscv, oliver.sang
Hello,
kernel test robot noticed "BUG:unable_to_handle_page_fault_for_address" on:
commit: 1ba20479196e5af3ebbedf9321de6b26f2a0cdd3 ("[PATCH 19/19] perf: Garbage-collect event_init checks")
url: https://github.com/intel-lab-lkp/linux/commits/Robin-Murphy/perf-arm-cmn-Fix-event-validation/20250814-010626
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 91325f31afc1026de28665cf1a7b6e157fa4d39d
patch link: https://lore.kernel.org/all/ace3532a8a438a96338bf349a27636d8294c7111.1755096883.git.robin.murphy@arm.com/
patch subject: [PATCH 19/19] perf: Garbage-collect event_init checks
in testcase: perf-sanity-tests
version:
with following parameters:
perf_compiler: clang
group: group-02
config: x86_64-rhel-9.4-bpf
compiler: gcc-12
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202508190403.33c83ece-lkp@intel.com
[ 307.132412][ T7614] BUG: unable to handle page fault for address: ffffffff8674015c
[ 307.140048][ T7614] #PF: supervisor read access in kernel mode
[ 307.145926][ T7614] #PF: error_code(0x0000) - not-present page
[ 307.151801][ T7614] PGD 819477067 P4D 819477067 PUD 819478063 PMD 1002c3063 PTE 800ffff7e48bf062
[ 307.160663][ T7614] Oops: Oops: 0000 [#1] SMP KASAN PTI
[ 307.165931][ T7614] CPU: 0 UID: 0 PID: 7614 Comm: perf Tainted: G I 6.17.0-rc1-00048-g1ba20479196e #1 PREEMPT(voluntary)
[ 307.178456][ T7614] Tainted: [I]=FIRMWARE_WORKAROUND
[ 307.183459][ T7614] Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.2.0 12/22/2016
[ 307.191609][ T7614] RIP: 0010:uncore_pmu_event_init (arch/x86/events/intel/uncore.c:141 arch/x86/events/intel/uncore.c:739) intel_uncore
[ 307.198867][ T7614] Code: c1 4c 63 ab 0c 03 00 00 4a 8d 3c ed a0 3e c8 83 e8 17 de 3a c1 4e 03 24 ed a0 3e c8 83 49 8d bc 24 fc 00 00 00 e8 a2 dc 3a c1 <45> 8b a4 24 fc 00 00 00 44 3b 25 03 3d 35 00 0f 83 5b 04 00 00 48
All code
========
0: c1 4c 63 ab 0c rorl $0xc,-0x55(%rbx,%riz,2)
5: 03 00 add (%rax),%eax
7: 00 4a 8d add %cl,-0x73(%rdx)
a: 3c ed cmp $0xed,%al
c: a0 3e c8 83 e8 17 de movabs 0xc13ade17e883c83e,%al
13: 3a c1
15: 4e 03 24 ed a0 3e c8 add -0x7c37c160(,%r13,8),%r12
1c: 83
1d: 49 8d bc 24 fc 00 00 lea 0xfc(%r12),%rdi
24: 00
25: e8 a2 dc 3a c1 call 0xffffffffc13adccc
2a:* 45 8b a4 24 fc 00 00 mov 0xfc(%r12),%r12d <-- trapping instruction
31: 00
32: 44 3b 25 03 3d 35 00 cmp 0x353d03(%rip),%r12d # 0x353d3c
39: 0f 83 5b 04 00 00 jae 0x49a
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 45 8b a4 24 fc 00 00 mov 0xfc(%r12),%r12d
7: 00
8: 44 3b 25 03 3d 35 00 cmp 0x353d03(%rip),%r12d # 0x353d12
f: 0f 83 5b 04 00 00 jae 0x470
15: 48 rex.W
[ 307.218475][ T7614] RSP: 0018:ffff8881b30ef8d8 EFLAGS: 00010246
[ 307.224450][ T7614] RAX: 0000000000000000 RBX: ffff8881193547b8 RCX: dffffc0000000000
[ 307.232353][ T7614] RDX: 0000000000000007 RSI: ffffffffc05230ae RDI: ffffffff8674015c
[ 307.240255][ T7614] RBP: ffff88810468d000 R08: 0000000000000000 R09: fffffbfff0ae31b4
[ 307.248151][ T7614] R10: ffffffff85718da7 R11: 0000000067e9e64c R12: ffffffff86740060
[ 307.256042][ T7614] R13: ffffffffffffffff R14: ffff888119354890 R15: ffffffff81727da9
[ 307.263933][ T7614] FS: 00007f54bdb88880(0000) GS:ffff8887a24e8000(0000) knlGS:0000000000000000
[ 307.272787][ T7614] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 307.279279][ T7614] CR2: ffffffff8674015c CR3: 00000002e3e06003 CR4: 00000000003726f0
[ 307.287168][ T7614] Call Trace:
[ 307.290337][ T7614] <TASK>
[ 307.293157][ T7614] ? perf_init_event (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 include/linux/rcupdate.h:1155 kernel/events/core.c:12690)
[ 307.298005][ T7614] perf_try_init_event (kernel/events/core.c:12579)
[ 307.303538][ T7614] ? perf_init_event (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 include/linux/rcupdate.h:1155 kernel/events/core.c:12690)
[ 307.308370][ T7614] perf_init_event (kernel/events/core.c:12697)
[ 307.313031][ T7614] perf_event_alloc (kernel/events/core.c:12972)
[ 307.317862][ T7614] ? __pfx_perf_event_output_forward (kernel/events/core.c:8496)
[ 307.323919][ T7614] ? __lock_release+0x5d/0x160
[ 307.329194][ T7614] __do_sys_perf_event_open (kernel/events/core.c:13492)
[ 307.334732][ T7614] ? __pfx___do_sys_perf_event_open (kernel/events/core.c:13374)
[ 307.340702][ T7614] ? trace_contention_end (include/trace/events/lock.h:122 (discriminator 21))
[ 307.345808][ T7614] ? lock_acquire (kernel/locking/lockdep.c:470 kernel/locking/lockdep.c:5870 kernel/locking/lockdep.c:5825)
[ 307.350379][ T7614] ? find_held_lock (kernel/locking/lockdep.c:5350)
[ 307.354947][ T7614] ? rcu_is_watching (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
[ 307.359623][ T7614] do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
[ 307.364020][ T7614] ? __do_sys_perf_event_open (include/linux/srcu.h:167 include/linux/srcu.h:375 include/linux/srcu.h:479 kernel/events/core.c:13454)
[ 307.369726][ T7614] ? __lock_release+0x5d/0x160
[ 307.375006][ T7614] ? __do_sys_perf_event_open (include/linux/srcu.h:167 include/linux/srcu.h:375 include/linux/srcu.h:479 kernel/events/core.c:13454)
[ 307.380713][ T7614] ? lock_release (kernel/locking/lockdep.c:470 kernel/locking/lockdep.c:5891)
[ 307.385194][ T7614] ? __srcu_read_unlock (kernel/rcu/srcutree.c:770)
[ 307.390112][ T7614] ? __do_sys_perf_event_open (include/linux/srcu.h:377 include/linux/srcu.h:479 kernel/events/core.c:13454)
[ 307.395823][ T7614] ? __pfx___do_sys_perf_event_open (kernel/events/core.c:13374)
[ 307.401798][ T7614] ? rcu_is_watching (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
[ 307.406455][ T7614] ? trace_irq_enable+0xac/0xe0
[ 307.412248][ T7614] ? rcu_is_watching (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
[ 307.416904][ T7614] ? trace_irq_enable+0xac/0xe0
[ 307.422698][ T7614] ? rcu_is_watching (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
[ 307.427355][ T7614] ? trace_irq_enable+0xac/0xe0
[ 307.433149][ T7614] ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
[ 307.437808][ T7614] ? handle_mm_fault (include/linux/rcupdate.h:341 include/linux/rcupdate.h:871 include/linux/memcontrol.h:981 include/linux/memcontrol.h:987 mm/memory.c:6229 mm/memory.c:6390)
[ 307.442652][ T7614] ? __lock_release+0x5d/0x160
[ 307.447923][ T7614] ? find_held_lock (kernel/locking/lockdep.c:5350)
[ 307.452491][ T7614] ? rcu_is_watching (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
[ 307.457151][ T7614] ? trace_irq_enable+0xac/0xe0
[ 307.462954][ T7614] ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
[ 307.467631][ T7614] ? lock_release (kernel/locking/lockdep.c:470 kernel/locking/lockdep.c:5891)
[ 307.472122][ T7614] ? do_user_addr_fault (arch/x86/include/asm/atomic.h:93 include/linux/atomic/atomic-arch-fallback.h:949 include/linux/atomic/atomic-instrumented.h:401 include/linux/refcount.h:389 include/linux/refcount.h:432 include/linux/mmap_lock.h:143 include/linux/mmap_lock.h:267 arch/x86/mm/fault.c:1338)
[ 307.477225][ T7614] ? rcu_is_watching (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
[ 307.481892][ T7614] ? trace_irq_enable+0xac/0xe0
[ 307.487692][ T7614] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4351 kernel/locking/lockdep.c:4410)
[ 307.493487][ T7614] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 307.499281][ T7614] RIP: 0033:0x7f54c9b4d719
[ 307.503585][ T7614] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8 64 89 01 48
All code
========
0: 08 89 e8 5b 5d c3 or %cl,-0x3ca2a418(%rcx)
6: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
d: 00 00 00
10: 90 nop
11: 48 89 f8 mov %rdi,%rax
14: 48 89 f7 mov %rsi,%rdi
17: 48 89 d6 mov %rdx,%rsi
1a: 48 89 ca mov %rcx,%rdx
1d: 4d 89 c2 mov %r8,%r10
20: 4d 89 c8 mov %r9,%r8
23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 ret
33: 48 8b 0d b7 06 0d 00 mov 0xd06b7(%rip),%rcx # 0xd06f1
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 ret
9: 48 8b 0d b7 06 0d 00 mov 0xd06b7(%rip),%rcx # 0xd06c7
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250819/202508190403.33c83ece-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 18/19] perf: Introduce positive capability for raw events
2025-08-13 17:01 ` [PATCH 18/19] perf: Introduce positive capability for raw events Robin Murphy
@ 2025-08-19 13:15 ` Robin Murphy
2025-08-20 8:09 ` Thomas Richter
2025-08-21 2:53 ` kernel test robot
2025-08-26 13:43 ` Mark Rutland
2 siblings, 1 reply; 52+ messages in thread
From: Robin Murphy @ 2025-08-19 13:15 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
On 13/08/2025 6:01 pm, Robin Murphy wrote:
> Only a handful of CPU PMUs accept PERF_TYPE_{RAW,HARDWARE,HW_CACHE}
> events without registering themselves as PERF_TYPE_RAW in the first
> place. Add an explicit opt-in for these special cases, so that we can
> make life easier for every other driver (and probably also speed up the
> slow-path search) by having perf_try_init_event() do the basic type
> checking to cover the majority of cases.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>
> A further possibility is to automatically add the cap to PERF_TYPE_RAW
> PMUs in perf_pmu_register() to have a single point-of-use condition; I'm
> undecided...
> ---
> arch/s390/kernel/perf_cpum_cf.c | 1 +
> arch/s390/kernel/perf_pai_crypto.c | 2 +-
> arch/s390/kernel/perf_pai_ext.c | 2 +-
> arch/x86/events/core.c | 2 +-
> drivers/perf/arm_pmu.c | 1 +
> include/linux/perf_event.h | 1 +
> kernel/events/core.c | 15 +++++++++++++++
> 7 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
> index 1a94e0944bc5..782ab755ddd4 100644
> --- a/arch/s390/kernel/perf_cpum_cf.c
> +++ b/arch/s390/kernel/perf_cpum_cf.c
> @@ -1054,6 +1054,7 @@ static void cpumf_pmu_del(struct perf_event *event, int flags)
> /* Performance monitoring unit for s390x */
> static struct pmu cpumf_pmu = {
> .task_ctx_nr = perf_sw_context,
> + .capabilities = PERF_PMU_CAP_RAW_EVENTS,
> .pmu_enable = cpumf_pmu_enable,
> .pmu_disable = cpumf_pmu_disable,
> .event_init = cpumf_pmu_event_init,
> diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c
> index a64b6b056a21..b5b6d8b5d943 100644
> --- a/arch/s390/kernel/perf_pai_crypto.c
> +++ b/arch/s390/kernel/perf_pai_crypto.c
> @@ -569,7 +569,7 @@ static const struct attribute_group *paicrypt_attr_groups[] = {
> /* Performance monitoring unit for mapped counters */
> static struct pmu paicrypt = {
> .task_ctx_nr = perf_hw_context,
> - .capabilities = PERF_PMU_CAP_SAMPLING,
> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
> .event_init = paicrypt_event_init,
> .add = paicrypt_add,
> .del = paicrypt_del,
> diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c
> index 1261f80c6d52..bcd28c38da70 100644
> --- a/arch/s390/kernel/perf_pai_ext.c
> +++ b/arch/s390/kernel/perf_pai_ext.c
> @@ -595,7 +595,7 @@ static const struct attribute_group *paiext_attr_groups[] = {
> /* Performance monitoring unit for mapped counters */
> static struct pmu paiext = {
> .task_ctx_nr = perf_hw_context,
> - .capabilities = PERF_PMU_CAP_SAMPLING,
> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
> .event_init = paiext_event_init,
> .add = paiext_add,
> .del = paiext_del,
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 789dfca2fa67..764728bb80ae 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2697,7 +2697,7 @@ static bool x86_pmu_filter(struct pmu *pmu, int cpu)
> }
>
> static struct pmu pmu = {
> - .capabilities = PERF_PMU_CAP_SAMPLING,
> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
>
> .pmu_enable = x86_pmu_enable,
> .pmu_disable = x86_pmu_disable,
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 72d8f38d0aa5..bc772a3bf411 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -877,6 +877,7 @@ struct arm_pmu *armpmu_alloc(void)
> * specific PMU.
> */
> .capabilities = PERF_PMU_CAP_SAMPLING |
> + PERF_PMU_CAP_RAW_EVENTS |
> PERF_PMU_CAP_EXTENDED_REGS |
> PERF_PMU_CAP_EXTENDED_HW_TYPE,
> };
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 183b7c48b329..c6ad036c0037 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -305,6 +305,7 @@ struct perf_event_pmu_context;
> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
> +#define PERF_PMU_CAP_RAW_EVENTS 0x0800
>
> /**
> * pmu::scope
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 71b2a6730705..2ecee76d2ae2 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -12556,11 +12556,26 @@ static inline bool has_extended_regs(struct perf_event *event)
> (event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK);
> }
>
> +static bool is_raw_pmu(const struct pmu *pmu)
> +{
> + return pmu->type == PERF_TYPE_RAW ||
> + pmu->capabilities & PERF_PMU_CAP_RAW_EVENTS;
> +}
> +
> static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
> {
> struct perf_event_context *ctx = NULL;
> int ret;
>
> + /*
> + * Before touching anything, we can safely skip:
> + * - any event for a specific PMU which is not this one
> + * - any common event if this PMU doesn't support them
> + */
> + if (event->attr.type != pmu->type &&
> + (event->attr.type >= PERF_TYPE_MAX || is_raw_pmu(pmu)))
Ah, that should be "!is_raw_pmu(pmu)" there (although it's not entirely
the cause of the LKP report on the final patch.)
Thanks,
Robin.
> + return -ENOENT;
> +
> if (!try_module_get(pmu->module))
> return -ENODEV;
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 19/19] perf: Garbage-collect event_init checks
2025-08-13 17:01 ` [PATCH 19/19] perf: Garbage-collect event_init checks Robin Murphy
2025-08-14 8:04 ` kernel test robot
2025-08-19 2:44 ` kernel test robot
@ 2025-08-19 13:25 ` Robin Murphy
2 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-19 13:25 UTC (permalink / raw)
To: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
On 13/08/2025 6:01 pm, Robin Murphy wrote:
[...]
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 297ff5adb667..98ffab403bb4 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -731,24 +731,11 @@ static int uncore_pmu_event_init(struct perf_event *event)
> struct hw_perf_event *hwc = &event->hw;
> int ret;
>
> - if (event->attr.type != event->pmu->type)
> - return -ENOENT;
> -
> pmu = uncore_event_to_pmu(event);
> /* no device found for this pmu */
> if (!pmu->registered)
> return -ENOENT;
>
> - /* Sampling not supported yet */
> - if (hwc->sample_period)
> - return -EINVAL;
> -
> - /*
> - * Place all uncore events for a particular physical package
> - * onto a single cpu
> - */
> - if (event->cpu < 0)
> - return -EINVAL;
Oopsie, I missed that this isn't just the usual boilerplate as the
comment kind of implies, but is also necessary to prevent the
uncore_pmu_to_box() lookup going wrong (since the core code won't reject
a task-bound event until later). I'll put this back with an updated
comment for v2 (and double-check everything else again...), thanks LKP!
Robin.
> box = uncore_pmu_to_box(pmu, event->cpu);
> if (!box || box->cpu < 0)
> return -EINVAL;
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 19/19] perf: Garbage-collect event_init checks
2025-08-19 2:44 ` kernel test robot
@ 2025-08-19 17:49 ` Robin Murphy
0 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-19 17:49 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, linux-arm-kernel, linuxppc-dev, linux-s390,
linux-perf-users, linux-kernel, linux-rockchip, dmaengine,
linux-fpga, amd-gfx, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-pm, peterz, mingo,
will, mark.rutland, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-alpha, linux-snps-arc,
imx, linux-csky, loongarch, linux-mips, linux-sh, sparclinux,
dri-devel, linux-riscv
On 19/08/2025 3:44 am, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "BUG:unable_to_handle_page_fault_for_address" on:
>
> commit: 1ba20479196e5af3ebbedf9321de6b26f2a0cdd3 ("[PATCH 19/19] perf: Garbage-collect event_init checks")
> url: https://github.com/intel-lab-lkp/linux/commits/Robin-Murphy/perf-arm-cmn-Fix-event-validation/20250814-010626
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 91325f31afc1026de28665cf1a7b6e157fa4d39d
> patch link: https://lore.kernel.org/all/ace3532a8a438a96338bf349a27636d8294c7111.1755096883.git.robin.murphy@arm.com/
> patch subject: [PATCH 19/19] perf: Garbage-collect event_init checks
OK, after looking a bit more deeply at x86 and PowerPC, I think it
probably is nicest to solve this commonly too. Below is what I've cooked
up for a v2 (I'll save reposting the whole series this soon...)
Thanks,
Robin.
----->8-----
Subject: [PATCH 18.5/19] perf: Add common uncore-CPU check
Many uncore drivers depend on event->cpu being valid in order to look
up various data in their event_init call. Since we've now factored out
common PMU identification, we can factor out this check in the correct
order too. While it might technically be possible to hoist the general
task/cgroup check up here now, that would be horribly messy, so for
clarity let's keep these as distinct (albeit related) concerns.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202508190403.33c83ece-lkp@intel.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
kernel/events/core.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5f7eb526d87c..ddf045ad4d83 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12562,6 +12562,11 @@ static bool is_raw_pmu(const struct pmu *pmu)
pmu->capabilities & PERF_PMU_CAP_RAW_EVENTS;
}
+static bool is_uncore_pmu(const struct pmu *pmu)
+{
+ return pmu->task_ctx_nr == perf_invalid_context;
+}
+
static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
{
struct perf_event_context *ctx = NULL;
@@ -12571,11 +12576,16 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
* Before touching anything, we can safely skip:
* - any event for a specific PMU which is not this one
* - any common event if this PMU doesn't support them
+ * - non-CPU-bound uncore events (so drivers can assume event->cpu is
+ * valid; we'll check the actual task/cgroup attach state later)
*/
if (event->attr.type != pmu->type &&
(event->attr.type >= PERF_TYPE_MAX || !is_raw_pmu(pmu)))
return -ENOENT;
+ if (is_uncore_pmu(pmu) && event->cpu < 0)
+ return -EINVAL;
+
if (!try_module_get(pmu->module))
return -ENODEV;
@@ -12990,7 +13000,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
* events (they don't make sense as the cgroup will be different
* on other CPUs in the uncore mask).
*/
- if (pmu->task_ctx_nr == perf_invalid_context && (task || cgroup_fd != -1))
+ if (is_uncore_pmu(pmu) && (task || cgroup_fd != -1))
return ERR_PTR(-EINVAL);
if (event->attr.aux_output &&
--
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH 18/19] perf: Introduce positive capability for raw events
2025-08-19 13:15 ` Robin Murphy
@ 2025-08-20 8:09 ` Thomas Richter
2025-08-20 11:39 ` Robin Murphy
0 siblings, 1 reply; 52+ messages in thread
From: Thomas Richter @ 2025-08-20 8:09 UTC (permalink / raw)
To: Robin Murphy, peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
On 8/19/25 15:15, Robin Murphy wrote:
> On 13/08/2025 6:01 pm, Robin Murphy wrote:
>> Only a handful of CPU PMUs accept PERF_TYPE_{RAW,HARDWARE,HW_CACHE}
>> events without registering themselves as PERF_TYPE_RAW in the first
>> place. Add an explicit opt-in for these special cases, so that we can
>> make life easier for every other driver (and probably also speed up the
>> slow-path search) by having perf_try_init_event() do the basic type
>> checking to cover the majority of cases.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> A further possibility is to automatically add the cap to PERF_TYPE_RAW
>> PMUs in perf_pmu_register() to have a single point-of-use condition; I'm
>> undecided...
>> ---
>> arch/s390/kernel/perf_cpum_cf.c | 1 +
>> arch/s390/kernel/perf_pai_crypto.c | 2 +-
>> arch/s390/kernel/perf_pai_ext.c | 2 +-
>> arch/x86/events/core.c | 2 +-
>> drivers/perf/arm_pmu.c | 1 +
>> include/linux/perf_event.h | 1 +
>> kernel/events/core.c | 15 +++++++++++++++
>> 7 files changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
>> index 1a94e0944bc5..782ab755ddd4 100644
>> --- a/arch/s390/kernel/perf_cpum_cf.c
>> +++ b/arch/s390/kernel/perf_cpum_cf.c
>> @@ -1054,6 +1054,7 @@ static void cpumf_pmu_del(struct perf_event *event, int flags)
>> /* Performance monitoring unit for s390x */
>> static struct pmu cpumf_pmu = {
>> .task_ctx_nr = perf_sw_context,
>> + .capabilities = PERF_PMU_CAP_RAW_EVENTS,
>> .pmu_enable = cpumf_pmu_enable,
>> .pmu_disable = cpumf_pmu_disable,
>> .event_init = cpumf_pmu_event_init,
>> diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c
>> index a64b6b056a21..b5b6d8b5d943 100644
>> --- a/arch/s390/kernel/perf_pai_crypto.c
>> +++ b/arch/s390/kernel/perf_pai_crypto.c
>> @@ -569,7 +569,7 @@ static const struct attribute_group *paicrypt_attr_groups[] = {
>> /* Performance monitoring unit for mapped counters */
>> static struct pmu paicrypt = {
>> .task_ctx_nr = perf_hw_context,
>> - .capabilities = PERF_PMU_CAP_SAMPLING,
>> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
>> .event_init = paicrypt_event_init,
>> .add = paicrypt_add,
>> .del = paicrypt_del,
>> diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c
>> index 1261f80c6d52..bcd28c38da70 100644
>> --- a/arch/s390/kernel/perf_pai_ext.c
>> +++ b/arch/s390/kernel/perf_pai_ext.c
>> @@ -595,7 +595,7 @@ static const struct attribute_group *paiext_attr_groups[] = {
>> /* Performance monitoring unit for mapped counters */
>> static struct pmu paiext = {
>> .task_ctx_nr = perf_hw_context,
>> - .capabilities = PERF_PMU_CAP_SAMPLING,
>> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
>> .event_init = paiext_event_init,
>> .add = paiext_add,
>> .del = paiext_del,
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index 789dfca2fa67..764728bb80ae 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -2697,7 +2697,7 @@ static bool x86_pmu_filter(struct pmu *pmu, int cpu)
>> }
>> static struct pmu pmu = {
>> - .capabilities = PERF_PMU_CAP_SAMPLING,
>> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
>> .pmu_enable = x86_pmu_enable,
>> .pmu_disable = x86_pmu_disable,
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 72d8f38d0aa5..bc772a3bf411 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -877,6 +877,7 @@ struct arm_pmu *armpmu_alloc(void)
>> * specific PMU.
>> */
>> .capabilities = PERF_PMU_CAP_SAMPLING |
>> + PERF_PMU_CAP_RAW_EVENTS |
>> PERF_PMU_CAP_EXTENDED_REGS |
>> PERF_PMU_CAP_EXTENDED_HW_TYPE,
>> };
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 183b7c48b329..c6ad036c0037 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -305,6 +305,7 @@ struct perf_event_pmu_context;
>> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
>> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
>> #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
>> +#define PERF_PMU_CAP_RAW_EVENTS 0x0800
>> /**
>> * pmu::scope
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 71b2a6730705..2ecee76d2ae2 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -12556,11 +12556,26 @@ static inline bool has_extended_regs(struct perf_event *event)
>> (event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK);
>> }
>> +static bool is_raw_pmu(const struct pmu *pmu)
>> +{
>> + return pmu->type == PERF_TYPE_RAW ||
>> + pmu->capabilities & PERF_PMU_CAP_RAW_EVENTS;
>> +}
>> +
>> static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
>> {
>> struct perf_event_context *ctx = NULL;
>> int ret;
>> + /*
>> + * Before touching anything, we can safely skip:
>> + * - any event for a specific PMU which is not this one
>> + * - any common event if this PMU doesn't support them
>> + */
>> + if (event->attr.type != pmu->type &&
>> + (event->attr.type >= PERF_TYPE_MAX || is_raw_pmu(pmu)))
>
> Ah, that should be "!is_raw_pmu(pmu)" there (although it's not entirely the cause of the LKP report on the final patch.)
>
> Thanks,
> Robin.
>
>> + return -ENOENT;
>> +
>> if (!try_module_get(pmu->module))
>> return -ENODEV;
>>
>
>
Hi Robin,
what is the intention of that patch?
Can you explain that a bit more.
Thanks.
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 18/19] perf: Introduce positive capability for raw events
2025-08-20 8:09 ` Thomas Richter
@ 2025-08-20 11:39 ` Robin Murphy
0 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-20 11:39 UTC (permalink / raw)
To: Thomas Richter, peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Cc: linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
Hi Thomas,
On 2025-08-20 9:09 am, Thomas Richter wrote:
> On 8/19/25 15:15, Robin Murphy wrote:
>> On 13/08/2025 6:01 pm, Robin Murphy wrote:
>>> Only a handful of CPU PMUs accept PERF_TYPE_{RAW,HARDWARE,HW_CACHE}
>>> events without registering themselves as PERF_TYPE_RAW in the first
>>> place. Add an explicit opt-in for these special cases, so that we can
>>> make life easier for every other driver (and probably also speed up the
>>> slow-path search) by having perf_try_init_event() do the basic type
>>> checking to cover the majority of cases.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>
>>> A further possibility is to automatically add the cap to PERF_TYPE_RAW
>>> PMUs in perf_pmu_register() to have a single point-of-use condition; I'm
>>> undecided...
>>> ---
>>> arch/s390/kernel/perf_cpum_cf.c | 1 +
>>> arch/s390/kernel/perf_pai_crypto.c | 2 +-
>>> arch/s390/kernel/perf_pai_ext.c | 2 +-
>>> arch/x86/events/core.c | 2 +-
>>> drivers/perf/arm_pmu.c | 1 +
>>> include/linux/perf_event.h | 1 +
>>> kernel/events/core.c | 15 +++++++++++++++
>>> 7 files changed, 21 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
>>> index 1a94e0944bc5..782ab755ddd4 100644
>>> --- a/arch/s390/kernel/perf_cpum_cf.c
>>> +++ b/arch/s390/kernel/perf_cpum_cf.c
>>> @@ -1054,6 +1054,7 @@ static void cpumf_pmu_del(struct perf_event *event, int flags)
>>> /* Performance monitoring unit for s390x */
>>> static struct pmu cpumf_pmu = {
>>> .task_ctx_nr = perf_sw_context,
>>> + .capabilities = PERF_PMU_CAP_RAW_EVENTS,
>>> .pmu_enable = cpumf_pmu_enable,
>>> .pmu_disable = cpumf_pmu_disable,
>>> .event_init = cpumf_pmu_event_init,
>>> diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c
>>> index a64b6b056a21..b5b6d8b5d943 100644
>>> --- a/arch/s390/kernel/perf_pai_crypto.c
>>> +++ b/arch/s390/kernel/perf_pai_crypto.c
>>> @@ -569,7 +569,7 @@ static const struct attribute_group *paicrypt_attr_groups[] = {
>>> /* Performance monitoring unit for mapped counters */
>>> static struct pmu paicrypt = {
>>> .task_ctx_nr = perf_hw_context,
>>> - .capabilities = PERF_PMU_CAP_SAMPLING,
>>> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
>>> .event_init = paicrypt_event_init,
>>> .add = paicrypt_add,
>>> .del = paicrypt_del,
>>> diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c
>>> index 1261f80c6d52..bcd28c38da70 100644
>>> --- a/arch/s390/kernel/perf_pai_ext.c
>>> +++ b/arch/s390/kernel/perf_pai_ext.c
>>> @@ -595,7 +595,7 @@ static const struct attribute_group *paiext_attr_groups[] = {
>>> /* Performance monitoring unit for mapped counters */
>>> static struct pmu paiext = {
>>> .task_ctx_nr = perf_hw_context,
>>> - .capabilities = PERF_PMU_CAP_SAMPLING,
>>> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
>>> .event_init = paiext_event_init,
>>> .add = paiext_add,
>>> .del = paiext_del,
>>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>>> index 789dfca2fa67..764728bb80ae 100644
>>> --- a/arch/x86/events/core.c
>>> +++ b/arch/x86/events/core.c
>>> @@ -2697,7 +2697,7 @@ static bool x86_pmu_filter(struct pmu *pmu, int cpu)
>>> }
>>> static struct pmu pmu = {
>>> - .capabilities = PERF_PMU_CAP_SAMPLING,
>>> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
>>> .pmu_enable = x86_pmu_enable,
>>> .pmu_disable = x86_pmu_disable,
>>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>>> index 72d8f38d0aa5..bc772a3bf411 100644
>>> --- a/drivers/perf/arm_pmu.c
>>> +++ b/drivers/perf/arm_pmu.c
>>> @@ -877,6 +877,7 @@ struct arm_pmu *armpmu_alloc(void)
>>> * specific PMU.
>>> */
>>> .capabilities = PERF_PMU_CAP_SAMPLING |
>>> + PERF_PMU_CAP_RAW_EVENTS |
>>> PERF_PMU_CAP_EXTENDED_REGS |
>>> PERF_PMU_CAP_EXTENDED_HW_TYPE,
>>> };
>>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>>> index 183b7c48b329..c6ad036c0037 100644
>>> --- a/include/linux/perf_event.h
>>> +++ b/include/linux/perf_event.h
>>> @@ -305,6 +305,7 @@ struct perf_event_pmu_context;
>>> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
>>> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
>>> #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
>>> +#define PERF_PMU_CAP_RAW_EVENTS 0x0800
>>> /**
>>> * pmu::scope
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index 71b2a6730705..2ecee76d2ae2 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -12556,11 +12556,26 @@ static inline bool has_extended_regs(struct perf_event *event)
>>> (event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK);
>>> }
>>> +static bool is_raw_pmu(const struct pmu *pmu)
>>> +{
>>> + return pmu->type == PERF_TYPE_RAW ||
>>> + pmu->capabilities & PERF_PMU_CAP_RAW_EVENTS;
>>> +}
>>> +
>>> static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
>>> {
>>> struct perf_event_context *ctx = NULL;
>>> int ret;
>>> + /*
>>> + * Before touching anything, we can safely skip:
>>> + * - any event for a specific PMU which is not this one
>>> + * - any common event if this PMU doesn't support them
>>> + */
>>> + if (event->attr.type != pmu->type &&
>>> + (event->attr.type >= PERF_TYPE_MAX || is_raw_pmu(pmu)))
>>
>> Ah, that should be "!is_raw_pmu(pmu)" there (although it's not entirely the cause of the LKP report on the final patch.)
>>
>> Thanks,
>> Robin.
>>
>>> + return -ENOENT;
>>> +
>>> if (!try_module_get(pmu->module))
>>> return -ENODEV;
>>>
>>
>>
>
> Hi Robin,
>
> what is the intention of that patch?
> Can you explain that a bit more.
The background here is that, in this context, we essentially have 3
distinct categories of PMU driver:
- Older/simpler CPU PMUs which register as PERF_TYPE_RAW and accept
raw/hardware events
- Newer/heterogeneous CPU PMUs which register as a dynamic type, and
accept both raw/hardware events and events of their own type
- Other (mostly uncore) PMUs which only accept events of their own type
These days that third one is by far the majority, so it seems
increasingly unreasonable and inefficient to always offer every kind of
event to every driver, and so force nearly all of them to have the same
boilerplate code to refuse events they don't want. The core code is
already in a position to be able to assume that a PERF_TYPE_RAW PMU
wants "raw" events and a typed PMU wants its own events, so the only
actual new thing we need is a way to discern the 5 drivers in the middle
category - where s390 dominates :) - from the rest in the third.
The way the check itself ends up structured is that the only time we'll
now offer an event to a driver of a different type is if it's a "raw"
event and the driver has asked to be offered them (either by registering
as PERF_TYPE_RAW or with the new cap). Otherwise we can safely assume
that this PMU won't want this event, and so skip straight to trying the
next one. We can get away with the single PERF_TYPE_MAX check for all
"raw" events, since the drivers which do handle them already have to
consider the exact type to discern between RAW/HARDWARE/HW_CACHE, and
thus must reject SOFTWARE/TRACEPOINT/BREAKPOINT events anyway, but I
could of course make that more specific if people prefer. Conversely,
since the actual software/tracepoint/breakpoint PMUs won't pass the
is_raw_pmu() check either, and thus will only be given their own events,
I could remove the type checking from their event_init routines as well,
but I thought that might be perhaps a little too subtle as-is.
BTW if the s390 drivers are intended to coexist then I'm not sure they
actually handle sharing PERF_TYPE_RAW events very well - what happens to
any particular event seems ultimately largely dependent on the order in
which the drivers happen to register - but that's a pre-existing issue
and this series shouldn't change anything in that respect. (As it
similarly shouldn't affect the trick of the first matching driver
rewriting the event type to "forward" it to another driver later in the
list.)
Thanks,
Robin.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 18/19] perf: Introduce positive capability for raw events
2025-08-13 17:01 ` [PATCH 18/19] perf: Introduce positive capability for raw events Robin Murphy
2025-08-19 13:15 ` Robin Murphy
@ 2025-08-21 2:53 ` kernel test robot
2025-08-26 13:43 ` Mark Rutland
2 siblings, 0 replies; 52+ messages in thread
From: kernel test robot @ 2025-08-21 2:53 UTC (permalink / raw)
To: Robin Murphy
Cc: oe-lkp, lkp, linux-s390, linux-perf-users, linux-kernel,
linux-arm-kernel, peterz, mingo, will, mark.rutland, acme,
namhyung, alexander.shishkin, jolsa, irogers, adrian.hunter,
kan.liang, linux-alpha, linux-snps-arc, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-sh, sparclinux,
linux-pm, linux-rockchip, dmaengine, linux-fpga, amd-gfx,
dri-devel, intel-gfx, intel-xe, coresight, iommu, linux-amlogic,
linux-cxl, linux-arm-msm, linux-riscv, oliver.sang
Hello,
kernel test robot noticed "perf-sanity-tests.Event_groups.fail" on:
commit: a704f7a13544a408baee6fa78f0f24fa05bfa406 ("[PATCH 18/19] perf: Introduce positive capability for raw events")
url: https://github.com/intel-lab-lkp/linux/commits/Robin-Murphy/perf-arm-cmn-Fix-event-validation/20250814-010626
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 91325f31afc1026de28665cf1a7b6e157fa4d39d
patch link: https://lore.kernel.org/all/542787fd188ea15ef41c53d557989c962ed44771.1755096883.git.robin.murphy@arm.com/
patch subject: [PATCH 18/19] perf: Introduce positive capability for raw events
in testcase: perf-sanity-tests
version:
with following parameters:
perf_compiler: gcc
group: group-01
config: x86_64-rhel-9.4-bpf
compiler: gcc-12
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202508211037.3f897218-lkp@intel.com
besides Event_groups, we also noticed perf_stat_JSON_output_linter and
perf_stat_STD_output_linter become failed upon this commit but pass on parent.
0129bbf0ee6f109a a704f7a13544a408baee6fa78f0
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:38 16% 6:6 perf-sanity-tests.Event_groups.fail
:38 16% 6:6 perf-sanity-tests.perf_stat_JSON_output_linter.fail
:38 16% 6:6 perf-sanity-tests.perf_stat_STD_output_linter.fail
2025-08-18 13:20:21 sudo /usr/src/linux-perf-x86_64-rhel-9.4-bpf-a704f7a13544a408baee6fa78f0f24fa05bfa406/tools/perf/perf test 66 -v
66: Event groups : Running (1 active)
--- start ---
test child forked, pid 9619
Using CPUID GenuineIntel-6-55-B
Using uncore_imc_0 for uncore pmu event
0x0 0x0, 0x0 0x0, 0x0 0x1: Fail
0x0 0x0, 0x0 0x0, 0x1 0x3: Fail
0x0 0x0, 0x0 0x0, 0xf 0x1: Fail
0x0 0x0, 0x1 0x3, 0x0 0x0: Fail
0x0 0x0, 0x1 0x3, 0x1 0x3: Fail
0x0 0x0, 0x1 0x3, 0xf 0x1: Fail
0x0 0x0, 0xf 0x1, 0x0 0x0: Fail
0x0 0x0, 0xf 0x1, 0x1 0x3: Fail
0x0 0x0, 0xf 0x1, 0xf 0x1: Fail
0x1 0x3, 0x0 0x0, 0x0 0x0: Fail
0x1 0x3, 0x0 0x0, 0x1 0x3: Fail
0x1 0x3, 0x0 0x0, 0xf 0x1: Pass
0x1 0x3, 0x1 0x3, 0x0 0x0: Fail
0x1 0x3, 0x1 0x3, 0x1 0x3: Pass
0x1 0x3, 0x1 0x3, 0xf 0x1: Pass
0x1 0x3, 0xf 0x1, 0x0 0x0: Pass
0x1 0x3, 0xf 0x1, 0x1 0x3: Pass
0x1 0x3, 0xf 0x1, 0xf 0x1: Pass
0xf 0x1, 0x0 0x0, 0x0 0x0: Pass
0xf 0x1, 0x0 0x0, 0x1 0x3: Pass
0xf 0x1, 0x0 0x0, 0xf 0x1: Pass
0xf 0x1, 0x1 0x3, 0x0 0x0: Pass
0xf 0x1, 0x1 0x3, 0x1 0x3: Pass
0xf 0x1, 0x1 0x3, 0xf 0x1: Pass
0xf 0x1, 0xf 0x1, 0x0 0x0: Pass
0xf 0x1, 0xf 0x1, 0x1 0x3: Pass
0xf 0x1, 0xf 0x1, 0xf 0x1: Pass
---- end(-1) ----
66: Event groups : FAILED!
...
2025-08-18 13:29:36 sudo /usr/src/linux-perf-x86_64-rhel-9.4-bpf-a704f7a13544a408baee6fa78f0f24fa05bfa406/tools/perf/perf test 97 -v
97: perf stat JSON output linter : Running (1 active)
--- start ---
test child forked, pid 20715
Checking json output: no args [Success]
Checking json output: system wide [Success]
Checking json output: interval [Success]
Checking json output: event [Success]
Checking json output: per thread [Success]
Checking json output: per node [Success]
Checking json output: metric only Test failed for input:
{"metric-value" : "none"}
Traceback (most recent call last):
File "/usr/src/perf_selftests-x86_64-rhel-9.4-bpf-a704f7a13544a408baee6fa78f0f24fa05bfa406/tools/perf/tests/shell/lib/perf_json_output_lint.py", line 108, in <module>
check_json_output(expected_items)
File "/usr/src/perf_selftests-x86_64-rhel-9.4-bpf-a704f7a13544a408baee6fa78f0f24fa05bfa406/tools/perf/tests/shell/lib/perf_json_output_lint.py", line 93, in check_json_output
raise RuntimeError(f'Check failed for: key={key} value={value}')
RuntimeError: Check failed for: key=metric-value value=none
---- end(-1) ----
97: perf stat JSON output linter : FAILED!
...
2025-08-18 13:29:46 sudo /usr/src/linux-perf-x86_64-rhel-9.4-bpf-a704f7a13544a408baee6fa78f0f24fa05bfa406/tools/perf/perf test 99 -v
99: perf stat STD output linter : Running (1 active)
--- start ---
test child forked, pid 20818
Checking STD output: no args [Success]
Checking STD output: system wide [Success]
Checking STD output: interval [Success]
Checking STD output: per thread [Success]
Checking STD output: per node [Success]
Checking STD output: metric only ---- end(-1) ----
99: perf stat STD output linter : FAILED!
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250821/202508211037.3f897218-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 01/19] perf/arm-cmn: Fix event validation
2025-08-13 17:00 ` [PATCH 01/19] perf/arm-cmn: Fix event validation Robin Murphy
@ 2025-08-26 10:46 ` Mark Rutland
0 siblings, 0 replies; 52+ messages in thread
From: Mark Rutland @ 2025-08-26 10:46 UTC (permalink / raw)
To: Robin Murphy
Cc: peterz, mingo, will, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
Hi Robin,
On Wed, Aug 13, 2025 at 06:00:53PM +0100, Robin Murphy wrote:
> In the hypothetical case where a CMN event is opened with a software
> group leader that already has some other hardware sibling, currently
> arm_cmn_val_add_event() could try to interpret the other event's data
> as an arm_cmn_hw_event, which is not great since we dereference a
> pointer from there... Thankfully the way to be more robust is to be
> less clever - stop trying to special-case software events and simply
> skip any event that isn't for our PMU.
I think this is missing some important context w.r.t. how the core perf
code behaves (and hence why this change doesn't cause other problems).
I'd suggest that we give the first few patches a common preamble:
| When opening a new perf event, the core perf code calls
| pmu::event_init() before checking whether the new event would cause an
| event group to span multiple hardware PMUs. Considering this:
|
| (1) Any pmu::event_init() callback needs to be robust to cases where
| a non-software group_leader or sibling event targets a distinct
| PMU.
|
| (2) Any pmu::event_init() callback doesn't need to explicitly reject
| groups that span multiple hardware PMUs, as the core code will
| reject this later.
... and then spell out the specific issues in the driver, e.g.
| The logic in arm_cmn_validate_group() doesn't account for cases where
| a non-software sibling event targets a distinct PMU. In such cases,
| arm_cmn_val_add_event() will erroneously interpret the sibling's
| event::hw as as struct arm_cmn_hw_event, including dereferencing
| pointers from potentially user-controlled fields.
|
| Fix this by skipping any events for distinct PMUs, and leaving it to
| the core code to reject event groups that span multiple hardware PMUs.
With that context, the patch itself looks good to me.
This will need a Cc stable. I'm not sure what Fixes tag is necessary;
has this been broken since its introduction?
Mark.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> drivers/perf/arm-cmn.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index 11fb2234b10f..f8c9be9fa6c0 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -1652,7 +1652,7 @@ static void arm_cmn_val_add_event(struct arm_cmn *cmn, struct arm_cmn_val *val,
> enum cmn_node_type type;
> int i;
>
> - if (is_software_event(event))
> + if (event->pmu != &cmn->pmu)
> return;
>
> type = CMN_EVENT_TYPE(event);
> @@ -1693,9 +1693,6 @@ static int arm_cmn_validate_group(struct arm_cmn *cmn, struct perf_event *event)
> if (leader == event)
> return 0;
>
> - if (event->pmu != leader->pmu && !is_software_event(leader))
> - return -EINVAL;
> -
> val = kzalloc(sizeof(*val), GFP_KERNEL);
> if (!val)
> return -ENOMEM;
> --
> 2.39.2.101.g768bb238c484.dirty
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 02/19] perf/hisilicon: Fix group validation
2025-08-13 17:00 ` [PATCH 02/19] perf/hisilicon: Fix group validation Robin Murphy
@ 2025-08-26 11:15 ` Mark Rutland
2025-08-26 13:18 ` Mark Rutland
2025-08-26 14:35 ` Robin Murphy
0 siblings, 2 replies; 52+ messages in thread
From: Mark Rutland @ 2025-08-26 11:15 UTC (permalink / raw)
To: Robin Murphy
Cc: peterz, mingo, will, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
> The group validation logic shared by the HiSilicon HNS3/PCIe drivers is
> a bit off, in that given a software group leader, it will consider that
> event *in place of* the actual new event being opened. At worst this
> could theoretically allow an unschedulable group if the software event
> config happens to look like one of the hardware siblings.
>
> The uncore framework avoids that particular issue,
What is "the uncore framework"? I'm not sure exactly what you're
referring to, nor how that composes with the problem described above.
> but all 3 also share the common issue of not preventing racy access to
> the sibling list,
Can you please elaborate on this racy access to the silbing list? I'm
not sure exactly what you're referring to.
> and some redundant checks which can be cleaned up.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> drivers/perf/hisilicon/hisi_pcie_pmu.c | 17 ++++++-----------
> drivers/perf/hisilicon/hisi_uncore_pmu.c | 23 +++++++----------------
> drivers/perf/hisilicon/hns3_pmu.c | 17 ++++++-----------
> 3 files changed, 19 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> index c5394d007b61..3b0b2f7197d0 100644
> --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> @@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
> int counters = 1;
> int num;
>
> - event_group[0] = leader;
> - if (!is_software_event(leader)) {
> - if (leader->pmu != event->pmu)
> - return false;
> + if (leader == event)
> + return true;
>
> - if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
> - event_group[counters++] = event;
> - }
> + event_group[0] = event;
> + if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
> + event_group[counters++] = leader;
Looking at this, the existing logic to share counters (which
hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given
that the start/stop callbacks will reprogram the HW counters (and hence
can fight with one another).
I suspect that can be removed *entirely*, and this can be simplified
down to allocating N counters, without a quadratic event comparison. We
don't try to share counters in other PMU drivers, and there was no
rationale for trying to do this when this wa introduced in commit:
8404b0fbc7fbd42e ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
The 'link' tag in that comment goes to v13, which doesn't link to prior
postings, so I'm not going to dig further.
Mark.
>
> for_each_sibling_event(sibling, event->group_leader) {
> - if (is_software_event(sibling))
> - continue;
> -
> if (sibling->pmu != event->pmu)
> - return false;
> + continue;
>
> for (num = 0; num < counters; num++) {
> /*
> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> index a449651f79c9..3c531b36cf25 100644
> --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> @@ -101,26 +101,17 @@ static bool hisi_validate_event_group(struct perf_event *event)
> /* Include count for the event */
> int counters = 1;
>
> - if (!is_software_event(leader)) {
> - /*
> - * We must NOT create groups containing mixed PMUs, although
> - * software events are acceptable
> - */
> - if (leader->pmu != event->pmu)
> - return false;
> + if (leader == event)
> + return true;
>
> - /* Increment counter for the leader */
> - if (leader != event)
> - counters++;
> - }
> + /* Increment counter for the leader */
> + if (leader->pmu == event->pmu)
> + counters++;
>
> for_each_sibling_event(sibling, event->group_leader) {
> - if (is_software_event(sibling))
> - continue;
> - if (sibling->pmu != event->pmu)
> - return false;
> /* Increment counter for each sibling */
> - counters++;
> + if (sibling->pmu == event->pmu)
> + counters++;
> }
>
> /* The group can not count events more than the counters in the HW */
> diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c
> index c157f3572cae..382e469257f9 100644
> --- a/drivers/perf/hisilicon/hns3_pmu.c
> +++ b/drivers/perf/hisilicon/hns3_pmu.c
> @@ -1058,21 +1058,16 @@ static bool hns3_pmu_validate_event_group(struct perf_event *event)
> int counters = 1;
> int num;
>
> - event_group[0] = leader;
> - if (!is_software_event(leader)) {
> - if (leader->pmu != event->pmu)
> - return false;
> + if (leader == event)
> + return true;
>
> - if (leader != event && !hns3_pmu_cmp_event(leader, event))
> - event_group[counters++] = event;
> - }
> + event_group[0] = event;
> + if (leader->pmu == event->pmu && !hns3_pmu_cmp_event(leader, event))
> + event_group[counters++] = leader;
>
> for_each_sibling_event(sibling, event->group_leader) {
> - if (is_software_event(sibling))
> - continue;
> -
> if (sibling->pmu != event->pmu)
> - return false;
> + continue;
>
> for (num = 0; num < counters; num++) {
> /*
> --
> 2.39.2.101.g768bb238c484.dirty
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 12/19] perf: Ignore event state for group validation
2025-08-13 17:01 ` [PATCH 12/19] perf: Ignore event state for group validation Robin Murphy
@ 2025-08-26 13:03 ` Peter Zijlstra
2025-08-26 15:32 ` Robin Murphy
0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-08-26 13:03 UTC (permalink / raw)
To: Robin Murphy
Cc: mingo, will, mark.rutland, acme, namhyung, alexander.shishkin,
jolsa, irogers, adrian.hunter, kan.liang, linux-perf-users,
linux-kernel, linux-alpha, linux-snps-arc, linux-arm-kernel, imx,
linux-csky, loongarch, linux-mips, linuxppc-dev, linux-s390,
linux-sh, sparclinux, linux-pm, linux-rockchip, dmaengine,
linux-fpga, amd-gfx, dri-devel, intel-gfx, intel-xe, coresight,
iommu, linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On Wed, Aug 13, 2025 at 06:01:04PM +0100, Robin Murphy wrote:
> It may have been different long ago, but today it seems wrong for these
> drivers to skip counting disabled sibling events in group validation,
> given that perf_event_enable() could make them schedulable again, and
> thus increase the effective size of the group later. Conversely, if a
> sibling event is truly dead then it stands to reason that the whole
> group is dead, so it's not worth going to any special effort to try to
> squeeze in a new event that's never going to run anyway. Thus, we can
> simply remove all these checks.
So currently you can do sort of a manual event rotation inside an
over-sized group and have it work.
I'm not sure if anybody actually does this, but its possible.
Eg. on a PMU that supports only 4 counters, create a group of 5 and
periodically cycle which of the 5 events is off.
So I'm not against changing this, but changing stuff like this always
makes me a little fearful -- it wouldn't be the first time that when it
finally trickles down to some 'enterprise' user in 5 years someone comes
and finally says, oh hey, you broke my shit :-(
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 16/19] perf: Introduce positive capability for sampling
2025-08-13 17:01 ` [PATCH 16/19] perf: Introduce positive capability for sampling Robin Murphy
@ 2025-08-26 13:08 ` Peter Zijlstra
2025-08-26 13:28 ` Mark Rutland
2025-08-26 13:11 ` Leo Yan
1 sibling, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-08-26 13:08 UTC (permalink / raw)
To: Robin Murphy
Cc: mingo, will, mark.rutland, acme, namhyung, alexander.shishkin,
jolsa, irogers, adrian.hunter, kan.liang, linux-perf-users,
linux-kernel, linux-alpha, linux-snps-arc, linux-arm-kernel, imx,
linux-csky, loongarch, linux-mips, linuxppc-dev, linux-s390,
linux-sh, sparclinux, linux-pm, linux-rockchip, dmaengine,
linux-fpga, amd-gfx, dri-devel, intel-gfx, intel-xe, coresight,
iommu, linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On Wed, Aug 13, 2025 at 06:01:08PM +0100, Robin Murphy wrote:
> Sampling is inherently a feature for CPU PMUs, given that the thing
> to be sampled is a CPU context. These days, we have many more
> uncore/system PMUs than CPU PMUs, so it no longer makes much sense to
> assume sampling support by default and force the ever-growing majority
> of drivers to opt out of it (or erroneously fail to). Instead, let's
> introduce a positive opt-in capability that's more obvious and easier to
> maintain.
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 4d439c24c901..bf2cfbeabba2 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -294,7 +294,7 @@ struct perf_event_pmu_context;
> /**
> * pmu::capabilities flags
> */
> -#define PERF_PMU_CAP_NO_INTERRUPT 0x0001
> +#define PERF_PMU_CAP_SAMPLING 0x0001
> #define PERF_PMU_CAP_NO_NMI 0x0002
> #define PERF_PMU_CAP_AUX_NO_SG 0x0004
> #define PERF_PMU_CAP_EXTENDED_REGS 0x0008
> @@ -305,6 +305,7 @@ struct perf_event_pmu_context;
> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
> +#define PERF_PMU_CAP_NO_INTERRUPT 0x0800
So NO_INTERRUPT was supposed to be the negative of your new SAMPLING
(and I agree with your reasoning).
What I'm confused/curious about is why we retain NO_INTERRUPT?
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 17/19] perf: Retire PERF_PMU_CAP_NO_INTERRUPT
2025-08-13 17:01 ` [PATCH 17/19] perf: Retire PERF_PMU_CAP_NO_INTERRUPT Robin Murphy
@ 2025-08-26 13:08 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-08-26 13:08 UTC (permalink / raw)
To: Robin Murphy
Cc: mingo, will, mark.rutland, acme, namhyung, alexander.shishkin,
jolsa, irogers, adrian.hunter, kan.liang, linux-perf-users,
linux-kernel, linux-alpha, linux-snps-arc, linux-arm-kernel, imx,
linux-csky, loongarch, linux-mips, linuxppc-dev, linux-s390,
linux-sh, sparclinux, linux-pm, linux-rockchip, dmaengine,
linux-fpga, amd-gfx, dri-devel, intel-gfx, intel-xe, coresight,
iommu, linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On Wed, Aug 13, 2025 at 06:01:09PM +0100, Robin Murphy wrote:
> Now that we have a well-defined cap for sampling support, clean up the
> remains of the mildly unintuitive and inconsistently-applied
> PERF_PMU_CAP_NO_INTERRUPT. Not to mention the obvious redundancy of
> some of these drivers still checking for sampling in event_init too.
Ah, clearly I should've read the next patch... n/m.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 16/19] perf: Introduce positive capability for sampling
2025-08-13 17:01 ` [PATCH 16/19] perf: Introduce positive capability for sampling Robin Murphy
2025-08-26 13:08 ` Peter Zijlstra
@ 2025-08-26 13:11 ` Leo Yan
2025-08-26 15:53 ` Robin Murphy
1 sibling, 1 reply; 52+ messages in thread
From: Leo Yan @ 2025-08-26 13:11 UTC (permalink / raw)
To: Robin Murphy
Cc: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
On Wed, Aug 13, 2025 at 06:01:08PM +0100, Robin Murphy wrote:
> Sampling is inherently a feature for CPU PMUs, given that the thing
> to be sampled is a CPU context. These days, we have many more
> uncore/system PMUs than CPU PMUs, so it no longer makes much sense to
> assume sampling support by default and force the ever-growing majority
> of drivers to opt out of it (or erroneously fail to). Instead, let's
> introduce a positive opt-in capability that's more obvious and easier to
> maintain.
[...]
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 369e77ad5f13..dbd52851f5c6 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -955,7 +955,8 @@ static int arm_spe_pmu_perf_init(struct arm_spe_pmu *spe_pmu)
> spe_pmu->pmu = (struct pmu) {
> .module = THIS_MODULE,
> .parent = &spe_pmu->pdev->dev,
> - .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
> + .capabilities = PERF_PMU_CAP_SAMPLING |
> + PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
> .attr_groups = arm_spe_pmu_attr_groups,
> /*
> * We hitch a ride on the software context here, so that
The change in Arm SPE driver looks good to me.
I noticed you did not set the flag for other AUX events, like Arm
CoreSight, Intel PT and bts. The drivers locate in:
drivers/hwtracing/coresight/coresight-etm-perf.c
arch/x86/events/intel/bts.c
arch/x86/events/intel/pt.c
Genearlly, AUX events generate interrupts based on AUX ring buffer
watermark but not the period. Seems to me, it is correct to set the
PERF_PMU_CAP_SAMPLING flag for them.
A special case is Arm CoreSight legacy sinks (like ETR/ETB, etc)
don't has interrupt. We might need set or clear the flag on the fly
based on sink type:
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index f1551c08ecb2..404edc94c198 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -433,6 +433,11 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
if (!sink)
goto err;
+ if (coresight_is_percpu_sink(sink))
+ event->pmu.capabilities = PERF_PMU_CAP_SAMPLING;
+ else
+ event->pmu.capabilities &= ~PERF_PMU_CAP_SAMPLING;
+
Thanks,
Leo
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH 02/19] perf/hisilicon: Fix group validation
2025-08-26 11:15 ` Mark Rutland
@ 2025-08-26 13:18 ` Mark Rutland
2025-08-26 14:35 ` Robin Murphy
1 sibling, 0 replies; 52+ messages in thread
From: Mark Rutland @ 2025-08-26 13:18 UTC (permalink / raw)
To: Robin Murphy
Cc: peterz, mingo, will, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On Tue, Aug 26, 2025 at 12:15:23PM +0100, Mark Rutland wrote:
> On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
> > The group validation logic shared by the HiSilicon HNS3/PCIe drivers is
> > a bit off, in that given a software group leader, it will consider that
> > event *in place of* the actual new event being opened. At worst this
> > could theoretically allow an unschedulable group if the software event
> > config happens to look like one of the hardware siblings.
> >
> > The uncore framework avoids that particular issue,
>
> What is "the uncore framework"? I'm not sure exactly what you're
> referring to, nor how that composes with the problem described above.
>
> > but all 3 also share the common issue of not preventing racy access to
> > the sibling list,
>
> Can you please elaborate on this racy access to the silbing list? I'm
> not sure exactly what you're referring to.
Ah, I think you're referring to the issue in:
https://lore.kernel.org/linux-arm-kernel/Zg0l642PgQ7T3a8Z@FVFF77S0Q05N/
... where when creatign a new event which is its own group leader,
lockdep_assert_event_ctx(event) fires in for_each_sibling_event(),
because the new event's context isn't locked...
> > diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> > index a449651f79c9..3c531b36cf25 100644
> > --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
> > +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> > @@ -101,26 +101,17 @@ static bool hisi_validate_event_group(struct perf_event *event)
> > /* Include count for the event */
> > int counters = 1;
> >
> > - if (!is_software_event(leader)) {
> > - /*
> > - * We must NOT create groups containing mixed PMUs, although
> > - * software events are acceptable
> > - */
> > - if (leader->pmu != event->pmu)
> > - return false;
> > + if (leader == event)
> > + return true;
... and hence bailing out here avoids that?
It's not strictly "racy access to the sibling list", becuase there's
nothing else accessing the list; it's just that this is the simplest way
to appease lockdep while avoiding false negatives.
It'd probably be better to say something like "the common issue of
calling for_each_sibling_event() when initialising a new group leader",
and maybe to spell that out a bit.
Mark.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 16/19] perf: Introduce positive capability for sampling
2025-08-26 13:08 ` Peter Zijlstra
@ 2025-08-26 13:28 ` Mark Rutland
2025-08-26 16:35 ` Robin Murphy
0 siblings, 1 reply; 52+ messages in thread
From: Mark Rutland @ 2025-08-26 13:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Robin Murphy, mingo, will, acme, namhyung, alexander.shishkin,
jolsa, irogers, adrian.hunter, kan.liang, linux-perf-users,
linux-kernel, linux-alpha, linux-snps-arc, linux-arm-kernel, imx,
linux-csky, loongarch, linux-mips, linuxppc-dev, linux-s390,
linux-sh, sparclinux, linux-pm, linux-rockchip, dmaengine,
linux-fpga, amd-gfx, dri-devel, intel-gfx, intel-xe, coresight,
iommu, linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On Tue, Aug 26, 2025 at 03:08:06PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 13, 2025 at 06:01:08PM +0100, Robin Murphy wrote:
> > Sampling is inherently a feature for CPU PMUs, given that the thing
> > to be sampled is a CPU context. These days, we have many more
> > uncore/system PMUs than CPU PMUs, so it no longer makes much sense to
> > assume sampling support by default and force the ever-growing majority
> > of drivers to opt out of it (or erroneously fail to). Instead, let's
> > introduce a positive opt-in capability that's more obvious and easier to
> > maintain.
>
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 4d439c24c901..bf2cfbeabba2 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -294,7 +294,7 @@ struct perf_event_pmu_context;
> > /**
> > * pmu::capabilities flags
> > */
> > -#define PERF_PMU_CAP_NO_INTERRUPT 0x0001
> > +#define PERF_PMU_CAP_SAMPLING 0x0001
> > #define PERF_PMU_CAP_NO_NMI 0x0002
> > #define PERF_PMU_CAP_AUX_NO_SG 0x0004
> > #define PERF_PMU_CAP_EXTENDED_REGS 0x0008
> > @@ -305,6 +305,7 @@ struct perf_event_pmu_context;
> > #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> > #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> > #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
> > +#define PERF_PMU_CAP_NO_INTERRUPT 0x0800
>
> So NO_INTERRUPT was supposed to be the negative of your new SAMPLING
> (and I agree with your reasoning).
>
> What I'm confused/curious about is why we retain NO_INTERRUPT?
I see from your other reply that you spotted the next patch does that.
For the sake of other reviewers or anyone digging through the git
history it's probably worth adding a line to this commit message to say:
| A subsequent patch will remove PERF_PMU_CAP_NO_INTERRUPT as this
| requires some additional cleanup.
Mark.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 18/19] perf: Introduce positive capability for raw events
2025-08-13 17:01 ` [PATCH 18/19] perf: Introduce positive capability for raw events Robin Murphy
2025-08-19 13:15 ` Robin Murphy
2025-08-21 2:53 ` kernel test robot
@ 2025-08-26 13:43 ` Mark Rutland
2025-08-26 22:46 ` Robin Murphy
2025-08-27 5:27 ` Thomas Richter
2 siblings, 2 replies; 52+ messages in thread
From: Mark Rutland @ 2025-08-26 13:43 UTC (permalink / raw)
To: Robin Murphy
Cc: peterz, mingo, will, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On Wed, Aug 13, 2025 at 06:01:10PM +0100, Robin Murphy wrote:
> Only a handful of CPU PMUs accept PERF_TYPE_{RAW,HARDWARE,HW_CACHE}
> events without registering themselves as PERF_TYPE_RAW in the first
> place. Add an explicit opt-in for these special cases, so that we can
> make life easier for every other driver (and probably also speed up the
> slow-path search) by having perf_try_init_event() do the basic type
> checking to cover the majority of cases.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
To bikeshed a little here, I'm not keen on the PERF_PMU_CAP_RAW_EVENTS
name, because it's not clear what "RAW" really means, and people will
definitely read that to mean something else.
Could we go with something like PERF_PMU_CAP_COMMON_CPU_EVENTS, to make
it clear that this is about opting into CPU-PMU specific event types (of
which PERF_TYPE_RAW is one of)?
Likewise, s/is_raw_pmu()/pmu_supports_common_cpu_events()/.
> ---
>
> A further possibility is to automatically add the cap to PERF_TYPE_RAW
> PMUs in perf_pmu_register() to have a single point-of-use condition; I'm
> undecided...
I reckon we don't need to automagically do that, but I reckon that
is_raw_pmu()/pmu_supports_common_cpu_events() should only check the cap,
and we don't read anything special into any of
PERF_TYPE_{RAW,HARDWARE,HW_CACHE}.
> ---
> arch/s390/kernel/perf_cpum_cf.c | 1 +
> arch/s390/kernel/perf_pai_crypto.c | 2 +-
> arch/s390/kernel/perf_pai_ext.c | 2 +-
> arch/x86/events/core.c | 2 +-
> drivers/perf/arm_pmu.c | 1 +
> include/linux/perf_event.h | 1 +
> kernel/events/core.c | 15 +++++++++++++++
> 7 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
> index 1a94e0944bc5..782ab755ddd4 100644
> --- a/arch/s390/kernel/perf_cpum_cf.c
> +++ b/arch/s390/kernel/perf_cpum_cf.c
> @@ -1054,6 +1054,7 @@ static void cpumf_pmu_del(struct perf_event *event, int flags)
> /* Performance monitoring unit for s390x */
> static struct pmu cpumf_pmu = {
> .task_ctx_nr = perf_sw_context,
> + .capabilities = PERF_PMU_CAP_RAW_EVENTS,
> .pmu_enable = cpumf_pmu_enable,
> .pmu_disable = cpumf_pmu_disable,
> .event_init = cpumf_pmu_event_init,
Tangential, but use of perf_sw_context here looks bogus.
> diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c
> index a64b6b056a21..b5b6d8b5d943 100644
> --- a/arch/s390/kernel/perf_pai_crypto.c
> +++ b/arch/s390/kernel/perf_pai_crypto.c
> @@ -569,7 +569,7 @@ static const struct attribute_group *paicrypt_attr_groups[] = {
> /* Performance monitoring unit for mapped counters */
> static struct pmu paicrypt = {
> .task_ctx_nr = perf_hw_context,
> - .capabilities = PERF_PMU_CAP_SAMPLING,
> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
> .event_init = paicrypt_event_init,
> .add = paicrypt_add,
> .del = paicrypt_del,
> diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c
> index 1261f80c6d52..bcd28c38da70 100644
> --- a/arch/s390/kernel/perf_pai_ext.c
> +++ b/arch/s390/kernel/perf_pai_ext.c
> @@ -595,7 +595,7 @@ static const struct attribute_group *paiext_attr_groups[] = {
> /* Performance monitoring unit for mapped counters */
> static struct pmu paiext = {
> .task_ctx_nr = perf_hw_context,
> - .capabilities = PERF_PMU_CAP_SAMPLING,
> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
> .event_init = paiext_event_init,
> .add = paiext_add,
> .del = paiext_del,
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 789dfca2fa67..764728bb80ae 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2697,7 +2697,7 @@ static bool x86_pmu_filter(struct pmu *pmu, int cpu)
> }
>
> static struct pmu pmu = {
> - .capabilities = PERF_PMU_CAP_SAMPLING,
> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
>
> .pmu_enable = x86_pmu_enable,
> .pmu_disable = x86_pmu_disable,
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 72d8f38d0aa5..bc772a3bf411 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -877,6 +877,7 @@ struct arm_pmu *armpmu_alloc(void)
> * specific PMU.
> */
> .capabilities = PERF_PMU_CAP_SAMPLING |
> + PERF_PMU_CAP_RAW_EVENTS |
> PERF_PMU_CAP_EXTENDED_REGS |
> PERF_PMU_CAP_EXTENDED_HW_TYPE,
> };
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 183b7c48b329..c6ad036c0037 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -305,6 +305,7 @@ struct perf_event_pmu_context;
> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
> +#define PERF_PMU_CAP_RAW_EVENTS 0x0800
>
> /**
> * pmu::scope
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 71b2a6730705..2ecee76d2ae2 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -12556,11 +12556,26 @@ static inline bool has_extended_regs(struct perf_event *event)
> (event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK);
> }
>
> +static bool is_raw_pmu(const struct pmu *pmu)
> +{
> + return pmu->type == PERF_TYPE_RAW ||
> + pmu->capabilities & PERF_PMU_CAP_RAW_EVENTS;
> +}
As above, I reckon we should make this:
static bool pmu_supports_common_cpu_events(const struct pmu *pmu)
{
return pmu->capabilities & PERF_PMU_CAP_RAW_EVENTS;
}
Other than the above, this looks good to me.
Mark.
> +
> static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
> {
> struct perf_event_context *ctx = NULL;
> int ret;
>
> + /*
> + * Before touching anything, we can safely skip:
> + * - any event for a specific PMU which is not this one
> + * - any common event if this PMU doesn't support them
> + */
> + if (event->attr.type != pmu->type &&
> + (event->attr.type >= PERF_TYPE_MAX || is_raw_pmu(pmu)))
> + return -ENOENT;
> +
> if (!try_module_get(pmu->module))
> return -ENODEV;
>
> --
> 2.39.2.101.g768bb238c484.dirty
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 02/19] perf/hisilicon: Fix group validation
2025-08-26 11:15 ` Mark Rutland
2025-08-26 13:18 ` Mark Rutland
@ 2025-08-26 14:35 ` Robin Murphy
2025-08-26 15:31 ` Mark Rutland
1 sibling, 1 reply; 52+ messages in thread
From: Robin Murphy @ 2025-08-26 14:35 UTC (permalink / raw)
To: Mark Rutland
Cc: peterz, mingo, will, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On 2025-08-26 12:15 pm, Mark Rutland wrote:
> On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
>> The group validation logic shared by the HiSilicon HNS3/PCIe drivers is
>> a bit off, in that given a software group leader, it will consider that
>> event *in place of* the actual new event being opened. At worst this
>> could theoretically allow an unschedulable group if the software event
>> config happens to look like one of the hardware siblings.
>>
>> The uncore framework avoids that particular issue,
>
> What is "the uncore framework"? I'm not sure exactly what you're
> referring to, nor how that composes with the problem described above.
Literally that hisi_uncore_pmu.c is actually a framework for half a
dozen individual sub-drivers rather than a "driver" itself per se, but I
suppose that detail doesn't strictly matter at this level.
>> but all 3 also share the common issue of not preventing racy access to
>> the sibling list,
>
> Can you please elaborate on this racy access to the silbing list? I'm
> not sure exactly what you're referring to.
Hmm, yes, I guess an actual race is probably impossible since if we're
still in the middle of opening the group leader event then we haven't
yet allocated the fd that userspace would need to start adding siblings,
even if it tried to guess. I leaned on "racy" as a concise way to infer
"when it isn't locked (even though the reasons for that are more
subtle)" repeatedly over several patches - after all, the overall theme
of this series is that I dislike repetitive boilerplate :)
I'll dedicate some time for polishing commit messages for v2, especially
the common context for these "part 1" patches per your feedback on patch #1.
>> and some redundant checks which can be cleaned up.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>> drivers/perf/hisilicon/hisi_pcie_pmu.c | 17 ++++++-----------
>> drivers/perf/hisilicon/hisi_uncore_pmu.c | 23 +++++++----------------
>> drivers/perf/hisilicon/hns3_pmu.c | 17 ++++++-----------
>> 3 files changed, 19 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
>> index c5394d007b61..3b0b2f7197d0 100644
>> --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
>> +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
>> @@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
>> int counters = 1;
>> int num;
>>
>> - event_group[0] = leader;
>> - if (!is_software_event(leader)) {
>> - if (leader->pmu != event->pmu)
>> - return false;
>> + if (leader == event)
>> + return true;
>>
>> - if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
>> - event_group[counters++] = event;
>> - }
>> + event_group[0] = event;
>> + if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
>> + event_group[counters++] = leader;
>
> Looking at this, the existing logic to share counters (which
> hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given
> that the start/stop callbacks will reprogram the HW counters (and hence
> can fight with one another).
Yeah, this had a dodgy smell when I first came across it, but after
doing all the digging I think it does actually work out - the trick
seems to be the group_leader check in hisi_pcie_pmu_get_event_idx(),
with the implication the PMU is going to be stopped while scheduling
in/out the whole group, so assuming hisi_pcie_pmu_del() doesn't clear
the counter value in hardware (even though the first call nukes the rest
of the event configuration), then the events should stay in sync.
It does seem somewhat nonsensical to have multiple copies of the same
event in the same group, but I imagine it could happen with some sort of
scripted combination of metrics, and supporting it at this level saves
needing explicit deduplication further up. So even though my initial
instinct was to rip it out too, in the end I concluded that that doesn't
seem justified.
Thanks,
Robin.
> I suspect that can be removed *entirely*, and this can be simplified
> down to allocating N counters, without a quadratic event comparison. We
> don't try to share counters in other PMU drivers, and there was no
> rationale for trying to do this when this wa introduced in commit:
>
> 8404b0fbc7fbd42e ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
>
> The 'link' tag in that comment goes to v13, which doesn't link to prior
> postings, so I'm not going to dig further.
>
> Mark.
>
>>
>> for_each_sibling_event(sibling, event->group_leader) {
>> - if (is_software_event(sibling))
>> - continue;
>> -
>> if (sibling->pmu != event->pmu)
>> - return false;
>> + continue;
>>
>> for (num = 0; num < counters; num++) {
>> /*
>> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
>> index a449651f79c9..3c531b36cf25 100644
>> --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
>> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
>> @@ -101,26 +101,17 @@ static bool hisi_validate_event_group(struct perf_event *event)
>> /* Include count for the event */
>> int counters = 1;
>>
>> - if (!is_software_event(leader)) {
>> - /*
>> - * We must NOT create groups containing mixed PMUs, although
>> - * software events are acceptable
>> - */
>> - if (leader->pmu != event->pmu)
>> - return false;
>> + if (leader == event)
>> + return true;
>>
>> - /* Increment counter for the leader */
>> - if (leader != event)
>> - counters++;
>> - }
>> + /* Increment counter for the leader */
>> + if (leader->pmu == event->pmu)
>> + counters++;
>>
>> for_each_sibling_event(sibling, event->group_leader) {
>> - if (is_software_event(sibling))
>> - continue;
>> - if (sibling->pmu != event->pmu)
>> - return false;
>> /* Increment counter for each sibling */
>> - counters++;
>> + if (sibling->pmu == event->pmu)
>> + counters++;
>> }
>>
>> /* The group can not count events more than the counters in the HW */
>> diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c
>> index c157f3572cae..382e469257f9 100644
>> --- a/drivers/perf/hisilicon/hns3_pmu.c
>> +++ b/drivers/perf/hisilicon/hns3_pmu.c
>> @@ -1058,21 +1058,16 @@ static bool hns3_pmu_validate_event_group(struct perf_event *event)
>> int counters = 1;
>> int num;
>>
>> - event_group[0] = leader;
>> - if (!is_software_event(leader)) {
>> - if (leader->pmu != event->pmu)
>> - return false;
>> + if (leader == event)
>> + return true;
>>
>> - if (leader != event && !hns3_pmu_cmp_event(leader, event))
>> - event_group[counters++] = event;
>> - }
>> + event_group[0] = event;
>> + if (leader->pmu == event->pmu && !hns3_pmu_cmp_event(leader, event))
>> + event_group[counters++] = leader;
>>
>> for_each_sibling_event(sibling, event->group_leader) {
>> - if (is_software_event(sibling))
>> - continue;
>> -
>> if (sibling->pmu != event->pmu)
>> - return false;
>> + continue;
>>
>> for (num = 0; num < counters; num++) {
>> /*
>> --
>> 2.39.2.101.g768bb238c484.dirty
>>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 02/19] perf/hisilicon: Fix group validation
2025-08-26 14:35 ` Robin Murphy
@ 2025-08-26 15:31 ` Mark Rutland
2025-08-26 15:55 ` Mark Rutland
2025-08-27 14:03 ` Mark Rutland
0 siblings, 2 replies; 52+ messages in thread
From: Mark Rutland @ 2025-08-26 15:31 UTC (permalink / raw)
To: Robin Murphy
Cc: peterz, mingo, will, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On Tue, Aug 26, 2025 at 03:35:48PM +0100, Robin Murphy wrote:
> On 2025-08-26 12:15 pm, Mark Rutland wrote:
> > On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
> > > The group validation logic shared by the HiSilicon HNS3/PCIe drivers is
> > > a bit off, in that given a software group leader, it will consider that
> > > event *in place of* the actual new event being opened. At worst this
> > > could theoretically allow an unschedulable group if the software event
> > > config happens to look like one of the hardware siblings.
> > >
> > > The uncore framework avoids that particular issue,
> >
> > What is "the uncore framework"? I'm not sure exactly what you're
> > referring to, nor how that composes with the problem described above.
>
> Literally that hisi_uncore_pmu.c is actually a framework for half a dozen
> individual sub-drivers rather than a "driver" itself per se, but I suppose
> that detail doesn't strictly matter at this level.
I see. My concern was just that I couldn't figure out what "the uncore
framework" was, since it sounded more generic. If you say something like
"the shared code in hisi_uncore_pmu.c", I think that would be clearer.
> > > but all 3 also share the common issue of not preventing racy access to
> > > the sibling list,
> >
> > Can you please elaborate on this racy access to the silbing list? I'm
> > not sure exactly what you're referring to.
>
> Hmm, yes, I guess an actual race is probably impossible since if we're still
> in the middle of opening the group leader event then we haven't yet
> allocated the fd that userspace would need to start adding siblings, even if
> it tried to guess. I leaned on "racy" as a concise way to infer "when it
> isn't locked (even though the reasons for that are more subtle)" repeatedly
> over several patches - after all, the overall theme of this series is that I
> dislike repetitive boilerplate :)
>
> I'll dedicate some time for polishing commit messages for v2, especially the
> common context for these "part 1" patches per your feedback on patch #1.
Thanks!
> > > and some redundant checks which can be cleaned up.
> > >
> > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > > drivers/perf/hisilicon/hisi_pcie_pmu.c | 17 ++++++-----------
> > > drivers/perf/hisilicon/hisi_uncore_pmu.c | 23 +++++++----------------
> > > drivers/perf/hisilicon/hns3_pmu.c | 17 ++++++-----------
> > > 3 files changed, 19 insertions(+), 38 deletions(-)
> > >
> > > diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > index c5394d007b61..3b0b2f7197d0 100644
> > > --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > @@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
> > > int counters = 1;
> > > int num;
> > > - event_group[0] = leader;
> > > - if (!is_software_event(leader)) {
> > > - if (leader->pmu != event->pmu)
> > > - return false;
> > > + if (leader == event)
> > > + return true;
> > > - if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
> > > - event_group[counters++] = event;
> > > - }
> > > + event_group[0] = event;
> > > + if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
> > > + event_group[counters++] = leader;
> >
> > Looking at this, the existing logic to share counters (which
> > hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given
> > that the start/stop callbacks will reprogram the HW counters (and hence
> > can fight with one another).
>
> Yeah, this had a dodgy smell when I first came across it, but after doing
> all the digging I think it does actually work out - the trick seems to be
> the group_leader check in hisi_pcie_pmu_get_event_idx(), with the
> implication the PMU is going to be stopped while scheduling in/out the whole
> group, so assuming hisi_pcie_pmu_del() doesn't clear the counter value in
> hardware (even though the first call nukes the rest of the event
> configuration), then the events should stay in sync.
I don't think that's sufficient. If nothing else, overflow is handled
per-event, and for a group of two identical events, upon overflow
hisi_pcie_pmu_irq() will reprogram the shared HW counter when handling
the first event, and the second event will see an arbitrary
discontinuity. Maybe no-one has spotted that due to the 2^63 counter
period that we program, but this is clearly bogus.
In addition, AFAICT the IRQ handler doesn't stop the PMU, so in general
groups aren't handled atomically, and snapshots of the counters won't be
atomic.
> It does seem somewhat nonsensical to have multiple copies of the same event
> in the same group, but I imagine it could happen with some sort of scripted
> combination of metrics, and supporting it at this level saves needing
> explicit deduplication further up. So even though my initial instinct was to
> rip it out too, in the end I concluded that that doesn't seem justified.
As above, I think it's clearly bogus. I don't think we should have
merged it as-is and it's not something I'd like to see others copy.
Other PMUs don't do this sort of event deduplication, and in general it
should be up to the user or userspace software to do that rather than
doing that badly in the kernel.
Given it was implemented with no rationale I think we should rip it out.
If that breaks someone's scripting, then we can consider implementing
something that actually works.
Mark.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 12/19] perf: Ignore event state for group validation
2025-08-26 13:03 ` Peter Zijlstra
@ 2025-08-26 15:32 ` Robin Murphy
2025-08-26 18:48 ` Ian Rogers
0 siblings, 1 reply; 52+ messages in thread
From: Robin Murphy @ 2025-08-26 15:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, will, mark.rutland, acme, namhyung, alexander.shishkin,
jolsa, irogers, adrian.hunter, kan.liang, linux-perf-users,
linux-kernel, linux-alpha, linux-snps-arc, linux-arm-kernel, imx,
linux-csky, loongarch, linux-mips, linuxppc-dev, linux-s390,
linux-sh, sparclinux, linux-pm, linux-rockchip, dmaengine,
linux-fpga, amd-gfx, dri-devel, intel-gfx, intel-xe, coresight,
iommu, linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On 2025-08-26 2:03 pm, Peter Zijlstra wrote:
> On Wed, Aug 13, 2025 at 06:01:04PM +0100, Robin Murphy wrote:
>> It may have been different long ago, but today it seems wrong for these
>> drivers to skip counting disabled sibling events in group validation,
>> given that perf_event_enable() could make them schedulable again, and
>> thus increase the effective size of the group later. Conversely, if a
>> sibling event is truly dead then it stands to reason that the whole
>> group is dead, so it's not worth going to any special effort to try to
>> squeeze in a new event that's never going to run anyway. Thus, we can
>> simply remove all these checks.
>
> So currently you can do sort of a manual event rotation inside an
> over-sized group and have it work.
>
> I'm not sure if anybody actually does this, but its possible.
>
> Eg. on a PMU that supports only 4 counters, create a group of 5 and
> periodically cycle which of the 5 events is off.
>
> So I'm not against changing this, but changing stuff like this always
> makes me a little fearful -- it wouldn't be the first time that when it
> finally trickles down to some 'enterprise' user in 5 years someone comes
> and finally says, oh hey, you broke my shit :-(
Eww, I see what you mean... and I guess that's probably lower-overhead
than actually deleting and recreating the sibling event(s) each time,
and potentially less bother then wrangling multiple groups for different
combinations of subsets when one simply must still approximate a complex
metric that requires more counters than the hardware offers.
I'm also not keen to break anything that wasn't already somewhat broken,
especially since this patch is only intended as cleanup, so either we
could just drop it altogether, or perhaps I can wrap the existing
behaviour in a helper that can at least document this assumption and
discourage new drivers from copying it. Am I right that only
PERF_EVENT_STATE_{OFF,ERROR} would matter for this, though, and my
reasoning for state <= PERF_EVENT_STATE_EXIT should still stand? As for
the fiddly discrepancy with enable_on_exec between arm_pmu and others
I'm not really sure what to think...
Thanks,
Robin.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 16/19] perf: Introduce positive capability for sampling
2025-08-26 13:11 ` Leo Yan
@ 2025-08-26 15:53 ` Robin Murphy
2025-08-27 8:06 ` Leo Yan
0 siblings, 1 reply; 52+ messages in thread
From: Robin Murphy @ 2025-08-26 15:53 UTC (permalink / raw)
To: Leo Yan
Cc: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
On 2025-08-26 2:11 pm, Leo Yan wrote:
> On Wed, Aug 13, 2025 at 06:01:08PM +0100, Robin Murphy wrote:
>> Sampling is inherently a feature for CPU PMUs, given that the thing
>> to be sampled is a CPU context. These days, we have many more
>> uncore/system PMUs than CPU PMUs, so it no longer makes much sense to
>> assume sampling support by default and force the ever-growing majority
>> of drivers to opt out of it (or erroneously fail to). Instead, let's
>> introduce a positive opt-in capability that's more obvious and easier to
>> maintain.
>
> [...]
>
>> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
>> index 369e77ad5f13..dbd52851f5c6 100644
>> --- a/drivers/perf/arm_spe_pmu.c
>> +++ b/drivers/perf/arm_spe_pmu.c
>> @@ -955,7 +955,8 @@ static int arm_spe_pmu_perf_init(struct arm_spe_pmu *spe_pmu)
>> spe_pmu->pmu = (struct pmu) {
>> .module = THIS_MODULE,
>> .parent = &spe_pmu->pdev->dev,
>> - .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
>> + .capabilities = PERF_PMU_CAP_SAMPLING |
>> + PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
>> .attr_groups = arm_spe_pmu_attr_groups,
>> /*
>> * We hitch a ride on the software context here, so that
>
> The change in Arm SPE driver looks good to me.
>
> I noticed you did not set the flag for other AUX events, like Arm
> CoreSight, Intel PT and bts. The drivers locate in:
>
> drivers/hwtracing/coresight/coresight-etm-perf.c
> arch/x86/events/intel/bts.c
> arch/x86/events/intel/pt.c
>
> Genearlly, AUX events generate interrupts based on AUX ring buffer
> watermark but not the period. Seems to me, it is correct to set the
> PERF_PMU_CAP_SAMPLING flag for them.
This cap is given to drivers which handle event->attr.sample_period and
call perf_event_overflow() - or in a few rare cases,
perf_output_sample() directly - to do something meaningful with it,
since the intent is to convey "I properly handle events for which
is_sampling_event() is true". My understanding is that aux events are
something else entirely, but I'm happy to be corrected.
Otherwise, perhaps this suggests it deserves to be named a little more
specifically for clarity, maybe PERF_CAP_SAMPLING_EVENTS?
Thanks,
Robin.
> A special case is Arm CoreSight legacy sinks (like ETR/ETB, etc)
> don't has interrupt. We might need set or clear the flag on the fly
> based on sink type:
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index f1551c08ecb2..404edc94c198 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -433,6 +433,11 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> if (!sink)
> goto err;
>
> + if (coresight_is_percpu_sink(sink))
> + event->pmu.capabilities = PERF_PMU_CAP_SAMPLING;
> + else
> + event->pmu.capabilities &= ~PERF_PMU_CAP_SAMPLING;
> +
>
> Thanks,
> Leo
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 02/19] perf/hisilicon: Fix group validation
2025-08-26 15:31 ` Mark Rutland
@ 2025-08-26 15:55 ` Mark Rutland
2025-08-27 14:03 ` Mark Rutland
1 sibling, 0 replies; 52+ messages in thread
From: Mark Rutland @ 2025-08-26 15:55 UTC (permalink / raw)
To: Robin Murphy
Cc: peterz, mingo, will, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On Tue, Aug 26, 2025 at 04:31:23PM +0100, Mark Rutland wrote:
> On Tue, Aug 26, 2025 at 03:35:48PM +0100, Robin Murphy wrote:
> > On 2025-08-26 12:15 pm, Mark Rutland wrote:
> > > On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
> > > > diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > index c5394d007b61..3b0b2f7197d0 100644
> > > > --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > @@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
> > > > int counters = 1;
> > > > int num;
> > > > - event_group[0] = leader;
> > > > - if (!is_software_event(leader)) {
> > > > - if (leader->pmu != event->pmu)
> > > > - return false;
> > > > + if (leader == event)
> > > > + return true;
> > > > - if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
> > > > - event_group[counters++] = event;
> > > > - }
> > > > + event_group[0] = event;
> > > > + if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
> > > > + event_group[counters++] = leader;
> > >
> > > Looking at this, the existing logic to share counters (which
> > > hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given
> > > that the start/stop callbacks will reprogram the HW counters (and hence
> > > can fight with one another).
> >
> > Yeah, this had a dodgy smell when I first came across it, but after doing
> > all the digging I think it does actually work out - the trick seems to be
> > the group_leader check in hisi_pcie_pmu_get_event_idx(), with the
> > implication the PMU is going to be stopped while scheduling in/out the whole
> > group, so assuming hisi_pcie_pmu_del() doesn't clear the counter value in
> > hardware (even though the first call nukes the rest of the event
> > configuration), then the events should stay in sync.
>
> I don't think that's sufficient. If nothing else, overflow is handled
> per-event, and for a group of two identical events, upon overflow
> hisi_pcie_pmu_irq() will reprogram the shared HW counter when handling
> the first event, and the second event will see an arbitrary
> discontinuity. Maybe no-one has spotted that due to the 2^63 counter
> period that we program, but this is clearly bogus.
>
> In addition, AFAICT the IRQ handler doesn't stop the PMU, so in general
> groups aren't handled atomically, and snapshots of the counters won't be
> atomic.
>
> > It does seem somewhat nonsensical to have multiple copies of the same event
> > in the same group, but I imagine it could happen with some sort of scripted
> > combination of metrics, and supporting it at this level saves needing
> > explicit deduplication further up. So even though my initial instinct was to
> > rip it out too, in the end I concluded that that doesn't seem justified.
>
[...]
> As above, I think it's clearly bogus. I don't think we should have
> merged it as-is and it's not something I'd like to see others copy.
> Other PMUs don't do this sort of event deduplication, and in general it
> should be up to the user or userspace software to do that rather than
> doing that badly in the kernel.
>
> Given it was implemented with no rationale I think we should rip it out.
> If that breaks someone's scripting, then we can consider implementing
> something that actually works.
FWIW, I'm happy to go do that as a follow-up, so if that's a pain, feel
free to leave that as-is for now.
Mark.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 16/19] perf: Introduce positive capability for sampling
2025-08-26 13:28 ` Mark Rutland
@ 2025-08-26 16:35 ` Robin Murphy
0 siblings, 0 replies; 52+ messages in thread
From: Robin Murphy @ 2025-08-26 16:35 UTC (permalink / raw)
To: Mark Rutland, Peter Zijlstra
Cc: mingo, will, acme, namhyung, alexander.shishkin, jolsa, irogers,
adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On 2025-08-26 2:28 pm, Mark Rutland wrote:
> On Tue, Aug 26, 2025 at 03:08:06PM +0200, Peter Zijlstra wrote:
>> On Wed, Aug 13, 2025 at 06:01:08PM +0100, Robin Murphy wrote:
>>> Sampling is inherently a feature for CPU PMUs, given that the thing
>>> to be sampled is a CPU context. These days, we have many more
>>> uncore/system PMUs than CPU PMUs, so it no longer makes much sense to
>>> assume sampling support by default and force the ever-growing majority
>>> of drivers to opt out of it (or erroneously fail to). Instead, let's
>>> introduce a positive opt-in capability that's more obvious and easier to
>>> maintain.
>>
>>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>>> index 4d439c24c901..bf2cfbeabba2 100644
>>> --- a/include/linux/perf_event.h
>>> +++ b/include/linux/perf_event.h
>>> @@ -294,7 +294,7 @@ struct perf_event_pmu_context;
>>> /**
>>> * pmu::capabilities flags
>>> */
>>> -#define PERF_PMU_CAP_NO_INTERRUPT 0x0001
>>> +#define PERF_PMU_CAP_SAMPLING 0x0001
>>> #define PERF_PMU_CAP_NO_NMI 0x0002
>>> #define PERF_PMU_CAP_AUX_NO_SG 0x0004
>>> #define PERF_PMU_CAP_EXTENDED_REGS 0x0008
>>> @@ -305,6 +305,7 @@ struct perf_event_pmu_context;
>>> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
>>> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
>>> #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
>>> +#define PERF_PMU_CAP_NO_INTERRUPT 0x0800
>>
>> So NO_INTERRUPT was supposed to be the negative of your new SAMPLING
>> (and I agree with your reasoning).
>>
>> What I'm confused/curious about is why we retain NO_INTERRUPT?
>
> I see from your other reply that you spotted the next patch does that.
>
> For the sake of other reviewers or anyone digging through the git
> history it's probably worth adding a line to this commit message to say:
>
> | A subsequent patch will remove PERF_PMU_CAP_NO_INTERRUPT as this
> | requires some additional cleanup.
Yup, the main reason is the set of drivers getting the new cap is
smaller than the set of drivers currently not rejecting sampling events,
so I wanted it to be clearly visible in the patch. Indeed I shall
clarify the relationship to NO_INTERRUPT in the commit message.
Thanks,
Robin.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 12/19] perf: Ignore event state for group validation
2025-08-26 15:32 ` Robin Murphy
@ 2025-08-26 18:48 ` Ian Rogers
2025-08-27 8:18 ` Mark Rutland
0 siblings, 1 reply; 52+ messages in thread
From: Ian Rogers @ 2025-08-26 18:48 UTC (permalink / raw)
To: Robin Murphy
Cc: Peter Zijlstra, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
On Tue, Aug 26, 2025 at 8:32 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2025-08-26 2:03 pm, Peter Zijlstra wrote:
> > On Wed, Aug 13, 2025 at 06:01:04PM +0100, Robin Murphy wrote:
> >> It may have been different long ago, but today it seems wrong for these
> >> drivers to skip counting disabled sibling events in group validation,
> >> given that perf_event_enable() could make them schedulable again, and
> >> thus increase the effective size of the group later. Conversely, if a
> >> sibling event is truly dead then it stands to reason that the whole
> >> group is dead, so it's not worth going to any special effort to try to
> >> squeeze in a new event that's never going to run anyway. Thus, we can
> >> simply remove all these checks.
> >
> > So currently you can do sort of a manual event rotation inside an
> > over-sized group and have it work.
> >
> > I'm not sure if anybody actually does this, but its possible.
> >
> > Eg. on a PMU that supports only 4 counters, create a group of 5 and
> > periodically cycle which of the 5 events is off.
I'm not sure this is true, I thought this would fail in the
perf_event_open when adding the 5th event and there being insufficient
counters for the group. Not all PMUs validate a group will fit on the
counters, but I thought at least Intel's core PMU would validate and
not allow this. Fwiw, the metric code is reliant on this behavior as
by default all events are placed into a weak group:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/metricgroup.c?h=perf-tools-next#n631
Weak groups are really just groups that when the perf_event_open fails
retry with the grouping removed. PMUs that don't fail the
perf_event_open are problematic as the reads just report "not counted"
and the metric doesn't work. Sometimes the PMU can't help it due to
errata. There are a bunch of workarounds for those cases carried in
the perf tool, but in general weak groups working is relied upon:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/pmu-events.h?h=perf-tools-next#n16
Thanks,
Ian
> > So I'm not against changing this, but changing stuff like this always
> > makes me a little fearful -- it wouldn't be the first time that when it
> > finally trickles down to some 'enterprise' user in 5 years someone comes
> > and finally says, oh hey, you broke my shit :-(
>
> Eww, I see what you mean... and I guess that's probably lower-overhead
> than actually deleting and recreating the sibling event(s) each time,
> and potentially less bother then wrangling multiple groups for different
> combinations of subsets when one simply must still approximate a complex
> metric that requires more counters than the hardware offers.
>
> I'm also not keen to break anything that wasn't already somewhat broken,
> especially since this patch is only intended as cleanup, so either we
> could just drop it altogether, or perhaps I can wrap the existing
> behaviour in a helper that can at least document this assumption and
> discourage new drivers from copying it. Am I right that only
> PERF_EVENT_STATE_{OFF,ERROR} would matter for this, though, and my
> reasoning for state <= PERF_EVENT_STATE_EXIT should still stand? As for
> the fiddly discrepancy with enable_on_exec between arm_pmu and others
> I'm not really sure what to think...
>
> Thanks,
> Robin.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 18/19] perf: Introduce positive capability for raw events
2025-08-26 13:43 ` Mark Rutland
@ 2025-08-26 22:46 ` Robin Murphy
2025-08-27 8:04 ` Mark Rutland
2025-08-27 5:27 ` Thomas Richter
1 sibling, 1 reply; 52+ messages in thread
From: Robin Murphy @ 2025-08-26 22:46 UTC (permalink / raw)
To: Mark Rutland
Cc: peterz, mingo, will, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On 2025-08-26 2:43 pm, Mark Rutland wrote:
> On Wed, Aug 13, 2025 at 06:01:10PM +0100, Robin Murphy wrote:
>> Only a handful of CPU PMUs accept PERF_TYPE_{RAW,HARDWARE,HW_CACHE}
>> events without registering themselves as PERF_TYPE_RAW in the first
>> place. Add an explicit opt-in for these special cases, so that we can
>> make life easier for every other driver (and probably also speed up the
>> slow-path search) by having perf_try_init_event() do the basic type
>> checking to cover the majority of cases.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>
>
> To bikeshed a little here, I'm not keen on the PERF_PMU_CAP_RAW_EVENTS
> name, because it's not clear what "RAW" really means, and people will
> definitely read that to mean something else.
>
> Could we go with something like PERF_PMU_CAP_COMMON_CPU_EVENTS, to make
> it clear that this is about opting into CPU-PMU specific event types (of
> which PERF_TYPE_RAW is one of)?
Indeed I started with that very intention after our previous discussion,
but soon realised that in fact nowhere in the code is there any
definition or even established notion of what "common" means in this
context, so it's hardly immune to misinterpretation either. Furthermore
the semantics of the cap as it ended up are specifically that the PMU
wants the same behaviour as if it had registered as PERF_TYPE_RAW, so
having "raw" in the name started to look like the more intuitive option
after all (plus being nice and short helps.)
If anything, it's "events" that carries the implication that's proving
hard to capture precisely and concisely here, so maybe the answer to
avoid ambiguity is to lean further away from a "what it represents" to a
"what it actually does" naming - PERF_PMU_CAP_TYPE_RAW, anyone?
> Likewise, s/is_raw_pmu()/pmu_supports_common_cpu_events()/.
Case in point: is it any more logical and expected that supporting
common CPU events implies a PMU should be offered software or breakpoint
events as well? Because that's what such a mere rename would currently
mean :/
>> ---
>>
>> A further possibility is to automatically add the cap to PERF_TYPE_RAW
>> PMUs in perf_pmu_register() to have a single point-of-use condition; I'm
>> undecided...
>
> I reckon we don't need to automagically do that, but I reckon that
> is_raw_pmu()/pmu_supports_common_cpu_events() should only check the cap,
> and we don't read anything special into any of
> PERF_TYPE_{RAW,HARDWARE,HW_CACHE}.
OK, but that would then necessitate having to explicitly add the cap to
all 15-odd other drivers which register as PERF_TYPE_RAW as well, at
which point it starts to look like a more general "I am a CPU PMU in
terms of most typical assumptions you might want to make about that" flag...
To clarify (and perhaps something for a v2 commit message), we currently
have 3 categories of PMU driver:
1: (Older/simpler CPUs) Registers as PERF_TYPE_RAW, wants
PERF_TYPE_RAW/HARDWARE/HW_CACHE events
2: (Heterogeneous CPUs) Registers as dynamic type, wants
PERF_TYPE_RAW/HARDWARE/HW_CACHE events plus events of its own type
3: (Mostly uncore) Registers as dynamic type, only wants events of its
own type
My vested interest is in making category 3 the default behaviour, given
that the growing majority of new drivers are uncore (and I keep having
to write them...) However unclear the type overlaps in category 1 might
be, it's been like that for 15 years, so I didn't feel compelled to
churn fossils like Alpha more than reasonably necessary. Category 2 is
only these 5 drivers, so a relatively small tweak to distinguish them
from category 3 and let them retain the effective category 1 behaviour
(which remains the current one of potentially still being offered
software etc. events too) seemed like the neatest way to make progress.
I'm not saying I'm necessarily against a general overhaul of CPU PMUs
being attempted too, just that it seems more like a whole other
side-quest, and I'd really like to slay the uncore-boilerplate dragon first.
>> ---
>> arch/s390/kernel/perf_cpum_cf.c | 1 +
>> arch/s390/kernel/perf_pai_crypto.c | 2 +-
>> arch/s390/kernel/perf_pai_ext.c | 2 +-
>> arch/x86/events/core.c | 2 +-
>> drivers/perf/arm_pmu.c | 1 +
>> include/linux/perf_event.h | 1 +
>> kernel/events/core.c | 15 +++++++++++++++
>> 7 files changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
>> index 1a94e0944bc5..782ab755ddd4 100644
>> --- a/arch/s390/kernel/perf_cpum_cf.c
>> +++ b/arch/s390/kernel/perf_cpum_cf.c
>> @@ -1054,6 +1054,7 @@ static void cpumf_pmu_del(struct perf_event *event, int flags)
>> /* Performance monitoring unit for s390x */
>> static struct pmu cpumf_pmu = {
>> .task_ctx_nr = perf_sw_context,
>> + .capabilities = PERF_PMU_CAP_RAW_EVENTS,
>> .pmu_enable = cpumf_pmu_enable,
>> .pmu_disable = cpumf_pmu_disable,
>> .event_init = cpumf_pmu_event_init,
>
> Tangential, but use of perf_sw_context here looks bogus.
Indeed, according to the history it was intentional, but perhaps that no
longer applies since the big context redesign? FWIW there seem to be a
fair few instances of this, including Arm SPE.
Thanks,
Robin.
>> diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c
>> index a64b6b056a21..b5b6d8b5d943 100644
>> --- a/arch/s390/kernel/perf_pai_crypto.c
>> +++ b/arch/s390/kernel/perf_pai_crypto.c
>> @@ -569,7 +569,7 @@ static const struct attribute_group *paicrypt_attr_groups[] = {
>> /* Performance monitoring unit for mapped counters */
>> static struct pmu paicrypt = {
>> .task_ctx_nr = perf_hw_context,
>> - .capabilities = PERF_PMU_CAP_SAMPLING,
>> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
>> .event_init = paicrypt_event_init,
>> .add = paicrypt_add,
>> .del = paicrypt_del,
>> diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c
>> index 1261f80c6d52..bcd28c38da70 100644
>> --- a/arch/s390/kernel/perf_pai_ext.c
>> +++ b/arch/s390/kernel/perf_pai_ext.c
>> @@ -595,7 +595,7 @@ static const struct attribute_group *paiext_attr_groups[] = {
>> /* Performance monitoring unit for mapped counters */
>> static struct pmu paiext = {
>> .task_ctx_nr = perf_hw_context,
>> - .capabilities = PERF_PMU_CAP_SAMPLING,
>> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
>> .event_init = paiext_event_init,
>> .add = paiext_add,
>> .del = paiext_del,
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index 789dfca2fa67..764728bb80ae 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -2697,7 +2697,7 @@ static bool x86_pmu_filter(struct pmu *pmu, int cpu)
>> }
>>
>> static struct pmu pmu = {
>> - .capabilities = PERF_PMU_CAP_SAMPLING,
>> + .capabilities = PERF_PMU_CAP_SAMPLING | PERF_PMU_CAP_RAW_EVENTS,
>>
>> .pmu_enable = x86_pmu_enable,
>> .pmu_disable = x86_pmu_disable,
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 72d8f38d0aa5..bc772a3bf411 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -877,6 +877,7 @@ struct arm_pmu *armpmu_alloc(void)
>> * specific PMU.
>> */
>> .capabilities = PERF_PMU_CAP_SAMPLING |
>> + PERF_PMU_CAP_RAW_EVENTS |
>> PERF_PMU_CAP_EXTENDED_REGS |
>> PERF_PMU_CAP_EXTENDED_HW_TYPE,
>> };
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 183b7c48b329..c6ad036c0037 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -305,6 +305,7 @@ struct perf_event_pmu_context;
>> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
>> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
>> #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
>> +#define PERF_PMU_CAP_RAW_EVENTS 0x0800
>>
>> /**
>> * pmu::scope
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 71b2a6730705..2ecee76d2ae2 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -12556,11 +12556,26 @@ static inline bool has_extended_regs(struct perf_event *event)
>> (event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK);
>> }
>>
>> +static bool is_raw_pmu(const struct pmu *pmu)
>> +{
>> + return pmu->type == PERF_TYPE_RAW ||
>> + pmu->capabilities & PERF_PMU_CAP_RAW_EVENTS;
>> +}
>
> As above, I reckon we should make this:
>
> static bool pmu_supports_common_cpu_events(const struct pmu *pmu)
> {
> return pmu->capabilities & PERF_PMU_CAP_RAW_EVENTS;
> }
>
> Other than the above, this looks good to me.
>
> Mark.
>
>> +
>> static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
>> {
>> struct perf_event_context *ctx = NULL;
>> int ret;
>>
>> + /*
>> + * Before touching anything, we can safely skip:
>> + * - any event for a specific PMU which is not this one
>> + * - any common event if this PMU doesn't support them
>> + */
>> + if (event->attr.type != pmu->type &&
>> + (event->attr.type >= PERF_TYPE_MAX || is_raw_pmu(pmu)))
>> + return -ENOENT;
>> +
>> if (!try_module_get(pmu->module))
>> return -ENODEV;
>>
>> --
>> 2.39.2.101.g768bb238c484.dirty
>>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 18/19] perf: Introduce positive capability for raw events
2025-08-26 13:43 ` Mark Rutland
2025-08-26 22:46 ` Robin Murphy
@ 2025-08-27 5:27 ` Thomas Richter
1 sibling, 0 replies; 52+ messages in thread
From: Thomas Richter @ 2025-08-27 5:27 UTC (permalink / raw)
To: Mark Rutland, Robin Murphy, Sumanth Korikkar, Jan Polensky
Cc: peterz, mingo, will, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On 8/26/25 15:43, Mark Rutland wrote:
> On Wed, Aug 13, 2025 at 06:01:10PM +0100, Robin Murphy wrote:
>> Only a handful of CPU PMUs accept PERF_TYPE_{RAW,HARDWARE,HW_CACHE}
>> events without registering themselves as PERF_TYPE_RAW in the first
>> place. Add an explicit opt-in for these special cases, so that we can
>> make life easier for every other driver (and probably also speed up the
>> slow-path search) by having perf_try_init_event() do the basic type
>> checking to cover the majority of cases.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>
>
> To bikeshed a little here, I'm not keen on the PERF_PMU_CAP_RAW_EVENTS
> name, because it's not clear what "RAW" really means, and people will
> definitely read that to mean something else.
>
> Could we go with something like PERF_PMU_CAP_COMMON_CPU_EVENTS, to make
> it clear that this is about opting into CPU-PMU specific event types (of
> which PERF_TYPE_RAW is one of)?
>
> Likewise, s/is_raw_pmu()/pmu_supports_common_cpu_events()/.
>
>> ---
>>
>> A further possibility is to automatically add the cap to PERF_TYPE_RAW
>> PMUs in perf_pmu_register() to have a single point-of-use condition; I'm
>> undecided...
>
> I reckon we don't need to automagically do that, but I reckon that
> is_raw_pmu()/pmu_supports_common_cpu_events() should only check the cap,
> and we don't read anything special into any of
> PERF_TYPE_{RAW,HARDWARE,HW_CACHE}.
>
>> ---
>> arch/s390/kernel/perf_cpum_cf.c | 1 +
>> arch/s390/kernel/perf_pai_crypto.c | 2 +-
>> arch/s390/kernel/perf_pai_ext.c | 2 +-
>> arch/x86/events/core.c | 2 +-
>> drivers/perf/arm_pmu.c | 1 +
>> include/linux/perf_event.h | 1 +
>> kernel/events/core.c | 15 +++++++++++++++
>> 7 files changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
>> index 1a94e0944bc5..782ab755ddd4 100644
>> --- a/arch/s390/kernel/perf_cpum_cf.c
>> +++ b/arch/s390/kernel/perf_cpum_cf.c
>> @@ -1054,6 +1054,7 @@ static void cpumf_pmu_del(struct perf_event *event, int flags)
>> /* Performance monitoring unit for s390x */
>> static struct pmu cpumf_pmu = {
>> .task_ctx_nr = perf_sw_context,
>> + .capabilities = PERF_PMU_CAP_RAW_EVENTS,
>> .pmu_enable = cpumf_pmu_enable,
>> .pmu_disable = cpumf_pmu_disable,
>> .event_init = cpumf_pmu_event_init,
>
> Tangential, but use of perf_sw_context here looks bogus.
>
It might look strange, but it was done on purpose. For details see
commit 9254e70c4ef1 ("s390/cpum_cf: use perf software context for hardware counters")
Background was a WARN_ON() statement which fired, because several PMU device drivers
existed in parallel on s390x platform.
Not sure if this condition is still true after all these years...
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 18/19] perf: Introduce positive capability for raw events
2025-08-26 22:46 ` Robin Murphy
@ 2025-08-27 8:04 ` Mark Rutland
0 siblings, 0 replies; 52+ messages in thread
From: Mark Rutland @ 2025-08-27 8:04 UTC (permalink / raw)
To: Robin Murphy
Cc: peterz, mingo, will, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On Tue, Aug 26, 2025 at 11:46:02PM +0100, Robin Murphy wrote:
> On 2025-08-26 2:43 pm, Mark Rutland wrote:
> > On Wed, Aug 13, 2025 at 06:01:10PM +0100, Robin Murphy wrote:
> > To bikeshed a little here, I'm not keen on the PERF_PMU_CAP_RAW_EVENTS
> > name, because it's not clear what "RAW" really means, and people will
> > definitely read that to mean something else.
> >
> > Could we go with something like PERF_PMU_CAP_COMMON_CPU_EVENTS, to make
> > it clear that this is about opting into CPU-PMU specific event types (of
> > which PERF_TYPE_RAW is one of)?
>
> Indeed I started with that very intention after our previous discussion, but
> soon realised that in fact nowhere in the code is there any definition or
> even established notion of what "common" means in this context, so it's
> hardly immune to misinterpretation either.
We can document that; it's everything less than PERF_TYPE_MAX:
enum perf_type_id {
PERF_TYPE_HARDWARE = 0,
PERF_TYPE_SOFTWARE = 1,
PERF_TYPE_TRACEPOINT = 2,
PERF_TYPE_HW_CACHE = 3,
PERF_TYPE_RAW = 4,
PERF_TYPE_BREAKPOINT = 5,
PERF_TYPE_MAX, /* non-ABI */
};
... and maybe you could use "PERF_PMU_CAP_ABI_TYPES" to align with that
comment?
> Furthermore the semantics of the cap as it ended up are specifically
> that the PMU wants the same behaviour as if it had registered as
> PERF_TYPE_RAW, so having "raw" in the name started to look like the
> more intuitive option after all (plus being nice and short helps.)
I appreciate the shortness, but I think it's misleading to tie this to
"RAW" specifically, when really this is a capabiltiy to say "please let
me try to init any events for non-dynamic types, in addition to whatever
specific type I am registered with".
> If anything, it's "events" that carries the implication that's proving hard
> to capture precisely and concisely here, so maybe the answer to avoid
> ambiguity is to lean further away from a "what it represents" to a "what it
> actually does" naming - PERF_PMU_CAP_TYPE_RAW, anyone?
I'm happy with TYPE in the name; it's just RAW specifically that I'm
objecting to.
> > Likewise, s/is_raw_pmu()/pmu_supports_common_cpu_events()/.
>
> Case in point: is it any more logical and expected that supporting common
> CPU events implies a PMU should be offered software or breakpoint events as
> well? Because that's what such a mere rename would currently mean :/
Yes, I think it is.
> > > ---
> > >
> > > A further possibility is to automatically add the cap to PERF_TYPE_RAW
> > > PMUs in perf_pmu_register() to have a single point-of-use condition; I'm
> > > undecided...
> >
> > I reckon we don't need to automagically do that, but I reckon that
> > is_raw_pmu()/pmu_supports_common_cpu_events() should only check the cap,
> > and we don't read anything special into any of
> > PERF_TYPE_{RAW,HARDWARE,HW_CACHE}.
>
> OK, but that would then necessitate having to explicitly add the cap to all
> 15-odd other drivers which register as PERF_TYPE_RAW as well, at which point
> it starts to look like a more general "I am a CPU PMU in terms of most
> typical assumptions you might want to make about that" flag...
>
> To clarify (and perhaps something for a v2 commit message), we currently
> have 3 categories of PMU driver:
>
> 1: (Older/simpler CPUs) Registers as PERF_TYPE_RAW, wants
> PERF_TYPE_RAW/HARDWARE/HW_CACHE events
> 2: (Heterogeneous CPUs) Registers as dynamic type, wants
> PERF_TYPE_RAW/HARDWARE/HW_CACHE events plus events of its own type
> 3: (Mostly uncore) Registers as dynamic type, only wants events of its own
> type
Sure, but I think that separating 1 and 2 is an artificial distinction,
and what we really have is:
(a) Wants to handle (some of) the non-dynamic/common/ABI types (in
addition to whatever specific type it was registered with). Needs to
have a switch statement somewhere in pmu::event_init().
(b) Only wants to handle the specific type the PMU was registered with.
> My vested interest is in making category 3 the default behaviour, given that
> the growing majority of new drivers are uncore (and I keep having to write
> them...)
Yes, we're aligned on that.
> However unclear the type overlaps in category 1 might be, it's been
> like that for 15 years, so I didn't feel compelled to churn fossils like
> Alpha more than reasonably necessary. Category 2 is only these 5 drivers, so
> a relatively small tweak to distinguish them from category 3 and let them
> retain the effective category 1 behaviour (which remains the current one of
> potentially still being offered software etc. events too) seemed like the
> neatest way to make progress.
I just think we should combine 1 and 2 (into categroy a as above), since
that removes the need to treat RAW specially going forwards.
> I'm not saying I'm necessarily against a general overhaul of CPU PMUs being
> attempted too, just that it seems more like a whole other side-quest, and
> I'd really like to slay the uncore-boilerplate dragon first.
I think that adding the cap to those 15 PMUs would take less time than
it has taken me to write this email, so I do not understand the
objection.
Mark.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 16/19] perf: Introduce positive capability for sampling
2025-08-26 15:53 ` Robin Murphy
@ 2025-08-27 8:06 ` Leo Yan
0 siblings, 0 replies; 52+ messages in thread
From: Leo Yan @ 2025-08-27 8:06 UTC (permalink / raw)
To: Robin Murphy
Cc: peterz, mingo, will, mark.rutland, acme, namhyung,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
On Tue, Aug 26, 2025 at 04:53:51PM +0100, Robin Murphy wrote:
[...]
> > Genearlly, AUX events generate interrupts based on AUX ring buffer
> > watermark but not the period. Seems to me, it is correct to set the
> > PERF_PMU_CAP_SAMPLING flag for them.
>
> This cap is given to drivers which handle event->attr.sample_period and call
> perf_event_overflow() - or in a few rare cases, perf_output_sample()
> directly - to do something meaningful with it, since the intent is to convey
> "I properly handle events for which is_sampling_event() is true". My
> understanding is that aux events are something else entirely, but I'm happy
> to be corrected.
If the discussion is based only on this patch, my understanding is
that the PERF_PMU_CAP_SAMPLING flag replaces the
PERF_PMU_CAP_NO_INTERRUPT flag for checking whether a PMU event needs
to be re-enabled in perf_adjust_freq_unthr_context().
AUX events can trigger a large number of interrupts under certain
conditions (e.g., if we set a very small watermark). This is why I
conclude that we need to set the PERF_PMU_CAP_SAMPLING flag to ensure
that AUX events are re-enabled properly after throttling (see
perf_adjust_freq_unthr_events()).
> Otherwise, perhaps this suggests it deserves to be named a little more
> specifically for clarity, maybe PERF_CAP_SAMPLING_EVENTS?
Seems to me, the naming is not critical. If without setting the
PERF_PMU_CAP_SAMPLING flag, AUX events might lose chance to be
re-enabled after throttling.
Thanks,
Leo
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 12/19] perf: Ignore event state for group validation
2025-08-26 18:48 ` Ian Rogers
@ 2025-08-27 8:18 ` Mark Rutland
2025-08-27 15:15 ` Ian Rogers
0 siblings, 1 reply; 52+ messages in thread
From: Mark Rutland @ 2025-08-27 8:18 UTC (permalink / raw)
To: Ian Rogers
Cc: Robin Murphy, Peter Zijlstra, mingo, will, acme, namhyung,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
On Tue, Aug 26, 2025 at 11:48:48AM -0700, Ian Rogers wrote:
> On Tue, Aug 26, 2025 at 8:32 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > On 2025-08-26 2:03 pm, Peter Zijlstra wrote:
> > > On Wed, Aug 13, 2025 at 06:01:04PM +0100, Robin Murphy wrote:
> > >> It may have been different long ago, but today it seems wrong for these
> > >> drivers to skip counting disabled sibling events in group validation,
> > >> given that perf_event_enable() could make them schedulable again, and
> > >> thus increase the effective size of the group later. Conversely, if a
> > >> sibling event is truly dead then it stands to reason that the whole
> > >> group is dead, so it's not worth going to any special effort to try to
> > >> squeeze in a new event that's never going to run anyway. Thus, we can
> > >> simply remove all these checks.
> > >
> > > So currently you can do sort of a manual event rotation inside an
> > > over-sized group and have it work.
> > >
> > > I'm not sure if anybody actually does this, but its possible.
> > >
> > > Eg. on a PMU that supports only 4 counters, create a group of 5 and
> > > periodically cycle which of the 5 events is off.
>
> I'm not sure this is true, I thought this would fail in the
> perf_event_open when adding the 5th event and there being insufficient
> counters for the group.
We're talking specifically about cases where the logic in a pmu's
pmu::event_init() callback doesn't count events in specific states, and
hence the 5th even doesn't get rejected when it is initialised.
For example, in arch/x86/events/core.c, validate_group() uses
collect_events(), which has:
for_each_sibling_event(event, leader) {
if (!is_x86_event(event) || event->state <= PERF_EVENT_STATE_OFF)
continue;
if (collect_event(cpuc, event, max_count, n))
return -EINVAL;
n++;
}
... and so where an event's state is <= PERF_EVENT_STATE_OFF at init
time, that event is not counted to see if it fits into HW counters.
Mark.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 02/19] perf/hisilicon: Fix group validation
2025-08-26 15:31 ` Mark Rutland
2025-08-26 15:55 ` Mark Rutland
@ 2025-08-27 14:03 ` Mark Rutland
1 sibling, 0 replies; 52+ messages in thread
From: Mark Rutland @ 2025-08-27 14:03 UTC (permalink / raw)
To: Robin Murphy
Cc: peterz, mingo, will, acme, namhyung, alexander.shishkin, jolsa,
irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
linux-alpha, linux-snps-arc, linux-arm-kernel, imx, linux-csky,
loongarch, linux-mips, linuxppc-dev, linux-s390, linux-sh,
sparclinux, linux-pm, linux-rockchip, dmaengine, linux-fpga,
amd-gfx, dri-devel, intel-gfx, intel-xe, coresight, iommu,
linux-amlogic, linux-cxl, linux-arm-msm, linux-riscv
On Tue, Aug 26, 2025 at 04:31:23PM +0100, Mark Rutland wrote:
> On Tue, Aug 26, 2025 at 03:35:48PM +0100, Robin Murphy wrote:
> > On 2025-08-26 12:15 pm, Mark Rutland wrote:
> > > On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
> > > > diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > index c5394d007b61..3b0b2f7197d0 100644
> > > > --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > @@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
> > > > int counters = 1;
> > > > int num;
> > > > - event_group[0] = leader;
> > > > - if (!is_software_event(leader)) {
> > > > - if (leader->pmu != event->pmu)
> > > > - return false;
> > > > + if (leader == event)
> > > > + return true;
> > > > - if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
> > > > - event_group[counters++] = event;
> > > > - }
> > > > + event_group[0] = event;
> > > > + if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
> > > > + event_group[counters++] = leader;
> > >
> > > Looking at this, the existing logic to share counters (which
> > > hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given
> > > that the start/stop callbacks will reprogram the HW counters (and hence
> > > can fight with one another).
> > It does seem somewhat nonsensical to have multiple copies of the same event
> > in the same group, but I imagine it could happen with some sort of scripted
> > combination of metrics, and supporting it at this level saves needing
> > explicit deduplication further up. So even though my initial instinct was to
> > rip it out too, in the end I concluded that that doesn't seem justified.
>
> As above, I think it's clearly bogus. I don't think we should have
> merged it as-is and it's not something I'd like to see others copy.
> Other PMUs don't do this sort of event deduplication, and in general it
> should be up to the user or userspace software to do that rather than
> doing that badly in the kernel.
>
> Given it was implemented with no rationale I think we should rip it out.
> If that breaks someone's scripting, then we can consider implementing
> something that actually works.
Having dug some more, I see that this was intended to handle the way
the hardware shares a single config register between pairs of counter
and counter_ext registers, with the idea being that two related events
could be allocated into the same counter pair (but would only occupy a
single counter each).
I still think the code is wrong, but it is more complex than I made it
out to be, and you're right that we should leave it as-is for now. I can
follow up after we've got this series in.
Mark.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 12/19] perf: Ignore event state for group validation
2025-08-27 8:18 ` Mark Rutland
@ 2025-08-27 15:15 ` Ian Rogers
0 siblings, 0 replies; 52+ messages in thread
From: Ian Rogers @ 2025-08-27 15:15 UTC (permalink / raw)
To: Mark Rutland
Cc: Robin Murphy, Peter Zijlstra, mingo, will, acme, namhyung,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, linux-alpha, linux-snps-arc,
linux-arm-kernel, imx, linux-csky, loongarch, linux-mips,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-pm,
linux-rockchip, dmaengine, linux-fpga, amd-gfx, dri-devel,
intel-gfx, intel-xe, coresight, iommu, linux-amlogic, linux-cxl,
linux-arm-msm, linux-riscv
On Wed, Aug 27, 2025 at 1:18 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Aug 26, 2025 at 11:48:48AM -0700, Ian Rogers wrote:
> > On Tue, Aug 26, 2025 at 8:32 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > >
> > > On 2025-08-26 2:03 pm, Peter Zijlstra wrote:
> > > > On Wed, Aug 13, 2025 at 06:01:04PM +0100, Robin Murphy wrote:
> > > >> It may have been different long ago, but today it seems wrong for these
> > > >> drivers to skip counting disabled sibling events in group validation,
> > > >> given that perf_event_enable() could make them schedulable again, and
> > > >> thus increase the effective size of the group later. Conversely, if a
> > > >> sibling event is truly dead then it stands to reason that the whole
> > > >> group is dead, so it's not worth going to any special effort to try to
> > > >> squeeze in a new event that's never going to run anyway. Thus, we can
> > > >> simply remove all these checks.
> > > >
> > > > So currently you can do sort of a manual event rotation inside an
> > > > over-sized group and have it work.
> > > >
> > > > I'm not sure if anybody actually does this, but its possible.
> > > >
> > > > Eg. on a PMU that supports only 4 counters, create a group of 5 and
> > > > periodically cycle which of the 5 events is off.
> >
> > I'm not sure this is true, I thought this would fail in the
> > perf_event_open when adding the 5th event and there being insufficient
> > counters for the group.
>
> We're talking specifically about cases where the logic in a pmu's
> pmu::event_init() callback doesn't count events in specific states, and
> hence the 5th even doesn't get rejected when it is initialised.
>
> For example, in arch/x86/events/core.c, validate_group() uses
> collect_events(), which has:
>
> for_each_sibling_event(event, leader) {
> if (!is_x86_event(event) || event->state <= PERF_EVENT_STATE_OFF)
> continue;
>
> if (collect_event(cpuc, event, max_count, n))
> return -EINVAL;
>
> n++;
> }
>
> ... and so where an event's state is <= PERF_EVENT_STATE_OFF at init
> time, that event is not counted to see if it fits into HW counters.
Hmm.. Thinking out loud. So it looked like perf with weak groups could
be broken then:
```
$ sudo perf stat -vv -e '{instructions,cycles}:W' true
...
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0x400000001
(cpu_core/PERF_COUNT_HW_INSTRUCTIONS/)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 3337764 cpu -1 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0x400000000
(cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 3337764 cpu -1 group_fd 5 flags 0x8 = 7
...
```
Note, the group leader (instructions) is disabled because of:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/stat.c?h=perf-tools-next#n761
```
/*
* Disabling all counters initially, they will be enabled
* either manually by us or by kernel via enable_on_exec
* set later.
*/
if (evsel__is_group_leader(evsel)) {
attr->disabled = 1;
```
but the checking of being disabled (PERF_EVENT_STATE_OFF) is only done
on siblings in the code you show above. So yes, you can disable the
group events to allow the perf_event_open to succeed but not on the
leader which is always checked (no PERF_EVENT_STATE_OFF check):
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/arch/x86/events/core.c?h=perf-tools-next#n1204
```
if (is_x86_event(leader)) {
if (collect_event(cpuc, leader, max_count, n))
return -EINVAL;
```
Thanks,
Ian
^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2025-08-27 15:15 UTC | newest]
Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13 17:00 [PATCH 00/19] perf: Rework event_init checks Robin Murphy
2025-08-13 17:00 ` [PATCH 01/19] perf/arm-cmn: Fix event validation Robin Murphy
2025-08-26 10:46 ` Mark Rutland
2025-08-13 17:00 ` [PATCH 02/19] perf/hisilicon: Fix group validation Robin Murphy
2025-08-26 11:15 ` Mark Rutland
2025-08-26 13:18 ` Mark Rutland
2025-08-26 14:35 ` Robin Murphy
2025-08-26 15:31 ` Mark Rutland
2025-08-26 15:55 ` Mark Rutland
2025-08-27 14:03 ` Mark Rutland
2025-08-13 17:00 ` [PATCH 03/19] perf/imx8_ddr: " Robin Murphy
2025-08-13 17:00 ` [PATCH 04/19] perf/starfive: " Robin Murphy
2025-08-13 17:00 ` [PATCH 05/19] iommu/vt-d: Fix perfmon " Robin Murphy
2025-08-13 17:00 ` [PATCH 06/19] ARM: l2x0: Fix " Robin Murphy
2025-08-13 17:00 ` [PATCH 07/19] ARM: imx: Fix MMDC PMU " Robin Murphy
2025-08-13 17:01 ` [PATCH 08/19] perf/arm_smmu_v3: Improve " Robin Murphy
2025-08-13 17:01 ` [PATCH 09/19] perf/qcom: " Robin Murphy
2025-08-13 17:01 ` [PATCH 10/19] perf/arm-ni: Improve event validation Robin Murphy
2025-08-13 17:01 ` [PATCH 11/19] perf/arm-cci: Tidy up " Robin Murphy
2025-08-13 17:01 ` [PATCH 12/19] perf: Ignore event state for group validation Robin Murphy
2025-08-26 13:03 ` Peter Zijlstra
2025-08-26 15:32 ` Robin Murphy
2025-08-26 18:48 ` Ian Rogers
2025-08-27 8:18 ` Mark Rutland
2025-08-27 15:15 ` Ian Rogers
2025-08-13 17:01 ` [PATCH 13/19] perf: Add helper for checking grouped events Robin Murphy
2025-08-14 5:43 ` kernel test robot
2025-08-13 17:01 ` [PATCH 14/19] perf: Clean up redundant group validation Robin Murphy
2025-08-13 17:01 ` [PATCH 15/19] perf: Simplify " Robin Murphy
2025-08-13 17:01 ` [PATCH 16/19] perf: Introduce positive capability for sampling Robin Murphy
2025-08-26 13:08 ` Peter Zijlstra
2025-08-26 13:28 ` Mark Rutland
2025-08-26 16:35 ` Robin Murphy
2025-08-26 13:11 ` Leo Yan
2025-08-26 15:53 ` Robin Murphy
2025-08-27 8:06 ` Leo Yan
2025-08-13 17:01 ` [PATCH 17/19] perf: Retire PERF_PMU_CAP_NO_INTERRUPT Robin Murphy
2025-08-26 13:08 ` Peter Zijlstra
2025-08-13 17:01 ` [PATCH 18/19] perf: Introduce positive capability for raw events Robin Murphy
2025-08-19 13:15 ` Robin Murphy
2025-08-20 8:09 ` Thomas Richter
2025-08-20 11:39 ` Robin Murphy
2025-08-21 2:53 ` kernel test robot
2025-08-26 13:43 ` Mark Rutland
2025-08-26 22:46 ` Robin Murphy
2025-08-27 8:04 ` Mark Rutland
2025-08-27 5:27 ` Thomas Richter
2025-08-13 17:01 ` [PATCH 19/19] perf: Garbage-collect event_init checks Robin Murphy
2025-08-14 8:04 ` kernel test robot
2025-08-19 2:44 ` kernel test robot
2025-08-19 17:49 ` Robin Murphy
2025-08-19 13:25 ` Robin Murphy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).