* [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
@ 2026-05-03 15:51 Aviv Bakal
2026-05-03 16:20 ` sashiko-bot
2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal
0 siblings, 2 replies; 28+ messages in thread
From: Aviv Bakal @ 2026-05-03 15:51 UTC (permalink / raw)
To: robin.murphy, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev,
blakgeof
Graviton5 uses a customised CMN-S3 implementation where certain
discovery registers report zeroed fields. Add the following workarounds:
- Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant.
- Derive the DTC domain from the XP node ID, since the unit info
register reports it as zero.
- Set the DTC logical ID from the XP's logical ID, since the node info
register's logical ID field is also zeroed.
Signed-off-by: Aviv Bakal <avivb@amazon.com>
---
drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index f5305c8fdca4..368fe1a86bfb 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -31,7 +31,8 @@
#define CMN_CHILD_NODE_ADDR GENMASK(29, 0)
#define CMN_CHILD_NODE_EXTERNAL BIT(31)
-#define CMN_MAX_DIMENSION 12
+/* Some implementations use a mesh larger than the architectural max of 12 */
+#define CMN_MAX_DIMENSION 14
#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
#define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4)
@@ -214,6 +215,8 @@ enum cmn_part {
PART_CMN700 = 0x43c,
PART_CI700 = 0x43a,
PART_CMN_S3 = 0x43e,
+ /* Synthetic part number, overridden to PART_CMN_S3 during discovery */
+ PART_GRAVITON5 = 0xa5,
};
/* CMN-600 r0px shouldn't exist in silicon, thankfully */
@@ -2221,6 +2224,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
}
+static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
+{
+ unsigned int x = (xp_id >> 7) & 0xf;
+ unsigned int y = (xp_id >> 3) & 0xf;
+
+ /*
+ * The unit info register reads as zero; derive the DTC domain from
+ * the XP's mesh coordinates over the 10x14 mesh.
+ */
+ return (x / 5) + (y / 7) * 2;
+}
+
static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node)
{
int level;
@@ -2266,6 +2281,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
u64 reg;
int i, j;
size_t sz;
+ bool graviton5_workaround = false;
arm_cmn_init_node_info(cmn, rgn_offset, &cfg);
if (cfg.type != CMN_TYPE_CFG)
@@ -2276,6 +2292,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01);
part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg);
part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8;
+
+ /* Graviton5 has a customised CMN-S3 which needs some fixups */
+ if (cmn->part == PART_GRAVITON5) {
+ cmn->part = PART_CMN_S3;
+ graviton5_workaround = true;
+ }
+
/* 600AE is close enough that it's not really worth more complexity */
if (part == PART_CMN600AE)
part = PART_CMN600;
@@ -2365,6 +2388,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
if (cmn->part == PART_CMN600)
xp->dtc = -1;
+ else if (graviton5_workaround)
+ xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id);
else
xp->dtc = arm_cmn_dtc_domain(cmn, xp_region);
@@ -2443,6 +2468,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
switch (dn->type) {
case CMN_TYPE_DTC:
+ if (graviton5_workaround) {
+ /* Node info logical ID is zeroed; use the XP's */
+ dn->logid = xp->logid;
+ }
cmn->num_dtcs++;
dn++;
break;
@@ -2658,6 +2687,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = {
{ "ARMHC650" },
{ "ARMHC700" },
{ "ARMHC003" },
+ { "AMZN0070", PART_GRAVITON5 },
{}
};
MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match);
--
2.47.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-03 15:51 [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
@ 2026-05-03 16:20 ` sashiko-bot
2026-05-04 13:37 ` Aviv Bakal
2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal
1 sibling, 1 reply; 28+ messages in thread
From: sashiko-bot @ 2026-05-03 16:20 UTC (permalink / raw)
To: Aviv Bakal; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Incorrect DTC logical ID assignment cross-wires DTC domains.
--
commit 621fea5b4bb35e497f17f2f20187c68b18f49751
Author: Aviv Bakal <avivb@amazon.com>
perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
This commit introduces workarounds for a customised CMN-S3 implementation on
Graviton5, where certain discovery registers report zeroed fields. It adds a
dedicated ACPI HID, derives the DTC domain from the XP node ID, and sets the
DTC logical ID from the XP's logical ID.
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index f5305c8fdca4..368fe1a86bfb 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
[ ... ]
> @@ -2443,6 +2468,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>
> switch (dn->type) {
> case CMN_TYPE_DTC:
> + if (graviton5_workaround) {
> + /* Node info logical ID is zeroed; use the XP's */
> + dn->logid = xp->logid;
Will this cross-wire the DTC domains?
In arm_cmn_init_dtcs(), the cmn->dns array is sorted by logid, and the
cmn->dtc array is populated sequentially based on this sorted order.
During PMU event programming, the driver uses the domain ID (dn->dtc) as an
index into the cmn->dtc array.
If the physical layout of the DTC XPs causes their xp->logid values (typically
y * mesh_x + x) to sort differently than their computed domain IDs, wouldn't
the cmn->dtc array be misaligned?
For example, a Domain 1 DTC at (5,0) has a smaller logid (5) than a Domain 0
DTC at (0,6) (60), meaning Domain 1 would incorrectly occupy cmn->dtc[0]. This
cross-wiring could cause PMU events to program the wrong physical DTC hardware
and map IRQs to the wrong domains, resulting in incorrect performance
statistics and unhandled interrupts.
Would it be better to assign the computed domain ID to the logical ID here
using dn->logid = xp->dtc?
> + }
> cmn->num_dtcs++;
> dn++;
> break;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260503155118.16191-1-avivb@amazon.com?part=1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-03 16:20 ` sashiko-bot
@ 2026-05-04 13:37 ` Aviv Bakal
0 siblings, 0 replies; 28+ messages in thread
From: Aviv Bakal @ 2026-05-04 13:37 UTC (permalink / raw)
To: sashiko-bot
Cc: robin.murphy, will, mark.rutland, linux-arm-kernel,
linux-perf-users, linux-kernel, zeev, blakgeof
> Will this cross-wire the DTC domains?
> In arm_cmn_init_dtcs(), the cmn->dns array is sorted by logid, and the
> cmn->dtc array is populated sequentially based on this sorted order.
> During PMU event programming, the driver uses the domain ID (dn->dtc) as an
> index into the cmn->dtc array.
> If the physical layout of the DTC XPs causes their xp->logid values (typically
> y * mesh_x + x) to sort differently than their computed domain IDs, wouldn't
> the cmn->dtc array be misaligned?
> For example, a Domain 1 DTC at (5,0) has a smaller logid (5) than a Domain 0
> DTC at (0,6) (60), meaning Domain 1 would incorrectly occupy cmn->dtc[0]. This
> cross-wiring could cause PMU events to program the wrong physical DTC hardware
> and map IRQs to the wrong domains, resulting in incorrect performance
> statistics and unhandled interrupts.
> Would it be better to assign the computed domain ID to the logical ID here
> using dn->logid = xp->dtc?
Good catch on the cleanliness — I'll switch to dn->logid = xp->dtc.
That said, the two approaches are equivalent on this hardware. The four
DTCs discovered on Graviton5 are:
pmu_base xp->dtc xp->logid
c030d900 0 0
e430d900 1 9
c370d900 2 130
e770d900 3 139
The xp->logid values (0, 9, 130, 139) sort in the same order as the
domain IDs (0, 1, 2, 3), so arm_cmn_node_cmp produces the same sorted
sequence either way, and dtc_idx++ assigns the same indices.
This follows from the topology: arm_cmn_graviton5_dtc_domain() computes
domain = (x/5) + (y/7)*2 over the 10x14 mesh, which increases
monotonically with the row-major XP position that logid encodes. So the
sort orders can't diverge for this mesh geometry.
Still, using xp->dtc is cleaner and more obviously correct — the logid
directly reflects the domain ID rather than relying on sort-order
equivalence. I'll update the patch.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-03 15:51 [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
2026-05-03 16:20 ` sashiko-bot
@ 2026-05-04 13:39 ` Aviv Bakal
2026-05-04 19:38 ` sashiko-bot
` (3 more replies)
1 sibling, 4 replies; 28+ messages in thread
From: Aviv Bakal @ 2026-05-04 13:39 UTC (permalink / raw)
To: robin.murphy, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof
Graviton5 uses a customised CMN-S3 implementation where certain
discovery registers report zeroed fields. Add the following workarounds:
- Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant.
- Derive the DTC domain from the XP node ID, since the unit info
register reports it as zero.
- Set the DTC logical ID from the computed domain ID, since the node
info register's logical ID field is also zeroed.
Signed-off-by: Aviv Bakal <avivb@amazon.com>
---
v2:
- Use computed domain ID (xp->dtc) instead of XP logical ID for DTC
logical ID assignment.
drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index f5305c8fdca4..8ee3f8638602 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -31,7 +31,8 @@
#define CMN_CHILD_NODE_ADDR GENMASK(29, 0)
#define CMN_CHILD_NODE_EXTERNAL BIT(31)
-#define CMN_MAX_DIMENSION 12
+/* Some implementations use a mesh larger than the architectural max of 12 */
+#define CMN_MAX_DIMENSION 14
#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
#define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4)
@@ -214,6 +215,8 @@ enum cmn_part {
PART_CMN700 = 0x43c,
PART_CI700 = 0x43a,
PART_CMN_S3 = 0x43e,
+ /* Synthetic part number, overridden to PART_CMN_S3 during discovery */
+ PART_GRAVITON5 = 0xa5,
};
/* CMN-600 r0px shouldn't exist in silicon, thankfully */
@@ -2221,6 +2224,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
}
+static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
+{
+ unsigned int x = (xp_id >> 7) & 0xf;
+ unsigned int y = (xp_id >> 3) & 0xf;
+
+ /*
+ * The unit info register reads as zero; derive the DTC domain from
+ * the XP's mesh coordinates over the 10x14 mesh.
+ */
+ return (x / 5) + (y / 7) * 2;
+}
+
static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node)
{
int level;
@@ -2266,6 +2281,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
u64 reg;
int i, j;
size_t sz;
+ bool graviton5_workaround = false;
arm_cmn_init_node_info(cmn, rgn_offset, &cfg);
if (cfg.type != CMN_TYPE_CFG)
@@ -2276,6 +2292,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01);
part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg);
part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8;
+
+ /* Graviton5 has a customised CMN-S3 which needs some fixups */
+ if (cmn->part == PART_GRAVITON5) {
+ cmn->part = PART_CMN_S3;
+ graviton5_workaround = true;
+ }
+
/* 600AE is close enough that it's not really worth more complexity */
if (part == PART_CMN600AE)
part = PART_CMN600;
@@ -2365,6 +2388,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
if (cmn->part == PART_CMN600)
xp->dtc = -1;
+ else if (graviton5_workaround)
+ xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id);
else
xp->dtc = arm_cmn_dtc_domain(cmn, xp_region);
@@ -2443,6 +2468,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
switch (dn->type) {
case CMN_TYPE_DTC:
+ if (graviton5_workaround) {
+ /* Node info logical ID is zeroed; use the domain ID */
+ dn->logid = xp->dtc;
+ }
cmn->num_dtcs++;
dn++;
break;
@@ -2658,6 +2687,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = {
{ "ARMHC650" },
{ "ARMHC700" },
{ "ARMHC003" },
+ { "AMZN0070", PART_GRAVITON5 },
{}
};
MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match);
--
2.47.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal
@ 2026-05-04 19:38 ` sashiko-bot
2026-05-05 2:31 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-05-04 19:38 UTC (permalink / raw)
To: Aviv Bakal; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Buffer overflow in `arm_cmn_event_init` due to `CMN_MAX_DTCS` not scaling with mesh dimension.
--
commit 9d571ad62b9e626c4f8ff883ad8a2627c1e26529
Author: Aviv Bakal <avivb@amazon.com>
perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
Graviton5 uses a customised CMN-S3 implementation where certain
discovery registers report zeroed fields. Add the following workarounds:
- Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant.
- Derive the DTC domain from the XP node ID, since the unit info
register reports it as zero.
- Set the DTC logical ID from the computed domain ID, since the node
info register's logical ID field is also zeroed.
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index f5305c8fdca43..8ee3f86386025 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -31,7 +31,8 @@
> #define CMN_CHILD_NODE_ADDR GENMASK(29, 0)
> #define CMN_CHILD_NODE_EXTERNAL BIT(31)
>
> -#define CMN_MAX_DIMENSION 12
> +/* Some implementations use a mesh larger than the architectural max of 12 */
> +#define CMN_MAX_DIMENSION 14
> #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
> #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4)
>
[ ... ]
> @@ -2221,6 +2224,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
> return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
> }
>
> +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
> +{
> + unsigned int x = (xp_id >> 7) & 0xf;
> + unsigned int y = (xp_id >> 3) & 0xf;
> +
> + /*
> + * The unit info register reads as zero; derive the DTC domain from
> + * the XP's mesh coordinates over the 10x14 mesh.
> + */
> + return (x / 5) + (y / 7) * 2;
> +}
> +
Does this new dimension limit require an update to CMN_MAX_DTCS?
CMN_MAX_DTCS is currently 4, which sizes the dtc_idx array in struct arm_cmn_hw_event:
struct arm_cmn_hw_event {
struct arm_cmn_node *dn;
u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
s8 dtc_idx[CMN_MAX_DTCS];
...
};
For a 14x10 mesh where X=13 and Y=9, arm_cmn_graviton5_dtc_domain() will return 4.
Will this cause an out-of-bounds write in arm_cmn_event_init()?
memset(hw->dtc_idx, -1, sizeof(hw->dtc_idx));
for (dn = hw->dn; dn->type == type; dn++) {
...
if (dn->dtc < 0)
memset(hw->dtc_idx, 0, cmn->num_dtcs);
else
hw->dtc_idx[dn->dtc] = 0;
...
}
If the larger mesh size exposes more than 4 DTC nodes, could cmn->num_dtcs also
exceed 4, causing the memset() to overflow the dtc_idx[] array?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260504133923.23373-1-avivb@amazon.com?part=1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal
2026-05-04 19:38 ` sashiko-bot
@ 2026-05-05 2:31 ` kernel test robot
2026-05-21 16:02 ` Robin Murphy
2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal
3 siblings, 0 replies; 28+ messages in thread
From: kernel test robot @ 2026-05-05 2:31 UTC (permalink / raw)
To: Aviv Bakal, robin.murphy, will, mark.rutland
Cc: llvm, oe-kbuild-all, linux-arm-kernel, linux-perf-users,
linux-kernel, zeev, blakgeof
Hi Aviv,
kernel test robot noticed the following build errors:
[auto build test ERROR on arm-perf/for-next/perf]
[also build test ERROR on soc/for-next linus/master v7.1-rc2 next-20260430]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Aviv-Bakal/perf-arm-cmn-Add-workarounds-for-CMN-S3-on-Graviton5/20260505-011858
base: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git for-next/perf
patch link: https://lore.kernel.org/r/20260504133923.23373-1-avivb%40amazon.com
patch subject: [PATCH v2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
config: i386-buildonly-randconfig-004-20260505 (https://download.01.org/0day-ci/archive/20260505/202605051052.zOemYJY9-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260505/202605051052.zOemYJY9-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605051052.zOemYJY9-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/perf/arm-cmn.c:617:15: error: static assertion failed due to requirement 'sizeof(struct arm_cmn_hw_event) <= __builtin_offsetof(struct hw_perf_event, target)': sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target)
617 | static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target));
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:77:50: note: expanded from macro 'static_assert'
77 | #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
| ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:78:56: note: expanded from macro '__static_assert'
78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
| ^~~~
drivers/perf/arm-cmn.c:617:47: note: expression evaluates to '104 <= 96'
617 | static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:77:50: note: expanded from macro 'static_assert'
77 | #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
| ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:78:56: note: expanded from macro '__static_assert'
78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
| ^~~~
1 error generated.
vim +617 drivers/perf/arm-cmn.c
a88fa6c28b867a Robin Murphy 2021-12-03 600
0ba64770a2f2e5 Robin Murphy 2020-09-18 601 struct arm_cmn_hw_event {
0ba64770a2f2e5 Robin Murphy 2020-09-18 602 struct arm_cmn_node *dn;
359414b33e00ba Robin Murphy 2024-09-02 603 u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
7633ec2c262fab Robin Murphy 2023-10-20 604 s8 dtc_idx[CMN_MAX_DTCS];
0ba64770a2f2e5 Robin Murphy 2020-09-18 605 u8 num_dns;
60d1504070c22c Robin Murphy 2021-12-03 606 u8 dtm_offset;
4a112585ebe8cb Ilkka Koskinen 2024-06-17 607
4a112585ebe8cb Ilkka Koskinen 2024-06-17 608 /*
4a112585ebe8cb Ilkka Koskinen 2024-06-17 609 * WP config registers are divided to UP and DOWN events. We need to
4a112585ebe8cb Ilkka Koskinen 2024-06-17 610 * keep to track only one of them.
4a112585ebe8cb Ilkka Koskinen 2024-06-17 611 */
4a112585ebe8cb Ilkka Koskinen 2024-06-17 612 DECLARE_BITMAP(wp_idx, CMN_MAX_XPS);
4a112585ebe8cb Ilkka Koskinen 2024-06-17 613
23760a0144173e Robin Murphy 2022-04-18 614 bool wide_sel;
65adf71398f5af Robin Murphy 2022-04-18 615 enum cmn_filter_select filter_sel;
0ba64770a2f2e5 Robin Murphy 2020-09-18 616 };
ff436cee694ee8 Robin Murphy 2024-09-02 @617 static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target));
0ba64770a2f2e5 Robin Murphy 2020-09-18 618
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal
2026-05-04 19:38 ` sashiko-bot
2026-05-05 2:31 ` kernel test robot
@ 2026-05-21 16:02 ` Robin Murphy
2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal
3 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2026-05-21 16:02 UTC (permalink / raw)
To: Aviv Bakal, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof
On 2026-05-04 2:39 pm, Aviv Bakal wrote:
> Graviton5 uses a customised CMN-S3 implementation where certain
> discovery registers report zeroed fields. Add the following workarounds:
>
> - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant.
> - Derive the DTC domain from the XP node ID, since the unit info
> register reports it as zero.
> - Set the DTC logical ID from the computed domain ID, since the node
> info register's logical ID field is also zeroed.
>
> Signed-off-by: Aviv Bakal <avivb@amazon.com>
> ---
> v2:
> - Use computed domain ID (xp->dtc) instead of XP logical ID for DTC
> logical ID assignment.
>
> drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index f5305c8fdca4..8ee3f8638602 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -31,7 +31,8 @@
> #define CMN_CHILD_NODE_ADDR GENMASK(29, 0)
> #define CMN_CHILD_NODE_EXTERNAL BIT(31)
>
> -#define CMN_MAX_DIMENSION 12
> +/* Some implementations use a mesh larger than the architectural max of 12 */
> +#define CMN_MAX_DIMENSION 14
> #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
> #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4)
>
> @@ -214,6 +215,8 @@ enum cmn_part {
> PART_CMN700 = 0x43c,
> PART_CI700 = 0x43a,
> PART_CMN_S3 = 0x43e,
> + /* Synthetic part number, overridden to PART_CMN_S3 during discovery */
> + PART_GRAVITON5 = 0xa5,
> };
>
> /* CMN-600 r0px shouldn't exist in silicon, thankfully */
> @@ -2221,6 +2224,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
> return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
> }
>
> +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
> +{
> + unsigned int x = (xp_id >> 7) & 0xf;
> + unsigned int y = (xp_id >> 3) & 0xf;
> +
> + /*
> + * The unit info register reads as zero; derive the DTC domain from
> + * the XP's mesh coordinates over the 10x14 mesh.
> + */
> + return (x / 5) + (y / 7) * 2;
> +}
> +
> static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node)
> {
> int level;
> @@ -2266,6 +2281,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
> u64 reg;
> int i, j;
> size_t sz;
> + bool graviton5_workaround = false;
>
> arm_cmn_init_node_info(cmn, rgn_offset, &cfg);
> if (cfg.type != CMN_TYPE_CFG)
> @@ -2276,6 +2292,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
> reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01);
> part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg);
> part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8;
> +
> + /* Graviton5 has a customised CMN-S3 which needs some fixups */
> + if (cmn->part == PART_GRAVITON5) {
> + cmn->part = PART_CMN_S3;
> + graviton5_workaround = true;
> + }
> +
> /* 600AE is close enough that it's not really worth more complexity */
> if (part == PART_CMN600AE)
> part = PART_CMN600;
> @@ -2365,6 +2388,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>
> if (cmn->part == PART_CMN600)
> xp->dtc = -1;
> + else if (graviton5_workaround)
> + xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id);
> else
> xp->dtc = arm_cmn_dtc_domain(cmn, xp_region);
>
> @@ -2443,6 +2468,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>
> switch (dn->type) {
> case CMN_TYPE_DTC:
> + if (graviton5_workaround) {
> + /* Node info logical ID is zeroed; use the domain ID */
> + dn->logid = xp->dtc;
No, this really should be xp->logid - other than DTC0 whose logical ID
is always forced to 0, but if it naturally lines up that way anyway then
all the better - since that is consistent with how the tooling generates
a regular hardware configuration. The cmn->dtc array ends up sorted by
logical ID as that is a guaranteed stable order for all CMN versions (as
the domain numbers themselves aren't always known), and it is the
interrupt order defined by the firmware bindings which we need to match.
If it's not guaranteed that the actual domain numbers are in the same
order then we have an existing bug in general (I'll have to check...)
Otherwise, this looks OK to me - in fact surprisingly pleasant and
unobtrusive given that it's a pretty horrible hardware issue to work
around. I guess we get lucky that it's an easy topology to compute.
Thanks,
Robin.
> + }
> cmn->num_dtcs++;
> dn++;
> break;
> @@ -2658,6 +2687,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = {
> { "ARMHC650" },
> { "ARMHC700" },
> { "ARMHC003" },
> + { "AMZN0070", PART_GRAVITON5 },
> {}
> };
> MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match);
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v3 0/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal
` (2 preceding siblings ...)
2026-05-21 16:02 ` Robin Murphy
@ 2026-05-24 15:38 ` Aviv Bakal
2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal
` (2 more replies)
3 siblings, 3 replies; 28+ messages in thread
From: Aviv Bakal @ 2026-05-24 15:38 UTC (permalink / raw)
To: robin.murphy, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev,
blakgeof
This series adds support for Graviton5's customised CMN-S3 which has
zeroed discovery registers.
Robin, I understand moving driver state into the core perf header isn't
ideal, but I couldn't find another way to grow the struct. The v2
submission failed the kernel test robot build on i386 (COMPILE_TEST) due
to arm_cmn_hw_event exceeding the static_assert against the 'target'
field offset when CMN_MAX_DIMENSION is increased beyond 12.
Patch 1 moves struct arm_cmn_hw_event into the hw_perf_event union to
resolve this. I'd appreciate your feedback on this approach, or any
alternative you'd suggest.
Patch 2 adds the Graviton5 workarounds themselves (unchanged from v2
except for the DTC logid fix below).
Changes since v2:
- Revert DTC logical ID assignment back to xp->logid (per Robin's
review)
- Add patch 1/2 to move arm_cmn_hw_event into hw_perf_event union
to resolve 32-bit build failure
Aviv Bakal (2):
perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event
perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
drivers/perf/arm-cmn.c | 55 +++++++++++++++++++++-----------------
include/linux/perf_event.h | 22 +++++++++++++++
2 files changed, 52 insertions(+), 25 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event
2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal
@ 2026-05-24 15:38 ` Aviv Bakal
2026-05-24 16:13 ` sashiko-bot
2026-05-29 16:44 ` Robin Murphy
2026-05-24 15:38 ` [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal
2 siblings, 2 replies; 28+ messages in thread
From: Aviv Bakal @ 2026-05-24 15:38 UTC (permalink / raw)
To: robin.murphy, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev,
blakgeof
In order to increase CMN_MAX_DIMENSION beyond 12 (required for meshes
larger than 12x12, such as Graviton5), the arm_cmn_hw_event struct must
grow. Since it is overlaid on the beginning of hw_perf_event via an
unsafe cast, increasing its size would violate the static_assert that
guards against overflowing into the 'target' field.
Resolve this by moving struct arm_cmn_hw_event into the hw_perf_event
union as a proper named member, eliminating the cast in to_cmn_hw() and
making the size reservation explicit. Set CMN_MAX_DIMENSION to 14 to
accommodate larger mesh topologies.
Signed-off-by: Aviv Bakal <avivb@amazon.com>
---
drivers/perf/arm-cmn.c | 26 +-------------------------
include/linux/perf_event.h | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+), 25 deletions(-)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index f5305c8fdca4..3443b819afed 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -31,13 +31,8 @@
#define CMN_CHILD_NODE_ADDR GENMASK(29, 0)
#define CMN_CHILD_NODE_EXTERNAL BIT(31)
-#define CMN_MAX_DIMENSION 12
-#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
#define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4)
-/* Currently XPs are the node type we can have most of; others top out at 128 */
-#define CMN_MAX_NODES_PER_EVENT CMN_MAX_XPS
-
/* The CFG node has various info besides the discovery tree */
#define CMN_CFGM_PERIPH_ID_01 0x0008
#define CMN_CFGM_PID0_PART_0 GENMASK_ULL(7, 0)
@@ -148,7 +143,6 @@
#define CMN_DT_PMSRR_SS_REQ BIT(0)
#define CMN_DT_NUM_COUNTERS 8
-#define CMN_MAX_DTCS 4
/*
* Even in the worst case a DTC counter can't wrap in fewer than 2^42 cycles,
@@ -595,24 +589,6 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id)
static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {}
#endif
-struct arm_cmn_hw_event {
- struct arm_cmn_node *dn;
- u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
- s8 dtc_idx[CMN_MAX_DTCS];
- u8 num_dns;
- u8 dtm_offset;
-
- /*
- * WP config registers are divided to UP and DOWN events. We need to
- * keep to track only one of them.
- */
- DECLARE_BITMAP(wp_idx, CMN_MAX_XPS);
-
- bool wide_sel;
- enum cmn_filter_select filter_sel;
-};
-static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target));
-
#define for_each_hw_dn(hw, dn, i) \
for (i = 0, dn = hw->dn; i < hw->num_dns; i++, dn++)
@@ -622,7 +598,7 @@ static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event,
static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event)
{
- return (struct arm_cmn_hw_event *)&event->hw;
+ return &event->hw.cmn;
}
static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 48d851fbd8ea..c38576a8e338 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -119,6 +119,7 @@ struct perf_branch_stack {
};
struct task_struct;
+struct arm_cmn_node;
/*
* extra PMU register associated with an event
@@ -200,6 +201,27 @@ struct hw_perf_event {
u64 conf;
u64 conf1;
};
+#ifdef CONFIG_ARM_CMN
+/* Some implementations use a mesh larger than the architectural max of 12 */
+#define CMN_MAX_DIMENSION 14
+#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
+#define CMN_MAX_NODES_PER_EVENT CMN_MAX_XPS
+#define CMN_MAX_DTCS 4
+ struct arm_cmn_hw_event { /* arm_cmn */
+ /*
+ * CMN PMU event state overlaid on hw_perf_event.
+ * Must fit before the 'target' field.
+ */
+ struct arm_cmn_node *dn;
+ u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
+ s8 dtc_idx[CMN_MAX_DTCS];
+ u8 num_dns;
+ u8 dtm_offset;
+ DECLARE_BITMAP(wp_idx, CMN_MAX_XPS);
+ bool wide_sel;
+ int filter_sel;
+ } cmn;
+#endif
};
/*
* If the event is a per task event, this will point to the task in
--
2.47.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal
2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal
@ 2026-05-24 15:38 ` Aviv Bakal
2026-05-24 16:40 ` sashiko-bot
2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal
2 siblings, 1 reply; 28+ messages in thread
From: Aviv Bakal @ 2026-05-24 15:38 UTC (permalink / raw)
To: robin.murphy, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev,
blakgeof
Graviton5 uses a customised CMN-S3 implementation where certain
discovery registers report zeroed fields. Add the following workarounds:
- Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant.
- Derive the DTC domain from the XP node ID, since the unit info
register reports it as zero.
- Set the DTC logical ID from the XP's logical ID, since the node info
register's logical ID field is also zeroed.
Signed-off-by: Aviv Bakal <avivb@amazon.com>
---
drivers/perf/arm-cmn.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 3443b819afed..0184e598777a 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -208,6 +208,8 @@ enum cmn_part {
PART_CMN700 = 0x43c,
PART_CI700 = 0x43a,
PART_CMN_S3 = 0x43e,
+ /* Synthetic part number, overridden to PART_CMN_S3 during discovery */
+ PART_GRAVITON5 = 0xa5,
};
/* CMN-600 r0px shouldn't exist in silicon, thankfully */
@@ -2197,6 +2199,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
}
+static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
+{
+ unsigned int x = (xp_id >> 7) & 0xf;
+ unsigned int y = (xp_id >> 3) & 0xf;
+
+ /*
+ * The unit info register reads as zero; derive the DTC domain from
+ * the XP's mesh coordinates over the 10x14 mesh.
+ */
+ return (x / 5) + (y / 7) * 2;
+}
+
static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node)
{
int level;
@@ -2242,6 +2256,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
u64 reg;
int i, j;
size_t sz;
+ bool graviton5_workaround = false;
arm_cmn_init_node_info(cmn, rgn_offset, &cfg);
if (cfg.type != CMN_TYPE_CFG)
@@ -2252,6 +2267,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01);
part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg);
part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8;
+
+ /* Graviton5 has a customised CMN-S3 which needs some fixups */
+ if (cmn->part == PART_GRAVITON5) {
+ cmn->part = PART_CMN_S3;
+ graviton5_workaround = true;
+ }
+
/* 600AE is close enough that it's not really worth more complexity */
if (part == PART_CMN600AE)
part = PART_CMN600;
@@ -2341,6 +2363,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
if (cmn->part == PART_CMN600)
xp->dtc = -1;
+ else if (graviton5_workaround)
+ xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id);
else
xp->dtc = arm_cmn_dtc_domain(cmn, xp_region);
@@ -2419,6 +2443,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
switch (dn->type) {
case CMN_TYPE_DTC:
+ if (graviton5_workaround) {
+ /* Node info logical ID is zeroed; use the XP's */
+ dn->logid = xp->logid;
+ }
cmn->num_dtcs++;
dn++;
break;
@@ -2634,6 +2662,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = {
{ "ARMHC650" },
{ "ARMHC700" },
{ "ARMHC003" },
+ { "AMZN0070", PART_GRAVITON5 },
{}
};
MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match);
--
2.47.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event
2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal
@ 2026-05-24 16:13 ` sashiko-bot
2026-05-29 16:44 ` Robin Murphy
1 sibling, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-05-24 16:13 UTC (permalink / raw)
To: Aviv Bakal; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
New issues:
- [Low] The patch breaks the build when `arm-cmn` is configured as a module due to an incorrect `#ifdef` guard in `include/linux/perf_event.h`.
Pre-existing issues:
- [Critical] Integer truncation of `cmn->num_dns` leads to an undersized heap allocation followed by an out-of-bounds write.
- [High] Unbounded `cmn->num_dtcs` leads to an out-of-bounds memset in `arm_cmn_event_init()`.
- [High] Unbounded counting of `hw->num_dns` leads to an out-of-bounds write in the `hw->dtm_idx` array.
- [High] Unbounded DTM pointer advancement leads to an out-of-bounds write in `arm_cmn_init_dtm()`.
--
commit 783e16d4bed9e2f619820aa17251b0f781f0cd6e
Author: Aviv Bakal <avivb@amazon.com>
perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event
This patch moves the arm_cmn_hw_event structure out of the driver file and
directly into the hw_perf_event union to allow expanding its size for larger
mesh topologies without violating static bounds.
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 48d851fbd8ea5..c38576a8e3383 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -200,6 +201,27 @@ struct hw_perf_event {
> u64 conf;
> u64 conf1;
> };
> +#ifdef CONFIG_ARM_CMN
[Severity: Low]
Will this break the build when the arm-cmn driver is compiled as a module?
When the driver is built as a module, Kbuild defines CONFIG_ARM_CMN_MODULE
instead of CONFIG_ARM_CMN. If the ifdef block is skipped, the cmn field is
omitted from struct hw_perf_event, resulting in compiler errors when
drivers/perf/arm-cmn.c attempts to access it.
Does this need to use IS_ENABLED(CONFIG_ARM_CMN)?
> +/* Some implementations use a mesh larger than the architectural max of 12 */
> +#define CMN_MAX_DIMENSION 14
[Severity: High]
This is a pre-existing issue, but is the DTM initialization logic vulnerable
to an out-of-bounds heap write with larger meshes?
When multi_dtm is enabled, the driver allocates 2 * num_xps + 1 DTMs.
However, if multiple XPs have more than 4 connected ports, the logic in
arm_cmn_init_dtm() can initialize up to 3 DTMs per XP.
For meshes containing multiple such XPs, the total number of initialized
DTMs could exceed the allocated capacity, overwriting adjacent heap memory.
> +#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
> +#define CMN_MAX_NODES_PER_EVENT CMN_MAX_XPS
[Severity: Critical]
This isn't a bug introduced by this patch, but does the discovery phase
have an integer truncation regression when accumulating the total node count?
In struct arm_cmn, num_dns is declared as a 16-bit integer (u16). During
arm_cmn_discover(), accumulating child_count across all XPs can wrap around
this 16-bit boundary if an unexpectedly large number of children are reported.
Because devm_kcalloc() uses this truncated value to allocate the dn array,
the subsequent pass that advances the dn pointer based on the true number
of children could write past the end of the undersized heap allocation.
> +#define CMN_MAX_DTCS 4
> + struct arm_cmn_hw_event { /* arm_cmn */
> + /*
> + * CMN PMU event state overlaid on hw_perf_event.
> + * Must fit before the 'target' field.
> + */
> + struct arm_cmn_node *dn;
> + u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
[Severity: High]
This isn't a bug introduced by this patch, but what prevents hw->num_dns
from exceeding the capacity of the dtm_idx array?
While the driver restricts individual dn->logid values, it does not limit
the total number of nodes of a specific type. If firmware declares more than
224 nodes of the same type, hw->num_dns will exceed the space allocated for
dtm_idx.
During event addition, the driver calls arm_cmn_set_index() for values of i
up to hw->num_dns - 1. Does this result in an out-of-bounds bitwise write
that corrupts nearby structure members?
> + s8 dtc_idx[CMN_MAX_DTCS];
[Severity: High]
This is a pre-existing issue, but is there a missing bounds check for
cmn->num_dtcs during discovery?
In arm_cmn_discover(), the driver increments cmn->num_dtcs for every
CMN_TYPE_DTC node found. If the firmware reports more than CMN_MAX_DTCS (4)
nodes, arm_cmn_event_init() will perform a memset() using the unbounded count:
memset(hw->dtc_idx, 0, cmn->num_dtcs);
Since dtc_idx has a fixed capacity of 4, could this lead to an out-of-bounds
write that corrupts adjacent perf_event fields?
> + u8 num_dns;
> + u8 dtm_offset;
> + DECLARE_BITMAP(wp_idx, CMN_MAX_XPS);
> + bool wide_sel;
> + int filter_sel;
> + } cmn;
> +#endif
> };
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260524153848.16334-1-avivb@amazon.com?part=1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-24 15:38 ` [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
@ 2026-05-24 16:40 ` sashiko-bot
0 siblings, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-05-24 16:40 UTC (permalink / raw)
To: Aviv Bakal; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Setting the DTC node's logical ID to the parent XP's logical ID breaks the DTC array indexing, leading to incorrect PMU event programming and IRQ routing.
--
commit 20d5a81cc2d5eb87a70576532c9a783cb6e011c3
Author: Aviv Bakal <avivb@amazon.com>
perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
This commit introduces workarounds for a custom CMN-S3 implementation on
Graviton5 where certain discovery registers report zeroed fields. It adds
a dedicated ACPI HID and derives the DTC domain and logical ID from the
XP node.
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
[ ... ]
> @@ -2419,6 +2443,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>
> switch (dn->type) {
> case CMN_TYPE_DTC:
> + if (graviton5_workaround) {
> + /* Node info logical ID is zeroed; use the XP's */
> + dn->logid = xp->logid;
[Severity: High]
Will setting the DTC's logical ID to the parent XP's logical ID break the
DTC array indexing?
For standard CMN hardware, a DTC's logid is its Domain ID. Nodes are sorted
by type and logid, which means DTCs naturally sort by Domain ID.
By using the XP's logid instead, the DTCs will be sorted by their physical
XP order rather than their Domain ID. The cmn->dtc array and ACPI IRQs are
populated sequentially based on this sorted order.
Later, when programming PMU events, the driver indexes into the cmn->dtc
array using the Domain ID:
d = max_t(int, dn->dtc, 0);
If the physical XP sort order doesn't match the Domain ID order, will this
cause PMU events to be programmed into the wrong hardware blocks and
misroute IRQs?
Could this be assigned the Domain ID directly using dn->logid = xp->dtc?
> + }
> cmn->num_dtcs++;
> dn++;
> break;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260524153848.16334-1-avivb@amazon.com?part=2
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event
2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal
2026-05-24 16:13 ` sashiko-bot
@ 2026-05-29 16:44 ` Robin Murphy
1 sibling, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2026-05-29 16:44 UTC (permalink / raw)
To: Aviv Bakal, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof,
Ilkka Koskinen
On 24/05/2026 4:38 pm, Aviv Bakal wrote:
> In order to increase CMN_MAX_DIMENSION beyond 12 (required for meshes
> larger than 12x12, such as Graviton5), the arm_cmn_hw_event struct must
> grow. Since it is overlaid on the beginning of hw_perf_event via an
> unsafe cast, increasing its size would violate the static_assert that
> guards against overflowing into the 'target' field.
>
> Resolve this by moving struct arm_cmn_hw_event into the hw_perf_event
> union as a proper named member, eliminating the cast in to_cmn_hw() and
> making the size reservation explicit. Set CMN_MAX_DIMENSION to 14 to
> accommodate larger mesh topologies.
>
> Signed-off-by: Aviv Bakal <avivb@amazon.com>
> ---
> drivers/perf/arm-cmn.c | 26 +-------------------------
> include/linux/perf_event.h | 22 ++++++++++++++++++++++
> 2 files changed, 23 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index f5305c8fdca4..3443b819afed 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -31,13 +31,8 @@
> #define CMN_CHILD_NODE_ADDR GENMASK(29, 0)
> #define CMN_CHILD_NODE_EXTERNAL BIT(31)
>
> -#define CMN_MAX_DIMENSION 12
> -#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
> #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4)
>
> -/* Currently XPs are the node type we can have most of; others top out at 128 */
> -#define CMN_MAX_NODES_PER_EVENT CMN_MAX_XPS
> -
> /* The CFG node has various info besides the discovery tree */
> #define CMN_CFGM_PERIPH_ID_01 0x0008
> #define CMN_CFGM_PID0_PART_0 GENMASK_ULL(7, 0)
> @@ -148,7 +143,6 @@
> #define CMN_DT_PMSRR_SS_REQ BIT(0)
>
> #define CMN_DT_NUM_COUNTERS 8
> -#define CMN_MAX_DTCS 4
>
> /*
> * Even in the worst case a DTC counter can't wrap in fewer than 2^42 cycles,
> @@ -595,24 +589,6 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id)
> static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {}
> #endif
>
> -struct arm_cmn_hw_event {
> - struct arm_cmn_node *dn;
> - u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
> - s8 dtc_idx[CMN_MAX_DTCS];
> - u8 num_dns;
> - u8 dtm_offset;
> -
> - /*
> - * WP config registers are divided to UP and DOWN events. We need to
> - * keep to track only one of them.
> - */
> - DECLARE_BITMAP(wp_idx, CMN_MAX_XPS);
> -
> - bool wide_sel;
> - enum cmn_filter_select filter_sel;
> -};
> -static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target));
> -
> #define for_each_hw_dn(hw, dn, i) \
> for (i = 0, dn = hw->dn; i < hw->num_dns; i++, dn++)
>
> @@ -622,7 +598,7 @@ static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event,
>
> static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event)
> {
> - return (struct arm_cmn_hw_event *)&event->hw;
> + return &event->hw.cmn;
> }
>
> static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val)
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 48d851fbd8ea..c38576a8e338 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
$ scripts/get_maintainer.pl include/linux/perf_event.h
Peter Zijlstra <peterz@infradead.org> (maintainer:PERFORMANCE EVENTS SUBSYSTEM)
Ingo Molnar <mingo@redhat.com> (maintainer:PERFORMANCE EVENTS SUBSYSTEM)
Arnaldo Carvalho de Melo <acme@kernel.org> (maintainer:PERFORMANCE EVENTS SUBSYSTEM)
Namhyung Kim <namhyung@kernel.org> (maintainer:PERFORMANCE EVENTS SUBSYSTEM)
Mark Rutland <mark.rutland@arm.com> (reviewer:PERFORMANCE EVENTS SUBSYSTEM)
Alexander Shishkin <alexander.shishkin@linux.intel.com> (reviewer:PERFORMANCE EVENTS SUBSYSTEM)
Jiri Olsa <jolsa@kernel.org> (reviewer:PERFORMANCE EVENTS SUBSYSTEM)
Ian Rogers <irogers@google.com> (reviewer:PERFORMANCE EVENTS SUBSYSTEM)
Adrian Hunter <adrian.hunter@intel.com> (reviewer:PERFORMANCE EVENTS SUBSYSTEM)
James Clark <james.clark@linaro.org> (reviewer:PERFORMANCE EVENTS SUBSYSTEM)
linux-perf-users@vger.kernel.org (open list:PERFORMANCE EVENTS SUBSYSTEM)
linux-kernel@vger.kernel.org (open list:PERFORMANCE EVENTS SUBSYSTEM)
> @@ -119,6 +119,7 @@ struct perf_branch_stack {
> };
>
> struct task_struct;
> +struct arm_cmn_node;
>
> /*
> * extra PMU register associated with an event
> @@ -200,6 +201,27 @@ struct hw_perf_event {
> u64 conf;
> u64 conf1;
> };
> +#ifdef CONFIG_ARM_CMN
> +/* Some implementations use a mesh larger than the architectural max of 12 */
> +#define CMN_MAX_DIMENSION 14
> +#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
> +#define CMN_MAX_NODES_PER_EVENT CMN_MAX_XPS
> +#define CMN_MAX_DTCS 4
I doubt the maintainers would be too happy to have this degree of random
driver specifics in a core header. Certainly I wouldn't consider it a
great idea myself...
> + struct arm_cmn_hw_event { /* arm_cmn */
> + /*
> + * CMN PMU event state overlaid on hw_perf_event.
> + * Must fit before the 'target' field.
> + */
> + struct arm_cmn_node *dn;
> + u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
> + s8 dtc_idx[CMN_MAX_DTCS];
> + u8 num_dns;
> + u8 dtm_offset;
> + DECLARE_BITMAP(wp_idx, CMN_MAX_XPS);
> + bool wide_sel;
> + int filter_sel;
> + } cmn;
> +#endif
> };
> /*
> * If the event is a per task event, this will point to the task in
TBH I'd be tempted to just disable 32-bit compile-testing for now, but
if we're going to do anything "proper" then it's probably about time to
do this:
Thanks,
Robin.
----->8-----
Subject: [PATCH] perf/arm-cmn: Move DTM index data out of hw_perf_event
The amount of data we need to store all the per-DTM counter and
watchpoint allocations is already testing the limits of hw_perf_event,
and future CMNs are only likely to keep growing larger, so move these
arrays out to separate memory allocations. As part of that we can use
an explicit union for allocating cycle counters to dtc_cycles events,
which is arguably nicer anyway.
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/perf/arm-cmn.c | 89 ++++++++++++++++++++++++++++--------------
1 file changed, 59 insertions(+), 30 deletions(-)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 6e5cc4086a9e..470f6a82502e 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -598,17 +598,14 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {}
struct arm_cmn_hw_event {
struct arm_cmn_node *dn;
- u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
+ union {
+ u64 *dtm_idx;
+ int cc_idx;
+ };
+ unsigned long *wp_idx;
s8 dtc_idx[CMN_MAX_DTCS];
u8 num_dns;
u8 dtm_offset;
-
- /*
- * WP config registers are divided to UP and DOWN events. We need to
- * keep to track only one of them.
- */
- DECLARE_BITMAP(wp_idx, CMN_MAX_XPS);
-
bool wide_sel;
enum cmn_filter_select filter_sel;
};
@@ -626,25 +623,42 @@ static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event)
return (struct arm_cmn_hw_event *)&event->hw;
}
-static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val)
+static void arm_cmn_set_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos, unsigned int val)
{
- x[pos / 32] |= (u64)val << ((pos % 32) * 2);
+ hw->dtm_idx[pos / 32] |= (u64)val << ((pos % 32) * 2);
}
-static unsigned int arm_cmn_get_index(u64 x[], unsigned int pos)
+static unsigned int arm_cmn_get_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos)
{
- return (x[pos / 32] >> ((pos % 32) * 2)) & 3;
+ return (hw->dtm_idx[pos / 32] >> ((pos % 32) * 2)) & 3;
}
-static void arm_cmn_set_wp_idx(unsigned long *wp_idx, unsigned int pos, bool val)
+static u64 *arm_cmn_alloc_dtm_idx(void)
+{
+ return kzalloc(DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64), GFP_KERNEL);
+}
+
+static void arm_cmn_set_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos, bool val)
{
if (val)
- set_bit(pos, wp_idx);
+ set_bit(pos, hw->wp_idx);
}
-static unsigned int arm_cmn_get_wp_idx(unsigned long *wp_idx, unsigned int pos)
+static unsigned int arm_cmn_get_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos)
{
- return test_bit(pos, wp_idx);
+ return test_bit(pos, hw->wp_idx);
+}
+
+static unsigned long *arm_cmn_alloc_wp_idx(void)
+{
+ return bitmap_zalloc(CMN_MAX_XPS, GFP_KERNEL);
+}
+
+static void arm_cmn_clear_idx(struct arm_cmn_hw_event *hw)
+{
+ memset(hw->dtm_idx, 0, DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64));
+ if (hw->wp_idx)
+ memset(hw->wp_idx, 0, CMN_MAX_XPS);
}
struct arm_cmn_event_attr {
@@ -1377,7 +1391,7 @@ static int arm_cmn_get_assigned_wp_idx(struct perf_event *event,
struct arm_cmn_hw_event *hw,
unsigned int pos)
{
- return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw->wp_idx, pos);
+ return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw, pos);
}
static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm,
@@ -1388,7 +1402,7 @@ static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm,
struct arm_cmn_hw_event *hw = to_cmn_hw(event);
dtm->wp_event[wp_idx] = hw->dtc_idx[dtc];
- arm_cmn_set_wp_idx(hw->wp_idx, pos, wp_idx - CMN_EVENT_EVENTID(event));
+ arm_cmn_set_wp_idx(hw, pos, wp_idx - CMN_EVENT_EVENTID(event));
}
static u32 arm_cmn_wp_config(struct perf_event *event, int wp_idx)
@@ -1459,7 +1473,7 @@ static u64 arm_cmn_read_dtm(struct arm_cmn *cmn, struct arm_cmn_hw_event *hw,
dtm = &cmn->dtms[dn->dtm] + hw->dtm_offset;
reg = readq_relaxed(dtm->base + offset);
}
- dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ dtm_idx = arm_cmn_get_dtm_idx(hw, i);
count += (u16)(reg >> (dtm_idx * 16));
}
return count;
@@ -1506,7 +1520,7 @@ static void arm_cmn_event_read(struct perf_event *event)
unsigned long flags;
if (CMN_EVENT_TYPE(event) == CMN_TYPE_DTC) {
- delta = arm_cmn_read_cc(cmn->dtc + hw->dtc_idx[0]);
+ delta = arm_cmn_read_cc(cmn->dtc + hw->cc_idx);
local64_add(delta, &event->count);
return;
}
@@ -1573,7 +1587,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags)
int i;
if (type == CMN_TYPE_DTC) {
- struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0];
+ struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx;
writel_relaxed(CMN_DT_DTC_CTL_DT_EN | CMN_DT_DTC_CTL_CG_DISABLE,
dtc->base + CMN_DT_DTC_CTL);
@@ -1591,7 +1605,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags)
writeq_relaxed(mask, base + CMN_DTM_WPn_MASK(wp_idx));
}
} else for_each_hw_dn(hw, dn, i) {
- int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
arm_cmn_set_event_sel_lo(dn, dtm_idx, CMN_EVENT_EVENTID(event),
hw->wide_sel);
@@ -1607,7 +1621,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags)
int i;
if (type == CMN_TYPE_DTC) {
- struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0];
+ struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx;
dtc->cc_active = false;
writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL);
@@ -1620,7 +1634,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags)
writeq_relaxed(~0ULL, base + CMN_DTM_WPn_VAL(wp_idx));
}
} else for_each_hw_dn(hw, dn, i) {
- int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
arm_cmn_set_event_sel_lo(dn, dtm_idx, 0, hw->wide_sel);
}
@@ -1764,6 +1778,14 @@ static enum cmn_filter_select arm_cmn_filter_sel(const struct arm_cmn *cmn,
}
+static void arm_cmn_event_destroy(struct perf_event *event)
+{
+ struct arm_cmn_hw_event *hw = to_cmn_hw(event);
+
+ kfree(hw->dtm_idx);
+ bitmap_free(hw->wp_idx);
+}
+
static int arm_cmn_event_init(struct perf_event *event)
{
struct arm_cmn *cmn = to_cmn(event->pmu);
@@ -1788,6 +1810,11 @@ static int arm_cmn_event_init(struct perf_event *event)
if (type == CMN_TYPE_DTC)
return arm_cmn_validate_group(cmn, event);
+ event->destroy = arm_cmn_event_destroy;
+ hw->dtm_idx = arm_cmn_alloc_dtm_idx();
+ if (!hw->dtm_idx)
+ return -ENOMEM;
+
eventid = CMN_EVENT_EVENTID(event);
/* For watchpoints we need the actual XP node here */
if (type == CMN_TYPE_WP) {
@@ -1798,6 +1825,9 @@ static int arm_cmn_event_init(struct perf_event *event)
/* ...but the DTM may depend on which port we're watching */
if (cmn->multi_dtm)
hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2;
+ hw->wp_idx = arm_cmn_alloc_wp_idx();
+ if (!hw->wp_idx)
+ return -ENOMEM;
} else if (type == CMN_TYPE_XP &&
(cmn->part == PART_CMN700 || cmn->part == PART_CMN_S3)) {
hw->wide_sel = true;
@@ -1848,7 +1878,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event,
while (i--) {
struct arm_cmn_dtm *dtm = &cmn->dtms[hw->dn[i].dtm] + hw->dtm_offset;
- unsigned int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ unsigned int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
if (type == CMN_TYPE_WP) {
int wp_idx = arm_cmn_get_assigned_wp_idx(event, hw, i);
@@ -1862,8 +1892,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event,
dtm->pmu_config_low &= ~CMN__PMEVCNT_PAIRED(dtm_idx);
writel_relaxed(dtm->pmu_config_low, dtm->base + CMN_DTM_PMU_CONFIG);
}
- memset(hw->dtm_idx, 0, sizeof(hw->dtm_idx));
- memset(hw->wp_idx, 0, sizeof(hw->wp_idx));
+ arm_cmn_clear_idx(hw);
for_each_hw_dtc_idx(hw, j, idx)
cmn->dtc[j].counters[idx] = NULL;
@@ -1883,7 +1912,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags)
return -ENOSPC;
cmn->dtc[i].cycles = event;
- hw->dtc_idx[0] = i;
+ hw->cc_idx = i;
if (flags & PERF_EF_START)
arm_cmn_event_start(event, 0);
@@ -1948,7 +1977,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags)
goto free_dtms;
}
- arm_cmn_set_index(hw->dtm_idx, i, dtm_idx);
+ arm_cmn_set_dtm_idx(hw, i, dtm_idx);
dtm->input_sel[dtm_idx] = input_sel;
shift = CMN__PMEVCNTn_GLOBAL_NUM_SHIFT(dtm_idx);
@@ -1981,7 +2010,7 @@ static void arm_cmn_event_del(struct perf_event *event, int flags)
arm_cmn_event_stop(event, PERF_EF_UPDATE);
if (type == CMN_TYPE_DTC)
- cmn->dtc[hw->dtc_idx[0]].cycles = NULL;
+ cmn->dtc[hw->cc_idx].cycles = NULL;
else
arm_cmn_event_clear(cmn, event, hw->num_dns);
}
--
2.54.0.dirty
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v4 0/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal
2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal
2026-05-24 15:38 ` [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
@ 2026-05-31 11:04 ` Aviv Bakal
2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal
` (2 more replies)
2 siblings, 3 replies; 28+ messages in thread
From: Aviv Bakal @ 2026-05-31 11:04 UTC (permalink / raw)
To: robin.murphy, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev,
blakgeof, ilkka
This series adds support for Graviton5's customised CMN-S3 which has
zeroed discovery registers.
Patch 1 is based on Robin's suggestion [1] to move the DTM index and
watchpoint bitmaps out of hw_perf_event into separate allocations,
removing the size constraint that prevented increasing CMN_MAX_DIMENSION.
Two fixes were applied on top: the kzalloc size now accounts for
sizeof(u64), and bitmap_zero() is used instead of a raw memset for
wp_idx.
Patch 2 adds the Graviton5 workarounds.
[1] https://lore.kernel.org/all/c1d27212-8e2b-4820-bee6-ad7e7de81238@arm.com/
Changes since v3:
- Replace patch 1/2 with Robin's approach: dynamically allocate
dtm_idx and wp_idx instead of moving the struct into perf_event.h
Changes since v2:
- Revert DTC logical ID assignment back to xp->logid (per Robin's
review)
Aviv Bakal (2):
perf/arm-cmn: Move DTM index data out of hw_perf_event
perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
drivers/perf/arm-cmn.c | 121 ++++++++++++++++++++++++++++++-----------
1 file changed, 90 insertions(+), 31 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event
2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal
@ 2026-05-31 11:04 ` Aviv Bakal
2026-06-01 18:23 ` Robin Murphy
2026-06-02 5:49 ` Ilkka Koskinen
2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
2026-06-03 15:00 ` [PATCH v5 0/2] " Aviv Bakal
2 siblings, 2 replies; 28+ messages in thread
From: Aviv Bakal @ 2026-05-31 11:04 UTC (permalink / raw)
To: robin.murphy, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev,
blakgeof, ilkka
The amount of data we need to store all the per-DTM counter and
watchpoint allocations is already testing the limits of hw_perf_event,
and future CMNs are only likely to keep growing larger, so move these
arrays out to separate memory allocations. As part of that we can use
an explicit union for allocating cycle counters to dtc_cycles events,
which is arguably nicer anyway.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Aviv Bakal <avivb@amazon.com>
---
drivers/perf/arm-cmn.c | 89 ++++++++++++++++++++++++++++--------------
1 file changed, 59 insertions(+), 30 deletions(-)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index f5305c8fdca4..f1978a53d1c1 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -597,17 +597,14 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {}
struct arm_cmn_hw_event {
struct arm_cmn_node *dn;
- u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
+ union {
+ u64 *dtm_idx;
+ int cc_idx;
+ };
+ unsigned long *wp_idx;
s8 dtc_idx[CMN_MAX_DTCS];
u8 num_dns;
u8 dtm_offset;
-
- /*
- * WP config registers are divided to UP and DOWN events. We need to
- * keep to track only one of them.
- */
- DECLARE_BITMAP(wp_idx, CMN_MAX_XPS);
-
bool wide_sel;
enum cmn_filter_select filter_sel;
};
@@ -625,25 +622,42 @@ static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event)
return (struct arm_cmn_hw_event *)&event->hw;
}
-static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val)
+static void arm_cmn_set_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos, unsigned int val)
{
- x[pos / 32] |= (u64)val << ((pos % 32) * 2);
+ hw->dtm_idx[pos / 32] |= (u64)val << ((pos % 32) * 2);
}
-static unsigned int arm_cmn_get_index(u64 x[], unsigned int pos)
+static unsigned int arm_cmn_get_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos)
{
- return (x[pos / 32] >> ((pos % 32) * 2)) & 3;
+ return (hw->dtm_idx[pos / 32] >> ((pos % 32) * 2)) & 3;
}
-static void arm_cmn_set_wp_idx(unsigned long *wp_idx, unsigned int pos, bool val)
+static u64 *arm_cmn_alloc_dtm_idx(void)
+{
+ return kzalloc(DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64), GFP_KERNEL);
+}
+
+static void arm_cmn_set_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos, bool val)
{
if (val)
- set_bit(pos, wp_idx);
+ set_bit(pos, hw->wp_idx);
+}
+
+static unsigned int arm_cmn_get_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos)
+{
+ return test_bit(pos, hw->wp_idx);
+}
+
+static unsigned long *arm_cmn_alloc_wp_idx(void)
+{
+ return bitmap_zalloc(CMN_MAX_XPS, GFP_KERNEL);
}
-static unsigned int arm_cmn_get_wp_idx(unsigned long *wp_idx, unsigned int pos)
+static void arm_cmn_clear_idx(struct arm_cmn_hw_event *hw)
{
- return test_bit(pos, wp_idx);
+ memset(hw->dtm_idx, 0, DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64));
+ if (hw->wp_idx)
+ bitmap_zero(hw->wp_idx, CMN_MAX_XPS);
}
struct arm_cmn_event_attr {
@@ -1376,7 +1390,7 @@ static int arm_cmn_get_assigned_wp_idx(struct perf_event *event,
struct arm_cmn_hw_event *hw,
unsigned int pos)
{
- return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw->wp_idx, pos);
+ return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw, pos);
}
static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm,
@@ -1387,7 +1401,7 @@ static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm,
struct arm_cmn_hw_event *hw = to_cmn_hw(event);
dtm->wp_event[wp_idx] = hw->dtc_idx[dtc];
- arm_cmn_set_wp_idx(hw->wp_idx, pos, wp_idx - CMN_EVENT_EVENTID(event));
+ arm_cmn_set_wp_idx(hw, pos, wp_idx - CMN_EVENT_EVENTID(event));
}
static u32 arm_cmn_wp_config(struct perf_event *event, int wp_idx)
@@ -1458,7 +1472,7 @@ static u64 arm_cmn_read_dtm(struct arm_cmn *cmn, struct arm_cmn_hw_event *hw,
dtm = &cmn->dtms[dn->dtm] + hw->dtm_offset;
reg = readq_relaxed(dtm->base + offset);
}
- dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ dtm_idx = arm_cmn_get_dtm_idx(hw, i);
count += (u16)(reg >> (dtm_idx * 16));
}
return count;
@@ -1505,7 +1519,7 @@ static void arm_cmn_event_read(struct perf_event *event)
unsigned long flags;
if (CMN_EVENT_TYPE(event) == CMN_TYPE_DTC) {
- delta = arm_cmn_read_cc(cmn->dtc + hw->dtc_idx[0]);
+ delta = arm_cmn_read_cc(cmn->dtc + hw->cc_idx);
local64_add(delta, &event->count);
return;
}
@@ -1572,7 +1586,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags)
int i;
if (type == CMN_TYPE_DTC) {
- struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0];
+ struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx;
writel_relaxed(CMN_DT_DTC_CTL_DT_EN | CMN_DT_DTC_CTL_CG_DISABLE,
dtc->base + CMN_DT_DTC_CTL);
@@ -1590,7 +1604,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags)
writeq_relaxed(mask, base + CMN_DTM_WPn_MASK(wp_idx));
}
} else for_each_hw_dn(hw, dn, i) {
- int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
arm_cmn_set_event_sel_lo(dn, dtm_idx, CMN_EVENT_EVENTID(event),
hw->wide_sel);
@@ -1606,7 +1620,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags)
int i;
if (type == CMN_TYPE_DTC) {
- struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0];
+ struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx;
dtc->cc_active = false;
writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL);
@@ -1619,7 +1633,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags)
writeq_relaxed(~0ULL, base + CMN_DTM_WPn_VAL(wp_idx));
}
} else for_each_hw_dn(hw, dn, i) {
- int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
arm_cmn_set_event_sel_lo(dn, dtm_idx, 0, hw->wide_sel);
}
@@ -1763,6 +1777,14 @@ static enum cmn_filter_select arm_cmn_filter_sel(const struct arm_cmn *cmn,
}
+static void arm_cmn_event_destroy(struct perf_event *event)
+{
+ struct arm_cmn_hw_event *hw = to_cmn_hw(event);
+
+ kfree(hw->dtm_idx);
+ bitmap_free(hw->wp_idx);
+}
+
static int arm_cmn_event_init(struct perf_event *event)
{
struct arm_cmn *cmn = to_cmn(event->pmu);
@@ -1787,6 +1809,11 @@ static int arm_cmn_event_init(struct perf_event *event)
if (type == CMN_TYPE_DTC)
return arm_cmn_validate_group(cmn, event);
+ event->destroy = arm_cmn_event_destroy;
+ hw->dtm_idx = arm_cmn_alloc_dtm_idx();
+ if (!hw->dtm_idx)
+ return -ENOMEM;
+
eventid = CMN_EVENT_EVENTID(event);
/* For watchpoints we need the actual XP node here */
if (type == CMN_TYPE_WP) {
@@ -1797,6 +1824,9 @@ static int arm_cmn_event_init(struct perf_event *event)
/* ...but the DTM may depend on which port we're watching */
if (cmn->multi_dtm)
hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2;
+ hw->wp_idx = arm_cmn_alloc_wp_idx();
+ if (!hw->wp_idx)
+ return -ENOMEM;
} else if (type == CMN_TYPE_XP &&
(cmn->part == PART_CMN700 || cmn->part == PART_CMN_S3)) {
hw->wide_sel = true;
@@ -1847,7 +1877,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event,
while (i--) {
struct arm_cmn_dtm *dtm = &cmn->dtms[hw->dn[i].dtm] + hw->dtm_offset;
- unsigned int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ unsigned int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
if (type == CMN_TYPE_WP) {
int wp_idx = arm_cmn_get_assigned_wp_idx(event, hw, i);
@@ -1861,8 +1891,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event,
dtm->pmu_config_low &= ~CMN__PMEVCNT_PAIRED(dtm_idx);
writel_relaxed(dtm->pmu_config_low, dtm->base + CMN_DTM_PMU_CONFIG);
}
- memset(hw->dtm_idx, 0, sizeof(hw->dtm_idx));
- memset(hw->wp_idx, 0, sizeof(hw->wp_idx));
+ arm_cmn_clear_idx(hw);
for_each_hw_dtc_idx(hw, j, idx)
cmn->dtc[j].counters[idx] = NULL;
@@ -1882,7 +1911,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags)
return -ENOSPC;
cmn->dtc[i].cycles = event;
- hw->dtc_idx[0] = i;
+ hw->cc_idx = i;
if (flags & PERF_EF_START)
arm_cmn_event_start(event, 0);
@@ -1947,7 +1976,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags)
goto free_dtms;
}
- arm_cmn_set_index(hw->dtm_idx, i, dtm_idx);
+ arm_cmn_set_dtm_idx(hw, i, dtm_idx);
dtm->input_sel[dtm_idx] = input_sel;
shift = CMN__PMEVCNTn_GLOBAL_NUM_SHIFT(dtm_idx);
@@ -1980,7 +2009,7 @@ static void arm_cmn_event_del(struct perf_event *event, int flags)
arm_cmn_event_stop(event, PERF_EF_UPDATE);
if (type == CMN_TYPE_DTC)
- cmn->dtc[hw->dtc_idx[0]].cycles = NULL;
+ cmn->dtc[hw->cc_idx].cycles = NULL;
else
arm_cmn_event_clear(cmn, event, hw->num_dns);
}
--
2.47.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal
2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal
@ 2026-05-31 11:04 ` Aviv Bakal
2026-05-31 11:32 ` sashiko-bot
` (2 more replies)
2026-06-03 15:00 ` [PATCH v5 0/2] " Aviv Bakal
2 siblings, 3 replies; 28+ messages in thread
From: Aviv Bakal @ 2026-05-31 11:04 UTC (permalink / raw)
To: robin.murphy, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev,
blakgeof, ilkka
Graviton5 uses a customised CMN-S3 implementation where certain
discovery registers report zeroed fields. Add the following workarounds:
- Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant.
- Derive the DTC domain from the XP node ID, since the unit info
register reports it as zero.
- Set the DTC logical ID from the XP's logical ID, since the node info
register's logical ID field is also zeroed.
Signed-off-by: Aviv Bakal <avivb@amazon.com>
---
drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index f1978a53d1c1..3fb71d9a57eb 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -31,7 +31,8 @@
#define CMN_CHILD_NODE_ADDR GENMASK(29, 0)
#define CMN_CHILD_NODE_EXTERNAL BIT(31)
-#define CMN_MAX_DIMENSION 12
+/* Some implementations use a mesh larger than the architectural max of 12 */
+#define CMN_MAX_DIMENSION 14
#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
#define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4)
@@ -214,6 +215,8 @@ enum cmn_part {
PART_CMN700 = 0x43c,
PART_CI700 = 0x43a,
PART_CMN_S3 = 0x43e,
+ /* Synthetic part number, overridden to PART_CMN_S3 during discovery */
+ PART_GRAVITON5 = 0xa5,
};
/* CMN-600 r0px shouldn't exist in silicon, thankfully */
@@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
}
+static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
+{
+ unsigned int x = (xp_id >> 7) & 0xf;
+ unsigned int y = (xp_id >> 3) & 0xf;
+
+ /*
+ * The unit info register reads as zero; derive the DTC domain from
+ * the XP's mesh coordinates over the 10x14 mesh.
+ */
+ return (x / 5) + (y / 7) * 2;
+}
+
static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node)
{
int level;
@@ -2295,6 +2310,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
u64 reg;
int i, j;
size_t sz;
+ bool graviton5_workaround = false;
arm_cmn_init_node_info(cmn, rgn_offset, &cfg);
if (cfg.type != CMN_TYPE_CFG)
@@ -2305,6 +2321,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01);
part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg);
part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8;
+
+ /* Graviton5 has a customised CMN-S3 which needs some fixups */
+ if (cmn->part == PART_GRAVITON5) {
+ cmn->part = PART_CMN_S3;
+ graviton5_workaround = true;
+ }
+
/* 600AE is close enough that it's not really worth more complexity */
if (part == PART_CMN600AE)
part = PART_CMN600;
@@ -2394,6 +2417,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
if (cmn->part == PART_CMN600)
xp->dtc = -1;
+ else if (graviton5_workaround)
+ xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id);
else
xp->dtc = arm_cmn_dtc_domain(cmn, xp_region);
@@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
switch (dn->type) {
case CMN_TYPE_DTC:
+ if (graviton5_workaround) {
+ /* Node info logical ID is zeroed; use the XP's */
+ dn->logid = xp->logid;
+ }
cmn->num_dtcs++;
dn++;
break;
@@ -2687,6 +2716,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = {
{ "ARMHC650" },
{ "ARMHC700" },
{ "ARMHC003" },
+ { "AMZN0070", PART_GRAVITON5 },
{}
};
MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match);
--
2.47.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
@ 2026-05-31 11:32 ` sashiko-bot
2026-06-01 17:45 ` Robin Murphy
2026-06-01 17:48 ` Robin Murphy
2026-06-02 5:55 ` Ilkka Koskinen
2 siblings, 1 reply; 28+ messages in thread
From: sashiko-bot @ 2026-05-31 11:32 UTC (permalink / raw)
To: Aviv Bakal; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [High] Setting the DTC node's logical ID to the parent XP's logical ID causes DTC nodes to be sorted out of domain order, breaking PMU event routing.
- [High] The hardcoded bit-shifts in arm_cmn_graviton5_dtc_domain() can produce DTC domains >= 4, leading to out-of-bounds writes in arm_cmn_event_init().
Pre-existing issues:
- [High] Memory allocated for `hw->dtm_idx` and `hw->wp_idx` is leaked if `arm_cmn_event_init` fails.
--
commit c34c59523c8b94745143d1d1114415bb33c12d3e
Author: Aviv Bakal <avivb@amazon.com>
perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
Graviton5 uses a customised CMN-S3 implementation where certain discovery
registers report zeroed fields. This patch adds workarounds to identify the
variant, derive the DTC domain from the XP node ID, and set the DTC logical
ID from the XP's logical ID.
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index f1978a53d1c19..3fb71d9a57eb5 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
> return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
> }
>
> +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
> +{
> + unsigned int x = (xp_id >> 7) & 0xf;
> + unsigned int y = (xp_id >> 3) & 0xf;
> +
> + /*
> + * The unit info register reads as zero; derive the DTC domain from
> + * the XP's mesh coordinates over the 10x14 mesh.
> + */
> + return (x / 5) + (y / 7) * 2;
> +}
[Severity: High]
Can the hardcoded bit-shifts extract overlapping bits if the mesh dimensions
are smaller than 10x14?
If the system describes a smaller mesh (e.g., 8x8), arm_cmn_xyidbits
evaluates to 3. This would cause the hardcoded shifts (xp_id >> 7) and
(xp_id >> 3) to extract overlapping bits that inflate y (e.g., y = 15),
yielding a domain of 5.
If a 14x10 mesh is used, the x coordinate can natively reach 13, yielding
a domain index of 4 ((13 / 5) + (9 / 7) * 2).
When an unvalidated domain index >= 4 is assigned to dn->dtc, does it
cause an out-of-bounds write in arm_cmn_event_init()?
drivers/perf/arm-cmn.c:arm_cmn_event_init() {
...
hw->dtc_idx[dn->dtc] = 0;
...
}
Since hw->dtc_idx is a fixed 4-byte array (s8 dtc_idx[CMN_MAX_DTCS]), an
index >= 4 would write past its bounds and corrupt adjacent fields like
hw->num_dns.
[ ... ]
> @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>
> switch (dn->type) {
> case CMN_TYPE_DTC:
> + if (graviton5_workaround) {
> + /* Node info logical ID is zeroed; use the XP's */
> + dn->logid = xp->logid;
> + }
[Severity: High]
Does overriding the DTC logical ID with the parent XP's logical ID cause
DTC nodes to be sorted out of domain order?
In arm_cmn_init_dtcs(), CMN_TYPE_DTC nodes are sorted by their logical ID
(dn->logid), which determines their index in the dynamically allocated
cmn->dtc array. Standard CMN parts have their logid read from hardware as
the exact domain index (0-3).
Because the domain index here is a 2D block distribution, the physical XP IDs
do not monotonically increase with the domain index. For example, an XP at
x=7, y=0 belongs to domain 1 but has a smaller logid than an XP at
x=2, y=3 which belongs to domain 0.
If cmn->dtc[0] points to the hardware base of domain 1 instead of domain 0,
will events programmed for domain 0 be routed to domain 1's hardware, breaking
PMU event counting?
Would it be better to explicitly set this as:
dn->logid = arm_cmn_graviton5_dtc_domain(xp->id);
[Severity: High]
This is a pre-existing issue, but does arm_cmn_event_init() leak memory
if initialization fails?
drivers/perf/arm-cmn.c:arm_cmn_event_init() {
...
if (arm_cmn_is_watchpoint(event))
if (eventid != CMN_WP_UP && eventid != CMN_WP_DOWN)
return -EINVAL;
...
return arm_cmn_validate_group(cmn, event);
}
The driver dynamically allocates memory using arm_cmn_alloc_dtm_idx() and
arm_cmn_alloc_wp_idx(). If the initialization subsequently encounters an
error, such as an invalid eventid for a watchpoint or a failure in
arm_cmn_validate_group(), it returns an error code without freeing the
allocated memory.
The core perf code perf_try_init_event() does not call the driver's
event->destroy() callback if pmu->event_init() fails, expecting the driver
to clean up its own resources. Could this allow memory leaks if invalid events
are repeatedly created?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260531110447.10095-1-avivb@amazon.com?part=2
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-31 11:32 ` sashiko-bot
@ 2026-06-01 17:45 ` Robin Murphy
2026-06-03 14:57 ` Aviv Bakal
0 siblings, 1 reply; 28+ messages in thread
From: Robin Murphy @ 2026-06-01 17:45 UTC (permalink / raw)
To: sashiko-reviews, Aviv Bakal; +Cc: linux-perf-users
For the benefit of any humans...
On 2026-05-31 12:32 pm, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
>
> New issues:
> - [High] Setting the DTC node's logical ID to the parent XP's logical ID causes DTC nodes to be sorted out of domain order, breaking PMU event routing.
> - [High] The hardcoded bit-shifts in arm_cmn_graviton5_dtc_domain() can produce DTC domains >= 4, leading to out-of-bounds writes in arm_cmn_event_init().
>
> Pre-existing issues:
> - [High] Memory allocated for `hw->dtm_idx` and `hw->wp_idx` is leaked if `arm_cmn_event_init` fails.
> --
>
> commit c34c59523c8b94745143d1d1114415bb33c12d3e
> Author: Aviv Bakal <avivb@amazon.com>
>
> perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
>
> Graviton5 uses a customised CMN-S3 implementation where certain discovery
> registers report zeroed fields. This patch adds workarounds to identify the
> variant, derive the DTC domain from the XP node ID, and set the DTC logical
> ID from the XP's logical ID.
>
>> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
>> index f1978a53d1c19..3fb71d9a57eb5 100644
>> --- a/drivers/perf/arm-cmn.c
>> +++ b/drivers/perf/arm-cmn.c
>> @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
>> return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
>> }
>>
>> +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
>> +{
>> + unsigned int x = (xp_id >> 7) & 0xf;
>> + unsigned int y = (xp_id >> 3) & 0xf;
>> +
>> + /*
>> + * The unit info register reads as zero; derive the DTC domain from
>> + * the XP's mesh coordinates over the 10x14 mesh.
>> + */
>> + return (x / 5) + (y / 7) * 2;
>> +}
>
> [Severity: High]
> Can the hardcoded bit-shifts extract overlapping bits if the mesh dimensions
> are smaller than 10x14?
No, because this code path is specific to things which firmware claims
to be the Graviton5 CMN, and as such it hard-codes the known topology of
the Graviton5 CMN. We have to assume we can trust what firmware tells us
about the platform, or we wouldn't even be able to boot.
> If the system describes a smaller mesh (e.g., 8x8), arm_cmn_xyidbits
> evaluates to 3. This would cause the hardcoded shifts (xp_id >> 7) and
> (xp_id >> 3) to extract overlapping bits that inflate y (e.g., y = 15),
> yielding a domain of 5.
>
> If a 14x10 mesh is used, the x coordinate can natively reach 13, yielding
> a domain index of 4 ((13 / 5) + (9 / 7) * 2).
>
> When an unvalidated domain index >= 4 is assigned to dn->dtc, does it
> cause an out-of-bounds write in arm_cmn_event_init()?
No, because there is no such thing. The maximum possible number of DTC
domains is 4. Even when correctly generated, the ID field in the
register is only 2 bits:
https://developer.arm.com/documentation/107858/0205/Programmers-model/Register-summary/MXP-register-summary/por-dtm-unit-info?lang=en
> drivers/perf/arm-cmn.c:arm_cmn_event_init() {
> ...
> hw->dtc_idx[dn->dtc] = 0;
> ...
> }
>
> Since hw->dtc_idx is a fixed 4-byte array (s8 dtc_idx[CMN_MAX_DTCS]), an
> index >= 4 would write past its bounds and corrupt adjacent fields like
> hw->num_dns.
>
> [ ... ]
>
>> @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>>
>> switch (dn->type) {
>> case CMN_TYPE_DTC:
>> + if (graviton5_workaround) {
>> + /* Node info logical ID is zeroed; use the XP's */
>> + dn->logid = xp->logid;
>> + }
>
> [Severity: High]
> Does overriding the DTC logical ID with the parent XP's logical ID cause
> DTC nodes to be sorted out of domain order?
>
> In arm_cmn_init_dtcs(), CMN_TYPE_DTC nodes are sorted by their logical ID
> (dn->logid), which determines their index in the dynamically allocated
> cmn->dtc array. Standard CMN parts have their logid read from hardware as
> the exact domain index (0-3).
No, the DTC logical ID is read from the logical_id field in
por_dt_node_info, and while the main HN-D DTC always has an ID of 0,
additional HN-T sub-DTCs should normally inherit the logical ID of their
parent XP.
> Because the domain index here is a 2D block distribution, the physical XP IDs
> do not monotonically increase with the domain index. For example, an XP at
> x=7, y=0 belongs to domain 1 but has a smaller logid than an XP at
> x=2, y=3 which belongs to domain 0.
That is true, but since neither of those XPs has a DTC in this
configuration, it is also entirely irrelevant.
> If cmn->dtc[0] points to the hardware base of domain 1 instead of domain 0,
> will events programmed for domain 0 be routed to domain 1's hardware, breaking
> PMU event counting?
>
> Would it be better to explicitly set this as:
>
> dn->logid = arm_cmn_graviton5_dtc_domain(xp->id);
No, because that would then be inconsistent with how the driver works
normally on non-Graviton5 systems, and as Aviv already confirmed, it
doesn't change the outcome in this case anyway.
> [Severity: High]
> This is a pre-existing issue, but does arm_cmn_event_init() leak memory
> if initialization fails?
>
> drivers/perf/arm-cmn.c:arm_cmn_event_init() {
> ...
> if (arm_cmn_is_watchpoint(event))
> if (eventid != CMN_WP_UP && eventid != CMN_WP_DOWN)
> return -EINVAL;
> ...
> return arm_cmn_validate_group(cmn, event);
> }
>
> The driver dynamically allocates memory using arm_cmn_alloc_dtm_idx() and
> arm_cmn_alloc_wp_idx(). If the initialization subsequently encounters an
> error, such as an invalid eventid for a watchpoint or a failure in
> arm_cmn_validate_group(), it returns an error code without freeing the
> allocated memory.
>
> The core perf code perf_try_init_event() does not call the driver's
> event->destroy() callback if pmu->event_init() fails, expecting the driver
> to clean up its own resources. Could this allow memory leaks if invalid events
> are repeatedly created?
No, __free_event() invokes event->destroy(), and if perf_alloc_event()
could fail without freeing the event then that would be a much greater
memory leak anyway.
Robin.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
2026-05-31 11:32 ` sashiko-bot
@ 2026-06-01 17:48 ` Robin Murphy
2026-06-02 5:55 ` Ilkka Koskinen
2 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2026-06-01 17:48 UTC (permalink / raw)
To: Aviv Bakal, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof,
ilkka
On 2026-05-31 12:04 pm, Aviv Bakal wrote:
> Graviton5 uses a customised CMN-S3 implementation where certain
> discovery registers report zeroed fields. Add the following workarounds:
>
> - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant.
> - Derive the DTC domain from the XP node ID, since the unit info
> register reports it as zero.
> - Set the DTC logical ID from the XP's logical ID, since the node info
> register's logical ID field is also zeroed.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Aviv Bakal <avivb@amazon.com>
> ---
> drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index f1978a53d1c1..3fb71d9a57eb 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -31,7 +31,8 @@
> #define CMN_CHILD_NODE_ADDR GENMASK(29, 0)
> #define CMN_CHILD_NODE_EXTERNAL BIT(31)
>
> -#define CMN_MAX_DIMENSION 12
> +/* Some implementations use a mesh larger than the architectural max of 12 */
> +#define CMN_MAX_DIMENSION 14
> #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
> #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4)
>
> @@ -214,6 +215,8 @@ enum cmn_part {
> PART_CMN700 = 0x43c,
> PART_CI700 = 0x43a,
> PART_CMN_S3 = 0x43e,
> + /* Synthetic part number, overridden to PART_CMN_S3 during discovery */
> + PART_GRAVITON5 = 0xa5,
> };
>
> /* CMN-600 r0px shouldn't exist in silicon, thankfully */
> @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
> return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
> }
>
> +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
> +{
> + unsigned int x = (xp_id >> 7) & 0xf;
> + unsigned int y = (xp_id >> 3) & 0xf;
> +
> + /*
> + * The unit info register reads as zero; derive the DTC domain from
> + * the XP's mesh coordinates over the 10x14 mesh.
> + */
> + return (x / 5) + (y / 7) * 2;
> +}
> +
> static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node)
> {
> int level;
> @@ -2295,6 +2310,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
> u64 reg;
> int i, j;
> size_t sz;
> + bool graviton5_workaround = false;
>
> arm_cmn_init_node_info(cmn, rgn_offset, &cfg);
> if (cfg.type != CMN_TYPE_CFG)
> @@ -2305,6 +2321,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
> reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01);
> part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg);
> part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8;
> +
> + /* Graviton5 has a customised CMN-S3 which needs some fixups */
> + if (cmn->part == PART_GRAVITON5) {
> + cmn->part = PART_CMN_S3;
> + graviton5_workaround = true;
> + }
> +
> /* 600AE is close enough that it's not really worth more complexity */
> if (part == PART_CMN600AE)
> part = PART_CMN600;
> @@ -2394,6 +2417,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>
> if (cmn->part == PART_CMN600)
> xp->dtc = -1;
> + else if (graviton5_workaround)
> + xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id);
> else
> xp->dtc = arm_cmn_dtc_domain(cmn, xp_region);
>
> @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>
> switch (dn->type) {
> case CMN_TYPE_DTC:
> + if (graviton5_workaround) {
> + /* Node info logical ID is zeroed; use the XP's */
> + dn->logid = xp->logid;
> + }
> cmn->num_dtcs++;
> dn++;
> break;
> @@ -2687,6 +2716,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = {
> { "ARMHC650" },
> { "ARMHC700" },
> { "ARMHC003" },
> + { "AMZN0070", PART_GRAVITON5 },
> {}
> };
> MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match);
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event
2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal
@ 2026-06-01 18:23 ` Robin Murphy
2026-06-02 5:49 ` Ilkka Koskinen
1 sibling, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2026-06-01 18:23 UTC (permalink / raw)
To: Aviv Bakal, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof,
ilkka
On 2026-05-31 12:04 pm, Aviv Bakal wrote:
> The amount of data we need to store all the per-DTM counter and
> watchpoint allocations is already testing the limits of hw_perf_event,
> and future CMNs are only likely to keep growing larger, so move these
> arrays out to separate memory allocations. As part of that we can use
> an explicit union for allocating cycle counters to dtc_cycles events,
> which is arguably nicer anyway.
Nit: not that I'm particularly precious about it, but this has lost my
original authorship - I really should leave the technically-redundant
"From:" line in when I post inline patches, as frankly even I prefer to
just copy-paste the patch part rather than export the whole mail with
its original headers for `git am --scissors`...
Thanks,
Robin.
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Aviv Bakal <avivb@amazon.com>
> ---
> drivers/perf/arm-cmn.c | 89 ++++++++++++++++++++++++++++--------------
> 1 file changed, 59 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index f5305c8fdca4..f1978a53d1c1 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -597,17 +597,14 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {}
>
> struct arm_cmn_hw_event {
> struct arm_cmn_node *dn;
> - u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
> + union {
> + u64 *dtm_idx;
> + int cc_idx;
> + };
> + unsigned long *wp_idx;
> s8 dtc_idx[CMN_MAX_DTCS];
> u8 num_dns;
> u8 dtm_offset;
> -
> - /*
> - * WP config registers are divided to UP and DOWN events. We need to
> - * keep to track only one of them.
> - */
> - DECLARE_BITMAP(wp_idx, CMN_MAX_XPS);
> -
> bool wide_sel;
> enum cmn_filter_select filter_sel;
> };
> @@ -625,25 +622,42 @@ static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event)
> return (struct arm_cmn_hw_event *)&event->hw;
> }
>
> -static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val)
> +static void arm_cmn_set_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos, unsigned int val)
> {
> - x[pos / 32] |= (u64)val << ((pos % 32) * 2);
> + hw->dtm_idx[pos / 32] |= (u64)val << ((pos % 32) * 2);
> }
>
> -static unsigned int arm_cmn_get_index(u64 x[], unsigned int pos)
> +static unsigned int arm_cmn_get_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos)
> {
> - return (x[pos / 32] >> ((pos % 32) * 2)) & 3;
> + return (hw->dtm_idx[pos / 32] >> ((pos % 32) * 2)) & 3;
> }
>
> -static void arm_cmn_set_wp_idx(unsigned long *wp_idx, unsigned int pos, bool val)
> +static u64 *arm_cmn_alloc_dtm_idx(void)
> +{
> + return kzalloc(DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64), GFP_KERNEL);
> +}
> +
> +static void arm_cmn_set_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos, bool val)
> {
> if (val)
> - set_bit(pos, wp_idx);
> + set_bit(pos, hw->wp_idx);
> +}
> +
> +static unsigned int arm_cmn_get_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos)
> +{
> + return test_bit(pos, hw->wp_idx);
> +}
> +
> +static unsigned long *arm_cmn_alloc_wp_idx(void)
> +{
> + return bitmap_zalloc(CMN_MAX_XPS, GFP_KERNEL);
> }
>
> -static unsigned int arm_cmn_get_wp_idx(unsigned long *wp_idx, unsigned int pos)
> +static void arm_cmn_clear_idx(struct arm_cmn_hw_event *hw)
> {
> - return test_bit(pos, wp_idx);
> + memset(hw->dtm_idx, 0, DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64));
> + if (hw->wp_idx)
> + bitmap_zero(hw->wp_idx, CMN_MAX_XPS);
> }
>
> struct arm_cmn_event_attr {
> @@ -1376,7 +1390,7 @@ static int arm_cmn_get_assigned_wp_idx(struct perf_event *event,
> struct arm_cmn_hw_event *hw,
> unsigned int pos)
> {
> - return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw->wp_idx, pos);
> + return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw, pos);
> }
>
> static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm,
> @@ -1387,7 +1401,7 @@ static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm,
> struct arm_cmn_hw_event *hw = to_cmn_hw(event);
>
> dtm->wp_event[wp_idx] = hw->dtc_idx[dtc];
> - arm_cmn_set_wp_idx(hw->wp_idx, pos, wp_idx - CMN_EVENT_EVENTID(event));
> + arm_cmn_set_wp_idx(hw, pos, wp_idx - CMN_EVENT_EVENTID(event));
> }
>
> static u32 arm_cmn_wp_config(struct perf_event *event, int wp_idx)
> @@ -1458,7 +1472,7 @@ static u64 arm_cmn_read_dtm(struct arm_cmn *cmn, struct arm_cmn_hw_event *hw,
> dtm = &cmn->dtms[dn->dtm] + hw->dtm_offset;
> reg = readq_relaxed(dtm->base + offset);
> }
> - dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
> + dtm_idx = arm_cmn_get_dtm_idx(hw, i);
> count += (u16)(reg >> (dtm_idx * 16));
> }
> return count;
> @@ -1505,7 +1519,7 @@ static void arm_cmn_event_read(struct perf_event *event)
> unsigned long flags;
>
> if (CMN_EVENT_TYPE(event) == CMN_TYPE_DTC) {
> - delta = arm_cmn_read_cc(cmn->dtc + hw->dtc_idx[0]);
> + delta = arm_cmn_read_cc(cmn->dtc + hw->cc_idx);
> local64_add(delta, &event->count);
> return;
> }
> @@ -1572,7 +1586,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags)
> int i;
>
> if (type == CMN_TYPE_DTC) {
> - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0];
> + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx;
>
> writel_relaxed(CMN_DT_DTC_CTL_DT_EN | CMN_DT_DTC_CTL_CG_DISABLE,
> dtc->base + CMN_DT_DTC_CTL);
> @@ -1590,7 +1604,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags)
> writeq_relaxed(mask, base + CMN_DTM_WPn_MASK(wp_idx));
> }
> } else for_each_hw_dn(hw, dn, i) {
> - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
> + int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
>
> arm_cmn_set_event_sel_lo(dn, dtm_idx, CMN_EVENT_EVENTID(event),
> hw->wide_sel);
> @@ -1606,7 +1620,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags)
> int i;
>
> if (type == CMN_TYPE_DTC) {
> - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0];
> + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx;
>
> dtc->cc_active = false;
> writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL);
> @@ -1619,7 +1633,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags)
> writeq_relaxed(~0ULL, base + CMN_DTM_WPn_VAL(wp_idx));
> }
> } else for_each_hw_dn(hw, dn, i) {
> - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
> + int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
>
> arm_cmn_set_event_sel_lo(dn, dtm_idx, 0, hw->wide_sel);
> }
> @@ -1763,6 +1777,14 @@ static enum cmn_filter_select arm_cmn_filter_sel(const struct arm_cmn *cmn,
> }
>
>
> +static void arm_cmn_event_destroy(struct perf_event *event)
> +{
> + struct arm_cmn_hw_event *hw = to_cmn_hw(event);
> +
> + kfree(hw->dtm_idx);
> + bitmap_free(hw->wp_idx);
> +}
> +
> static int arm_cmn_event_init(struct perf_event *event)
> {
> struct arm_cmn *cmn = to_cmn(event->pmu);
> @@ -1787,6 +1809,11 @@ static int arm_cmn_event_init(struct perf_event *event)
> if (type == CMN_TYPE_DTC)
> return arm_cmn_validate_group(cmn, event);
>
> + event->destroy = arm_cmn_event_destroy;
> + hw->dtm_idx = arm_cmn_alloc_dtm_idx();
> + if (!hw->dtm_idx)
> + return -ENOMEM;
> +
> eventid = CMN_EVENT_EVENTID(event);
> /* For watchpoints we need the actual XP node here */
> if (type == CMN_TYPE_WP) {
> @@ -1797,6 +1824,9 @@ static int arm_cmn_event_init(struct perf_event *event)
> /* ...but the DTM may depend on which port we're watching */
> if (cmn->multi_dtm)
> hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2;
> + hw->wp_idx = arm_cmn_alloc_wp_idx();
> + if (!hw->wp_idx)
> + return -ENOMEM;
> } else if (type == CMN_TYPE_XP &&
> (cmn->part == PART_CMN700 || cmn->part == PART_CMN_S3)) {
> hw->wide_sel = true;
> @@ -1847,7 +1877,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event,
>
> while (i--) {
> struct arm_cmn_dtm *dtm = &cmn->dtms[hw->dn[i].dtm] + hw->dtm_offset;
> - unsigned int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
> + unsigned int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
>
> if (type == CMN_TYPE_WP) {
> int wp_idx = arm_cmn_get_assigned_wp_idx(event, hw, i);
> @@ -1861,8 +1891,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event,
> dtm->pmu_config_low &= ~CMN__PMEVCNT_PAIRED(dtm_idx);
> writel_relaxed(dtm->pmu_config_low, dtm->base + CMN_DTM_PMU_CONFIG);
> }
> - memset(hw->dtm_idx, 0, sizeof(hw->dtm_idx));
> - memset(hw->wp_idx, 0, sizeof(hw->wp_idx));
> + arm_cmn_clear_idx(hw);
>
> for_each_hw_dtc_idx(hw, j, idx)
> cmn->dtc[j].counters[idx] = NULL;
> @@ -1882,7 +1911,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags)
> return -ENOSPC;
>
> cmn->dtc[i].cycles = event;
> - hw->dtc_idx[0] = i;
> + hw->cc_idx = i;
>
> if (flags & PERF_EF_START)
> arm_cmn_event_start(event, 0);
> @@ -1947,7 +1976,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags)
> goto free_dtms;
> }
>
> - arm_cmn_set_index(hw->dtm_idx, i, dtm_idx);
> + arm_cmn_set_dtm_idx(hw, i, dtm_idx);
>
> dtm->input_sel[dtm_idx] = input_sel;
> shift = CMN__PMEVCNTn_GLOBAL_NUM_SHIFT(dtm_idx);
> @@ -1980,7 +2009,7 @@ static void arm_cmn_event_del(struct perf_event *event, int flags)
> arm_cmn_event_stop(event, PERF_EF_UPDATE);
>
> if (type == CMN_TYPE_DTC)
> - cmn->dtc[hw->dtc_idx[0]].cycles = NULL;
> + cmn->dtc[hw->cc_idx].cycles = NULL;
> else
> arm_cmn_event_clear(cmn, event, hw->num_dns);
> }
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event
2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal
2026-06-01 18:23 ` Robin Murphy
@ 2026-06-02 5:49 ` Ilkka Koskinen
1 sibling, 0 replies; 28+ messages in thread
From: Ilkka Koskinen @ 2026-06-02 5:49 UTC (permalink / raw)
To: Aviv Bakal
Cc: robin.murphy, will, mark.rutland, linux-arm-kernel,
linux-perf-users, linux-kernel, zeev, blakgeof, ilkka
On Sun, 31 May 2026, Aviv Bakal wrote:
> The amount of data we need to store all the per-DTM counter and
> watchpoint allocations is already testing the limits of hw_perf_event,
> and future CMNs are only likely to keep growing larger, so move these
> arrays out to separate memory allocations. As part of that we can use
> an explicit union for allocating cycle counters to dtc_cycles events,
> which is arguably nicer anyway.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Aviv Bakal <avivb@amazon.com>
Thanks for the patch. I was wondering a while ago when the mesh grows too
big to handle it with the old way.
The patch looks good to me,
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cheers, Ilkka
> ---
> drivers/perf/arm-cmn.c | 89 ++++++++++++++++++++++++++++--------------
> 1 file changed, 59 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index f5305c8fdca4..f1978a53d1c1 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -597,17 +597,14 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {}
>
> struct arm_cmn_hw_event {
> struct arm_cmn_node *dn;
> - u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
> + union {
> + u64 *dtm_idx;
> + int cc_idx;
> + };
> + unsigned long *wp_idx;
> s8 dtc_idx[CMN_MAX_DTCS];
> u8 num_dns;
> u8 dtm_offset;
> -
> - /*
> - * WP config registers are divided to UP and DOWN events. We need to
> - * keep to track only one of them.
> - */
> - DECLARE_BITMAP(wp_idx, CMN_MAX_XPS);
> -
> bool wide_sel;
> enum cmn_filter_select filter_sel;
> };
> @@ -625,25 +622,42 @@ static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event)
> return (struct arm_cmn_hw_event *)&event->hw;
> }
>
> -static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val)
> +static void arm_cmn_set_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos, unsigned int val)
> {
> - x[pos / 32] |= (u64)val << ((pos % 32) * 2);
> + hw->dtm_idx[pos / 32] |= (u64)val << ((pos % 32) * 2);
> }
>
> -static unsigned int arm_cmn_get_index(u64 x[], unsigned int pos)
> +static unsigned int arm_cmn_get_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos)
> {
> - return (x[pos / 32] >> ((pos % 32) * 2)) & 3;
> + return (hw->dtm_idx[pos / 32] >> ((pos % 32) * 2)) & 3;
> }
>
> -static void arm_cmn_set_wp_idx(unsigned long *wp_idx, unsigned int pos, bool val)
> +static u64 *arm_cmn_alloc_dtm_idx(void)
> +{
> + return kzalloc(DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64), GFP_KERNEL);
> +}
> +
> +static void arm_cmn_set_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos, bool val)
> {
> if (val)
> - set_bit(pos, wp_idx);
> + set_bit(pos, hw->wp_idx);
> +}
> +
> +static unsigned int arm_cmn_get_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos)
> +{
> + return test_bit(pos, hw->wp_idx);
> +}
> +
> +static unsigned long *arm_cmn_alloc_wp_idx(void)
> +{
> + return bitmap_zalloc(CMN_MAX_XPS, GFP_KERNEL);
> }
>
> -static unsigned int arm_cmn_get_wp_idx(unsigned long *wp_idx, unsigned int pos)
> +static void arm_cmn_clear_idx(struct arm_cmn_hw_event *hw)
> {
> - return test_bit(pos, wp_idx);
> + memset(hw->dtm_idx, 0, DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64));
> + if (hw->wp_idx)
> + bitmap_zero(hw->wp_idx, CMN_MAX_XPS);
> }
>
> struct arm_cmn_event_attr {
> @@ -1376,7 +1390,7 @@ static int arm_cmn_get_assigned_wp_idx(struct perf_event *event,
> struct arm_cmn_hw_event *hw,
> unsigned int pos)
> {
> - return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw->wp_idx, pos);
> + return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw, pos);
> }
>
> static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm,
> @@ -1387,7 +1401,7 @@ static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm,
> struct arm_cmn_hw_event *hw = to_cmn_hw(event);
>
> dtm->wp_event[wp_idx] = hw->dtc_idx[dtc];
> - arm_cmn_set_wp_idx(hw->wp_idx, pos, wp_idx - CMN_EVENT_EVENTID(event));
> + arm_cmn_set_wp_idx(hw, pos, wp_idx - CMN_EVENT_EVENTID(event));
> }
>
> static u32 arm_cmn_wp_config(struct perf_event *event, int wp_idx)
> @@ -1458,7 +1472,7 @@ static u64 arm_cmn_read_dtm(struct arm_cmn *cmn, struct arm_cmn_hw_event *hw,
> dtm = &cmn->dtms[dn->dtm] + hw->dtm_offset;
> reg = readq_relaxed(dtm->base + offset);
> }
> - dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
> + dtm_idx = arm_cmn_get_dtm_idx(hw, i);
> count += (u16)(reg >> (dtm_idx * 16));
> }
> return count;
> @@ -1505,7 +1519,7 @@ static void arm_cmn_event_read(struct perf_event *event)
> unsigned long flags;
>
> if (CMN_EVENT_TYPE(event) == CMN_TYPE_DTC) {
> - delta = arm_cmn_read_cc(cmn->dtc + hw->dtc_idx[0]);
> + delta = arm_cmn_read_cc(cmn->dtc + hw->cc_idx);
> local64_add(delta, &event->count);
> return;
> }
> @@ -1572,7 +1586,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags)
> int i;
>
> if (type == CMN_TYPE_DTC) {
> - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0];
> + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx;
>
> writel_relaxed(CMN_DT_DTC_CTL_DT_EN | CMN_DT_DTC_CTL_CG_DISABLE,
> dtc->base + CMN_DT_DTC_CTL);
> @@ -1590,7 +1604,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags)
> writeq_relaxed(mask, base + CMN_DTM_WPn_MASK(wp_idx));
> }
> } else for_each_hw_dn(hw, dn, i) {
> - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
> + int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
>
> arm_cmn_set_event_sel_lo(dn, dtm_idx, CMN_EVENT_EVENTID(event),
> hw->wide_sel);
> @@ -1606,7 +1620,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags)
> int i;
>
> if (type == CMN_TYPE_DTC) {
> - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0];
> + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx;
>
> dtc->cc_active = false;
> writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL);
> @@ -1619,7 +1633,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags)
> writeq_relaxed(~0ULL, base + CMN_DTM_WPn_VAL(wp_idx));
> }
> } else for_each_hw_dn(hw, dn, i) {
> - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
> + int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
>
> arm_cmn_set_event_sel_lo(dn, dtm_idx, 0, hw->wide_sel);
> }
> @@ -1763,6 +1777,14 @@ static enum cmn_filter_select arm_cmn_filter_sel(const struct arm_cmn *cmn,
> }
>
>
> +static void arm_cmn_event_destroy(struct perf_event *event)
> +{
> + struct arm_cmn_hw_event *hw = to_cmn_hw(event);
> +
> + kfree(hw->dtm_idx);
> + bitmap_free(hw->wp_idx);
> +}
> +
> static int arm_cmn_event_init(struct perf_event *event)
> {
> struct arm_cmn *cmn = to_cmn(event->pmu);
> @@ -1787,6 +1809,11 @@ static int arm_cmn_event_init(struct perf_event *event)
> if (type == CMN_TYPE_DTC)
> return arm_cmn_validate_group(cmn, event);
>
> + event->destroy = arm_cmn_event_destroy;
> + hw->dtm_idx = arm_cmn_alloc_dtm_idx();
> + if (!hw->dtm_idx)
> + return -ENOMEM;
> +
> eventid = CMN_EVENT_EVENTID(event);
> /* For watchpoints we need the actual XP node here */
> if (type == CMN_TYPE_WP) {
> @@ -1797,6 +1824,9 @@ static int arm_cmn_event_init(struct perf_event *event)
> /* ...but the DTM may depend on which port we're watching */
> if (cmn->multi_dtm)
> hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2;
> + hw->wp_idx = arm_cmn_alloc_wp_idx();
> + if (!hw->wp_idx)
> + return -ENOMEM;
> } else if (type == CMN_TYPE_XP &&
> (cmn->part == PART_CMN700 || cmn->part == PART_CMN_S3)) {
> hw->wide_sel = true;
> @@ -1847,7 +1877,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event,
>
> while (i--) {
> struct arm_cmn_dtm *dtm = &cmn->dtms[hw->dn[i].dtm] + hw->dtm_offset;
> - unsigned int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
> + unsigned int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
>
> if (type == CMN_TYPE_WP) {
> int wp_idx = arm_cmn_get_assigned_wp_idx(event, hw, i);
> @@ -1861,8 +1891,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event,
> dtm->pmu_config_low &= ~CMN__PMEVCNT_PAIRED(dtm_idx);
> writel_relaxed(dtm->pmu_config_low, dtm->base + CMN_DTM_PMU_CONFIG);
> }
> - memset(hw->dtm_idx, 0, sizeof(hw->dtm_idx));
> - memset(hw->wp_idx, 0, sizeof(hw->wp_idx));
> + arm_cmn_clear_idx(hw);
>
> for_each_hw_dtc_idx(hw, j, idx)
> cmn->dtc[j].counters[idx] = NULL;
> @@ -1882,7 +1911,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags)
> return -ENOSPC;
>
> cmn->dtc[i].cycles = event;
> - hw->dtc_idx[0] = i;
> + hw->cc_idx = i;
>
> if (flags & PERF_EF_START)
> arm_cmn_event_start(event, 0);
> @@ -1947,7 +1976,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags)
> goto free_dtms;
> }
>
> - arm_cmn_set_index(hw->dtm_idx, i, dtm_idx);
> + arm_cmn_set_dtm_idx(hw, i, dtm_idx);
>
> dtm->input_sel[dtm_idx] = input_sel;
> shift = CMN__PMEVCNTn_GLOBAL_NUM_SHIFT(dtm_idx);
> @@ -1980,7 +2009,7 @@ static void arm_cmn_event_del(struct perf_event *event, int flags)
> arm_cmn_event_stop(event, PERF_EF_UPDATE);
>
> if (type == CMN_TYPE_DTC)
> - cmn->dtc[hw->dtc_idx[0]].cycles = NULL;
> + cmn->dtc[hw->cc_idx].cycles = NULL;
> else
> arm_cmn_event_clear(cmn, event, hw->num_dns);
> }
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
2026-05-31 11:32 ` sashiko-bot
2026-06-01 17:48 ` Robin Murphy
@ 2026-06-02 5:55 ` Ilkka Koskinen
2 siblings, 0 replies; 28+ messages in thread
From: Ilkka Koskinen @ 2026-06-02 5:55 UTC (permalink / raw)
To: Aviv Bakal
Cc: robin.murphy, will, mark.rutland, linux-arm-kernel,
linux-perf-users, linux-kernel, zeev, blakgeof, ilkka
On Sun, 31 May 2026, Aviv Bakal wrote:
> Graviton5 uses a customised CMN-S3 implementation where certain
> discovery registers report zeroed fields. Add the following workarounds:
>
> - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant.
> - Derive the DTC domain from the XP node ID, since the unit info
> register reports it as zero.
> - Set the DTC logical ID from the XP's logical ID, since the node info
> register's logical ID field is also zeroed.
>
> Signed-off-by: Aviv Bakal <avivb@amazon.com>
Thanks for the patch. It seems surprisingly clean approach. And thanks to
Robin for answering to Sashiko bot!
The patch looks good to me,
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cheers, Ilkka
> ---
> drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index f1978a53d1c1..3fb71d9a57eb 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -31,7 +31,8 @@
> #define CMN_CHILD_NODE_ADDR GENMASK(29, 0)
> #define CMN_CHILD_NODE_EXTERNAL BIT(31)
>
> -#define CMN_MAX_DIMENSION 12
> +/* Some implementations use a mesh larger than the architectural max of 12 */
> +#define CMN_MAX_DIMENSION 14
> #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
> #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4)
>
> @@ -214,6 +215,8 @@ enum cmn_part {
> PART_CMN700 = 0x43c,
> PART_CI700 = 0x43a,
> PART_CMN_S3 = 0x43e,
> + /* Synthetic part number, overridden to PART_CMN_S3 during discovery */
> + PART_GRAVITON5 = 0xa5,
> };
>
> /* CMN-600 r0px shouldn't exist in silicon, thankfully */
> @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
> return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
> }
>
> +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
> +{
> + unsigned int x = (xp_id >> 7) & 0xf;
> + unsigned int y = (xp_id >> 3) & 0xf;
> +
> + /*
> + * The unit info register reads as zero; derive the DTC domain from
> + * the XP's mesh coordinates over the 10x14 mesh.
> + */
> + return (x / 5) + (y / 7) * 2;
> +}
> +
> static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node)
> {
> int level;
> @@ -2295,6 +2310,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
> u64 reg;
> int i, j;
> size_t sz;
> + bool graviton5_workaround = false;
>
> arm_cmn_init_node_info(cmn, rgn_offset, &cfg);
> if (cfg.type != CMN_TYPE_CFG)
> @@ -2305,6 +2321,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
> reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01);
> part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg);
> part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8;
> +
> + /* Graviton5 has a customised CMN-S3 which needs some fixups */
> + if (cmn->part == PART_GRAVITON5) {
> + cmn->part = PART_CMN_S3;
> + graviton5_workaround = true;
> + }
> +
> /* 600AE is close enough that it's not really worth more complexity */
> if (part == PART_CMN600AE)
> part = PART_CMN600;
> @@ -2394,6 +2417,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>
> if (cmn->part == PART_CMN600)
> xp->dtc = -1;
> + else if (graviton5_workaround)
> + xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id);
> else
> xp->dtc = arm_cmn_dtc_domain(cmn, xp_region);
>
> @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>
> switch (dn->type) {
> case CMN_TYPE_DTC:
> + if (graviton5_workaround) {
> + /* Node info logical ID is zeroed; use the XP's */
> + dn->logid = xp->logid;
> + }
> cmn->num_dtcs++;
> dn++;
> break;
> @@ -2687,6 +2716,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = {
> { "ARMHC650" },
> { "ARMHC700" },
> { "ARMHC003" },
> + { "AMZN0070", PART_GRAVITON5 },
> {}
> };
> MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match);
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-06-01 17:45 ` Robin Murphy
@ 2026-06-03 14:57 ` Aviv Bakal
0 siblings, 0 replies; 28+ messages in thread
From: Aviv Bakal @ 2026-06-03 14:57 UTC (permalink / raw)
To: robin.murphy; +Cc: avivb, linux-perf-users, sashiko-reviews
Thanks for clarifying these, Robin. Much appreciated.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v5 0/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal
2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal
2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
@ 2026-06-03 15:00 ` Aviv Bakal
2026-06-03 15:00 ` [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal
2026-06-03 15:00 ` [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
2 siblings, 2 replies; 28+ messages in thread
From: Aviv Bakal @ 2026-06-03 15:00 UTC (permalink / raw)
To: robin.murphy, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev,
blakgeof, ilkka
This series adds support for Graviton5's customised CMN-S3 which has
zeroed discovery registers.
Changes since v4:
- Fix patch 1 authorship to correctly attribute Robin as author
- Collect Reviewed-by tags from Robin and Ilkka
Aviv Bakal (1):
perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
Robin Murphy (1):
perf/arm-cmn: Move DTM index data out of hw_perf_event
drivers/perf/arm-cmn.c | 121 ++++++++++++++++++++++++++++++-----------
1 file changed, 90 insertions(+), 31 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event
2026-06-03 15:00 ` [PATCH v5 0/2] " Aviv Bakal
@ 2026-06-03 15:00 ` Aviv Bakal
2026-06-03 15:21 ` sashiko-bot
2026-06-03 15:00 ` [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
1 sibling, 1 reply; 28+ messages in thread
From: Aviv Bakal @ 2026-06-03 15:00 UTC (permalink / raw)
To: robin.murphy, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev,
blakgeof, ilkka
From: Robin Murphy <robin.murphy@arm.com>
The amount of data we need to store all the per-DTM counter and
watchpoint allocations is already testing the limits of hw_perf_event,
and future CMNs are only likely to keep growing larger, so move these
arrays out to separate memory allocations. As part of that we can use
an explicit union for allocating cycle counters to dtc_cycles events,
which is arguably nicer anyway.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Aviv Bakal <avivb@amazon.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
---
drivers/perf/arm-cmn.c | 89 ++++++++++++++++++++++++++++--------------
1 file changed, 59 insertions(+), 30 deletions(-)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index f5305c8fdca4..f1978a53d1c1 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -597,17 +597,14 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {}
struct arm_cmn_hw_event {
struct arm_cmn_node *dn;
- u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)];
+ union {
+ u64 *dtm_idx;
+ int cc_idx;
+ };
+ unsigned long *wp_idx;
s8 dtc_idx[CMN_MAX_DTCS];
u8 num_dns;
u8 dtm_offset;
-
- /*
- * WP config registers are divided to UP and DOWN events. We need to
- * keep to track only one of them.
- */
- DECLARE_BITMAP(wp_idx, CMN_MAX_XPS);
-
bool wide_sel;
enum cmn_filter_select filter_sel;
};
@@ -625,25 +622,42 @@ static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event)
return (struct arm_cmn_hw_event *)&event->hw;
}
-static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val)
+static void arm_cmn_set_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos, unsigned int val)
{
- x[pos / 32] |= (u64)val << ((pos % 32) * 2);
+ hw->dtm_idx[pos / 32] |= (u64)val << ((pos % 32) * 2);
}
-static unsigned int arm_cmn_get_index(u64 x[], unsigned int pos)
+static unsigned int arm_cmn_get_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos)
{
- return (x[pos / 32] >> ((pos % 32) * 2)) & 3;
+ return (hw->dtm_idx[pos / 32] >> ((pos % 32) * 2)) & 3;
}
-static void arm_cmn_set_wp_idx(unsigned long *wp_idx, unsigned int pos, bool val)
+static u64 *arm_cmn_alloc_dtm_idx(void)
+{
+ return kzalloc(DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64), GFP_KERNEL);
+}
+
+static void arm_cmn_set_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos, bool val)
{
if (val)
- set_bit(pos, wp_idx);
+ set_bit(pos, hw->wp_idx);
+}
+
+static unsigned int arm_cmn_get_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos)
+{
+ return test_bit(pos, hw->wp_idx);
+}
+
+static unsigned long *arm_cmn_alloc_wp_idx(void)
+{
+ return bitmap_zalloc(CMN_MAX_XPS, GFP_KERNEL);
}
-static unsigned int arm_cmn_get_wp_idx(unsigned long *wp_idx, unsigned int pos)
+static void arm_cmn_clear_idx(struct arm_cmn_hw_event *hw)
{
- return test_bit(pos, wp_idx);
+ memset(hw->dtm_idx, 0, DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64));
+ if (hw->wp_idx)
+ bitmap_zero(hw->wp_idx, CMN_MAX_XPS);
}
struct arm_cmn_event_attr {
@@ -1376,7 +1390,7 @@ static int arm_cmn_get_assigned_wp_idx(struct perf_event *event,
struct arm_cmn_hw_event *hw,
unsigned int pos)
{
- return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw->wp_idx, pos);
+ return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw, pos);
}
static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm,
@@ -1387,7 +1401,7 @@ static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm,
struct arm_cmn_hw_event *hw = to_cmn_hw(event);
dtm->wp_event[wp_idx] = hw->dtc_idx[dtc];
- arm_cmn_set_wp_idx(hw->wp_idx, pos, wp_idx - CMN_EVENT_EVENTID(event));
+ arm_cmn_set_wp_idx(hw, pos, wp_idx - CMN_EVENT_EVENTID(event));
}
static u32 arm_cmn_wp_config(struct perf_event *event, int wp_idx)
@@ -1458,7 +1472,7 @@ static u64 arm_cmn_read_dtm(struct arm_cmn *cmn, struct arm_cmn_hw_event *hw,
dtm = &cmn->dtms[dn->dtm] + hw->dtm_offset;
reg = readq_relaxed(dtm->base + offset);
}
- dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ dtm_idx = arm_cmn_get_dtm_idx(hw, i);
count += (u16)(reg >> (dtm_idx * 16));
}
return count;
@@ -1505,7 +1519,7 @@ static void arm_cmn_event_read(struct perf_event *event)
unsigned long flags;
if (CMN_EVENT_TYPE(event) == CMN_TYPE_DTC) {
- delta = arm_cmn_read_cc(cmn->dtc + hw->dtc_idx[0]);
+ delta = arm_cmn_read_cc(cmn->dtc + hw->cc_idx);
local64_add(delta, &event->count);
return;
}
@@ -1572,7 +1586,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags)
int i;
if (type == CMN_TYPE_DTC) {
- struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0];
+ struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx;
writel_relaxed(CMN_DT_DTC_CTL_DT_EN | CMN_DT_DTC_CTL_CG_DISABLE,
dtc->base + CMN_DT_DTC_CTL);
@@ -1590,7 +1604,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags)
writeq_relaxed(mask, base + CMN_DTM_WPn_MASK(wp_idx));
}
} else for_each_hw_dn(hw, dn, i) {
- int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
arm_cmn_set_event_sel_lo(dn, dtm_idx, CMN_EVENT_EVENTID(event),
hw->wide_sel);
@@ -1606,7 +1620,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags)
int i;
if (type == CMN_TYPE_DTC) {
- struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0];
+ struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx;
dtc->cc_active = false;
writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL);
@@ -1619,7 +1633,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags)
writeq_relaxed(~0ULL, base + CMN_DTM_WPn_VAL(wp_idx));
}
} else for_each_hw_dn(hw, dn, i) {
- int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
arm_cmn_set_event_sel_lo(dn, dtm_idx, 0, hw->wide_sel);
}
@@ -1763,6 +1777,14 @@ static enum cmn_filter_select arm_cmn_filter_sel(const struct arm_cmn *cmn,
}
+static void arm_cmn_event_destroy(struct perf_event *event)
+{
+ struct arm_cmn_hw_event *hw = to_cmn_hw(event);
+
+ kfree(hw->dtm_idx);
+ bitmap_free(hw->wp_idx);
+}
+
static int arm_cmn_event_init(struct perf_event *event)
{
struct arm_cmn *cmn = to_cmn(event->pmu);
@@ -1787,6 +1809,11 @@ static int arm_cmn_event_init(struct perf_event *event)
if (type == CMN_TYPE_DTC)
return arm_cmn_validate_group(cmn, event);
+ event->destroy = arm_cmn_event_destroy;
+ hw->dtm_idx = arm_cmn_alloc_dtm_idx();
+ if (!hw->dtm_idx)
+ return -ENOMEM;
+
eventid = CMN_EVENT_EVENTID(event);
/* For watchpoints we need the actual XP node here */
if (type == CMN_TYPE_WP) {
@@ -1797,6 +1824,9 @@ static int arm_cmn_event_init(struct perf_event *event)
/* ...but the DTM may depend on which port we're watching */
if (cmn->multi_dtm)
hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2;
+ hw->wp_idx = arm_cmn_alloc_wp_idx();
+ if (!hw->wp_idx)
+ return -ENOMEM;
} else if (type == CMN_TYPE_XP &&
(cmn->part == PART_CMN700 || cmn->part == PART_CMN_S3)) {
hw->wide_sel = true;
@@ -1847,7 +1877,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event,
while (i--) {
struct arm_cmn_dtm *dtm = &cmn->dtms[hw->dn[i].dtm] + hw->dtm_offset;
- unsigned int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i);
+ unsigned int dtm_idx = arm_cmn_get_dtm_idx(hw, i);
if (type == CMN_TYPE_WP) {
int wp_idx = arm_cmn_get_assigned_wp_idx(event, hw, i);
@@ -1861,8 +1891,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event,
dtm->pmu_config_low &= ~CMN__PMEVCNT_PAIRED(dtm_idx);
writel_relaxed(dtm->pmu_config_low, dtm->base + CMN_DTM_PMU_CONFIG);
}
- memset(hw->dtm_idx, 0, sizeof(hw->dtm_idx));
- memset(hw->wp_idx, 0, sizeof(hw->wp_idx));
+ arm_cmn_clear_idx(hw);
for_each_hw_dtc_idx(hw, j, idx)
cmn->dtc[j].counters[idx] = NULL;
@@ -1882,7 +1911,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags)
return -ENOSPC;
cmn->dtc[i].cycles = event;
- hw->dtc_idx[0] = i;
+ hw->cc_idx = i;
if (flags & PERF_EF_START)
arm_cmn_event_start(event, 0);
@@ -1947,7 +1976,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags)
goto free_dtms;
}
- arm_cmn_set_index(hw->dtm_idx, i, dtm_idx);
+ arm_cmn_set_dtm_idx(hw, i, dtm_idx);
dtm->input_sel[dtm_idx] = input_sel;
shift = CMN__PMEVCNTn_GLOBAL_NUM_SHIFT(dtm_idx);
@@ -1980,7 +2009,7 @@ static void arm_cmn_event_del(struct perf_event *event, int flags)
arm_cmn_event_stop(event, PERF_EF_UPDATE);
if (type == CMN_TYPE_DTC)
- cmn->dtc[hw->dtc_idx[0]].cycles = NULL;
+ cmn->dtc[hw->cc_idx].cycles = NULL;
else
arm_cmn_event_clear(cmn, event, hw->num_dns);
}
--
2.47.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-06-03 15:00 ` [PATCH v5 0/2] " Aviv Bakal
2026-06-03 15:00 ` [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal
@ 2026-06-03 15:00 ` Aviv Bakal
2026-06-03 15:53 ` sashiko-bot
1 sibling, 1 reply; 28+ messages in thread
From: Aviv Bakal @ 2026-06-03 15:00 UTC (permalink / raw)
To: robin.murphy, will, mark.rutland
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev,
blakgeof, ilkka
Graviton5 uses a customised CMN-S3 implementation where certain
discovery registers report zeroed fields. Add the following workarounds:
- Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant.
- Derive the DTC domain from the XP node ID, since the unit info
register reports it as zero.
- Set the DTC logical ID from the XP's logical ID, since the node info
register's logical ID field is also zeroed.
Signed-off-by: Aviv Bakal <avivb@amazon.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
---
drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index f1978a53d1c1..3fb71d9a57eb 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -31,7 +31,8 @@
#define CMN_CHILD_NODE_ADDR GENMASK(29, 0)
#define CMN_CHILD_NODE_EXTERNAL BIT(31)
-#define CMN_MAX_DIMENSION 12
+/* Some implementations use a mesh larger than the architectural max of 12 */
+#define CMN_MAX_DIMENSION 14
#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION)
#define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4)
@@ -214,6 +215,8 @@ enum cmn_part {
PART_CMN700 = 0x43c,
PART_CI700 = 0x43a,
PART_CMN_S3 = 0x43e,
+ /* Synthetic part number, overridden to PART_CMN_S3 during discovery */
+ PART_GRAVITON5 = 0xa5,
};
/* CMN-600 r0px shouldn't exist in silicon, thankfully */
@@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
}
+static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
+{
+ unsigned int x = (xp_id >> 7) & 0xf;
+ unsigned int y = (xp_id >> 3) & 0xf;
+
+ /*
+ * The unit info register reads as zero; derive the DTC domain from
+ * the XP's mesh coordinates over the 10x14 mesh.
+ */
+ return (x / 5) + (y / 7) * 2;
+}
+
static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node)
{
int level;
@@ -2295,6 +2310,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
u64 reg;
int i, j;
size_t sz;
+ bool graviton5_workaround = false;
arm_cmn_init_node_info(cmn, rgn_offset, &cfg);
if (cfg.type != CMN_TYPE_CFG)
@@ -2305,6 +2321,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01);
part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg);
part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8;
+
+ /* Graviton5 has a customised CMN-S3 which needs some fixups */
+ if (cmn->part == PART_GRAVITON5) {
+ cmn->part = PART_CMN_S3;
+ graviton5_workaround = true;
+ }
+
/* 600AE is close enough that it's not really worth more complexity */
if (part == PART_CMN600AE)
part = PART_CMN600;
@@ -2394,6 +2417,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
if (cmn->part == PART_CMN600)
xp->dtc = -1;
+ else if (graviton5_workaround)
+ xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id);
else
xp->dtc = arm_cmn_dtc_domain(cmn, xp_region);
@@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
switch (dn->type) {
case CMN_TYPE_DTC:
+ if (graviton5_workaround) {
+ /* Node info logical ID is zeroed; use the XP's */
+ dn->logid = xp->logid;
+ }
cmn->num_dtcs++;
dn++;
break;
@@ -2687,6 +2716,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = {
{ "ARMHC650" },
{ "ARMHC700" },
{ "ARMHC003" },
+ { "AMZN0070", PART_GRAVITON5 },
{}
};
MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match);
--
2.47.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event
2026-06-03 15:00 ` [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal
@ 2026-06-03 15:21 ` sashiko-bot
0 siblings, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-06-03 15:21 UTC (permalink / raw)
To: Aviv Bakal; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [Critical] Type confusion in `arm_cmn_validate_group` allows arbitrary kernel memory read and out-of-bounds heap write via mixed PMU event groups.
--
commit 5cf0bca45fce2ef5c66d58f875b8d4b3f7c015e0
Author: Robin Murphy <robin.murphy@arm.com>
perf/arm-cmn: Move DTM index data out of hw_perf_event
This commit moves the DTM index arrays and watchpoint tracking out of
hw_perf_event into separate dynamically allocated memory structures. This
change is intended to accommodate growing CMN configurations without exceeding
the limits of hw_perf_event.
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index f5305c8fdca43..f1978a53d1c19 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
[ ... ]
> @@ -1787,6 +1809,11 @@ static int arm_cmn_event_init(struct perf_event *event)
> if (type == CMN_TYPE_DTC)
> return arm_cmn_validate_group(cmn, event);
>
[Severity: Critical]
This isn't a bug introduced by this patch, but could this lead to a type
confusion and out-of-bounds write when validating groups with mixed PMU events?
When arm_cmn_validate_group() iterates over the group's siblings, it doesn't
verify that sibling->pmu == event->pmu. If a sibling is a hardware event from
a different PMU, it passes it to arm_cmn_val_add_event():
drivers/perf/arm-cmn.c:arm_cmn_validate_group() {
...
for_each_sibling_event(sibling, leader)
arm_cmn_val_add_event(cmn, val, sibling);
...
}
Inside arm_cmn_val_add_event(), it casts the sibling's hw_perf_event to an
arm_cmn_hw_event. Since userspace controls attr.config, it can overlap with
the hw->dn pointer, allowing arbitrary kernel memory reads:
drivers/perf/arm-cmn.c:arm_cmn_val_add_event() {
...
for_each_hw_dn(hw, dn, i) {
int wp_idx, dtm = dn->dtm, sel = hw->filter_sel;
val->dtm_count[dtm]++;
...
}
Because the read byte dtm can be up to 255 and the array bounds are
CMN_MAX_DTMS (188), does this enable a controllable out-of-bounds write on
the kernel heap?
> + event->destroy = arm_cmn_event_destroy;
> + hw->dtm_idx = arm_cmn_alloc_dtm_idx();
> + if (!hw->dtm_idx)
> + return -ENOMEM;
> +
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260603150025.30980-1-avivb@amazon.com?part=1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
2026-06-03 15:00 ` [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
@ 2026-06-03 15:53 ` sashiko-bot
0 siblings, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-06-03 15:53 UTC (permalink / raw)
To: Aviv Bakal; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [High] The `arm_cmn_graviton5_dtc_domain` function computes a DTC domain index without upper bounds, which can exceed the `CMN_MAX_DTCS` limit and cause memory corruption.
- [High] The Graviton5 workaround assigns the XP's logical ID to the DTC node's logical ID, breaking the domain-based sorting order and causing DTC resources to be assigned incorrectly.
Pre-existing issues:
- [High] Lack of bounds checking on `cmn->num_dtcs` allows a `memset()` call to overflow the `dtc_idx` array.
--
commit fd0610a07c973430036fe29d5eab5bfbb5c0eb4c
Author: Aviv Bakal <avivb@amazon.com>
perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5
Graviton5 uses a customised CMN-S3 implementation where certain
discovery registers report zeroed fields. This patch introduces a
dedicated ACPI HID, derives the DTC domain from the XP node ID, and
sets the DTC logical ID from the XP's logical ID.
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index f1978a53d1c19..3fb71d9a57eb5 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
[ ... ]
> @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg
> return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset));
> }
>
> +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id)
> +{
> + unsigned int x = (xp_id >> 7) & 0xf;
> + unsigned int y = (xp_id >> 3) & 0xf;
> +
> + /*
> + * The unit info register reads as zero; derive the DTC domain from
> + * the XP's mesh coordinates over the 10x14 mesh.
> + */
> + return (x / 5) + (y / 7) * 2;
> +}
[Severity: High]
Can this calculation exceed the CMN_MAX_DTCS limit?
If a compromised or malformed firmware reports an XP node ID with
coordinate components up to 15 (e.g., x=15, y=15), this calculation
yields a domain index of 7.
The domain index is later inherited by the DTC node (dn->dtc) and used
directly as an array index in arm_cmn_event_init():
hw->dtc_idx[dn->dtc] = 0;
Since the hw->dtc_idx array has a fixed size of CMN_MAX_DTCS (4), an
index of 4 or higher would overwrite adjacent struct fields like num_dns,
leading to memory corruption.
[ ... ]
> @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>
> switch (dn->type) {
> case CMN_TYPE_DTC:
> + if (graviton5_workaround) {
> + /* Node info logical ID is zeroed; use the XP's */
> + dn->logid = xp->logid;
> + }
[Severity: High]
Does setting dn->logid to xp->logid break the domain-based sorting
order for DTC resources?
The CMN driver sequentially populates the cmn->dtc array based on the
sorted order of DTC nodes:
drivers/perf/arm-cmn.c:arm_cmn_init_dtcs() {
...
int err = arm_cmn_init_dtc(cmn, dn, dtc_idx++);
...
}
Standard hardware sets a DTC node's logid to its domain ID (0..3),
ensuring the array is populated in domain order. However, XP logical
IDs do not monotonically increase with domain IDs (e.g., an XP in Domain 1
at X=5, Y=0 has a lower logid than an XP in Domain 0 at X=0, Y=1).
Will this cause the DTC nodes to sort out of order, misaligning the
cmn->dtc array and causing PMU events to be programmed into the wrong
hardware domain counters? Would it be safer to use dn->logid = xp->dtc?
> cmn->num_dtcs++;
[Severity: High]
This is a pre-existing issue, but does cmn->num_dtcs lack bounds checking?
During hardware discovery, the driver increments cmn->num_dtcs for every
CMN_TYPE_DTC node reported by the firmware, without capping it at
CMN_MAX_DTCS (4).
Later in arm_cmn_event_init(), the driver uses this unbounded value:
memset(hw->dtc_idx, 0, cmn->num_dtcs);
Since hw->dtc_idx is a 4-byte array, if the firmware provides more than
4 DTC nodes, will the memset overflow the array and corrupt the
arm_cmn_hw_event struct?
> dn++;
> break;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260603150025.30980-1-avivb@amazon.com?part=2
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2026-06-03 15:53 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-03 15:51 [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
2026-05-03 16:20 ` sashiko-bot
2026-05-04 13:37 ` Aviv Bakal
2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal
2026-05-04 19:38 ` sashiko-bot
2026-05-05 2:31 ` kernel test robot
2026-05-21 16:02 ` Robin Murphy
2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal
2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal
2026-05-24 16:13 ` sashiko-bot
2026-05-29 16:44 ` Robin Murphy
2026-05-24 15:38 ` [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
2026-05-24 16:40 ` sashiko-bot
2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal
2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal
2026-06-01 18:23 ` Robin Murphy
2026-06-02 5:49 ` Ilkka Koskinen
2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
2026-05-31 11:32 ` sashiko-bot
2026-06-01 17:45 ` Robin Murphy
2026-06-03 14:57 ` Aviv Bakal
2026-06-01 17:48 ` Robin Murphy
2026-06-02 5:55 ` Ilkka Koskinen
2026-06-03 15:00 ` [PATCH v5 0/2] " Aviv Bakal
2026-06-03 15:00 ` [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal
2026-06-03 15:21 ` sashiko-bot
2026-06-03 15:00 ` [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal
2026-06-03 15:53 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox