* [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 @ 2026-05-03 15:51 Aviv Bakal 2026-05-03 16:20 ` sashiko-bot 2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal 0 siblings, 2 replies; 28+ messages in thread From: Aviv Bakal @ 2026-05-03 15:51 UTC (permalink / raw) To: robin.murphy, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev, blakgeof Graviton5 uses a customised CMN-S3 implementation where certain discovery registers report zeroed fields. Add the following workarounds: - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant. - Derive the DTC domain from the XP node ID, since the unit info register reports it as zero. - Set the DTC logical ID from the XP's logical ID, since the node info register's logical ID field is also zeroed. Signed-off-by: Aviv Bakal <avivb@amazon.com> --- drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index f5305c8fdca4..368fe1a86bfb 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -31,7 +31,8 @@ #define CMN_CHILD_NODE_ADDR GENMASK(29, 0) #define CMN_CHILD_NODE_EXTERNAL BIT(31) -#define CMN_MAX_DIMENSION 12 +/* Some implementations use a mesh larger than the architectural max of 12 */ +#define CMN_MAX_DIMENSION 14 #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4) @@ -214,6 +215,8 @@ enum cmn_part { PART_CMN700 = 0x43c, PART_CI700 = 0x43a, PART_CMN_S3 = 0x43e, + /* Synthetic part number, overridden to PART_CMN_S3 during discovery */ + PART_GRAVITON5 = 0xa5, }; /* CMN-600 r0px shouldn't exist in silicon, thankfully */ @@ -2221,6 +2224,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); } +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) +{ + unsigned int x = (xp_id >> 7) & 0xf; + unsigned int y = (xp_id >> 3) & 0xf; + + /* + * The unit info register reads as zero; derive the DTC domain from + * the XP's mesh coordinates over the 10x14 mesh. + */ + return (x / 5) + (y / 7) * 2; +} + static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node) { int level; @@ -2266,6 +2281,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) u64 reg; int i, j; size_t sz; + bool graviton5_workaround = false; arm_cmn_init_node_info(cmn, rgn_offset, &cfg); if (cfg.type != CMN_TYPE_CFG) @@ -2276,6 +2292,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01); part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg); part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8; + + /* Graviton5 has a customised CMN-S3 which needs some fixups */ + if (cmn->part == PART_GRAVITON5) { + cmn->part = PART_CMN_S3; + graviton5_workaround = true; + } + /* 600AE is close enough that it's not really worth more complexity */ if (part == PART_CMN600AE) part = PART_CMN600; @@ -2365,6 +2388,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) if (cmn->part == PART_CMN600) xp->dtc = -1; + else if (graviton5_workaround) + xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id); else xp->dtc = arm_cmn_dtc_domain(cmn, xp_region); @@ -2443,6 +2468,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) switch (dn->type) { case CMN_TYPE_DTC: + if (graviton5_workaround) { + /* Node info logical ID is zeroed; use the XP's */ + dn->logid = xp->logid; + } cmn->num_dtcs++; dn++; break; @@ -2658,6 +2687,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = { { "ARMHC650" }, { "ARMHC700" }, { "ARMHC003" }, + { "AMZN0070", PART_GRAVITON5 }, {} }; MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match); -- 2.47.3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-03 15:51 [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal @ 2026-05-03 16:20 ` sashiko-bot 2026-05-04 13:37 ` Aviv Bakal 2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal 1 sibling, 1 reply; 28+ messages in thread From: sashiko-bot @ 2026-05-03 16:20 UTC (permalink / raw) To: Aviv Bakal; +Cc: linux-perf-users Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: - [Medium] Incorrect DTC logical ID assignment cross-wires DTC domains. -- commit 621fea5b4bb35e497f17f2f20187c68b18f49751 Author: Aviv Bakal <avivb@amazon.com> perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 This commit introduces workarounds for a customised CMN-S3 implementation on Graviton5, where certain discovery registers report zeroed fields. It adds a dedicated ACPI HID, derives the DTC domain from the XP node ID, and sets the DTC logical ID from the XP's logical ID. > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > index f5305c8fdca4..368fe1a86bfb 100644 > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c [ ... ] > @@ -2443,6 +2468,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > > switch (dn->type) { > case CMN_TYPE_DTC: > + if (graviton5_workaround) { > + /* Node info logical ID is zeroed; use the XP's */ > + dn->logid = xp->logid; Will this cross-wire the DTC domains? In arm_cmn_init_dtcs(), the cmn->dns array is sorted by logid, and the cmn->dtc array is populated sequentially based on this sorted order. During PMU event programming, the driver uses the domain ID (dn->dtc) as an index into the cmn->dtc array. If the physical layout of the DTC XPs causes their xp->logid values (typically y * mesh_x + x) to sort differently than their computed domain IDs, wouldn't the cmn->dtc array be misaligned? For example, a Domain 1 DTC at (5,0) has a smaller logid (5) than a Domain 0 DTC at (0,6) (60), meaning Domain 1 would incorrectly occupy cmn->dtc[0]. This cross-wiring could cause PMU events to program the wrong physical DTC hardware and map IRQs to the wrong domains, resulting in incorrect performance statistics and unhandled interrupts. Would it be better to assign the computed domain ID to the logical ID here using dn->logid = xp->dtc? > + } > cmn->num_dtcs++; > dn++; > break; -- Sashiko AI review · https://sashiko.dev/#/patchset/20260503155118.16191-1-avivb@amazon.com?part=1 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-03 16:20 ` sashiko-bot @ 2026-05-04 13:37 ` Aviv Bakal 0 siblings, 0 replies; 28+ messages in thread From: Aviv Bakal @ 2026-05-04 13:37 UTC (permalink / raw) To: sashiko-bot Cc: robin.murphy, will, mark.rutland, linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof > Will this cross-wire the DTC domains? > In arm_cmn_init_dtcs(), the cmn->dns array is sorted by logid, and the > cmn->dtc array is populated sequentially based on this sorted order. > During PMU event programming, the driver uses the domain ID (dn->dtc) as an > index into the cmn->dtc array. > If the physical layout of the DTC XPs causes their xp->logid values (typically > y * mesh_x + x) to sort differently than their computed domain IDs, wouldn't > the cmn->dtc array be misaligned? > For example, a Domain 1 DTC at (5,0) has a smaller logid (5) than a Domain 0 > DTC at (0,6) (60), meaning Domain 1 would incorrectly occupy cmn->dtc[0]. This > cross-wiring could cause PMU events to program the wrong physical DTC hardware > and map IRQs to the wrong domains, resulting in incorrect performance > statistics and unhandled interrupts. > Would it be better to assign the computed domain ID to the logical ID here > using dn->logid = xp->dtc? Good catch on the cleanliness — I'll switch to dn->logid = xp->dtc. That said, the two approaches are equivalent on this hardware. The four DTCs discovered on Graviton5 are: pmu_base xp->dtc xp->logid c030d900 0 0 e430d900 1 9 c370d900 2 130 e770d900 3 139 The xp->logid values (0, 9, 130, 139) sort in the same order as the domain IDs (0, 1, 2, 3), so arm_cmn_node_cmp produces the same sorted sequence either way, and dtc_idx++ assigns the same indices. This follows from the topology: arm_cmn_graviton5_dtc_domain() computes domain = (x/5) + (y/7)*2 over the 10x14 mesh, which increases monotonically with the row-major XP position that logid encodes. So the sort orders can't diverge for this mesh geometry. Still, using xp->dtc is cleaner and more obviously correct — the logid directly reflects the domain ID rather than relying on sort-order equivalence. I'll update the patch. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-03 15:51 [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal 2026-05-03 16:20 ` sashiko-bot @ 2026-05-04 13:39 ` Aviv Bakal 2026-05-04 19:38 ` sashiko-bot ` (3 more replies) 1 sibling, 4 replies; 28+ messages in thread From: Aviv Bakal @ 2026-05-04 13:39 UTC (permalink / raw) To: robin.murphy, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof Graviton5 uses a customised CMN-S3 implementation where certain discovery registers report zeroed fields. Add the following workarounds: - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant. - Derive the DTC domain from the XP node ID, since the unit info register reports it as zero. - Set the DTC logical ID from the computed domain ID, since the node info register's logical ID field is also zeroed. Signed-off-by: Aviv Bakal <avivb@amazon.com> --- v2: - Use computed domain ID (xp->dtc) instead of XP logical ID for DTC logical ID assignment. drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index f5305c8fdca4..8ee3f8638602 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -31,7 +31,8 @@ #define CMN_CHILD_NODE_ADDR GENMASK(29, 0) #define CMN_CHILD_NODE_EXTERNAL BIT(31) -#define CMN_MAX_DIMENSION 12 +/* Some implementations use a mesh larger than the architectural max of 12 */ +#define CMN_MAX_DIMENSION 14 #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4) @@ -214,6 +215,8 @@ enum cmn_part { PART_CMN700 = 0x43c, PART_CI700 = 0x43a, PART_CMN_S3 = 0x43e, + /* Synthetic part number, overridden to PART_CMN_S3 during discovery */ + PART_GRAVITON5 = 0xa5, }; /* CMN-600 r0px shouldn't exist in silicon, thankfully */ @@ -2221,6 +2224,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); } +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) +{ + unsigned int x = (xp_id >> 7) & 0xf; + unsigned int y = (xp_id >> 3) & 0xf; + + /* + * The unit info register reads as zero; derive the DTC domain from + * the XP's mesh coordinates over the 10x14 mesh. + */ + return (x / 5) + (y / 7) * 2; +} + static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node) { int level; @@ -2266,6 +2281,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) u64 reg; int i, j; size_t sz; + bool graviton5_workaround = false; arm_cmn_init_node_info(cmn, rgn_offset, &cfg); if (cfg.type != CMN_TYPE_CFG) @@ -2276,6 +2292,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01); part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg); part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8; + + /* Graviton5 has a customised CMN-S3 which needs some fixups */ + if (cmn->part == PART_GRAVITON5) { + cmn->part = PART_CMN_S3; + graviton5_workaround = true; + } + /* 600AE is close enough that it's not really worth more complexity */ if (part == PART_CMN600AE) part = PART_CMN600; @@ -2365,6 +2388,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) if (cmn->part == PART_CMN600) xp->dtc = -1; + else if (graviton5_workaround) + xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id); else xp->dtc = arm_cmn_dtc_domain(cmn, xp_region); @@ -2443,6 +2468,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) switch (dn->type) { case CMN_TYPE_DTC: + if (graviton5_workaround) { + /* Node info logical ID is zeroed; use the domain ID */ + dn->logid = xp->dtc; + } cmn->num_dtcs++; dn++; break; @@ -2658,6 +2687,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = { { "ARMHC650" }, { "ARMHC700" }, { "ARMHC003" }, + { "AMZN0070", PART_GRAVITON5 }, {} }; MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match); -- 2.47.3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal @ 2026-05-04 19:38 ` sashiko-bot 2026-05-05 2:31 ` kernel test robot ` (2 subsequent siblings) 3 siblings, 0 replies; 28+ messages in thread From: sashiko-bot @ 2026-05-04 19:38 UTC (permalink / raw) To: Aviv Bakal; +Cc: linux-perf-users Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: - [High] Buffer overflow in `arm_cmn_event_init` due to `CMN_MAX_DTCS` not scaling with mesh dimension. -- commit 9d571ad62b9e626c4f8ff883ad8a2627c1e26529 Author: Aviv Bakal <avivb@amazon.com> perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Graviton5 uses a customised CMN-S3 implementation where certain discovery registers report zeroed fields. Add the following workarounds: - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant. - Derive the DTC domain from the XP node ID, since the unit info register reports it as zero. - Set the DTC logical ID from the computed domain ID, since the node info register's logical ID field is also zeroed. > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > index f5305c8fdca43..8ee3f86386025 100644 > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c > @@ -31,7 +31,8 @@ > #define CMN_CHILD_NODE_ADDR GENMASK(29, 0) > #define CMN_CHILD_NODE_EXTERNAL BIT(31) > > -#define CMN_MAX_DIMENSION 12 > +/* Some implementations use a mesh larger than the architectural max of 12 */ > +#define CMN_MAX_DIMENSION 14 > #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) > #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4) > [ ... ] > @@ -2221,6 +2224,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg > return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); > } > > +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) > +{ > + unsigned int x = (xp_id >> 7) & 0xf; > + unsigned int y = (xp_id >> 3) & 0xf; > + > + /* > + * The unit info register reads as zero; derive the DTC domain from > + * the XP's mesh coordinates over the 10x14 mesh. > + */ > + return (x / 5) + (y / 7) * 2; > +} > + Does this new dimension limit require an update to CMN_MAX_DTCS? CMN_MAX_DTCS is currently 4, which sizes the dtc_idx array in struct arm_cmn_hw_event: struct arm_cmn_hw_event { struct arm_cmn_node *dn; u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; s8 dtc_idx[CMN_MAX_DTCS]; ... }; For a 14x10 mesh where X=13 and Y=9, arm_cmn_graviton5_dtc_domain() will return 4. Will this cause an out-of-bounds write in arm_cmn_event_init()? memset(hw->dtc_idx, -1, sizeof(hw->dtc_idx)); for (dn = hw->dn; dn->type == type; dn++) { ... if (dn->dtc < 0) memset(hw->dtc_idx, 0, cmn->num_dtcs); else hw->dtc_idx[dn->dtc] = 0; ... } If the larger mesh size exposes more than 4 DTC nodes, could cmn->num_dtcs also exceed 4, causing the memset() to overflow the dtc_idx[] array? -- Sashiko AI review · https://sashiko.dev/#/patchset/20260504133923.23373-1-avivb@amazon.com?part=1 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal 2026-05-04 19:38 ` sashiko-bot @ 2026-05-05 2:31 ` kernel test robot 2026-05-21 16:02 ` Robin Murphy 2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal 3 siblings, 0 replies; 28+ messages in thread From: kernel test robot @ 2026-05-05 2:31 UTC (permalink / raw) To: Aviv Bakal, robin.murphy, will, mark.rutland Cc: llvm, oe-kbuild-all, linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof Hi Aviv, kernel test robot noticed the following build errors: [auto build test ERROR on arm-perf/for-next/perf] [also build test ERROR on soc/for-next linus/master v7.1-rc2 next-20260430] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Aviv-Bakal/perf-arm-cmn-Add-workarounds-for-CMN-S3-on-Graviton5/20260505-011858 base: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git for-next/perf patch link: https://lore.kernel.org/r/20260504133923.23373-1-avivb%40amazon.com patch subject: [PATCH v2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 config: i386-buildonly-randconfig-004-20260505 (https://download.01.org/0day-ci/archive/20260505/202605051052.zOemYJY9-lkp@intel.com/config) compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260505/202605051052.zOemYJY9-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202605051052.zOemYJY9-lkp@intel.com/ All errors (new ones prefixed by >>): >> drivers/perf/arm-cmn.c:617:15: error: static assertion failed due to requirement 'sizeof(struct arm_cmn_hw_event) <= __builtin_offsetof(struct hw_perf_event, target)': sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target) 617 | static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target)); | ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:77:50: note: expanded from macro 'static_assert' 77 | #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr) | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:78:56: note: expanded from macro '__static_assert' 78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) | ^~~~ drivers/perf/arm-cmn.c:617:47: note: expression evaluates to '104 <= 96' 617 | static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:77:50: note: expanded from macro 'static_assert' 77 | #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr) | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:78:56: note: expanded from macro '__static_assert' 78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) | ^~~~ 1 error generated. vim +617 drivers/perf/arm-cmn.c a88fa6c28b867a Robin Murphy 2021-12-03 600 0ba64770a2f2e5 Robin Murphy 2020-09-18 601 struct arm_cmn_hw_event { 0ba64770a2f2e5 Robin Murphy 2020-09-18 602 struct arm_cmn_node *dn; 359414b33e00ba Robin Murphy 2024-09-02 603 u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; 7633ec2c262fab Robin Murphy 2023-10-20 604 s8 dtc_idx[CMN_MAX_DTCS]; 0ba64770a2f2e5 Robin Murphy 2020-09-18 605 u8 num_dns; 60d1504070c22c Robin Murphy 2021-12-03 606 u8 dtm_offset; 4a112585ebe8cb Ilkka Koskinen 2024-06-17 607 4a112585ebe8cb Ilkka Koskinen 2024-06-17 608 /* 4a112585ebe8cb Ilkka Koskinen 2024-06-17 609 * WP config registers are divided to UP and DOWN events. We need to 4a112585ebe8cb Ilkka Koskinen 2024-06-17 610 * keep to track only one of them. 4a112585ebe8cb Ilkka Koskinen 2024-06-17 611 */ 4a112585ebe8cb Ilkka Koskinen 2024-06-17 612 DECLARE_BITMAP(wp_idx, CMN_MAX_XPS); 4a112585ebe8cb Ilkka Koskinen 2024-06-17 613 23760a0144173e Robin Murphy 2022-04-18 614 bool wide_sel; 65adf71398f5af Robin Murphy 2022-04-18 615 enum cmn_filter_select filter_sel; 0ba64770a2f2e5 Robin Murphy 2020-09-18 616 }; ff436cee694ee8 Robin Murphy 2024-09-02 @617 static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target)); 0ba64770a2f2e5 Robin Murphy 2020-09-18 618 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal 2026-05-04 19:38 ` sashiko-bot 2026-05-05 2:31 ` kernel test robot @ 2026-05-21 16:02 ` Robin Murphy 2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal 3 siblings, 0 replies; 28+ messages in thread From: Robin Murphy @ 2026-05-21 16:02 UTC (permalink / raw) To: Aviv Bakal, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof On 2026-05-04 2:39 pm, Aviv Bakal wrote: > Graviton5 uses a customised CMN-S3 implementation where certain > discovery registers report zeroed fields. Add the following workarounds: > > - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant. > - Derive the DTC domain from the XP node ID, since the unit info > register reports it as zero. > - Set the DTC logical ID from the computed domain ID, since the node > info register's logical ID field is also zeroed. > > Signed-off-by: Aviv Bakal <avivb@amazon.com> > --- > v2: > - Use computed domain ID (xp->dtc) instead of XP logical ID for DTC > logical ID assignment. > > drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > index f5305c8fdca4..8ee3f8638602 100644 > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c > @@ -31,7 +31,8 @@ > #define CMN_CHILD_NODE_ADDR GENMASK(29, 0) > #define CMN_CHILD_NODE_EXTERNAL BIT(31) > > -#define CMN_MAX_DIMENSION 12 > +/* Some implementations use a mesh larger than the architectural max of 12 */ > +#define CMN_MAX_DIMENSION 14 > #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) > #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4) > > @@ -214,6 +215,8 @@ enum cmn_part { > PART_CMN700 = 0x43c, > PART_CI700 = 0x43a, > PART_CMN_S3 = 0x43e, > + /* Synthetic part number, overridden to PART_CMN_S3 during discovery */ > + PART_GRAVITON5 = 0xa5, > }; > > /* CMN-600 r0px shouldn't exist in silicon, thankfully */ > @@ -2221,6 +2224,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg > return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); > } > > +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) > +{ > + unsigned int x = (xp_id >> 7) & 0xf; > + unsigned int y = (xp_id >> 3) & 0xf; > + > + /* > + * The unit info register reads as zero; derive the DTC domain from > + * the XP's mesh coordinates over the 10x14 mesh. > + */ > + return (x / 5) + (y / 7) * 2; > +} > + > static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node) > { > int level; > @@ -2266,6 +2281,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > u64 reg; > int i, j; > size_t sz; > + bool graviton5_workaround = false; > > arm_cmn_init_node_info(cmn, rgn_offset, &cfg); > if (cfg.type != CMN_TYPE_CFG) > @@ -2276,6 +2292,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01); > part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg); > part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8; > + > + /* Graviton5 has a customised CMN-S3 which needs some fixups */ > + if (cmn->part == PART_GRAVITON5) { > + cmn->part = PART_CMN_S3; > + graviton5_workaround = true; > + } > + > /* 600AE is close enough that it's not really worth more complexity */ > if (part == PART_CMN600AE) > part = PART_CMN600; > @@ -2365,6 +2388,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > > if (cmn->part == PART_CMN600) > xp->dtc = -1; > + else if (graviton5_workaround) > + xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id); > else > xp->dtc = arm_cmn_dtc_domain(cmn, xp_region); > > @@ -2443,6 +2468,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > > switch (dn->type) { > case CMN_TYPE_DTC: > + if (graviton5_workaround) { > + /* Node info logical ID is zeroed; use the domain ID */ > + dn->logid = xp->dtc; No, this really should be xp->logid - other than DTC0 whose logical ID is always forced to 0, but if it naturally lines up that way anyway then all the better - since that is consistent with how the tooling generates a regular hardware configuration. The cmn->dtc array ends up sorted by logical ID as that is a guaranteed stable order for all CMN versions (as the domain numbers themselves aren't always known), and it is the interrupt order defined by the firmware bindings which we need to match. If it's not guaranteed that the actual domain numbers are in the same order then we have an existing bug in general (I'll have to check...) Otherwise, this looks OK to me - in fact surprisingly pleasant and unobtrusive given that it's a pretty horrible hardware issue to work around. I guess we get lucky that it's an easy topology to compute. Thanks, Robin. > + } > cmn->num_dtcs++; > dn++; > break; > @@ -2658,6 +2687,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = { > { "ARMHC650" }, > { "ARMHC700" }, > { "ARMHC003" }, > + { "AMZN0070", PART_GRAVITON5 }, > {} > }; > MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match); ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v3 0/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal ` (2 preceding siblings ...) 2026-05-21 16:02 ` Robin Murphy @ 2026-05-24 15:38 ` Aviv Bakal 2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal ` (2 more replies) 3 siblings, 3 replies; 28+ messages in thread From: Aviv Bakal @ 2026-05-24 15:38 UTC (permalink / raw) To: robin.murphy, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev, blakgeof This series adds support for Graviton5's customised CMN-S3 which has zeroed discovery registers. Robin, I understand moving driver state into the core perf header isn't ideal, but I couldn't find another way to grow the struct. The v2 submission failed the kernel test robot build on i386 (COMPILE_TEST) due to arm_cmn_hw_event exceeding the static_assert against the 'target' field offset when CMN_MAX_DIMENSION is increased beyond 12. Patch 1 moves struct arm_cmn_hw_event into the hw_perf_event union to resolve this. I'd appreciate your feedback on this approach, or any alternative you'd suggest. Patch 2 adds the Graviton5 workarounds themselves (unchanged from v2 except for the DTC logid fix below). Changes since v2: - Revert DTC logical ID assignment back to xp->logid (per Robin's review) - Add patch 1/2 to move arm_cmn_hw_event into hw_perf_event union to resolve 32-bit build failure Aviv Bakal (2): perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 drivers/perf/arm-cmn.c | 55 +++++++++++++++++++++----------------- include/linux/perf_event.h | 22 +++++++++++++++ 2 files changed, 52 insertions(+), 25 deletions(-) -- 2.47.3 ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event 2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal @ 2026-05-24 15:38 ` Aviv Bakal 2026-05-24 16:13 ` sashiko-bot 2026-05-29 16:44 ` Robin Murphy 2026-05-24 15:38 ` [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal 2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal 2 siblings, 2 replies; 28+ messages in thread From: Aviv Bakal @ 2026-05-24 15:38 UTC (permalink / raw) To: robin.murphy, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev, blakgeof In order to increase CMN_MAX_DIMENSION beyond 12 (required for meshes larger than 12x12, such as Graviton5), the arm_cmn_hw_event struct must grow. Since it is overlaid on the beginning of hw_perf_event via an unsafe cast, increasing its size would violate the static_assert that guards against overflowing into the 'target' field. Resolve this by moving struct arm_cmn_hw_event into the hw_perf_event union as a proper named member, eliminating the cast in to_cmn_hw() and making the size reservation explicit. Set CMN_MAX_DIMENSION to 14 to accommodate larger mesh topologies. Signed-off-by: Aviv Bakal <avivb@amazon.com> --- drivers/perf/arm-cmn.c | 26 +------------------------- include/linux/perf_event.h | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+), 25 deletions(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index f5305c8fdca4..3443b819afed 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -31,13 +31,8 @@ #define CMN_CHILD_NODE_ADDR GENMASK(29, 0) #define CMN_CHILD_NODE_EXTERNAL BIT(31) -#define CMN_MAX_DIMENSION 12 -#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4) -/* Currently XPs are the node type we can have most of; others top out at 128 */ -#define CMN_MAX_NODES_PER_EVENT CMN_MAX_XPS - /* The CFG node has various info besides the discovery tree */ #define CMN_CFGM_PERIPH_ID_01 0x0008 #define CMN_CFGM_PID0_PART_0 GENMASK_ULL(7, 0) @@ -148,7 +143,6 @@ #define CMN_DT_PMSRR_SS_REQ BIT(0) #define CMN_DT_NUM_COUNTERS 8 -#define CMN_MAX_DTCS 4 /* * Even in the worst case a DTC counter can't wrap in fewer than 2^42 cycles, @@ -595,24 +589,6 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {} #endif -struct arm_cmn_hw_event { - struct arm_cmn_node *dn; - u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; - s8 dtc_idx[CMN_MAX_DTCS]; - u8 num_dns; - u8 dtm_offset; - - /* - * WP config registers are divided to UP and DOWN events. We need to - * keep to track only one of them. - */ - DECLARE_BITMAP(wp_idx, CMN_MAX_XPS); - - bool wide_sel; - enum cmn_filter_select filter_sel; -}; -static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target)); - #define for_each_hw_dn(hw, dn, i) \ for (i = 0, dn = hw->dn; i < hw->num_dns; i++, dn++) @@ -622,7 +598,7 @@ static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event) { - return (struct arm_cmn_hw_event *)&event->hw; + return &event->hw.cmn; } static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 48d851fbd8ea..c38576a8e338 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -119,6 +119,7 @@ struct perf_branch_stack { }; struct task_struct; +struct arm_cmn_node; /* * extra PMU register associated with an event @@ -200,6 +201,27 @@ struct hw_perf_event { u64 conf; u64 conf1; }; +#ifdef CONFIG_ARM_CMN +/* Some implementations use a mesh larger than the architectural max of 12 */ +#define CMN_MAX_DIMENSION 14 +#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) +#define CMN_MAX_NODES_PER_EVENT CMN_MAX_XPS +#define CMN_MAX_DTCS 4 + struct arm_cmn_hw_event { /* arm_cmn */ + /* + * CMN PMU event state overlaid on hw_perf_event. + * Must fit before the 'target' field. + */ + struct arm_cmn_node *dn; + u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; + s8 dtc_idx[CMN_MAX_DTCS]; + u8 num_dns; + u8 dtm_offset; + DECLARE_BITMAP(wp_idx, CMN_MAX_XPS); + bool wide_sel; + int filter_sel; + } cmn; +#endif }; /* * If the event is a per task event, this will point to the task in -- 2.47.3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event 2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal @ 2026-05-24 16:13 ` sashiko-bot 2026-05-29 16:44 ` Robin Murphy 1 sibling, 0 replies; 28+ messages in thread From: sashiko-bot @ 2026-05-24 16:13 UTC (permalink / raw) To: Aviv Bakal; +Cc: linux-perf-users Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider: New issues: - [Low] The patch breaks the build when `arm-cmn` is configured as a module due to an incorrect `#ifdef` guard in `include/linux/perf_event.h`. Pre-existing issues: - [Critical] Integer truncation of `cmn->num_dns` leads to an undersized heap allocation followed by an out-of-bounds write. - [High] Unbounded `cmn->num_dtcs` leads to an out-of-bounds memset in `arm_cmn_event_init()`. - [High] Unbounded counting of `hw->num_dns` leads to an out-of-bounds write in the `hw->dtm_idx` array. - [High] Unbounded DTM pointer advancement leads to an out-of-bounds write in `arm_cmn_init_dtm()`. -- commit 783e16d4bed9e2f619820aa17251b0f781f0cd6e Author: Aviv Bakal <avivb@amazon.com> perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event This patch moves the arm_cmn_hw_event structure out of the driver file and directly into the hw_perf_event union to allow expanding its size for larger mesh topologies without violating static bounds. > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 48d851fbd8ea5..c38576a8e3383 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -200,6 +201,27 @@ struct hw_perf_event { > u64 conf; > u64 conf1; > }; > +#ifdef CONFIG_ARM_CMN [Severity: Low] Will this break the build when the arm-cmn driver is compiled as a module? When the driver is built as a module, Kbuild defines CONFIG_ARM_CMN_MODULE instead of CONFIG_ARM_CMN. If the ifdef block is skipped, the cmn field is omitted from struct hw_perf_event, resulting in compiler errors when drivers/perf/arm-cmn.c attempts to access it. Does this need to use IS_ENABLED(CONFIG_ARM_CMN)? > +/* Some implementations use a mesh larger than the architectural max of 12 */ > +#define CMN_MAX_DIMENSION 14 [Severity: High] This is a pre-existing issue, but is the DTM initialization logic vulnerable to an out-of-bounds heap write with larger meshes? When multi_dtm is enabled, the driver allocates 2 * num_xps + 1 DTMs. However, if multiple XPs have more than 4 connected ports, the logic in arm_cmn_init_dtm() can initialize up to 3 DTMs per XP. For meshes containing multiple such XPs, the total number of initialized DTMs could exceed the allocated capacity, overwriting adjacent heap memory. > +#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) > +#define CMN_MAX_NODES_PER_EVENT CMN_MAX_XPS [Severity: Critical] This isn't a bug introduced by this patch, but does the discovery phase have an integer truncation regression when accumulating the total node count? In struct arm_cmn, num_dns is declared as a 16-bit integer (u16). During arm_cmn_discover(), accumulating child_count across all XPs can wrap around this 16-bit boundary if an unexpectedly large number of children are reported. Because devm_kcalloc() uses this truncated value to allocate the dn array, the subsequent pass that advances the dn pointer based on the true number of children could write past the end of the undersized heap allocation. > +#define CMN_MAX_DTCS 4 > + struct arm_cmn_hw_event { /* arm_cmn */ > + /* > + * CMN PMU event state overlaid on hw_perf_event. > + * Must fit before the 'target' field. > + */ > + struct arm_cmn_node *dn; > + u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; [Severity: High] This isn't a bug introduced by this patch, but what prevents hw->num_dns from exceeding the capacity of the dtm_idx array? While the driver restricts individual dn->logid values, it does not limit the total number of nodes of a specific type. If firmware declares more than 224 nodes of the same type, hw->num_dns will exceed the space allocated for dtm_idx. During event addition, the driver calls arm_cmn_set_index() for values of i up to hw->num_dns - 1. Does this result in an out-of-bounds bitwise write that corrupts nearby structure members? > + s8 dtc_idx[CMN_MAX_DTCS]; [Severity: High] This is a pre-existing issue, but is there a missing bounds check for cmn->num_dtcs during discovery? In arm_cmn_discover(), the driver increments cmn->num_dtcs for every CMN_TYPE_DTC node found. If the firmware reports more than CMN_MAX_DTCS (4) nodes, arm_cmn_event_init() will perform a memset() using the unbounded count: memset(hw->dtc_idx, 0, cmn->num_dtcs); Since dtc_idx has a fixed capacity of 4, could this lead to an out-of-bounds write that corrupts adjacent perf_event fields? > + u8 num_dns; > + u8 dtm_offset; > + DECLARE_BITMAP(wp_idx, CMN_MAX_XPS); > + bool wide_sel; > + int filter_sel; > + } cmn; > +#endif > }; [ ... ] -- Sashiko AI review · https://sashiko.dev/#/patchset/20260524153848.16334-1-avivb@amazon.com?part=1 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event 2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal 2026-05-24 16:13 ` sashiko-bot @ 2026-05-29 16:44 ` Robin Murphy 1 sibling, 0 replies; 28+ messages in thread From: Robin Murphy @ 2026-05-29 16:44 UTC (permalink / raw) To: Aviv Bakal, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof, Ilkka Koskinen On 24/05/2026 4:38 pm, Aviv Bakal wrote: > In order to increase CMN_MAX_DIMENSION beyond 12 (required for meshes > larger than 12x12, such as Graviton5), the arm_cmn_hw_event struct must > grow. Since it is overlaid on the beginning of hw_perf_event via an > unsafe cast, increasing its size would violate the static_assert that > guards against overflowing into the 'target' field. > > Resolve this by moving struct arm_cmn_hw_event into the hw_perf_event > union as a proper named member, eliminating the cast in to_cmn_hw() and > making the size reservation explicit. Set CMN_MAX_DIMENSION to 14 to > accommodate larger mesh topologies. > > Signed-off-by: Aviv Bakal <avivb@amazon.com> > --- > drivers/perf/arm-cmn.c | 26 +------------------------- > include/linux/perf_event.h | 22 ++++++++++++++++++++++ > 2 files changed, 23 insertions(+), 25 deletions(-) > > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > index f5305c8fdca4..3443b819afed 100644 > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c > @@ -31,13 +31,8 @@ > #define CMN_CHILD_NODE_ADDR GENMASK(29, 0) > #define CMN_CHILD_NODE_EXTERNAL BIT(31) > > -#define CMN_MAX_DIMENSION 12 > -#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) > #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4) > > -/* Currently XPs are the node type we can have most of; others top out at 128 */ > -#define CMN_MAX_NODES_PER_EVENT CMN_MAX_XPS > - > /* The CFG node has various info besides the discovery tree */ > #define CMN_CFGM_PERIPH_ID_01 0x0008 > #define CMN_CFGM_PID0_PART_0 GENMASK_ULL(7, 0) > @@ -148,7 +143,6 @@ > #define CMN_DT_PMSRR_SS_REQ BIT(0) > > #define CMN_DT_NUM_COUNTERS 8 > -#define CMN_MAX_DTCS 4 > > /* > * Even in the worst case a DTC counter can't wrap in fewer than 2^42 cycles, > @@ -595,24 +589,6 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) > static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {} > #endif > > -struct arm_cmn_hw_event { > - struct arm_cmn_node *dn; > - u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; > - s8 dtc_idx[CMN_MAX_DTCS]; > - u8 num_dns; > - u8 dtm_offset; > - > - /* > - * WP config registers are divided to UP and DOWN events. We need to > - * keep to track only one of them. > - */ > - DECLARE_BITMAP(wp_idx, CMN_MAX_XPS); > - > - bool wide_sel; > - enum cmn_filter_select filter_sel; > -}; > -static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target)); > - > #define for_each_hw_dn(hw, dn, i) \ > for (i = 0, dn = hw->dn; i < hw->num_dns; i++, dn++) > > @@ -622,7 +598,7 @@ static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, > > static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event) > { > - return (struct arm_cmn_hw_event *)&event->hw; > + return &event->hw.cmn; > } > > static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val) > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 48d851fbd8ea..c38576a8e338 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h $ scripts/get_maintainer.pl include/linux/perf_event.h Peter Zijlstra <peterz@infradead.org> (maintainer:PERFORMANCE EVENTS SUBSYSTEM) Ingo Molnar <mingo@redhat.com> (maintainer:PERFORMANCE EVENTS SUBSYSTEM) Arnaldo Carvalho de Melo <acme@kernel.org> (maintainer:PERFORMANCE EVENTS SUBSYSTEM) Namhyung Kim <namhyung@kernel.org> (maintainer:PERFORMANCE EVENTS SUBSYSTEM) Mark Rutland <mark.rutland@arm.com> (reviewer:PERFORMANCE EVENTS SUBSYSTEM) Alexander Shishkin <alexander.shishkin@linux.intel.com> (reviewer:PERFORMANCE EVENTS SUBSYSTEM) Jiri Olsa <jolsa@kernel.org> (reviewer:PERFORMANCE EVENTS SUBSYSTEM) Ian Rogers <irogers@google.com> (reviewer:PERFORMANCE EVENTS SUBSYSTEM) Adrian Hunter <adrian.hunter@intel.com> (reviewer:PERFORMANCE EVENTS SUBSYSTEM) James Clark <james.clark@linaro.org> (reviewer:PERFORMANCE EVENTS SUBSYSTEM) linux-perf-users@vger.kernel.org (open list:PERFORMANCE EVENTS SUBSYSTEM) linux-kernel@vger.kernel.org (open list:PERFORMANCE EVENTS SUBSYSTEM) > @@ -119,6 +119,7 @@ struct perf_branch_stack { > }; > > struct task_struct; > +struct arm_cmn_node; > > /* > * extra PMU register associated with an event > @@ -200,6 +201,27 @@ struct hw_perf_event { > u64 conf; > u64 conf1; > }; > +#ifdef CONFIG_ARM_CMN > +/* Some implementations use a mesh larger than the architectural max of 12 */ > +#define CMN_MAX_DIMENSION 14 > +#define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) > +#define CMN_MAX_NODES_PER_EVENT CMN_MAX_XPS > +#define CMN_MAX_DTCS 4 I doubt the maintainers would be too happy to have this degree of random driver specifics in a core header. Certainly I wouldn't consider it a great idea myself... > + struct arm_cmn_hw_event { /* arm_cmn */ > + /* > + * CMN PMU event state overlaid on hw_perf_event. > + * Must fit before the 'target' field. > + */ > + struct arm_cmn_node *dn; > + u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; > + s8 dtc_idx[CMN_MAX_DTCS]; > + u8 num_dns; > + u8 dtm_offset; > + DECLARE_BITMAP(wp_idx, CMN_MAX_XPS); > + bool wide_sel; > + int filter_sel; > + } cmn; > +#endif > }; > /* > * If the event is a per task event, this will point to the task in TBH I'd be tempted to just disable 32-bit compile-testing for now, but if we're going to do anything "proper" then it's probably about time to do this: Thanks, Robin. ----->8----- Subject: [PATCH] perf/arm-cmn: Move DTM index data out of hw_perf_event The amount of data we need to store all the per-DTM counter and watchpoint allocations is already testing the limits of hw_perf_event, and future CMNs are only likely to keep growing larger, so move these arrays out to separate memory allocations. As part of that we can use an explicit union for allocating cycle counters to dtc_cycles events, which is arguably nicer anyway. Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> --- drivers/perf/arm-cmn.c | 89 ++++++++++++++++++++++++++++-------------- 1 file changed, 59 insertions(+), 30 deletions(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index 6e5cc4086a9e..470f6a82502e 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -598,17 +598,14 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {} struct arm_cmn_hw_event { struct arm_cmn_node *dn; - u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; + union { + u64 *dtm_idx; + int cc_idx; + }; + unsigned long *wp_idx; s8 dtc_idx[CMN_MAX_DTCS]; u8 num_dns; u8 dtm_offset; - - /* - * WP config registers are divided to UP and DOWN events. We need to - * keep to track only one of them. - */ - DECLARE_BITMAP(wp_idx, CMN_MAX_XPS); - bool wide_sel; enum cmn_filter_select filter_sel; }; @@ -626,25 +623,42 @@ static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event) return (struct arm_cmn_hw_event *)&event->hw; } -static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val) +static void arm_cmn_set_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos, unsigned int val) { - x[pos / 32] |= (u64)val << ((pos % 32) * 2); + hw->dtm_idx[pos / 32] |= (u64)val << ((pos % 32) * 2); } -static unsigned int arm_cmn_get_index(u64 x[], unsigned int pos) +static unsigned int arm_cmn_get_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos) { - return (x[pos / 32] >> ((pos % 32) * 2)) & 3; + return (hw->dtm_idx[pos / 32] >> ((pos % 32) * 2)) & 3; } -static void arm_cmn_set_wp_idx(unsigned long *wp_idx, unsigned int pos, bool val) +static u64 *arm_cmn_alloc_dtm_idx(void) +{ + return kzalloc(DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64), GFP_KERNEL); +} + +static void arm_cmn_set_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos, bool val) { if (val) - set_bit(pos, wp_idx); + set_bit(pos, hw->wp_idx); } -static unsigned int arm_cmn_get_wp_idx(unsigned long *wp_idx, unsigned int pos) +static unsigned int arm_cmn_get_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos) { - return test_bit(pos, wp_idx); + return test_bit(pos, hw->wp_idx); +} + +static unsigned long *arm_cmn_alloc_wp_idx(void) +{ + return bitmap_zalloc(CMN_MAX_XPS, GFP_KERNEL); +} + +static void arm_cmn_clear_idx(struct arm_cmn_hw_event *hw) +{ + memset(hw->dtm_idx, 0, DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)); + if (hw->wp_idx) + memset(hw->wp_idx, 0, CMN_MAX_XPS); } struct arm_cmn_event_attr { @@ -1377,7 +1391,7 @@ static int arm_cmn_get_assigned_wp_idx(struct perf_event *event, struct arm_cmn_hw_event *hw, unsigned int pos) { - return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw->wp_idx, pos); + return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw, pos); } static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm, @@ -1388,7 +1402,7 @@ static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm, struct arm_cmn_hw_event *hw = to_cmn_hw(event); dtm->wp_event[wp_idx] = hw->dtc_idx[dtc]; - arm_cmn_set_wp_idx(hw->wp_idx, pos, wp_idx - CMN_EVENT_EVENTID(event)); + arm_cmn_set_wp_idx(hw, pos, wp_idx - CMN_EVENT_EVENTID(event)); } static u32 arm_cmn_wp_config(struct perf_event *event, int wp_idx) @@ -1459,7 +1473,7 @@ static u64 arm_cmn_read_dtm(struct arm_cmn *cmn, struct arm_cmn_hw_event *hw, dtm = &cmn->dtms[dn->dtm] + hw->dtm_offset; reg = readq_relaxed(dtm->base + offset); } - dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + dtm_idx = arm_cmn_get_dtm_idx(hw, i); count += (u16)(reg >> (dtm_idx * 16)); } return count; @@ -1506,7 +1520,7 @@ static void arm_cmn_event_read(struct perf_event *event) unsigned long flags; if (CMN_EVENT_TYPE(event) == CMN_TYPE_DTC) { - delta = arm_cmn_read_cc(cmn->dtc + hw->dtc_idx[0]); + delta = arm_cmn_read_cc(cmn->dtc + hw->cc_idx); local64_add(delta, &event->count); return; } @@ -1573,7 +1587,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags) int i; if (type == CMN_TYPE_DTC) { - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx; writel_relaxed(CMN_DT_DTC_CTL_DT_EN | CMN_DT_DTC_CTL_CG_DISABLE, dtc->base + CMN_DT_DTC_CTL); @@ -1591,7 +1605,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags) writeq_relaxed(mask, base + CMN_DTM_WPn_MASK(wp_idx)); } } else for_each_hw_dn(hw, dn, i) { - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + int dtm_idx = arm_cmn_get_dtm_idx(hw, i); arm_cmn_set_event_sel_lo(dn, dtm_idx, CMN_EVENT_EVENTID(event), hw->wide_sel); @@ -1607,7 +1621,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags) int i; if (type == CMN_TYPE_DTC) { - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx; dtc->cc_active = false; writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL); @@ -1620,7 +1634,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags) writeq_relaxed(~0ULL, base + CMN_DTM_WPn_VAL(wp_idx)); } } else for_each_hw_dn(hw, dn, i) { - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + int dtm_idx = arm_cmn_get_dtm_idx(hw, i); arm_cmn_set_event_sel_lo(dn, dtm_idx, 0, hw->wide_sel); } @@ -1764,6 +1778,14 @@ static enum cmn_filter_select arm_cmn_filter_sel(const struct arm_cmn *cmn, } +static void arm_cmn_event_destroy(struct perf_event *event) +{ + struct arm_cmn_hw_event *hw = to_cmn_hw(event); + + kfree(hw->dtm_idx); + bitmap_free(hw->wp_idx); +} + static int arm_cmn_event_init(struct perf_event *event) { struct arm_cmn *cmn = to_cmn(event->pmu); @@ -1788,6 +1810,11 @@ static int arm_cmn_event_init(struct perf_event *event) if (type == CMN_TYPE_DTC) return arm_cmn_validate_group(cmn, event); + event->destroy = arm_cmn_event_destroy; + hw->dtm_idx = arm_cmn_alloc_dtm_idx(); + if (!hw->dtm_idx) + return -ENOMEM; + eventid = CMN_EVENT_EVENTID(event); /* For watchpoints we need the actual XP node here */ if (type == CMN_TYPE_WP) { @@ -1798,6 +1825,9 @@ static int arm_cmn_event_init(struct perf_event *event) /* ...but the DTM may depend on which port we're watching */ if (cmn->multi_dtm) hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2; + hw->wp_idx = arm_cmn_alloc_wp_idx(); + if (!hw->wp_idx) + return -ENOMEM; } else if (type == CMN_TYPE_XP && (cmn->part == PART_CMN700 || cmn->part == PART_CMN_S3)) { hw->wide_sel = true; @@ -1848,7 +1878,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event, while (i--) { struct arm_cmn_dtm *dtm = &cmn->dtms[hw->dn[i].dtm] + hw->dtm_offset; - unsigned int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + unsigned int dtm_idx = arm_cmn_get_dtm_idx(hw, i); if (type == CMN_TYPE_WP) { int wp_idx = arm_cmn_get_assigned_wp_idx(event, hw, i); @@ -1862,8 +1892,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event, dtm->pmu_config_low &= ~CMN__PMEVCNT_PAIRED(dtm_idx); writel_relaxed(dtm->pmu_config_low, dtm->base + CMN_DTM_PMU_CONFIG); } - memset(hw->dtm_idx, 0, sizeof(hw->dtm_idx)); - memset(hw->wp_idx, 0, sizeof(hw->wp_idx)); + arm_cmn_clear_idx(hw); for_each_hw_dtc_idx(hw, j, idx) cmn->dtc[j].counters[idx] = NULL; @@ -1883,7 +1912,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags) return -ENOSPC; cmn->dtc[i].cycles = event; - hw->dtc_idx[0] = i; + hw->cc_idx = i; if (flags & PERF_EF_START) arm_cmn_event_start(event, 0); @@ -1948,7 +1977,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags) goto free_dtms; } - arm_cmn_set_index(hw->dtm_idx, i, dtm_idx); + arm_cmn_set_dtm_idx(hw, i, dtm_idx); dtm->input_sel[dtm_idx] = input_sel; shift = CMN__PMEVCNTn_GLOBAL_NUM_SHIFT(dtm_idx); @@ -1981,7 +2010,7 @@ static void arm_cmn_event_del(struct perf_event *event, int flags) arm_cmn_event_stop(event, PERF_EF_UPDATE); if (type == CMN_TYPE_DTC) - cmn->dtc[hw->dtc_idx[0]].cycles = NULL; + cmn->dtc[hw->cc_idx].cycles = NULL; else arm_cmn_event_clear(cmn, event, hw->num_dns); } -- 2.54.0.dirty ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal 2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal @ 2026-05-24 15:38 ` Aviv Bakal 2026-05-24 16:40 ` sashiko-bot 2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal 2 siblings, 1 reply; 28+ messages in thread From: Aviv Bakal @ 2026-05-24 15:38 UTC (permalink / raw) To: robin.murphy, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev, blakgeof Graviton5 uses a customised CMN-S3 implementation where certain discovery registers report zeroed fields. Add the following workarounds: - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant. - Derive the DTC domain from the XP node ID, since the unit info register reports it as zero. - Set the DTC logical ID from the XP's logical ID, since the node info register's logical ID field is also zeroed. Signed-off-by: Aviv Bakal <avivb@amazon.com> --- drivers/perf/arm-cmn.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index 3443b819afed..0184e598777a 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -208,6 +208,8 @@ enum cmn_part { PART_CMN700 = 0x43c, PART_CI700 = 0x43a, PART_CMN_S3 = 0x43e, + /* Synthetic part number, overridden to PART_CMN_S3 during discovery */ + PART_GRAVITON5 = 0xa5, }; /* CMN-600 r0px shouldn't exist in silicon, thankfully */ @@ -2197,6 +2199,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); } +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) +{ + unsigned int x = (xp_id >> 7) & 0xf; + unsigned int y = (xp_id >> 3) & 0xf; + + /* + * The unit info register reads as zero; derive the DTC domain from + * the XP's mesh coordinates over the 10x14 mesh. + */ + return (x / 5) + (y / 7) * 2; +} + static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node) { int level; @@ -2242,6 +2256,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) u64 reg; int i, j; size_t sz; + bool graviton5_workaround = false; arm_cmn_init_node_info(cmn, rgn_offset, &cfg); if (cfg.type != CMN_TYPE_CFG) @@ -2252,6 +2267,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01); part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg); part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8; + + /* Graviton5 has a customised CMN-S3 which needs some fixups */ + if (cmn->part == PART_GRAVITON5) { + cmn->part = PART_CMN_S3; + graviton5_workaround = true; + } + /* 600AE is close enough that it's not really worth more complexity */ if (part == PART_CMN600AE) part = PART_CMN600; @@ -2341,6 +2363,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) if (cmn->part == PART_CMN600) xp->dtc = -1; + else if (graviton5_workaround) + xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id); else xp->dtc = arm_cmn_dtc_domain(cmn, xp_region); @@ -2419,6 +2443,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) switch (dn->type) { case CMN_TYPE_DTC: + if (graviton5_workaround) { + /* Node info logical ID is zeroed; use the XP's */ + dn->logid = xp->logid; + } cmn->num_dtcs++; dn++; break; @@ -2634,6 +2662,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = { { "ARMHC650" }, { "ARMHC700" }, { "ARMHC003" }, + { "AMZN0070", PART_GRAVITON5 }, {} }; MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match); -- 2.47.3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-24 15:38 ` [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal @ 2026-05-24 16:40 ` sashiko-bot 0 siblings, 0 replies; 28+ messages in thread From: sashiko-bot @ 2026-05-24 16:40 UTC (permalink / raw) To: Aviv Bakal; +Cc: linux-perf-users Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: - [High] Setting the DTC node's logical ID to the parent XP's logical ID breaks the DTC array indexing, leading to incorrect PMU event programming and IRQ routing. -- commit 20d5a81cc2d5eb87a70576532c9a783cb6e011c3 Author: Aviv Bakal <avivb@amazon.com> perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 This commit introduces workarounds for a custom CMN-S3 implementation on Graviton5 where certain discovery registers report zeroed fields. It adds a dedicated ACPI HID and derives the DTC domain and logical ID from the XP node. > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c [ ... ] > @@ -2419,6 +2443,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > > switch (dn->type) { > case CMN_TYPE_DTC: > + if (graviton5_workaround) { > + /* Node info logical ID is zeroed; use the XP's */ > + dn->logid = xp->logid; [Severity: High] Will setting the DTC's logical ID to the parent XP's logical ID break the DTC array indexing? For standard CMN hardware, a DTC's logid is its Domain ID. Nodes are sorted by type and logid, which means DTCs naturally sort by Domain ID. By using the XP's logid instead, the DTCs will be sorted by their physical XP order rather than their Domain ID. The cmn->dtc array and ACPI IRQs are populated sequentially based on this sorted order. Later, when programming PMU events, the driver indexes into the cmn->dtc array using the Domain ID: d = max_t(int, dn->dtc, 0); If the physical XP sort order doesn't match the Domain ID order, will this cause PMU events to be programmed into the wrong hardware blocks and misroute IRQs? Could this be assigned the Domain ID directly using dn->logid = xp->dtc? > + } > cmn->num_dtcs++; > dn++; > break; -- Sashiko AI review · https://sashiko.dev/#/patchset/20260524153848.16334-1-avivb@amazon.com?part=2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v4 0/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal 2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal 2026-05-24 15:38 ` [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal @ 2026-05-31 11:04 ` Aviv Bakal 2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal ` (2 more replies) 2 siblings, 3 replies; 28+ messages in thread From: Aviv Bakal @ 2026-05-31 11:04 UTC (permalink / raw) To: robin.murphy, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev, blakgeof, ilkka This series adds support for Graviton5's customised CMN-S3 which has zeroed discovery registers. Patch 1 is based on Robin's suggestion [1] to move the DTM index and watchpoint bitmaps out of hw_perf_event into separate allocations, removing the size constraint that prevented increasing CMN_MAX_DIMENSION. Two fixes were applied on top: the kzalloc size now accounts for sizeof(u64), and bitmap_zero() is used instead of a raw memset for wp_idx. Patch 2 adds the Graviton5 workarounds. [1] https://lore.kernel.org/all/c1d27212-8e2b-4820-bee6-ad7e7de81238@arm.com/ Changes since v3: - Replace patch 1/2 with Robin's approach: dynamically allocate dtm_idx and wp_idx instead of moving the struct into perf_event.h Changes since v2: - Revert DTC logical ID assignment back to xp->logid (per Robin's review) Aviv Bakal (2): perf/arm-cmn: Move DTM index data out of hw_perf_event perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 drivers/perf/arm-cmn.c | 121 ++++++++++++++++++++++++++++++----------- 1 file changed, 90 insertions(+), 31 deletions(-) -- 2.47.3 ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event 2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal @ 2026-05-31 11:04 ` Aviv Bakal 2026-06-01 18:23 ` Robin Murphy 2026-06-02 5:49 ` Ilkka Koskinen 2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal 2026-06-03 15:00 ` [PATCH v5 0/2] " Aviv Bakal 2 siblings, 2 replies; 28+ messages in thread From: Aviv Bakal @ 2026-05-31 11:04 UTC (permalink / raw) To: robin.murphy, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev, blakgeof, ilkka The amount of data we need to store all the per-DTM counter and watchpoint allocations is already testing the limits of hw_perf_event, and future CMNs are only likely to keep growing larger, so move these arrays out to separate memory allocations. As part of that we can use an explicit union for allocating cycle counters to dtc_cycles events, which is arguably nicer anyway. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Aviv Bakal <avivb@amazon.com> --- drivers/perf/arm-cmn.c | 89 ++++++++++++++++++++++++++++-------------- 1 file changed, 59 insertions(+), 30 deletions(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index f5305c8fdca4..f1978a53d1c1 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -597,17 +597,14 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {} struct arm_cmn_hw_event { struct arm_cmn_node *dn; - u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; + union { + u64 *dtm_idx; + int cc_idx; + }; + unsigned long *wp_idx; s8 dtc_idx[CMN_MAX_DTCS]; u8 num_dns; u8 dtm_offset; - - /* - * WP config registers are divided to UP and DOWN events. We need to - * keep to track only one of them. - */ - DECLARE_BITMAP(wp_idx, CMN_MAX_XPS); - bool wide_sel; enum cmn_filter_select filter_sel; }; @@ -625,25 +622,42 @@ static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event) return (struct arm_cmn_hw_event *)&event->hw; } -static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val) +static void arm_cmn_set_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos, unsigned int val) { - x[pos / 32] |= (u64)val << ((pos % 32) * 2); + hw->dtm_idx[pos / 32] |= (u64)val << ((pos % 32) * 2); } -static unsigned int arm_cmn_get_index(u64 x[], unsigned int pos) +static unsigned int arm_cmn_get_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos) { - return (x[pos / 32] >> ((pos % 32) * 2)) & 3; + return (hw->dtm_idx[pos / 32] >> ((pos % 32) * 2)) & 3; } -static void arm_cmn_set_wp_idx(unsigned long *wp_idx, unsigned int pos, bool val) +static u64 *arm_cmn_alloc_dtm_idx(void) +{ + return kzalloc(DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64), GFP_KERNEL); +} + +static void arm_cmn_set_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos, bool val) { if (val) - set_bit(pos, wp_idx); + set_bit(pos, hw->wp_idx); +} + +static unsigned int arm_cmn_get_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos) +{ + return test_bit(pos, hw->wp_idx); +} + +static unsigned long *arm_cmn_alloc_wp_idx(void) +{ + return bitmap_zalloc(CMN_MAX_XPS, GFP_KERNEL); } -static unsigned int arm_cmn_get_wp_idx(unsigned long *wp_idx, unsigned int pos) +static void arm_cmn_clear_idx(struct arm_cmn_hw_event *hw) { - return test_bit(pos, wp_idx); + memset(hw->dtm_idx, 0, DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64)); + if (hw->wp_idx) + bitmap_zero(hw->wp_idx, CMN_MAX_XPS); } struct arm_cmn_event_attr { @@ -1376,7 +1390,7 @@ static int arm_cmn_get_assigned_wp_idx(struct perf_event *event, struct arm_cmn_hw_event *hw, unsigned int pos) { - return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw->wp_idx, pos); + return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw, pos); } static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm, @@ -1387,7 +1401,7 @@ static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm, struct arm_cmn_hw_event *hw = to_cmn_hw(event); dtm->wp_event[wp_idx] = hw->dtc_idx[dtc]; - arm_cmn_set_wp_idx(hw->wp_idx, pos, wp_idx - CMN_EVENT_EVENTID(event)); + arm_cmn_set_wp_idx(hw, pos, wp_idx - CMN_EVENT_EVENTID(event)); } static u32 arm_cmn_wp_config(struct perf_event *event, int wp_idx) @@ -1458,7 +1472,7 @@ static u64 arm_cmn_read_dtm(struct arm_cmn *cmn, struct arm_cmn_hw_event *hw, dtm = &cmn->dtms[dn->dtm] + hw->dtm_offset; reg = readq_relaxed(dtm->base + offset); } - dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + dtm_idx = arm_cmn_get_dtm_idx(hw, i); count += (u16)(reg >> (dtm_idx * 16)); } return count; @@ -1505,7 +1519,7 @@ static void arm_cmn_event_read(struct perf_event *event) unsigned long flags; if (CMN_EVENT_TYPE(event) == CMN_TYPE_DTC) { - delta = arm_cmn_read_cc(cmn->dtc + hw->dtc_idx[0]); + delta = arm_cmn_read_cc(cmn->dtc + hw->cc_idx); local64_add(delta, &event->count); return; } @@ -1572,7 +1586,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags) int i; if (type == CMN_TYPE_DTC) { - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx; writel_relaxed(CMN_DT_DTC_CTL_DT_EN | CMN_DT_DTC_CTL_CG_DISABLE, dtc->base + CMN_DT_DTC_CTL); @@ -1590,7 +1604,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags) writeq_relaxed(mask, base + CMN_DTM_WPn_MASK(wp_idx)); } } else for_each_hw_dn(hw, dn, i) { - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + int dtm_idx = arm_cmn_get_dtm_idx(hw, i); arm_cmn_set_event_sel_lo(dn, dtm_idx, CMN_EVENT_EVENTID(event), hw->wide_sel); @@ -1606,7 +1620,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags) int i; if (type == CMN_TYPE_DTC) { - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx; dtc->cc_active = false; writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL); @@ -1619,7 +1633,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags) writeq_relaxed(~0ULL, base + CMN_DTM_WPn_VAL(wp_idx)); } } else for_each_hw_dn(hw, dn, i) { - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + int dtm_idx = arm_cmn_get_dtm_idx(hw, i); arm_cmn_set_event_sel_lo(dn, dtm_idx, 0, hw->wide_sel); } @@ -1763,6 +1777,14 @@ static enum cmn_filter_select arm_cmn_filter_sel(const struct arm_cmn *cmn, } +static void arm_cmn_event_destroy(struct perf_event *event) +{ + struct arm_cmn_hw_event *hw = to_cmn_hw(event); + + kfree(hw->dtm_idx); + bitmap_free(hw->wp_idx); +} + static int arm_cmn_event_init(struct perf_event *event) { struct arm_cmn *cmn = to_cmn(event->pmu); @@ -1787,6 +1809,11 @@ static int arm_cmn_event_init(struct perf_event *event) if (type == CMN_TYPE_DTC) return arm_cmn_validate_group(cmn, event); + event->destroy = arm_cmn_event_destroy; + hw->dtm_idx = arm_cmn_alloc_dtm_idx(); + if (!hw->dtm_idx) + return -ENOMEM; + eventid = CMN_EVENT_EVENTID(event); /* For watchpoints we need the actual XP node here */ if (type == CMN_TYPE_WP) { @@ -1797,6 +1824,9 @@ static int arm_cmn_event_init(struct perf_event *event) /* ...but the DTM may depend on which port we're watching */ if (cmn->multi_dtm) hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2; + hw->wp_idx = arm_cmn_alloc_wp_idx(); + if (!hw->wp_idx) + return -ENOMEM; } else if (type == CMN_TYPE_XP && (cmn->part == PART_CMN700 || cmn->part == PART_CMN_S3)) { hw->wide_sel = true; @@ -1847,7 +1877,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event, while (i--) { struct arm_cmn_dtm *dtm = &cmn->dtms[hw->dn[i].dtm] + hw->dtm_offset; - unsigned int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + unsigned int dtm_idx = arm_cmn_get_dtm_idx(hw, i); if (type == CMN_TYPE_WP) { int wp_idx = arm_cmn_get_assigned_wp_idx(event, hw, i); @@ -1861,8 +1891,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event, dtm->pmu_config_low &= ~CMN__PMEVCNT_PAIRED(dtm_idx); writel_relaxed(dtm->pmu_config_low, dtm->base + CMN_DTM_PMU_CONFIG); } - memset(hw->dtm_idx, 0, sizeof(hw->dtm_idx)); - memset(hw->wp_idx, 0, sizeof(hw->wp_idx)); + arm_cmn_clear_idx(hw); for_each_hw_dtc_idx(hw, j, idx) cmn->dtc[j].counters[idx] = NULL; @@ -1882,7 +1911,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags) return -ENOSPC; cmn->dtc[i].cycles = event; - hw->dtc_idx[0] = i; + hw->cc_idx = i; if (flags & PERF_EF_START) arm_cmn_event_start(event, 0); @@ -1947,7 +1976,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags) goto free_dtms; } - arm_cmn_set_index(hw->dtm_idx, i, dtm_idx); + arm_cmn_set_dtm_idx(hw, i, dtm_idx); dtm->input_sel[dtm_idx] = input_sel; shift = CMN__PMEVCNTn_GLOBAL_NUM_SHIFT(dtm_idx); @@ -1980,7 +2009,7 @@ static void arm_cmn_event_del(struct perf_event *event, int flags) arm_cmn_event_stop(event, PERF_EF_UPDATE); if (type == CMN_TYPE_DTC) - cmn->dtc[hw->dtc_idx[0]].cycles = NULL; + cmn->dtc[hw->cc_idx].cycles = NULL; else arm_cmn_event_clear(cmn, event, hw->num_dns); } -- 2.47.3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event 2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal @ 2026-06-01 18:23 ` Robin Murphy 2026-06-02 5:49 ` Ilkka Koskinen 1 sibling, 0 replies; 28+ messages in thread From: Robin Murphy @ 2026-06-01 18:23 UTC (permalink / raw) To: Aviv Bakal, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof, ilkka On 2026-05-31 12:04 pm, Aviv Bakal wrote: > The amount of data we need to store all the per-DTM counter and > watchpoint allocations is already testing the limits of hw_perf_event, > and future CMNs are only likely to keep growing larger, so move these > arrays out to separate memory allocations. As part of that we can use > an explicit union for allocating cycle counters to dtc_cycles events, > which is arguably nicer anyway. Nit: not that I'm particularly precious about it, but this has lost my original authorship - I really should leave the technically-redundant "From:" line in when I post inline patches, as frankly even I prefer to just copy-paste the patch part rather than export the whole mail with its original headers for `git am --scissors`... Thanks, Robin. > Signed-off-by: Robin Murphy <robin.murphy@arm.com> > Signed-off-by: Aviv Bakal <avivb@amazon.com> > --- > drivers/perf/arm-cmn.c | 89 ++++++++++++++++++++++++++++-------------- > 1 file changed, 59 insertions(+), 30 deletions(-) > > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > index f5305c8fdca4..f1978a53d1c1 100644 > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c > @@ -597,17 +597,14 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {} > > struct arm_cmn_hw_event { > struct arm_cmn_node *dn; > - u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; > + union { > + u64 *dtm_idx; > + int cc_idx; > + }; > + unsigned long *wp_idx; > s8 dtc_idx[CMN_MAX_DTCS]; > u8 num_dns; > u8 dtm_offset; > - > - /* > - * WP config registers are divided to UP and DOWN events. We need to > - * keep to track only one of them. > - */ > - DECLARE_BITMAP(wp_idx, CMN_MAX_XPS); > - > bool wide_sel; > enum cmn_filter_select filter_sel; > }; > @@ -625,25 +622,42 @@ static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event) > return (struct arm_cmn_hw_event *)&event->hw; > } > > -static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val) > +static void arm_cmn_set_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos, unsigned int val) > { > - x[pos / 32] |= (u64)val << ((pos % 32) * 2); > + hw->dtm_idx[pos / 32] |= (u64)val << ((pos % 32) * 2); > } > > -static unsigned int arm_cmn_get_index(u64 x[], unsigned int pos) > +static unsigned int arm_cmn_get_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos) > { > - return (x[pos / 32] >> ((pos % 32) * 2)) & 3; > + return (hw->dtm_idx[pos / 32] >> ((pos % 32) * 2)) & 3; > } > > -static void arm_cmn_set_wp_idx(unsigned long *wp_idx, unsigned int pos, bool val) > +static u64 *arm_cmn_alloc_dtm_idx(void) > +{ > + return kzalloc(DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64), GFP_KERNEL); > +} > + > +static void arm_cmn_set_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos, bool val) > { > if (val) > - set_bit(pos, wp_idx); > + set_bit(pos, hw->wp_idx); > +} > + > +static unsigned int arm_cmn_get_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos) > +{ > + return test_bit(pos, hw->wp_idx); > +} > + > +static unsigned long *arm_cmn_alloc_wp_idx(void) > +{ > + return bitmap_zalloc(CMN_MAX_XPS, GFP_KERNEL); > } > > -static unsigned int arm_cmn_get_wp_idx(unsigned long *wp_idx, unsigned int pos) > +static void arm_cmn_clear_idx(struct arm_cmn_hw_event *hw) > { > - return test_bit(pos, wp_idx); > + memset(hw->dtm_idx, 0, DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64)); > + if (hw->wp_idx) > + bitmap_zero(hw->wp_idx, CMN_MAX_XPS); > } > > struct arm_cmn_event_attr { > @@ -1376,7 +1390,7 @@ static int arm_cmn_get_assigned_wp_idx(struct perf_event *event, > struct arm_cmn_hw_event *hw, > unsigned int pos) > { > - return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw->wp_idx, pos); > + return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw, pos); > } > > static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm, > @@ -1387,7 +1401,7 @@ static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm, > struct arm_cmn_hw_event *hw = to_cmn_hw(event); > > dtm->wp_event[wp_idx] = hw->dtc_idx[dtc]; > - arm_cmn_set_wp_idx(hw->wp_idx, pos, wp_idx - CMN_EVENT_EVENTID(event)); > + arm_cmn_set_wp_idx(hw, pos, wp_idx - CMN_EVENT_EVENTID(event)); > } > > static u32 arm_cmn_wp_config(struct perf_event *event, int wp_idx) > @@ -1458,7 +1472,7 @@ static u64 arm_cmn_read_dtm(struct arm_cmn *cmn, struct arm_cmn_hw_event *hw, > dtm = &cmn->dtms[dn->dtm] + hw->dtm_offset; > reg = readq_relaxed(dtm->base + offset); > } > - dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); > + dtm_idx = arm_cmn_get_dtm_idx(hw, i); > count += (u16)(reg >> (dtm_idx * 16)); > } > return count; > @@ -1505,7 +1519,7 @@ static void arm_cmn_event_read(struct perf_event *event) > unsigned long flags; > > if (CMN_EVENT_TYPE(event) == CMN_TYPE_DTC) { > - delta = arm_cmn_read_cc(cmn->dtc + hw->dtc_idx[0]); > + delta = arm_cmn_read_cc(cmn->dtc + hw->cc_idx); > local64_add(delta, &event->count); > return; > } > @@ -1572,7 +1586,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags) > int i; > > if (type == CMN_TYPE_DTC) { > - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; > + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx; > > writel_relaxed(CMN_DT_DTC_CTL_DT_EN | CMN_DT_DTC_CTL_CG_DISABLE, > dtc->base + CMN_DT_DTC_CTL); > @@ -1590,7 +1604,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags) > writeq_relaxed(mask, base + CMN_DTM_WPn_MASK(wp_idx)); > } > } else for_each_hw_dn(hw, dn, i) { > - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); > + int dtm_idx = arm_cmn_get_dtm_idx(hw, i); > > arm_cmn_set_event_sel_lo(dn, dtm_idx, CMN_EVENT_EVENTID(event), > hw->wide_sel); > @@ -1606,7 +1620,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags) > int i; > > if (type == CMN_TYPE_DTC) { > - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; > + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx; > > dtc->cc_active = false; > writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL); > @@ -1619,7 +1633,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags) > writeq_relaxed(~0ULL, base + CMN_DTM_WPn_VAL(wp_idx)); > } > } else for_each_hw_dn(hw, dn, i) { > - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); > + int dtm_idx = arm_cmn_get_dtm_idx(hw, i); > > arm_cmn_set_event_sel_lo(dn, dtm_idx, 0, hw->wide_sel); > } > @@ -1763,6 +1777,14 @@ static enum cmn_filter_select arm_cmn_filter_sel(const struct arm_cmn *cmn, > } > > > +static void arm_cmn_event_destroy(struct perf_event *event) > +{ > + struct arm_cmn_hw_event *hw = to_cmn_hw(event); > + > + kfree(hw->dtm_idx); > + bitmap_free(hw->wp_idx); > +} > + > static int arm_cmn_event_init(struct perf_event *event) > { > struct arm_cmn *cmn = to_cmn(event->pmu); > @@ -1787,6 +1809,11 @@ static int arm_cmn_event_init(struct perf_event *event) > if (type == CMN_TYPE_DTC) > return arm_cmn_validate_group(cmn, event); > > + event->destroy = arm_cmn_event_destroy; > + hw->dtm_idx = arm_cmn_alloc_dtm_idx(); > + if (!hw->dtm_idx) > + return -ENOMEM; > + > eventid = CMN_EVENT_EVENTID(event); > /* For watchpoints we need the actual XP node here */ > if (type == CMN_TYPE_WP) { > @@ -1797,6 +1824,9 @@ static int arm_cmn_event_init(struct perf_event *event) > /* ...but the DTM may depend on which port we're watching */ > if (cmn->multi_dtm) > hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2; > + hw->wp_idx = arm_cmn_alloc_wp_idx(); > + if (!hw->wp_idx) > + return -ENOMEM; > } else if (type == CMN_TYPE_XP && > (cmn->part == PART_CMN700 || cmn->part == PART_CMN_S3)) { > hw->wide_sel = true; > @@ -1847,7 +1877,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event, > > while (i--) { > struct arm_cmn_dtm *dtm = &cmn->dtms[hw->dn[i].dtm] + hw->dtm_offset; > - unsigned int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); > + unsigned int dtm_idx = arm_cmn_get_dtm_idx(hw, i); > > if (type == CMN_TYPE_WP) { > int wp_idx = arm_cmn_get_assigned_wp_idx(event, hw, i); > @@ -1861,8 +1891,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event, > dtm->pmu_config_low &= ~CMN__PMEVCNT_PAIRED(dtm_idx); > writel_relaxed(dtm->pmu_config_low, dtm->base + CMN_DTM_PMU_CONFIG); > } > - memset(hw->dtm_idx, 0, sizeof(hw->dtm_idx)); > - memset(hw->wp_idx, 0, sizeof(hw->wp_idx)); > + arm_cmn_clear_idx(hw); > > for_each_hw_dtc_idx(hw, j, idx) > cmn->dtc[j].counters[idx] = NULL; > @@ -1882,7 +1911,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags) > return -ENOSPC; > > cmn->dtc[i].cycles = event; > - hw->dtc_idx[0] = i; > + hw->cc_idx = i; > > if (flags & PERF_EF_START) > arm_cmn_event_start(event, 0); > @@ -1947,7 +1976,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags) > goto free_dtms; > } > > - arm_cmn_set_index(hw->dtm_idx, i, dtm_idx); > + arm_cmn_set_dtm_idx(hw, i, dtm_idx); > > dtm->input_sel[dtm_idx] = input_sel; > shift = CMN__PMEVCNTn_GLOBAL_NUM_SHIFT(dtm_idx); > @@ -1980,7 +2009,7 @@ static void arm_cmn_event_del(struct perf_event *event, int flags) > arm_cmn_event_stop(event, PERF_EF_UPDATE); > > if (type == CMN_TYPE_DTC) > - cmn->dtc[hw->dtc_idx[0]].cycles = NULL; > + cmn->dtc[hw->cc_idx].cycles = NULL; > else > arm_cmn_event_clear(cmn, event, hw->num_dns); > } ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event 2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal 2026-06-01 18:23 ` Robin Murphy @ 2026-06-02 5:49 ` Ilkka Koskinen 1 sibling, 0 replies; 28+ messages in thread From: Ilkka Koskinen @ 2026-06-02 5:49 UTC (permalink / raw) To: Aviv Bakal Cc: robin.murphy, will, mark.rutland, linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof, ilkka On Sun, 31 May 2026, Aviv Bakal wrote: > The amount of data we need to store all the per-DTM counter and > watchpoint allocations is already testing the limits of hw_perf_event, > and future CMNs are only likely to keep growing larger, so move these > arrays out to separate memory allocations. As part of that we can use > an explicit union for allocating cycle counters to dtc_cycles events, > which is arguably nicer anyway. > > Signed-off-by: Robin Murphy <robin.murphy@arm.com> > Signed-off-by: Aviv Bakal <avivb@amazon.com> Thanks for the patch. I was wondering a while ago when the mesh grows too big to handle it with the old way. The patch looks good to me, Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cheers, Ilkka > --- > drivers/perf/arm-cmn.c | 89 ++++++++++++++++++++++++++++-------------- > 1 file changed, 59 insertions(+), 30 deletions(-) > > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > index f5305c8fdca4..f1978a53d1c1 100644 > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c > @@ -597,17 +597,14 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {} > > struct arm_cmn_hw_event { > struct arm_cmn_node *dn; > - u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; > + union { > + u64 *dtm_idx; > + int cc_idx; > + }; > + unsigned long *wp_idx; > s8 dtc_idx[CMN_MAX_DTCS]; > u8 num_dns; > u8 dtm_offset; > - > - /* > - * WP config registers are divided to UP and DOWN events. We need to > - * keep to track only one of them. > - */ > - DECLARE_BITMAP(wp_idx, CMN_MAX_XPS); > - > bool wide_sel; > enum cmn_filter_select filter_sel; > }; > @@ -625,25 +622,42 @@ static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event) > return (struct arm_cmn_hw_event *)&event->hw; > } > > -static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val) > +static void arm_cmn_set_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos, unsigned int val) > { > - x[pos / 32] |= (u64)val << ((pos % 32) * 2); > + hw->dtm_idx[pos / 32] |= (u64)val << ((pos % 32) * 2); > } > > -static unsigned int arm_cmn_get_index(u64 x[], unsigned int pos) > +static unsigned int arm_cmn_get_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos) > { > - return (x[pos / 32] >> ((pos % 32) * 2)) & 3; > + return (hw->dtm_idx[pos / 32] >> ((pos % 32) * 2)) & 3; > } > > -static void arm_cmn_set_wp_idx(unsigned long *wp_idx, unsigned int pos, bool val) > +static u64 *arm_cmn_alloc_dtm_idx(void) > +{ > + return kzalloc(DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64), GFP_KERNEL); > +} > + > +static void arm_cmn_set_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos, bool val) > { > if (val) > - set_bit(pos, wp_idx); > + set_bit(pos, hw->wp_idx); > +} > + > +static unsigned int arm_cmn_get_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos) > +{ > + return test_bit(pos, hw->wp_idx); > +} > + > +static unsigned long *arm_cmn_alloc_wp_idx(void) > +{ > + return bitmap_zalloc(CMN_MAX_XPS, GFP_KERNEL); > } > > -static unsigned int arm_cmn_get_wp_idx(unsigned long *wp_idx, unsigned int pos) > +static void arm_cmn_clear_idx(struct arm_cmn_hw_event *hw) > { > - return test_bit(pos, wp_idx); > + memset(hw->dtm_idx, 0, DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64)); > + if (hw->wp_idx) > + bitmap_zero(hw->wp_idx, CMN_MAX_XPS); > } > > struct arm_cmn_event_attr { > @@ -1376,7 +1390,7 @@ static int arm_cmn_get_assigned_wp_idx(struct perf_event *event, > struct arm_cmn_hw_event *hw, > unsigned int pos) > { > - return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw->wp_idx, pos); > + return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw, pos); > } > > static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm, > @@ -1387,7 +1401,7 @@ static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm, > struct arm_cmn_hw_event *hw = to_cmn_hw(event); > > dtm->wp_event[wp_idx] = hw->dtc_idx[dtc]; > - arm_cmn_set_wp_idx(hw->wp_idx, pos, wp_idx - CMN_EVENT_EVENTID(event)); > + arm_cmn_set_wp_idx(hw, pos, wp_idx - CMN_EVENT_EVENTID(event)); > } > > static u32 arm_cmn_wp_config(struct perf_event *event, int wp_idx) > @@ -1458,7 +1472,7 @@ static u64 arm_cmn_read_dtm(struct arm_cmn *cmn, struct arm_cmn_hw_event *hw, > dtm = &cmn->dtms[dn->dtm] + hw->dtm_offset; > reg = readq_relaxed(dtm->base + offset); > } > - dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); > + dtm_idx = arm_cmn_get_dtm_idx(hw, i); > count += (u16)(reg >> (dtm_idx * 16)); > } > return count; > @@ -1505,7 +1519,7 @@ static void arm_cmn_event_read(struct perf_event *event) > unsigned long flags; > > if (CMN_EVENT_TYPE(event) == CMN_TYPE_DTC) { > - delta = arm_cmn_read_cc(cmn->dtc + hw->dtc_idx[0]); > + delta = arm_cmn_read_cc(cmn->dtc + hw->cc_idx); > local64_add(delta, &event->count); > return; > } > @@ -1572,7 +1586,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags) > int i; > > if (type == CMN_TYPE_DTC) { > - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; > + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx; > > writel_relaxed(CMN_DT_DTC_CTL_DT_EN | CMN_DT_DTC_CTL_CG_DISABLE, > dtc->base + CMN_DT_DTC_CTL); > @@ -1590,7 +1604,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags) > writeq_relaxed(mask, base + CMN_DTM_WPn_MASK(wp_idx)); > } > } else for_each_hw_dn(hw, dn, i) { > - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); > + int dtm_idx = arm_cmn_get_dtm_idx(hw, i); > > arm_cmn_set_event_sel_lo(dn, dtm_idx, CMN_EVENT_EVENTID(event), > hw->wide_sel); > @@ -1606,7 +1620,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags) > int i; > > if (type == CMN_TYPE_DTC) { > - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; > + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx; > > dtc->cc_active = false; > writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL); > @@ -1619,7 +1633,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags) > writeq_relaxed(~0ULL, base + CMN_DTM_WPn_VAL(wp_idx)); > } > } else for_each_hw_dn(hw, dn, i) { > - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); > + int dtm_idx = arm_cmn_get_dtm_idx(hw, i); > > arm_cmn_set_event_sel_lo(dn, dtm_idx, 0, hw->wide_sel); > } > @@ -1763,6 +1777,14 @@ static enum cmn_filter_select arm_cmn_filter_sel(const struct arm_cmn *cmn, > } > > > +static void arm_cmn_event_destroy(struct perf_event *event) > +{ > + struct arm_cmn_hw_event *hw = to_cmn_hw(event); > + > + kfree(hw->dtm_idx); > + bitmap_free(hw->wp_idx); > +} > + > static int arm_cmn_event_init(struct perf_event *event) > { > struct arm_cmn *cmn = to_cmn(event->pmu); > @@ -1787,6 +1809,11 @@ static int arm_cmn_event_init(struct perf_event *event) > if (type == CMN_TYPE_DTC) > return arm_cmn_validate_group(cmn, event); > > + event->destroy = arm_cmn_event_destroy; > + hw->dtm_idx = arm_cmn_alloc_dtm_idx(); > + if (!hw->dtm_idx) > + return -ENOMEM; > + > eventid = CMN_EVENT_EVENTID(event); > /* For watchpoints we need the actual XP node here */ > if (type == CMN_TYPE_WP) { > @@ -1797,6 +1824,9 @@ static int arm_cmn_event_init(struct perf_event *event) > /* ...but the DTM may depend on which port we're watching */ > if (cmn->multi_dtm) > hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2; > + hw->wp_idx = arm_cmn_alloc_wp_idx(); > + if (!hw->wp_idx) > + return -ENOMEM; > } else if (type == CMN_TYPE_XP && > (cmn->part == PART_CMN700 || cmn->part == PART_CMN_S3)) { > hw->wide_sel = true; > @@ -1847,7 +1877,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event, > > while (i--) { > struct arm_cmn_dtm *dtm = &cmn->dtms[hw->dn[i].dtm] + hw->dtm_offset; > - unsigned int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); > + unsigned int dtm_idx = arm_cmn_get_dtm_idx(hw, i); > > if (type == CMN_TYPE_WP) { > int wp_idx = arm_cmn_get_assigned_wp_idx(event, hw, i); > @@ -1861,8 +1891,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event, > dtm->pmu_config_low &= ~CMN__PMEVCNT_PAIRED(dtm_idx); > writel_relaxed(dtm->pmu_config_low, dtm->base + CMN_DTM_PMU_CONFIG); > } > - memset(hw->dtm_idx, 0, sizeof(hw->dtm_idx)); > - memset(hw->wp_idx, 0, sizeof(hw->wp_idx)); > + arm_cmn_clear_idx(hw); > > for_each_hw_dtc_idx(hw, j, idx) > cmn->dtc[j].counters[idx] = NULL; > @@ -1882,7 +1911,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags) > return -ENOSPC; > > cmn->dtc[i].cycles = event; > - hw->dtc_idx[0] = i; > + hw->cc_idx = i; > > if (flags & PERF_EF_START) > arm_cmn_event_start(event, 0); > @@ -1947,7 +1976,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags) > goto free_dtms; > } > > - arm_cmn_set_index(hw->dtm_idx, i, dtm_idx); > + arm_cmn_set_dtm_idx(hw, i, dtm_idx); > > dtm->input_sel[dtm_idx] = input_sel; > shift = CMN__PMEVCNTn_GLOBAL_NUM_SHIFT(dtm_idx); > @@ -1980,7 +2009,7 @@ static void arm_cmn_event_del(struct perf_event *event, int flags) > arm_cmn_event_stop(event, PERF_EF_UPDATE); > > if (type == CMN_TYPE_DTC) > - cmn->dtc[hw->dtc_idx[0]].cycles = NULL; > + cmn->dtc[hw->cc_idx].cycles = NULL; > else > arm_cmn_event_clear(cmn, event, hw->num_dns); > } > -- > 2.47.3 > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal 2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal @ 2026-05-31 11:04 ` Aviv Bakal 2026-05-31 11:32 ` sashiko-bot ` (2 more replies) 2026-06-03 15:00 ` [PATCH v5 0/2] " Aviv Bakal 2 siblings, 3 replies; 28+ messages in thread From: Aviv Bakal @ 2026-05-31 11:04 UTC (permalink / raw) To: robin.murphy, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev, blakgeof, ilkka Graviton5 uses a customised CMN-S3 implementation where certain discovery registers report zeroed fields. Add the following workarounds: - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant. - Derive the DTC domain from the XP node ID, since the unit info register reports it as zero. - Set the DTC logical ID from the XP's logical ID, since the node info register's logical ID field is also zeroed. Signed-off-by: Aviv Bakal <avivb@amazon.com> --- drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index f1978a53d1c1..3fb71d9a57eb 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -31,7 +31,8 @@ #define CMN_CHILD_NODE_ADDR GENMASK(29, 0) #define CMN_CHILD_NODE_EXTERNAL BIT(31) -#define CMN_MAX_DIMENSION 12 +/* Some implementations use a mesh larger than the architectural max of 12 */ +#define CMN_MAX_DIMENSION 14 #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4) @@ -214,6 +215,8 @@ enum cmn_part { PART_CMN700 = 0x43c, PART_CI700 = 0x43a, PART_CMN_S3 = 0x43e, + /* Synthetic part number, overridden to PART_CMN_S3 during discovery */ + PART_GRAVITON5 = 0xa5, }; /* CMN-600 r0px shouldn't exist in silicon, thankfully */ @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); } +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) +{ + unsigned int x = (xp_id >> 7) & 0xf; + unsigned int y = (xp_id >> 3) & 0xf; + + /* + * The unit info register reads as zero; derive the DTC domain from + * the XP's mesh coordinates over the 10x14 mesh. + */ + return (x / 5) + (y / 7) * 2; +} + static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node) { int level; @@ -2295,6 +2310,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) u64 reg; int i, j; size_t sz; + bool graviton5_workaround = false; arm_cmn_init_node_info(cmn, rgn_offset, &cfg); if (cfg.type != CMN_TYPE_CFG) @@ -2305,6 +2321,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01); part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg); part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8; + + /* Graviton5 has a customised CMN-S3 which needs some fixups */ + if (cmn->part == PART_GRAVITON5) { + cmn->part = PART_CMN_S3; + graviton5_workaround = true; + } + /* 600AE is close enough that it's not really worth more complexity */ if (part == PART_CMN600AE) part = PART_CMN600; @@ -2394,6 +2417,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) if (cmn->part == PART_CMN600) xp->dtc = -1; + else if (graviton5_workaround) + xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id); else xp->dtc = arm_cmn_dtc_domain(cmn, xp_region); @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) switch (dn->type) { case CMN_TYPE_DTC: + if (graviton5_workaround) { + /* Node info logical ID is zeroed; use the XP's */ + dn->logid = xp->logid; + } cmn->num_dtcs++; dn++; break; @@ -2687,6 +2716,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = { { "ARMHC650" }, { "ARMHC700" }, { "ARMHC003" }, + { "AMZN0070", PART_GRAVITON5 }, {} }; MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match); -- 2.47.3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal @ 2026-05-31 11:32 ` sashiko-bot 2026-06-01 17:45 ` Robin Murphy 2026-06-01 17:48 ` Robin Murphy 2026-06-02 5:55 ` Ilkka Koskinen 2 siblings, 1 reply; 28+ messages in thread From: sashiko-bot @ 2026-05-31 11:32 UTC (permalink / raw) To: Aviv Bakal; +Cc: linux-perf-users Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider: New issues: - [High] Setting the DTC node's logical ID to the parent XP's logical ID causes DTC nodes to be sorted out of domain order, breaking PMU event routing. - [High] The hardcoded bit-shifts in arm_cmn_graviton5_dtc_domain() can produce DTC domains >= 4, leading to out-of-bounds writes in arm_cmn_event_init(). Pre-existing issues: - [High] Memory allocated for `hw->dtm_idx` and `hw->wp_idx` is leaked if `arm_cmn_event_init` fails. -- commit c34c59523c8b94745143d1d1114415bb33c12d3e Author: Aviv Bakal <avivb@amazon.com> perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Graviton5 uses a customised CMN-S3 implementation where certain discovery registers report zeroed fields. This patch adds workarounds to identify the variant, derive the DTC domain from the XP node ID, and set the DTC logical ID from the XP's logical ID. > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > index f1978a53d1c19..3fb71d9a57eb5 100644 > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c > @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg > return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); > } > > +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) > +{ > + unsigned int x = (xp_id >> 7) & 0xf; > + unsigned int y = (xp_id >> 3) & 0xf; > + > + /* > + * The unit info register reads as zero; derive the DTC domain from > + * the XP's mesh coordinates over the 10x14 mesh. > + */ > + return (x / 5) + (y / 7) * 2; > +} [Severity: High] Can the hardcoded bit-shifts extract overlapping bits if the mesh dimensions are smaller than 10x14? If the system describes a smaller mesh (e.g., 8x8), arm_cmn_xyidbits evaluates to 3. This would cause the hardcoded shifts (xp_id >> 7) and (xp_id >> 3) to extract overlapping bits that inflate y (e.g., y = 15), yielding a domain of 5. If a 14x10 mesh is used, the x coordinate can natively reach 13, yielding a domain index of 4 ((13 / 5) + (9 / 7) * 2). When an unvalidated domain index >= 4 is assigned to dn->dtc, does it cause an out-of-bounds write in arm_cmn_event_init()? drivers/perf/arm-cmn.c:arm_cmn_event_init() { ... hw->dtc_idx[dn->dtc] = 0; ... } Since hw->dtc_idx is a fixed 4-byte array (s8 dtc_idx[CMN_MAX_DTCS]), an index >= 4 would write past its bounds and corrupt adjacent fields like hw->num_dns. [ ... ] > @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > > switch (dn->type) { > case CMN_TYPE_DTC: > + if (graviton5_workaround) { > + /* Node info logical ID is zeroed; use the XP's */ > + dn->logid = xp->logid; > + } [Severity: High] Does overriding the DTC logical ID with the parent XP's logical ID cause DTC nodes to be sorted out of domain order? In arm_cmn_init_dtcs(), CMN_TYPE_DTC nodes are sorted by their logical ID (dn->logid), which determines their index in the dynamically allocated cmn->dtc array. Standard CMN parts have their logid read from hardware as the exact domain index (0-3). Because the domain index here is a 2D block distribution, the physical XP IDs do not monotonically increase with the domain index. For example, an XP at x=7, y=0 belongs to domain 1 but has a smaller logid than an XP at x=2, y=3 which belongs to domain 0. If cmn->dtc[0] points to the hardware base of domain 1 instead of domain 0, will events programmed for domain 0 be routed to domain 1's hardware, breaking PMU event counting? Would it be better to explicitly set this as: dn->logid = arm_cmn_graviton5_dtc_domain(xp->id); [Severity: High] This is a pre-existing issue, but does arm_cmn_event_init() leak memory if initialization fails? drivers/perf/arm-cmn.c:arm_cmn_event_init() { ... if (arm_cmn_is_watchpoint(event)) if (eventid != CMN_WP_UP && eventid != CMN_WP_DOWN) return -EINVAL; ... return arm_cmn_validate_group(cmn, event); } The driver dynamically allocates memory using arm_cmn_alloc_dtm_idx() and arm_cmn_alloc_wp_idx(). If the initialization subsequently encounters an error, such as an invalid eventid for a watchpoint or a failure in arm_cmn_validate_group(), it returns an error code without freeing the allocated memory. The core perf code perf_try_init_event() does not call the driver's event->destroy() callback if pmu->event_init() fails, expecting the driver to clean up its own resources. Could this allow memory leaks if invalid events are repeatedly created? -- Sashiko AI review · https://sashiko.dev/#/patchset/20260531110447.10095-1-avivb@amazon.com?part=2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-31 11:32 ` sashiko-bot @ 2026-06-01 17:45 ` Robin Murphy 2026-06-03 14:57 ` Aviv Bakal 0 siblings, 1 reply; 28+ messages in thread From: Robin Murphy @ 2026-06-01 17:45 UTC (permalink / raw) To: sashiko-reviews, Aviv Bakal; +Cc: linux-perf-users For the benefit of any humans... On 2026-05-31 12:32 pm, sashiko-bot@kernel.org wrote: > Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider: > > New issues: > - [High] Setting the DTC node's logical ID to the parent XP's logical ID causes DTC nodes to be sorted out of domain order, breaking PMU event routing. > - [High] The hardcoded bit-shifts in arm_cmn_graviton5_dtc_domain() can produce DTC domains >= 4, leading to out-of-bounds writes in arm_cmn_event_init(). > > Pre-existing issues: > - [High] Memory allocated for `hw->dtm_idx` and `hw->wp_idx` is leaked if `arm_cmn_event_init` fails. > -- > > commit c34c59523c8b94745143d1d1114415bb33c12d3e > Author: Aviv Bakal <avivb@amazon.com> > > perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 > > Graviton5 uses a customised CMN-S3 implementation where certain discovery > registers report zeroed fields. This patch adds workarounds to identify the > variant, derive the DTC domain from the XP node ID, and set the DTC logical > ID from the XP's logical ID. > >> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c >> index f1978a53d1c19..3fb71d9a57eb5 100644 >> --- a/drivers/perf/arm-cmn.c >> +++ b/drivers/perf/arm-cmn.c >> @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg >> return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); >> } >> >> +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) >> +{ >> + unsigned int x = (xp_id >> 7) & 0xf; >> + unsigned int y = (xp_id >> 3) & 0xf; >> + >> + /* >> + * The unit info register reads as zero; derive the DTC domain from >> + * the XP's mesh coordinates over the 10x14 mesh. >> + */ >> + return (x / 5) + (y / 7) * 2; >> +} > > [Severity: High] > Can the hardcoded bit-shifts extract overlapping bits if the mesh dimensions > are smaller than 10x14? No, because this code path is specific to things which firmware claims to be the Graviton5 CMN, and as such it hard-codes the known topology of the Graviton5 CMN. We have to assume we can trust what firmware tells us about the platform, or we wouldn't even be able to boot. > If the system describes a smaller mesh (e.g., 8x8), arm_cmn_xyidbits > evaluates to 3. This would cause the hardcoded shifts (xp_id >> 7) and > (xp_id >> 3) to extract overlapping bits that inflate y (e.g., y = 15), > yielding a domain of 5. > > If a 14x10 mesh is used, the x coordinate can natively reach 13, yielding > a domain index of 4 ((13 / 5) + (9 / 7) * 2). > > When an unvalidated domain index >= 4 is assigned to dn->dtc, does it > cause an out-of-bounds write in arm_cmn_event_init()? No, because there is no such thing. The maximum possible number of DTC domains is 4. Even when correctly generated, the ID field in the register is only 2 bits: https://developer.arm.com/documentation/107858/0205/Programmers-model/Register-summary/MXP-register-summary/por-dtm-unit-info?lang=en > drivers/perf/arm-cmn.c:arm_cmn_event_init() { > ... > hw->dtc_idx[dn->dtc] = 0; > ... > } > > Since hw->dtc_idx is a fixed 4-byte array (s8 dtc_idx[CMN_MAX_DTCS]), an > index >= 4 would write past its bounds and corrupt adjacent fields like > hw->num_dns. > > [ ... ] > >> @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) >> >> switch (dn->type) { >> case CMN_TYPE_DTC: >> + if (graviton5_workaround) { >> + /* Node info logical ID is zeroed; use the XP's */ >> + dn->logid = xp->logid; >> + } > > [Severity: High] > Does overriding the DTC logical ID with the parent XP's logical ID cause > DTC nodes to be sorted out of domain order? > > In arm_cmn_init_dtcs(), CMN_TYPE_DTC nodes are sorted by their logical ID > (dn->logid), which determines their index in the dynamically allocated > cmn->dtc array. Standard CMN parts have their logid read from hardware as > the exact domain index (0-3). No, the DTC logical ID is read from the logical_id field in por_dt_node_info, and while the main HN-D DTC always has an ID of 0, additional HN-T sub-DTCs should normally inherit the logical ID of their parent XP. > Because the domain index here is a 2D block distribution, the physical XP IDs > do not monotonically increase with the domain index. For example, an XP at > x=7, y=0 belongs to domain 1 but has a smaller logid than an XP at > x=2, y=3 which belongs to domain 0. That is true, but since neither of those XPs has a DTC in this configuration, it is also entirely irrelevant. > If cmn->dtc[0] points to the hardware base of domain 1 instead of domain 0, > will events programmed for domain 0 be routed to domain 1's hardware, breaking > PMU event counting? > > Would it be better to explicitly set this as: > > dn->logid = arm_cmn_graviton5_dtc_domain(xp->id); No, because that would then be inconsistent with how the driver works normally on non-Graviton5 systems, and as Aviv already confirmed, it doesn't change the outcome in this case anyway. > [Severity: High] > This is a pre-existing issue, but does arm_cmn_event_init() leak memory > if initialization fails? > > drivers/perf/arm-cmn.c:arm_cmn_event_init() { > ... > if (arm_cmn_is_watchpoint(event)) > if (eventid != CMN_WP_UP && eventid != CMN_WP_DOWN) > return -EINVAL; > ... > return arm_cmn_validate_group(cmn, event); > } > > The driver dynamically allocates memory using arm_cmn_alloc_dtm_idx() and > arm_cmn_alloc_wp_idx(). If the initialization subsequently encounters an > error, such as an invalid eventid for a watchpoint or a failure in > arm_cmn_validate_group(), it returns an error code without freeing the > allocated memory. > > The core perf code perf_try_init_event() does not call the driver's > event->destroy() callback if pmu->event_init() fails, expecting the driver > to clean up its own resources. Could this allow memory leaks if invalid events > are repeatedly created? No, __free_event() invokes event->destroy(), and if perf_alloc_event() could fail without freeing the event then that would be a much greater memory leak anyway. Robin. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-06-01 17:45 ` Robin Murphy @ 2026-06-03 14:57 ` Aviv Bakal 0 siblings, 0 replies; 28+ messages in thread From: Aviv Bakal @ 2026-06-03 14:57 UTC (permalink / raw) To: robin.murphy; +Cc: avivb, linux-perf-users, sashiko-reviews Thanks for clarifying these, Robin. Much appreciated. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal 2026-05-31 11:32 ` sashiko-bot @ 2026-06-01 17:48 ` Robin Murphy 2026-06-02 5:55 ` Ilkka Koskinen 2 siblings, 0 replies; 28+ messages in thread From: Robin Murphy @ 2026-06-01 17:48 UTC (permalink / raw) To: Aviv Bakal, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof, ilkka On 2026-05-31 12:04 pm, Aviv Bakal wrote: > Graviton5 uses a customised CMN-S3 implementation where certain > discovery registers report zeroed fields. Add the following workarounds: > > - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant. > - Derive the DTC domain from the XP node ID, since the unit info > register reports it as zero. > - Set the DTC logical ID from the XP's logical ID, since the node info > register's logical ID field is also zeroed. Reviewed-by: Robin Murphy <robin.murphy@arm.com> > Signed-off-by: Aviv Bakal <avivb@amazon.com> > --- > drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > index f1978a53d1c1..3fb71d9a57eb 100644 > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c > @@ -31,7 +31,8 @@ > #define CMN_CHILD_NODE_ADDR GENMASK(29, 0) > #define CMN_CHILD_NODE_EXTERNAL BIT(31) > > -#define CMN_MAX_DIMENSION 12 > +/* Some implementations use a mesh larger than the architectural max of 12 */ > +#define CMN_MAX_DIMENSION 14 > #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) > #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4) > > @@ -214,6 +215,8 @@ enum cmn_part { > PART_CMN700 = 0x43c, > PART_CI700 = 0x43a, > PART_CMN_S3 = 0x43e, > + /* Synthetic part number, overridden to PART_CMN_S3 during discovery */ > + PART_GRAVITON5 = 0xa5, > }; > > /* CMN-600 r0px shouldn't exist in silicon, thankfully */ > @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg > return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); > } > > +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) > +{ > + unsigned int x = (xp_id >> 7) & 0xf; > + unsigned int y = (xp_id >> 3) & 0xf; > + > + /* > + * The unit info register reads as zero; derive the DTC domain from > + * the XP's mesh coordinates over the 10x14 mesh. > + */ > + return (x / 5) + (y / 7) * 2; > +} > + > static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node) > { > int level; > @@ -2295,6 +2310,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > u64 reg; > int i, j; > size_t sz; > + bool graviton5_workaround = false; > > arm_cmn_init_node_info(cmn, rgn_offset, &cfg); > if (cfg.type != CMN_TYPE_CFG) > @@ -2305,6 +2321,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01); > part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg); > part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8; > + > + /* Graviton5 has a customised CMN-S3 which needs some fixups */ > + if (cmn->part == PART_GRAVITON5) { > + cmn->part = PART_CMN_S3; > + graviton5_workaround = true; > + } > + > /* 600AE is close enough that it's not really worth more complexity */ > if (part == PART_CMN600AE) > part = PART_CMN600; > @@ -2394,6 +2417,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > > if (cmn->part == PART_CMN600) > xp->dtc = -1; > + else if (graviton5_workaround) > + xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id); > else > xp->dtc = arm_cmn_dtc_domain(cmn, xp_region); > > @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > > switch (dn->type) { > case CMN_TYPE_DTC: > + if (graviton5_workaround) { > + /* Node info logical ID is zeroed; use the XP's */ > + dn->logid = xp->logid; > + } > cmn->num_dtcs++; > dn++; > break; > @@ -2687,6 +2716,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = { > { "ARMHC650" }, > { "ARMHC700" }, > { "ARMHC003" }, > + { "AMZN0070", PART_GRAVITON5 }, > {} > }; > MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match); ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal 2026-05-31 11:32 ` sashiko-bot 2026-06-01 17:48 ` Robin Murphy @ 2026-06-02 5:55 ` Ilkka Koskinen 2 siblings, 0 replies; 28+ messages in thread From: Ilkka Koskinen @ 2026-06-02 5:55 UTC (permalink / raw) To: Aviv Bakal Cc: robin.murphy, will, mark.rutland, linux-arm-kernel, linux-perf-users, linux-kernel, zeev, blakgeof, ilkka On Sun, 31 May 2026, Aviv Bakal wrote: > Graviton5 uses a customised CMN-S3 implementation where certain > discovery registers report zeroed fields. Add the following workarounds: > > - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant. > - Derive the DTC domain from the XP node ID, since the unit info > register reports it as zero. > - Set the DTC logical ID from the XP's logical ID, since the node info > register's logical ID field is also zeroed. > > Signed-off-by: Aviv Bakal <avivb@amazon.com> Thanks for the patch. It seems surprisingly clean approach. And thanks to Robin for answering to Sashiko bot! The patch looks good to me, Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cheers, Ilkka > --- > drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > index f1978a53d1c1..3fb71d9a57eb 100644 > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c > @@ -31,7 +31,8 @@ > #define CMN_CHILD_NODE_ADDR GENMASK(29, 0) > #define CMN_CHILD_NODE_EXTERNAL BIT(31) > > -#define CMN_MAX_DIMENSION 12 > +/* Some implementations use a mesh larger than the architectural max of 12 */ > +#define CMN_MAX_DIMENSION 14 > #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) > #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4) > > @@ -214,6 +215,8 @@ enum cmn_part { > PART_CMN700 = 0x43c, > PART_CI700 = 0x43a, > PART_CMN_S3 = 0x43e, > + /* Synthetic part number, overridden to PART_CMN_S3 during discovery */ > + PART_GRAVITON5 = 0xa5, > }; > > /* CMN-600 r0px shouldn't exist in silicon, thankfully */ > @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg > return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); > } > > +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) > +{ > + unsigned int x = (xp_id >> 7) & 0xf; > + unsigned int y = (xp_id >> 3) & 0xf; > + > + /* > + * The unit info register reads as zero; derive the DTC domain from > + * the XP's mesh coordinates over the 10x14 mesh. > + */ > + return (x / 5) + (y / 7) * 2; > +} > + > static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node) > { > int level; > @@ -2295,6 +2310,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > u64 reg; > int i, j; > size_t sz; > + bool graviton5_workaround = false; > > arm_cmn_init_node_info(cmn, rgn_offset, &cfg); > if (cfg.type != CMN_TYPE_CFG) > @@ -2305,6 +2321,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01); > part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg); > part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8; > + > + /* Graviton5 has a customised CMN-S3 which needs some fixups */ > + if (cmn->part == PART_GRAVITON5) { > + cmn->part = PART_CMN_S3; > + graviton5_workaround = true; > + } > + > /* 600AE is close enough that it's not really worth more complexity */ > if (part == PART_CMN600AE) > part = PART_CMN600; > @@ -2394,6 +2417,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > > if (cmn->part == PART_CMN600) > xp->dtc = -1; > + else if (graviton5_workaround) > + xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id); > else > xp->dtc = arm_cmn_dtc_domain(cmn, xp_region); > > @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > > switch (dn->type) { > case CMN_TYPE_DTC: > + if (graviton5_workaround) { > + /* Node info logical ID is zeroed; use the XP's */ > + dn->logid = xp->logid; > + } > cmn->num_dtcs++; > dn++; > break; > @@ -2687,6 +2716,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = { > { "ARMHC650" }, > { "ARMHC700" }, > { "ARMHC003" }, > + { "AMZN0070", PART_GRAVITON5 }, > {} > }; > MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match); > -- > 2.47.3 > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v5 0/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal 2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal 2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal @ 2026-06-03 15:00 ` Aviv Bakal 2026-06-03 15:00 ` [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal 2026-06-03 15:00 ` [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal 2 siblings, 2 replies; 28+ messages in thread From: Aviv Bakal @ 2026-06-03 15:00 UTC (permalink / raw) To: robin.murphy, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev, blakgeof, ilkka This series adds support for Graviton5's customised CMN-S3 which has zeroed discovery registers. Changes since v4: - Fix patch 1 authorship to correctly attribute Robin as author - Collect Reviewed-by tags from Robin and Ilkka Aviv Bakal (1): perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Robin Murphy (1): perf/arm-cmn: Move DTM index data out of hw_perf_event drivers/perf/arm-cmn.c | 121 ++++++++++++++++++++++++++++++----------- 1 file changed, 90 insertions(+), 31 deletions(-) -- 2.47.3 ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event 2026-06-03 15:00 ` [PATCH v5 0/2] " Aviv Bakal @ 2026-06-03 15:00 ` Aviv Bakal 2026-06-03 15:21 ` sashiko-bot 2026-06-03 15:00 ` [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal 1 sibling, 1 reply; 28+ messages in thread From: Aviv Bakal @ 2026-06-03 15:00 UTC (permalink / raw) To: robin.murphy, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev, blakgeof, ilkka From: Robin Murphy <robin.murphy@arm.com> The amount of data we need to store all the per-DTM counter and watchpoint allocations is already testing the limits of hw_perf_event, and future CMNs are only likely to keep growing larger, so move these arrays out to separate memory allocations. As part of that we can use an explicit union for allocating cycle counters to dtc_cycles events, which is arguably nicer anyway. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Aviv Bakal <avivb@amazon.com> Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> --- drivers/perf/arm-cmn.c | 89 ++++++++++++++++++++++++++++-------------- 1 file changed, 59 insertions(+), 30 deletions(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index f5305c8fdca4..f1978a53d1c1 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -597,17 +597,14 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {} struct arm_cmn_hw_event { struct arm_cmn_node *dn; - u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; + union { + u64 *dtm_idx; + int cc_idx; + }; + unsigned long *wp_idx; s8 dtc_idx[CMN_MAX_DTCS]; u8 num_dns; u8 dtm_offset; - - /* - * WP config registers are divided to UP and DOWN events. We need to - * keep to track only one of them. - */ - DECLARE_BITMAP(wp_idx, CMN_MAX_XPS); - bool wide_sel; enum cmn_filter_select filter_sel; }; @@ -625,25 +622,42 @@ static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event) return (struct arm_cmn_hw_event *)&event->hw; } -static void arm_cmn_set_index(u64 x[], unsigned int pos, unsigned int val) +static void arm_cmn_set_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos, unsigned int val) { - x[pos / 32] |= (u64)val << ((pos % 32) * 2); + hw->dtm_idx[pos / 32] |= (u64)val << ((pos % 32) * 2); } -static unsigned int arm_cmn_get_index(u64 x[], unsigned int pos) +static unsigned int arm_cmn_get_dtm_idx(struct arm_cmn_hw_event *hw, unsigned int pos) { - return (x[pos / 32] >> ((pos % 32) * 2)) & 3; + return (hw->dtm_idx[pos / 32] >> ((pos % 32) * 2)) & 3; } -static void arm_cmn_set_wp_idx(unsigned long *wp_idx, unsigned int pos, bool val) +static u64 *arm_cmn_alloc_dtm_idx(void) +{ + return kzalloc(DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64), GFP_KERNEL); +} + +static void arm_cmn_set_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos, bool val) { if (val) - set_bit(pos, wp_idx); + set_bit(pos, hw->wp_idx); +} + +static unsigned int arm_cmn_get_wp_idx(struct arm_cmn_hw_event *hw, unsigned int pos) +{ + return test_bit(pos, hw->wp_idx); +} + +static unsigned long *arm_cmn_alloc_wp_idx(void) +{ + return bitmap_zalloc(CMN_MAX_XPS, GFP_KERNEL); } -static unsigned int arm_cmn_get_wp_idx(unsigned long *wp_idx, unsigned int pos) +static void arm_cmn_clear_idx(struct arm_cmn_hw_event *hw) { - return test_bit(pos, wp_idx); + memset(hw->dtm_idx, 0, DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64) * sizeof(u64)); + if (hw->wp_idx) + bitmap_zero(hw->wp_idx, CMN_MAX_XPS); } struct arm_cmn_event_attr { @@ -1376,7 +1390,7 @@ static int arm_cmn_get_assigned_wp_idx(struct perf_event *event, struct arm_cmn_hw_event *hw, unsigned int pos) { - return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw->wp_idx, pos); + return CMN_EVENT_EVENTID(event) + arm_cmn_get_wp_idx(hw, pos); } static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm, @@ -1387,7 +1401,7 @@ static void arm_cmn_claim_wp_idx(struct arm_cmn_dtm *dtm, struct arm_cmn_hw_event *hw = to_cmn_hw(event); dtm->wp_event[wp_idx] = hw->dtc_idx[dtc]; - arm_cmn_set_wp_idx(hw->wp_idx, pos, wp_idx - CMN_EVENT_EVENTID(event)); + arm_cmn_set_wp_idx(hw, pos, wp_idx - CMN_EVENT_EVENTID(event)); } static u32 arm_cmn_wp_config(struct perf_event *event, int wp_idx) @@ -1458,7 +1472,7 @@ static u64 arm_cmn_read_dtm(struct arm_cmn *cmn, struct arm_cmn_hw_event *hw, dtm = &cmn->dtms[dn->dtm] + hw->dtm_offset; reg = readq_relaxed(dtm->base + offset); } - dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + dtm_idx = arm_cmn_get_dtm_idx(hw, i); count += (u16)(reg >> (dtm_idx * 16)); } return count; @@ -1505,7 +1519,7 @@ static void arm_cmn_event_read(struct perf_event *event) unsigned long flags; if (CMN_EVENT_TYPE(event) == CMN_TYPE_DTC) { - delta = arm_cmn_read_cc(cmn->dtc + hw->dtc_idx[0]); + delta = arm_cmn_read_cc(cmn->dtc + hw->cc_idx); local64_add(delta, &event->count); return; } @@ -1572,7 +1586,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags) int i; if (type == CMN_TYPE_DTC) { - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx; writel_relaxed(CMN_DT_DTC_CTL_DT_EN | CMN_DT_DTC_CTL_CG_DISABLE, dtc->base + CMN_DT_DTC_CTL); @@ -1590,7 +1604,7 @@ static void arm_cmn_event_start(struct perf_event *event, int flags) writeq_relaxed(mask, base + CMN_DTM_WPn_MASK(wp_idx)); } } else for_each_hw_dn(hw, dn, i) { - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + int dtm_idx = arm_cmn_get_dtm_idx(hw, i); arm_cmn_set_event_sel_lo(dn, dtm_idx, CMN_EVENT_EVENTID(event), hw->wide_sel); @@ -1606,7 +1620,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags) int i; if (type == CMN_TYPE_DTC) { - struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; + struct arm_cmn_dtc *dtc = cmn->dtc + hw->cc_idx; dtc->cc_active = false; writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL); @@ -1619,7 +1633,7 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags) writeq_relaxed(~0ULL, base + CMN_DTM_WPn_VAL(wp_idx)); } } else for_each_hw_dn(hw, dn, i) { - int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + int dtm_idx = arm_cmn_get_dtm_idx(hw, i); arm_cmn_set_event_sel_lo(dn, dtm_idx, 0, hw->wide_sel); } @@ -1763,6 +1777,14 @@ static enum cmn_filter_select arm_cmn_filter_sel(const struct arm_cmn *cmn, } +static void arm_cmn_event_destroy(struct perf_event *event) +{ + struct arm_cmn_hw_event *hw = to_cmn_hw(event); + + kfree(hw->dtm_idx); + bitmap_free(hw->wp_idx); +} + static int arm_cmn_event_init(struct perf_event *event) { struct arm_cmn *cmn = to_cmn(event->pmu); @@ -1787,6 +1809,11 @@ static int arm_cmn_event_init(struct perf_event *event) if (type == CMN_TYPE_DTC) return arm_cmn_validate_group(cmn, event); + event->destroy = arm_cmn_event_destroy; + hw->dtm_idx = arm_cmn_alloc_dtm_idx(); + if (!hw->dtm_idx) + return -ENOMEM; + eventid = CMN_EVENT_EVENTID(event); /* For watchpoints we need the actual XP node here */ if (type == CMN_TYPE_WP) { @@ -1797,6 +1824,9 @@ static int arm_cmn_event_init(struct perf_event *event) /* ...but the DTM may depend on which port we're watching */ if (cmn->multi_dtm) hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2; + hw->wp_idx = arm_cmn_alloc_wp_idx(); + if (!hw->wp_idx) + return -ENOMEM; } else if (type == CMN_TYPE_XP && (cmn->part == PART_CMN700 || cmn->part == PART_CMN_S3)) { hw->wide_sel = true; @@ -1847,7 +1877,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event, while (i--) { struct arm_cmn_dtm *dtm = &cmn->dtms[hw->dn[i].dtm] + hw->dtm_offset; - unsigned int dtm_idx = arm_cmn_get_index(hw->dtm_idx, i); + unsigned int dtm_idx = arm_cmn_get_dtm_idx(hw, i); if (type == CMN_TYPE_WP) { int wp_idx = arm_cmn_get_assigned_wp_idx(event, hw, i); @@ -1861,8 +1891,7 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn, struct perf_event *event, dtm->pmu_config_low &= ~CMN__PMEVCNT_PAIRED(dtm_idx); writel_relaxed(dtm->pmu_config_low, dtm->base + CMN_DTM_PMU_CONFIG); } - memset(hw->dtm_idx, 0, sizeof(hw->dtm_idx)); - memset(hw->wp_idx, 0, sizeof(hw->wp_idx)); + arm_cmn_clear_idx(hw); for_each_hw_dtc_idx(hw, j, idx) cmn->dtc[j].counters[idx] = NULL; @@ -1882,7 +1911,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags) return -ENOSPC; cmn->dtc[i].cycles = event; - hw->dtc_idx[0] = i; + hw->cc_idx = i; if (flags & PERF_EF_START) arm_cmn_event_start(event, 0); @@ -1947,7 +1976,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags) goto free_dtms; } - arm_cmn_set_index(hw->dtm_idx, i, dtm_idx); + arm_cmn_set_dtm_idx(hw, i, dtm_idx); dtm->input_sel[dtm_idx] = input_sel; shift = CMN__PMEVCNTn_GLOBAL_NUM_SHIFT(dtm_idx); @@ -1980,7 +2009,7 @@ static void arm_cmn_event_del(struct perf_event *event, int flags) arm_cmn_event_stop(event, PERF_EF_UPDATE); if (type == CMN_TYPE_DTC) - cmn->dtc[hw->dtc_idx[0]].cycles = NULL; + cmn->dtc[hw->cc_idx].cycles = NULL; else arm_cmn_event_clear(cmn, event, hw->num_dns); } -- 2.47.3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event 2026-06-03 15:00 ` [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal @ 2026-06-03 15:21 ` sashiko-bot 0 siblings, 0 replies; 28+ messages in thread From: sashiko-bot @ 2026-06-03 15:21 UTC (permalink / raw) To: Aviv Bakal; +Cc: linux-perf-users Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: Pre-existing issues: - [Critical] Type confusion in `arm_cmn_validate_group` allows arbitrary kernel memory read and out-of-bounds heap write via mixed PMU event groups. -- commit 5cf0bca45fce2ef5c66d58f875b8d4b3f7c015e0 Author: Robin Murphy <robin.murphy@arm.com> perf/arm-cmn: Move DTM index data out of hw_perf_event This commit moves the DTM index arrays and watchpoint tracking out of hw_perf_event into separate dynamically allocated memory structures. This change is intended to accommodate growing CMN configurations without exceeding the limits of hw_perf_event. > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c > index f5305c8fdca43..f1978a53d1c19 100644 > --- a/drivers/perf/arm-cmn.c > +++ b/drivers/perf/arm-cmn.c [ ... ] > @@ -1787,6 +1809,11 @@ static int arm_cmn_event_init(struct perf_event *event) > if (type == CMN_TYPE_DTC) > return arm_cmn_validate_group(cmn, event); > [Severity: Critical] This isn't a bug introduced by this patch, but could this lead to a type confusion and out-of-bounds write when validating groups with mixed PMU events? When arm_cmn_validate_group() iterates over the group's siblings, it doesn't verify that sibling->pmu == event->pmu. If a sibling is a hardware event from a different PMU, it passes it to arm_cmn_val_add_event(): drivers/perf/arm-cmn.c:arm_cmn_validate_group() { ... for_each_sibling_event(sibling, leader) arm_cmn_val_add_event(cmn, val, sibling); ... } Inside arm_cmn_val_add_event(), it casts the sibling's hw_perf_event to an arm_cmn_hw_event. Since userspace controls attr.config, it can overlap with the hw->dn pointer, allowing arbitrary kernel memory reads: drivers/perf/arm-cmn.c:arm_cmn_val_add_event() { ... for_each_hw_dn(hw, dn, i) { int wp_idx, dtm = dn->dtm, sel = hw->filter_sel; val->dtm_count[dtm]++; ... } Because the read byte dtm can be up to 255 and the array bounds are CMN_MAX_DTMS (188), does this enable a controllable out-of-bounds write on the kernel heap? > + event->destroy = arm_cmn_event_destroy; > + hw->dtm_idx = arm_cmn_alloc_dtm_idx(); > + if (!hw->dtm_idx) > + return -ENOMEM; > + -- Sashiko AI review · https://sashiko.dev/#/patchset/20260603150025.30980-1-avivb@amazon.com?part=1 ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-06-03 15:00 ` [PATCH v5 0/2] " Aviv Bakal 2026-06-03 15:00 ` [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal @ 2026-06-03 15:00 ` Aviv Bakal 2026-06-03 15:53 ` sashiko-bot 1 sibling, 1 reply; 28+ messages in thread From: Aviv Bakal @ 2026-06-03 15:00 UTC (permalink / raw) To: robin.murphy, will, mark.rutland Cc: linux-arm-kernel, linux-perf-users, linux-kernel, avivb, zeev, blakgeof, ilkka Graviton5 uses a customised CMN-S3 implementation where certain discovery registers report zeroed fields. Add the following workarounds: - Introduce a dedicated ACPI HID to identify the Graviton5 CMN variant. - Derive the DTC domain from the XP node ID, since the unit info register reports it as zero. - Set the DTC logical ID from the XP's logical ID, since the node info register's logical ID field is also zeroed. Signed-off-by: Aviv Bakal <avivb@amazon.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> --- drivers/perf/arm-cmn.c | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index f1978a53d1c1..3fb71d9a57eb 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -31,7 +31,8 @@ #define CMN_CHILD_NODE_ADDR GENMASK(29, 0) #define CMN_CHILD_NODE_EXTERNAL BIT(31) -#define CMN_MAX_DIMENSION 12 +/* Some implementations use a mesh larger than the architectural max of 12 */ +#define CMN_MAX_DIMENSION 14 #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4) @@ -214,6 +215,8 @@ enum cmn_part { PART_CMN700 = 0x43c, PART_CI700 = 0x43a, PART_CMN_S3 = 0x43e, + /* Synthetic part number, overridden to PART_CMN_S3 during discovery */ + PART_GRAVITON5 = 0xa5, }; /* CMN-600 r0px shouldn't exist in silicon, thankfully */ @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); } +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) +{ + unsigned int x = (xp_id >> 7) & 0xf; + unsigned int y = (xp_id >> 3) & 0xf; + + /* + * The unit info register reads as zero; derive the DTC domain from + * the XP's mesh coordinates over the 10x14 mesh. + */ + return (x / 5) + (y / 7) * 2; +} + static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_cmn_node *node) { int level; @@ -2295,6 +2310,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) u64 reg; int i, j; size_t sz; + bool graviton5_workaround = false; arm_cmn_init_node_info(cmn, rgn_offset, &cfg); if (cfg.type != CMN_TYPE_CFG) @@ -2305,6 +2321,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01); part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg); part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8; + + /* Graviton5 has a customised CMN-S3 which needs some fixups */ + if (cmn->part == PART_GRAVITON5) { + cmn->part = PART_CMN_S3; + graviton5_workaround = true; + } + /* 600AE is close enough that it's not really worth more complexity */ if (part == PART_CMN600AE) part = PART_CMN600; @@ -2394,6 +2417,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) if (cmn->part == PART_CMN600) xp->dtc = -1; + else if (graviton5_workaround) + xp->dtc = arm_cmn_graviton5_dtc_domain(xp->id); else xp->dtc = arm_cmn_dtc_domain(cmn, xp_region); @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) switch (dn->type) { case CMN_TYPE_DTC: + if (graviton5_workaround) { + /* Node info logical ID is zeroed; use the XP's */ + dn->logid = xp->logid; + } cmn->num_dtcs++; dn++; break; @@ -2687,6 +2716,7 @@ static const struct acpi_device_id arm_cmn_acpi_match[] = { { "ARMHC650" }, { "ARMHC700" }, { "ARMHC003" }, + { "AMZN0070", PART_GRAVITON5 }, {} }; MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match); -- 2.47.3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 2026-06-03 15:00 ` [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal @ 2026-06-03 15:53 ` sashiko-bot 0 siblings, 0 replies; 28+ messages in thread From: sashiko-bot @ 2026-06-03 15:53 UTC (permalink / raw) To: Aviv Bakal; +Cc: linux-perf-users Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider: New issues: - [High] The `arm_cmn_graviton5_dtc_domain` function computes a DTC domain index without upper bounds, which can exceed the `CMN_MAX_DTCS` limit and cause memory corruption. - [High] The Graviton5 workaround assigns the XP's logical ID to the DTC node's logical ID, breaking the domain-based sorting order and causing DTC resources to be assigned incorrectly. Pre-existing issues: - [High] Lack of bounds checking on `cmn->num_dtcs` allows a `memset()` call to overflow the `dtc_idx` array. -- commit fd0610a07c973430036fe29d5eab5bfbb5c0eb4c Author: Aviv Bakal <avivb@amazon.com> perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Graviton5 uses a customised CMN-S3 implementation where certain discovery registers report zeroed fields. This patch introduces a dedicated ACPI HID, derives the DTC domain from the XP node ID, and sets the DTC logical ID from the XP's logical ID. diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index f1978a53d1c19..3fb71d9a57eb5 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c [ ... ] > @@ -2250,6 +2253,18 @@ static unsigned int arm_cmn_dtc_domain(struct arm_cmn *cmn, void __iomem *xp_reg > return FIELD_GET(CMN_DTM_UNIT_INFO_DTC_DOMAIN, readl_relaxed(xp_region + offset)); > } > > +static unsigned int arm_cmn_graviton5_dtc_domain(u16 xp_id) > +{ > + unsigned int x = (xp_id >> 7) & 0xf; > + unsigned int y = (xp_id >> 3) & 0xf; > + > + /* > + * The unit info register reads as zero; derive the DTC domain from > + * the XP's mesh coordinates over the 10x14 mesh. > + */ > + return (x / 5) + (y / 7) * 2; > +} [Severity: High] Can this calculation exceed the CMN_MAX_DTCS limit? If a compromised or malformed firmware reports an XP node ID with coordinate components up to 15 (e.g., x=15, y=15), this calculation yields a domain index of 7. The domain index is later inherited by the DTC node (dn->dtc) and used directly as an array index in arm_cmn_event_init(): hw->dtc_idx[dn->dtc] = 0; Since the hw->dtc_idx array has a fixed size of CMN_MAX_DTCS (4), an index of 4 or higher would overwrite adjacent struct fields like num_dns, leading to memory corruption. [ ... ] > @@ -2472,6 +2497,10 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) > > switch (dn->type) { > case CMN_TYPE_DTC: > + if (graviton5_workaround) { > + /* Node info logical ID is zeroed; use the XP's */ > + dn->logid = xp->logid; > + } [Severity: High] Does setting dn->logid to xp->logid break the domain-based sorting order for DTC resources? The CMN driver sequentially populates the cmn->dtc array based on the sorted order of DTC nodes: drivers/perf/arm-cmn.c:arm_cmn_init_dtcs() { ... int err = arm_cmn_init_dtc(cmn, dn, dtc_idx++); ... } Standard hardware sets a DTC node's logid to its domain ID (0..3), ensuring the array is populated in domain order. However, XP logical IDs do not monotonically increase with domain IDs (e.g., an XP in Domain 1 at X=5, Y=0 has a lower logid than an XP in Domain 0 at X=0, Y=1). Will this cause the DTC nodes to sort out of order, misaligning the cmn->dtc array and causing PMU events to be programmed into the wrong hardware domain counters? Would it be safer to use dn->logid = xp->dtc? > cmn->num_dtcs++; [Severity: High] This is a pre-existing issue, but does cmn->num_dtcs lack bounds checking? During hardware discovery, the driver increments cmn->num_dtcs for every CMN_TYPE_DTC node reported by the firmware, without capping it at CMN_MAX_DTCS (4). Later in arm_cmn_event_init(), the driver uses this unbounded value: memset(hw->dtc_idx, 0, cmn->num_dtcs); Since hw->dtc_idx is a 4-byte array, if the firmware provides more than 4 DTC nodes, will the memset overflow the array and corrupt the arm_cmn_hw_event struct? > dn++; > break; -- Sashiko AI review · https://sashiko.dev/#/patchset/20260603150025.30980-1-avivb@amazon.com?part=2 ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2026-06-03 15:53 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-03 15:51 [PATCH] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal 2026-05-03 16:20 ` sashiko-bot 2026-05-04 13:37 ` Aviv Bakal 2026-05-04 13:39 ` [PATCH v2] " Aviv Bakal 2026-05-04 19:38 ` sashiko-bot 2026-05-05 2:31 ` kernel test robot 2026-05-21 16:02 ` Robin Murphy 2026-05-24 15:38 ` [PATCH v3 0/2] " Aviv Bakal 2026-05-24 15:38 ` [PATCH v3 1/2] perf/arm-cmn: Move struct arm_cmn_hw_event into struct hw_perf_event Aviv Bakal 2026-05-24 16:13 ` sashiko-bot 2026-05-29 16:44 ` Robin Murphy 2026-05-24 15:38 ` [PATCH v3 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal 2026-05-24 16:40 ` sashiko-bot 2026-05-31 11:04 ` [PATCH v4 0/2] " Aviv Bakal 2026-05-31 11:04 ` [PATCH v4 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal 2026-06-01 18:23 ` Robin Murphy 2026-06-02 5:49 ` Ilkka Koskinen 2026-05-31 11:04 ` [PATCH v4 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal 2026-05-31 11:32 ` sashiko-bot 2026-06-01 17:45 ` Robin Murphy 2026-06-03 14:57 ` Aviv Bakal 2026-06-01 17:48 ` Robin Murphy 2026-06-02 5:55 ` Ilkka Koskinen 2026-06-03 15:00 ` [PATCH v5 0/2] " Aviv Bakal 2026-06-03 15:00 ` [PATCH v5 1/2] perf/arm-cmn: Move DTM index data out of hw_perf_event Aviv Bakal 2026-06-03 15:21 ` sashiko-bot 2026-06-03 15:00 ` [PATCH v5 2/2] perf/arm-cmn: Add workarounds for CMN-S3 on Graviton5 Aviv Bakal 2026-06-03 15:53 ` sashiko-bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox