* [PATCH v3 0/3] perf: marvell: LLC-TAD PMU MPAM filtering support
@ 2026-06-16 7:11 Geetha sowjanya
2026-06-16 7:11 ` [PATCH v3 1/3] perf: marvell: Add MPAM partid filtering to CN10K TAD PMU Geetha sowjanya
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Geetha sowjanya @ 2026-06-16 7:11 UTC (permalink / raw)
To: linux-perf-users, linux-kernel, linux-arm-kernel, devicetree
Cc: mark.rutland, will, krzk+dt, gakula
This series extends the Marvell LLC-TAD performance driver used on CN10K
and CN20K systems.
Patch 1 adds optional MPAM partition-id filtering for the subset of TAD
events that support it, exposes partid / partid_en in the PMU format string,
and keeps the reduced Odyssey event surface without advertising partid where
it does not apply. It also fixes probe resource handling (no in-place
mutation of platform_get_resource() bounds, validate MMIO window vs
tad-cnt), orders perf registration vs hotplug with unwind, and aligns the
filter-enable bit in config1 with the sysfs format (bit 9).
Patch 2 introduces CN20K LLC-TAD support: non-standard PFC/PRF offsets,
additional programmable events with visibility checks so CN10K does not
advertise V3-only events, CN20K-specific MPAM encoding for the V3 profile,
local64_set(prev_count) on counter start, and device discovery via OF and
ACPI.
Patch 3 extends the DeviceTree binding for marvell,cn20k-tad-pmu.
Changes since v2
----------------
- Validate the eventId using an appropriate mask to ensure it is restricted to 8 bits.
Changes since v1
----------------
- config1: use bit 9 for MPAM filter enable consistently with partid_en in
the PMU format; allow only bits 0..9 in event_init on CN10K/CN20K paths.
- Reject reserved bits in attr.config and use the same 8-bit event index in
start_counter as in event_init so MPAM validation cannot be bypassed.
- Hide V3-only sysfs events on V1.
- Reset prev_count when starting counters after clearing hardware.
- DT binding: explain non-fallback compatibles for CN10K vs CN20K.
Tanmay Jagdale (1):
perf: marvell: Add MPAM partid filtering to CN10K TAD PMU
Geetha sowjanya (2):
perf: marvell: Add CN20K LLC-TAD PMU support
dt-bindings: perf: marvell: Extend CN10K TAD PMU binding for CN20K
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
--
2.25.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/3] perf: marvell: Add MPAM partid filtering to CN10K TAD PMU
2026-06-16 7:11 [PATCH v3 0/3] perf: marvell: LLC-TAD PMU MPAM filtering support Geetha sowjanya
@ 2026-06-16 7:11 ` Geetha sowjanya
2026-06-16 7:11 ` [PATCH v3 2/3] perf: marvell: Add CN20K LLC-TAD PMU support Geetha sowjanya
2026-06-16 7:11 ` [PATCH v3 3/3] dt-bindings: perf: marvell: add CN20K TAD " Geetha sowjanya
2 siblings, 0 replies; 5+ messages in thread
From: Geetha sowjanya @ 2026-06-16 7:11 UTC (permalink / raw)
To: linux-perf-users, linux-kernel, linux-arm-kernel, devicetree
Cc: mark.rutland, will, krzk+dt, gakula
From: Tanmay Jagdale <tanmay@marvell.com>
The TAD PMU exposes counters that can be filtered by MPAM partition id
for a subset of allocation and hit events.
Add a 9-bit partid format attribute (config1) and route counter programming
through variant-specific ops so CN10K keeps MPAM-capable programming while
Odyssey keeps the reduced event set without advertising partid in sysfs.
Probe no longer mutates the platform_device MMIO resource (walk a local
map_start), rejects tad-cnt / page sizes of zero, validates the memory
window against tad-cnt, and registers the perf PMU before hotplug with
correct unwind.
Example:
perf stat -e tad/tad_alloc_any,partid=0x12,partid_en=1/ -- <program>
Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
---
Changelog (since v2)
--------------------
- Validate the eventId using an appropriate mask to ensure
it is restricted to 8 bits
Changelog (since v1)
--------------------
- Fix config1 filter enable to use bit 9 consistently with the PMU format
string (partid_en) and reject reserved bits with GENMASK(9, 0).
- Register perf_pmu_register before cpuhp_state_add_instance_nocalls and
unregister on hotplug failure.
drivers/perf/marvell_cn10k_tad_pmu.c | 216 ++++++++++++++++++++-------
1 file changed, 164 insertions(+), 52 deletions(-)
diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 51ccb0befa05..69a6648fa664 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -7,6 +7,7 @@
#define pr_fmt(fmt) "tad_pmu: " fmt
#include <linux/io.h>
+#include <linux/bits.h>
#include <linux/module.h>
#include <linux/of.h>
#include <linux/cpuhotplug.h>
@@ -14,12 +15,20 @@
#include <linux/platform_device.h>
#include <linux/acpi.h>
-#define TAD_PFC_OFFSET 0x800
-#define TAD_PFC(counter) (TAD_PFC_OFFSET | (counter << 3))
#define TAD_PRF_OFFSET 0x900
-#define TAD_PRF(counter) (TAD_PRF_OFFSET | (counter << 3))
+#define TAD_PFC_OFFSET 0x800
+#define TAD_PFC(base, counter) ((base) | ((u64)(counter) << 3))
+#define TAD_PRF(base, counter) ((base) | ((u64)(counter) << 3))
#define TAD_PRF_CNTSEL_MASK 0xFF
+#define TAD_PRF_MATCH_PARTID BIT(8)
+#define TAD_PRF_PARTID_NS BIT(10)
+/*
+ * config1: bits 0..8 MPAM partition id (including 0); bit 9 requests
+ * filtering for MPAM-capable events. All-zero config1 means no filter.
+ */
+#define TAD_PARTID_FILTER_EN BIT(9)
#define TAD_MAX_COUNTERS 8
+#define TAD_EVENT_SEL_MASK GENMASK(7, 0)
#define to_tad_pmu(p) (container_of(p, struct tad_pmu, pmu))
@@ -27,30 +36,92 @@ struct tad_region {
void __iomem *base;
};
+enum mrvl_tad_pmu_version {
+ TAD_PMU_V1 = 1,
+ TAD_PMU_V2,
+};
+
+struct tad_pmu_data {
+ int id;
+ u64 tad_prf_offset;
+ u64 tad_pfc_offset;
+};
+
struct tad_pmu {
struct pmu pmu;
struct tad_region *regions;
u32 region_cnt;
unsigned int cpu;
+ const struct tad_pmu_ops *ops;
+ const struct tad_pmu_data *pdata;
struct hlist_node node;
struct perf_event *events[TAD_MAX_COUNTERS];
DECLARE_BITMAP(counters_map, TAD_MAX_COUNTERS);
};
-enum mrvl_tad_pmu_version {
- TAD_PMU_V1 = 1,
- TAD_PMU_V2,
-};
-
-struct tad_pmu_data {
- int id;
+struct tad_pmu_ops {
+ void (*start_counter)(struct tad_pmu *pmu, struct perf_event *event);
};
static int tad_pmu_cpuhp_state;
+static void tad_pmu_start_counter(struct tad_pmu *pmu,
+ struct perf_event *event)
+{
+ const struct tad_pmu_data *pdata = pmu->pdata;
+ struct hw_perf_event *hwc = &event->hw;
+ u32 event_idx = (u32)(event->attr.config & TAD_EVENT_SEL_MASK);
+ u32 counter_idx = hwc->idx;
+ u64 partid_filter = 0;
+ u64 reg_val;
+ u64 cfg1 = event->attr.config1;
+ bool use_mpam = cfg1 & TAD_PARTID_FILTER_EN;
+ u32 partid = (u32)(cfg1 & GENMASK(8, 0));
+ int i;
+
+ for (i = 0; i < pmu->region_cnt; i++)
+ writeq_relaxed(0, pmu->regions[i].base +
+ TAD_PFC(pdata->tad_pfc_offset, counter_idx));
+
+ if (use_mpam && event_idx > 0x19 && event_idx < 0x21) {
+ partid_filter = TAD_PRF_MATCH_PARTID | TAD_PRF_PARTID_NS |
+ ((u64)partid << 11);
+ }
+
+
+ for (i = 0; i < pmu->region_cnt; i++) {
+ reg_val = event_idx & 0xFF;
+ reg_val |= partid_filter;
+ writeq_relaxed(reg_val, pmu->regions[i].base +
+ TAD_PRF(pdata->tad_prf_offset, counter_idx));
+ }
+}
+
+static void tad_pmu_v2_start_counter(struct tad_pmu *pmu,
+ struct perf_event *event)
+{
+ const struct tad_pmu_data *pdata = pmu->pdata;
+ struct hw_perf_event *hwc = &event->hw;
+ u32 event_idx = (u32)(event->attr.config & TAD_EVENT_SEL_MASK);
+ u32 counter_idx = hwc->idx;
+ u64 reg_val;
+ int i;
+
+ for (i = 0; i < pmu->region_cnt; i++)
+ writeq_relaxed(0, pmu->regions[i].base +
+ TAD_PFC(pdata->tad_pfc_offset, counter_idx));
+
+ for (i = 0; i < pmu->region_cnt; i++) {
+ reg_val = event_idx & 0xFF;
+ writeq_relaxed(reg_val, pmu->regions[i].base +
+ TAD_PRF(pdata->tad_prf_offset, counter_idx));
+ }
+}
+
static void tad_pmu_event_counter_read(struct perf_event *event)
{
struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
+ const struct tad_pmu_data *pdata = tad_pmu->pdata;
struct hw_perf_event *hwc = &event->hw;
u32 counter_idx = hwc->idx;
u64 prev, new;
@@ -60,7 +131,7 @@ static void tad_pmu_event_counter_read(struct perf_event *event)
prev = local64_read(&hwc->prev_count);
for (i = 0, new = 0; i < tad_pmu->region_cnt; i++)
new += readq(tad_pmu->regions[i].base +
- TAD_PFC(counter_idx));
+ TAD_PFC(pdata->tad_pfc_offset, counter_idx));
} while (local64_cmpxchg(&hwc->prev_count, prev, new) != prev);
local64_add(new - prev, &event->count);
@@ -69,16 +140,14 @@ static void tad_pmu_event_counter_read(struct perf_event *event)
static void tad_pmu_event_counter_stop(struct perf_event *event, int flags)
{
struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
+ const struct tad_pmu_data *pdata = tad_pmu->pdata;
struct hw_perf_event *hwc = &event->hw;
u32 counter_idx = hwc->idx;
int i;
- /* TAD()_PFC() stop counting on the write
- * which sets TAD()_PRF()[CNTSEL] == 0
- */
for (i = 0; i < tad_pmu->region_cnt; i++) {
writeq_relaxed(0, tad_pmu->regions[i].base +
- TAD_PRF(counter_idx));
+ TAD_PRF(pdata->tad_prf_offset, counter_idx));
}
tad_pmu_event_counter_read(event);
@@ -89,26 +158,10 @@ static void tad_pmu_event_counter_start(struct perf_event *event, int flags)
{
struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- u32 event_idx = event->attr.config;
- u32 counter_idx = hwc->idx;
- u64 reg_val;
- int i;
hwc->state = 0;
- /* Typically TAD_PFC() are zeroed to start counting */
- for (i = 0; i < tad_pmu->region_cnt; i++)
- writeq_relaxed(0, tad_pmu->regions[i].base +
- TAD_PFC(counter_idx));
-
- /* TAD()_PFC() start counting on the write
- * which sets TAD()_PRF()[CNTSEL] != 0
- */
- for (i = 0; i < tad_pmu->region_cnt; i++) {
- reg_val = event_idx & 0xFF;
- writeq_relaxed(reg_val, tad_pmu->regions[i].base +
- TAD_PRF(counter_idx));
- }
+ tad_pmu->ops->start_counter(tad_pmu, event);
}
static void tad_pmu_event_counter_del(struct perf_event *event, int flags)
@@ -128,7 +181,6 @@ static int tad_pmu_event_counter_add(struct perf_event *event, int flags)
struct hw_perf_event *hwc = &event->hw;
int idx;
- /* Get a free counter for this event */
idx = find_first_zero_bit(tad_pmu->counters_map, TAD_MAX_COUNTERS);
if (idx == TAD_MAX_COUNTERS)
return -EAGAIN;
@@ -148,6 +200,9 @@ static int tad_pmu_event_counter_add(struct perf_event *event, int flags)
static int tad_pmu_event_init(struct perf_event *event)
{
struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
+ const struct tad_pmu_data *pdata = tad_pmu->pdata;
+ u32 event_idx = (u32)(event->attr.config & TAD_EVENT_SEL_MASK);
+ u64 cfg1 = event->attr.config1;
if (event->attr.type != event->pmu->type)
return -ENOENT;
@@ -158,6 +213,23 @@ static int tad_pmu_event_init(struct perf_event *event)
if (event->state != PERF_EVENT_STATE_OFF)
return -EINVAL;
+ if (event->attr.config & ~TAD_EVENT_SEL_MASK)
+ return -EINVAL;
+
+ if (pdata->id == TAD_PMU_V2) {
+ if (cfg1)
+ return -EINVAL;
+ } else {
+ if ((cfg1 & GENMASK(8, 0)) && !(cfg1 & TAD_PARTID_FILTER_EN))
+ return -EINVAL;
+ if (cfg1 & TAD_PARTID_FILTER_EN) {
+ if (event_idx <= 0x19 || event_idx >= 0x21)
+ return -EINVAL;
+ }
+ if (cfg1 & ~GENMASK(9, 0))
+ return -EINVAL;
+ }
+
event->cpu = tad_pmu->cpu;
event->hw.idx = -1;
event->hw.config_base = event->attr.config;
@@ -232,7 +304,7 @@ static struct attribute *ody_tad_pmu_event_attrs[] = {
TAD_PMU_EVENT_ATTR(tad_hit_ltg, 0x1e),
TAD_PMU_EVENT_ATTR(tad_hit_any, 0x1f),
TAD_PMU_EVENT_ATTR(tad_tag_rd, 0x20),
- TAD_PMU_EVENT_ATTR(tad_tot_cycle, 0xFF),
+ TAD_PMU_EVENT_ATTR(tad_tot_cycle, 0xff),
NULL
};
@@ -242,9 +314,13 @@ static const struct attribute_group ody_tad_pmu_events_attr_group = {
};
PMU_FORMAT_ATTR(event, "config:0-7");
+PMU_FORMAT_ATTR(partid, "config1:0-8");
+PMU_FORMAT_ATTR(partid_en, "config1:9-9");
static struct attribute *tad_pmu_format_attrs[] = {
&format_attr_event.attr,
+ &format_attr_partid.attr,
+ &format_attr_partid_en.attr,
NULL
};
@@ -253,6 +329,16 @@ static struct attribute_group tad_pmu_format_attr_group = {
.attrs = tad_pmu_format_attrs,
};
+static struct attribute *ody_tad_pmu_format_attrs[] = {
+ &format_attr_event.attr,
+ NULL
+};
+
+static struct attribute_group ody_tad_pmu_format_attr_group = {
+ .name = "format",
+ .attrs = ody_tad_pmu_format_attrs,
+};
+
static ssize_t tad_pmu_cpumask_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -281,16 +367,25 @@ static const struct attribute_group *tad_pmu_attr_groups[] = {
static const struct attribute_group *ody_tad_pmu_attr_groups[] = {
&ody_tad_pmu_events_attr_group,
- &tad_pmu_format_attr_group,
+ &ody_tad_pmu_format_attr_group,
&tad_pmu_cpumask_attr_group,
NULL
};
+static const struct tad_pmu_ops tad_pmu_ops = {
+ .start_counter = tad_pmu_start_counter,
+};
+
+static const struct tad_pmu_ops tad_pmu_v2_ops = {
+ .start_counter = tad_pmu_v2_start_counter,
+};
+
static int tad_pmu_probe(struct platform_device *pdev)
{
const struct tad_pmu_data *dev_data;
struct device *dev = &pdev->dev;
struct tad_region *regions;
+ resource_size_t map_start;
struct tad_pmu *tad_pmu;
struct resource *res;
u32 tad_pmu_page_size;
@@ -298,7 +393,6 @@ static int tad_pmu_probe(struct platform_device *pdev)
u32 tad_cnt;
int version;
int i, ret;
- char *name;
tad_pmu = devm_kzalloc(&pdev->dev, sizeof(*tad_pmu), GFP_KERNEL);
if (!tad_pmu)
@@ -312,6 +406,7 @@ static int tad_pmu_probe(struct platform_device *pdev)
return -ENODEV;
}
version = dev_data->id;
+ tad_pmu->pdata = dev_data;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res) {
@@ -338,22 +433,31 @@ static int tad_pmu_probe(struct platform_device *pdev)
dev_err(&pdev->dev, "Can't find tad-cnt property\n");
return ret;
}
+ if (!tad_cnt || !tad_page_size || !tad_pmu_page_size) {
+ dev_err(&pdev->dev, "Invalid tad-cnt or page size\n");
+ return -EINVAL;
+ }
regions = devm_kcalloc(&pdev->dev, tad_cnt,
sizeof(*regions), GFP_KERNEL);
if (!regions)
return -ENOMEM;
- /* ioremap the distributed TAD pmu regions */
- for (i = 0; i < tad_cnt && res->start < res->end; i++) {
- regions[i].base = devm_ioremap(&pdev->dev,
- res->start,
+ map_start = res->start;
+ for (i = 0; i < tad_cnt; i++) {
+ if (map_start > res->end ||
+ tad_pmu_page_size > (resource_size_t)(res->end - map_start + 1)) {
+ dev_err(&pdev->dev, "TAD PMU mem window too small for tad-cnt=%u\n",
+ tad_cnt);
+ return -EINVAL;
+ }
+ regions[i].base = devm_ioremap(&pdev->dev, map_start,
tad_pmu_page_size);
if (!regions[i].base) {
dev_err(&pdev->dev, "TAD%d ioremap fail\n", i);
return -ENOMEM;
}
- res->start += tad_page_size;
+ map_start += tad_page_size;
}
tad_pmu->regions = regions;
@@ -374,28 +478,31 @@ static int tad_pmu_probe(struct platform_device *pdev)
.read = tad_pmu_event_counter_read,
};
- if (version == TAD_PMU_V1)
+ if (version == TAD_PMU_V1) {
tad_pmu->pmu.attr_groups = tad_pmu_attr_groups;
- else
+ tad_pmu->ops = &tad_pmu_ops;
+ } else {
tad_pmu->pmu.attr_groups = ody_tad_pmu_attr_groups;
+ tad_pmu->ops = &tad_pmu_v2_ops;
+ }
tad_pmu->cpu = raw_smp_processor_id();
- /* Register pmu instance for cpu hotplug */
+ ret = perf_pmu_register(&tad_pmu->pmu, "tad", -1);
+ if (ret) {
+ dev_err(&pdev->dev, "Error %d registering perf PMU\n", ret);
+ return ret;
+ }
+
ret = cpuhp_state_add_instance_nocalls(tad_pmu_cpuhp_state,
&tad_pmu->node);
if (ret) {
dev_err(&pdev->dev, "Error %d registering hotplug\n", ret);
+ perf_pmu_unregister(&tad_pmu->pmu);
return ret;
}
- name = "tad";
- ret = perf_pmu_register(&tad_pmu->pmu, name, -1);
- if (ret)
- cpuhp_state_remove_instance_nocalls(tad_pmu_cpuhp_state,
- &tad_pmu->node);
-
- return ret;
+ return 0;
}
static void tad_pmu_remove(struct platform_device *pdev)
@@ -410,12 +517,17 @@ static void tad_pmu_remove(struct platform_device *pdev)
#if defined(CONFIG_OF) || defined(CONFIG_ACPI)
static const struct tad_pmu_data tad_pmu_data = {
.id = TAD_PMU_V1,
+ .tad_prf_offset = TAD_PRF_OFFSET,
+ .tad_pfc_offset = TAD_PFC_OFFSET,
};
+
#endif
#ifdef CONFIG_ACPI
static const struct tad_pmu_data tad_pmu_v2_data = {
.id = TAD_PMU_V2,
+ .tad_prf_offset = TAD_PRF_OFFSET,
+ .tad_pfc_offset = TAD_PFC_OFFSET,
};
#endif
@@ -491,6 +603,6 @@ static void __exit tad_pmu_exit(void)
module_init(tad_pmu_init);
module_exit(tad_pmu_exit);
-MODULE_DESCRIPTION("Marvell CN10K LLC-TAD Perf driver");
+MODULE_DESCRIPTION("Marvell CN10K LLC-TAD perf driver");
MODULE_AUTHOR("Bhaskara Budiredla <bbudiredla@marvell.com>");
MODULE_LICENSE("GPL v2");
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 2/3] perf: marvell: Add CN20K LLC-TAD PMU support
2026-06-16 7:11 [PATCH v3 0/3] perf: marvell: LLC-TAD PMU MPAM filtering support Geetha sowjanya
2026-06-16 7:11 ` [PATCH v3 1/3] perf: marvell: Add MPAM partid filtering to CN10K TAD PMU Geetha sowjanya
@ 2026-06-16 7:11 ` Geetha sowjanya
2026-06-16 7:22 ` sashiko-bot
2026-06-16 7:11 ` [PATCH v3 3/3] dt-bindings: perf: marvell: add CN20K TAD " Geetha sowjanya
2 siblings, 1 reply; 5+ messages in thread
From: Geetha sowjanya @ 2026-06-16 7:11 UTC (permalink / raw)
To: linux-perf-users, linux-kernel, linux-arm-kernel, devicetree
Cc: mark.rutland, will, krzk+dt, gakula
Add support for the LLC Tag-and-Data (TAD) PMU present in
Marvell CN20K SoCs.
The CN20K TAD PMU is based on the CN10K design but differs in the
layout of PFC/PRF register offsets relative to each TAD base, and
introduces additional events. These offsets are selected by the driver
based on the compatible string and are not described via DT properties.
Because of this, "marvell,cn10k-tad-pmu" cannot be used as a fallback
for CN20K, as it would result in incorrect register programming.
Add support for "marvell,cn20k-tad-pmu" by:
- Introducing a TAD_PMU_V3 profile with CN20K-specific register bases
- Extending the event map for new CN20K events
- Matching the PMU via OF and ACPI (MRVL000F)
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
---
Changelog (since v2)
--------------------
- Validate the eventId using an appropriate mask to ensure
it is restricted to 8 bits.
Changelog (since v1)
--------------------
- Hide V3-only events on CN10K via sysfs is_visible and reject them in
event_init.
- Use CN20K-specific MPAM PRF bits (MATCH_MPAMNS, partid << 10) for V3;
software partid is limited to nine bits so this does not collide with
the fixed bit at 25.
- Reset hwc->prev_count when starting counters so reads match cleared HW.
drivers/perf/marvell_cn10k_tad_pmu.c | 54 ++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 69a6648fa664..cd81bf8ff569 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -17,11 +17,14 @@
#define TAD_PRF_OFFSET 0x900
#define TAD_PFC_OFFSET 0x800
+#define TAD_PRF_NS_OFFSET 0x30900
+#define TAD_PFC_NS_OFFSET 0x30800
#define TAD_PFC(base, counter) ((base) | ((u64)(counter) << 3))
#define TAD_PRF(base, counter) ((base) | ((u64)(counter) << 3))
#define TAD_PRF_CNTSEL_MASK 0xFF
#define TAD_PRF_MATCH_PARTID BIT(8)
#define TAD_PRF_PARTID_NS BIT(10)
+#define TAD_PRF_MATCH_MPAMNS BIT(25)
/*
* config1: bits 0..8 MPAM partition id (including 0); bit 9 requests
* filtering for MPAM-capable events. All-zero config1 means no filter.
@@ -39,6 +42,7 @@ struct tad_region {
enum mrvl_tad_pmu_version {
TAD_PMU_V1 = 1,
TAD_PMU_V2,
+ TAD_PMU_V3,
};
struct tad_pmu_data {
@@ -86,8 +90,15 @@ static void tad_pmu_start_counter(struct tad_pmu *pmu,
if (use_mpam && event_idx > 0x19 && event_idx < 0x21) {
partid_filter = TAD_PRF_MATCH_PARTID | TAD_PRF_PARTID_NS |
((u64)partid << 11);
+
+ if (pdata->id == TAD_PMU_V3)
+ partid_filter = TAD_PRF_MATCH_PARTID | TAD_PRF_MATCH_MPAMNS |
+ ((u64)partid << 10);
}
+ /* CN10K support events 0:24*/
+ if (pdata->id == TAD_PMU_V1 && event_idx >= 0x25)
+ return;
for (i = 0; i < pmu->region_cnt; i++) {
reg_val = event_idx & 0xFF;
@@ -160,6 +171,7 @@ static void tad_pmu_event_counter_start(struct perf_event *event, int flags)
struct hw_perf_event *hwc = &event->hw;
hwc->state = 0;
+ local64_set(&hwc->prev_count, 0);
tad_pmu->ops->start_counter(tad_pmu, event);
}
@@ -220,6 +232,8 @@ static int tad_pmu_event_init(struct perf_event *event)
if (cfg1)
return -EINVAL;
} else {
+ if (pdata->id == TAD_PMU_V1 && event_idx >= 0x25)
+ return -EINVAL;
if ((cfg1 & GENMASK(8, 0)) && !(cfg1 & TAD_PARTID_FILTER_EN))
return -EINVAL;
if (cfg1 & TAD_PARTID_FILTER_EN) {
@@ -246,6 +260,22 @@ static ssize_t tad_pmu_event_show(struct device *dev,
return sysfs_emit(page, "event=0x%02llx\n", pmu_attr->id);
}
+static umode_t tad_pmu_event_attr_is_visible(struct kobject *kobj,
+ struct attribute *attr, int unused)
+{
+ struct pmu *pmu = dev_get_drvdata(kobj_to_dev(kobj));
+ struct tad_pmu *t = to_tad_pmu(pmu);
+ struct device_attribute *da = container_of(attr, struct device_attribute,
+ attr);
+ struct perf_pmu_events_attr *e = container_of(da, struct perf_pmu_events_attr,
+ attr);
+ u64 id = e->id;
+
+ if (t->pdata->id != TAD_PMU_V3 && id >= 0x25)
+ return 0;
+ return attr->mode;
+}
+
#define TAD_PMU_EVENT_ATTR(name, config) \
PMU_EVENT_ATTR_ID(name, tad_pmu_event_show, config)
@@ -287,12 +317,25 @@ static struct attribute *tad_pmu_event_attrs[] = {
TAD_PMU_EVENT_ATTR(tad_dat_rd_byp, 0x22),
TAD_PMU_EVENT_ATTR(tad_ifb_occ, 0x23),
TAD_PMU_EVENT_ATTR(tad_req_occ, 0x24),
+ TAD_PMU_EVENT_ATTR(tad_req_msh_out_dtg_evict, 0x25),
+ TAD_PMU_EVENT_ATTR(tad_req_msh_out_ltg_evict, 0x26),
+ TAD_PMU_EVENT_ATTR(tad_rsp_msh_out_mpam, 0x28),
+ TAD_PMU_EVENT_ATTR(tad_replays, 0x29),
+ TAD_PMU_EVENT_ATTR(tad_req_byp0, 0x2a),
+ TAD_PMU_EVENT_ATTR(tad_req_byp1, 0x2b),
+ TAD_PMU_EVENT_ATTR(tad_txreq_byp, 0x2c),
+ TAD_PMU_EVENT_ATTR(tad_time_in_dslp, 0x2d),
+ TAD_PMU_EVENT_ATTR(tad_time_elapsed, 0x2e),
+ TAD_PMU_EVENT_ATTR(tad_req_msh_out_dss_rd_128mrg, 0x2f),
+ TAD_PMU_EVENT_ATTR(tad_req_msh_out_dss_wr_128mrg, 0x30),
+ TAD_PMU_EVENT_ATTR(tad_tot_cycle, 0xff),
NULL
};
static const struct attribute_group tad_pmu_events_attr_group = {
.name = "events",
.attrs = tad_pmu_event_attrs,
+ .is_visible = tad_pmu_event_attr_is_visible,
};
static struct attribute *ody_tad_pmu_event_attrs[] = {
@@ -478,7 +521,7 @@ static int tad_pmu_probe(struct platform_device *pdev)
.read = tad_pmu_event_counter_read,
};
- if (version == TAD_PMU_V1) {
+ if (version == TAD_PMU_V1 || version == TAD_PMU_V3) {
tad_pmu->pmu.attr_groups = tad_pmu_attr_groups;
tad_pmu->ops = &tad_pmu_ops;
} else {
@@ -521,6 +564,11 @@ static const struct tad_pmu_data tad_pmu_data = {
.tad_pfc_offset = TAD_PFC_OFFSET,
};
+static const struct tad_pmu_data tad_pmu_cn20k_data = {
+ .id = TAD_PMU_V3,
+ .tad_prf_offset = TAD_PRF_NS_OFFSET,
+ .tad_pfc_offset = TAD_PFC_NS_OFFSET,
+};
#endif
#ifdef CONFIG_ACPI
@@ -534,6 +582,7 @@ static const struct tad_pmu_data tad_pmu_v2_data = {
#ifdef CONFIG_OF
static const struct of_device_id tad_pmu_of_match[] = {
{ .compatible = "marvell,cn10k-tad-pmu", .data = &tad_pmu_data },
+ { .compatible = "marvell,cn20k-tad-pmu", .data = &tad_pmu_cn20k_data },
{},
};
#endif
@@ -542,6 +591,7 @@ static const struct of_device_id tad_pmu_of_match[] = {
static const struct acpi_device_id tad_pmu_acpi_match[] = {
{"MRVL000B", (kernel_ulong_t)&tad_pmu_data},
{"MRVL000D", (kernel_ulong_t)&tad_pmu_v2_data},
+ {"MRVL000F", (kernel_ulong_t)&tad_pmu_cn20k_data},
{},
};
MODULE_DEVICE_TABLE(acpi, tad_pmu_acpi_match);
@@ -603,6 +653,6 @@ static void __exit tad_pmu_exit(void)
module_init(tad_pmu_init);
module_exit(tad_pmu_exit);
-MODULE_DESCRIPTION("Marvell CN10K LLC-TAD perf driver");
+MODULE_DESCRIPTION("Marvell CN10K/CN20K LLC-TAD perf driver");
MODULE_AUTHOR("Bhaskara Budiredla <bbudiredla@marvell.com>");
MODULE_LICENSE("GPL v2");
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 3/3] dt-bindings: perf: marvell: add CN20K TAD PMU support
2026-06-16 7:11 [PATCH v3 0/3] perf: marvell: LLC-TAD PMU MPAM filtering support Geetha sowjanya
2026-06-16 7:11 ` [PATCH v3 1/3] perf: marvell: Add MPAM partid filtering to CN10K TAD PMU Geetha sowjanya
2026-06-16 7:11 ` [PATCH v3 2/3] perf: marvell: Add CN20K LLC-TAD PMU support Geetha sowjanya
@ 2026-06-16 7:11 ` Geetha sowjanya
2 siblings, 0 replies; 5+ messages in thread
From: Geetha sowjanya @ 2026-06-16 7:11 UTC (permalink / raw)
To: linux-perf-users, linux-kernel, linux-arm-kernel, devicetree
Cc: mark.rutland, will, krzk+dt, gakula
Marvell CN20K SoCs integrate a Performance Monitoring Unit (PMU)
associated with the LLC Tag-and-Data (TAD) blocks. The PMU provides
hardware counters to monitor cache traffic and performance events
via a dedicated MMIO region.
The CN20K LLC-TAD PMU is largely similar to CN10K, but differs in the
layout of PFC/PRF register offsets relative to each TAD base. These
offsets are derived from the compatible string in the driver and are
not described through Devicetree properties.
Because of this, using "marvell,cn10k-tad-pmu" as a fallback for CN20K
would result in incorrect register programming. Therefore, add a
separate compatible string:
"marvell,cn20k-tad-pmu"
Update the binding to document CN20K alongside CN10K.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
---
.../bindings/perf/marvell-cn10k-tad.yaml | 25 +++++++++++++------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml b/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml
index 362142252667..d11121a1e2c9 100644
--- a/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml
+++ b/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml
@@ -4,23 +4,32 @@
$id: http://devicetree.org/schemas/perf/marvell-cn10k-tad.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
-title: Marvell CN10K LLC-TAD performance monitor
+title: Marvell CN10K / CN20K LLC-TAD performance monitor
maintainers:
- Bhaskara Budiredla <bbudiredla@marvell.com>
+ - Geetha sowjanya <gakula@marvell.com>
description: |
- The Tag-and-Data units (TADs) maintain coherence and contain CN10K
- shared on-chip last level cache (LLC). The tad pmu measures the
- performance of last-level cache. Each tad pmu supports up to eight
- counters.
+ The Tag-and-Data units (TADs) maintain coherence and contain the
+ shared on-chip last level cache (LLC) on Marvell CN10K and CN20K SoCs.
+ The TAD PMU measures last-level cache performance. Each TAD PMU
+ supports up to eight counters.
- The DT setup comprises of number of tad blocks, the sizes of pmu
- regions, tad blocks and overall base address of the HW.
+ The DT setup describes the number of TAD blocks, the sizes of PMU
+ regions and TAD pages, and the overall MMIO base of the hardware.
+
+ marvell,cn20k-tad-pmu is not a compatible fallback for
+ marvell,cn10k-tad-pmu (and vice versa): the driver selects different
+ PFC/PRF MMIO offsets from the compatible string, and those offsets are
+ not described by separate DT properties today.
properties:
compatible:
- const: marvell,cn10k-tad-pmu
+ items:
+ - enum:
+ - marvell,cn10k-tad-pmu
+ - marvell,cn20k-tad-pmu
reg:
maxItems: 1
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 2/3] perf: marvell: Add CN20K LLC-TAD PMU support
2026-06-16 7:11 ` [PATCH v3 2/3] perf: marvell: Add CN20K LLC-TAD PMU support Geetha sowjanya
@ 2026-06-16 7:22 ` sashiko-bot
0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-06-16 7:22 UTC (permalink / raw)
To: Geetha sowjanya; +Cc: conor+dt, devicetree, linux-perf-users, robh
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The patch series reverses the registration order of the PMU and CPU hotplug, creating a race window where the PMU is exposed to userspace before hotplug migration is wired up.
--
commit 71b7ba2b271b1080739c44e37ec7465ecb45ee0e
Author: Geetha sowjanya <gakula@marvell.com>
perf: marvell: Add CN20K LLC-TAD PMU support
Add support for the LLC Tag-and-Data (TAD) PMU present in
Marvell CN20K SoCs.
> diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
> index 69a6648fa6648..cd81bf8ff5690 100644
> --- a/drivers/perf/marvell_cn10k_tad_pmu.c
> +++ b/drivers/perf/marvell_cn10k_tad_pmu.c
[ ... ]
> @@ -478,7 +521,7 @@ static int tad_pmu_probe(struct platform_device *pdev)
> .read = tad_pmu_event_counter_read,
> };
>
> - if (version == TAD_PMU_V1) {
> + if (version == TAD_PMU_V1 || version == TAD_PMU_V3) {
> tad_pmu->pmu.attr_groups = tad_pmu_attr_groups;
> tad_pmu->ops = &tad_pmu_ops;
> } else {
[Severity: Medium]
Looking at the remainder of tad_pmu_probe(), the registration order is:
ret = perf_pmu_register(&tad_pmu->pmu, "tad", -1);
if (ret) {
dev_err(&pdev->dev, "Error %d registering perf PMU\n", ret);
return ret;
}
ret = cpuhp_state_add_instance_nocalls(tad_pmu_cpuhp_state,
&tad_pmu->node);
Does this sequence create a race window where the PMU is exposed to
userspace before hotplug migration is wired up?
If userspace immediately creates events assigned to tad_pmu->cpu, and that
CPU goes offline before the hotplug state instance is added, the hotplug
callback wouldn't be invoked to migrate the context. This could leave the
PMU tied to an offline CPU, preventing its events from being scheduled.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616071149.12523-1-gakula@marvell.com?part=2
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-16 7:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 7:11 [PATCH v3 0/3] perf: marvell: LLC-TAD PMU MPAM filtering support Geetha sowjanya
2026-06-16 7:11 ` [PATCH v3 1/3] perf: marvell: Add MPAM partid filtering to CN10K TAD PMU Geetha sowjanya
2026-06-16 7:11 ` [PATCH v3 2/3] perf: marvell: Add CN20K LLC-TAD PMU support Geetha sowjanya
2026-06-16 7:22 ` sashiko-bot
2026-06-16 7:11 ` [PATCH v3 3/3] dt-bindings: perf: marvell: add CN20K TAD " Geetha sowjanya
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox