* [PATCH v14 01/32] x86,fs/resctrl: Remove unappropriate references to cacheinfo in the resctrl subsystem.
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-13 21:04 ` [PATCH v14 02/32] x86,fs/resctrl: Consolidate monitor event descriptions Babu Moger
` (32 subsequent siblings)
33 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
From: Qinyun Tan <qinyuntan@linux.alibaba.com>
In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain
structure representing a NUMA node relies on the cacheinfo interface
(rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map)
for monitoring. The L3 cache information of a SNC NUMA node determines
which domains are summed for the "top level" L3-scoped events.
rdt_mon_domain::ci is initialized using the first online CPU of a NUMA
node. When this CPU goes offline, its shared_cpu_map is cleared to contain
only the offline CPU itself. Subsequently, attempting to read counters
via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning
zero values for "top-level events" without any error indication.
Replace the cacheinfo references in struct rdt_mon_domain and struct
rmid_read with the cacheinfo ID (a unique identifier for the L3 cache).
rdt_domain_hdr::cpu_mask contains the online CPUs associated with that
domain. When reading "top-level events", select a CPU from
rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine
valid CPUs for reading RMID counter via the MSR interface.
Considering all CPUs associated with the L3 cache improves the chances
of picking a housekeeping CPU on which the counter reading work can be
queued, avoiding an unnecessary IPI.
Fixes: 328ea68874642 ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files")
Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v14: Picked up this patch on Tony's recomendation. This is expected to be
merged soon. Don't need to be reviwed.
---
arch/x86/kernel/cpu/resctrl/core.c | 6 ++++--
fs/resctrl/ctrlmondata.c | 13 +++++++++----
fs/resctrl/internal.h | 4 ++--
fs/resctrl/monitor.c | 6 ++++--
fs/resctrl/rdtgroup.c | 6 +++---
include/linux/resctrl.h | 4 ++--
6 files changed, 24 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7109cbfcad4f..187d527ef73b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -498,6 +498,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
struct rdt_hw_mon_domain *hw_dom;
struct rdt_domain_hdr *hdr;
struct rdt_mon_domain *d;
+ struct cacheinfo *ci;
int err;
lockdep_assert_held(&domain_list_lock);
@@ -525,12 +526,13 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
d = &hw_dom->d_resctrl;
d->hdr.id = id;
d->hdr.type = RESCTRL_MON_DOMAIN;
- d->ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
- if (!d->ci) {
+ ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
+ if (!ci) {
pr_warn_once("Can't find L3 cache for CPU:%d resource %s\n", cpu, r->name);
mon_domain_free(hw_dom);
return;
}
+ d->ci_id = ci->id;
cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
arch_mon_domain_online(r, d);
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 6ed2dfd4dbbd..d98e0d2de09f 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -594,9 +594,10 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
struct rmid_read rr = {0};
struct rdt_mon_domain *d;
struct rdtgroup *rdtgrp;
+ int domid, cpu, ret = 0;
struct rdt_resource *r;
+ struct cacheinfo *ci;
struct mon_data *md;
- int domid, ret = 0;
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (!rdtgrp) {
@@ -623,10 +624,14 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
* one that matches this cache id.
*/
list_for_each_entry(d, &r->mon_domains, hdr.list) {
- if (d->ci->id == domid) {
- rr.ci = d->ci;
+ if (d->ci_id == domid) {
+ rr.ci_id = d->ci_id;
+ cpu = cpumask_any(&d->hdr.cpu_mask);
+ ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
+ if (!ci)
+ continue;
mon_event_read(&rr, r, NULL, rdtgrp,
- &d->ci->shared_cpu_map, evtid, false);
+ &ci->shared_cpu_map, evtid, false);
goto checkresult;
}
}
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 9a8cf6f11151..0a1eedba2b03 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -98,7 +98,7 @@ struct mon_data {
* domains in @r sharing L3 @ci.id
* @evtid: Which monitor event to read.
* @first: Initialize MBM counter when true.
- * @ci: Cacheinfo for L3. Only set when @d is NULL. Used when summing domains.
+ * @ci_id: Cacheinfo id for L3. Only set when @d is NULL. Used when summing domains.
* @err: Error encountered when reading counter.
* @val: Returned value of event counter. If @rgrp is a parent resource group,
* @val includes the sum of event counts from its child resource groups.
@@ -112,7 +112,7 @@ struct rmid_read {
struct rdt_mon_domain *d;
enum resctrl_event_id evtid;
bool first;
- struct cacheinfo *ci;
+ unsigned int ci_id;
int err;
u64 val;
void *arch_mon_ctx;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index bde2801289d3..f5637855c3ac 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -361,6 +361,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
{
int cpu = smp_processor_id();
struct rdt_mon_domain *d;
+ struct cacheinfo *ci;
struct mbm_state *m;
int err, ret;
u64 tval = 0;
@@ -388,7 +389,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
}
/* Summing domains that share a cache, must be on a CPU for that cache. */
- if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
+ ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
+ if (!ci || ci->id != rr->ci_id)
return -EINVAL;
/*
@@ -400,7 +402,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
*/
ret = -EINVAL;
list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
- if (d->ci->id != rr->ci->id)
+ if (d->ci_id != rr->ci_id)
continue;
err = resctrl_arch_rmid_read(rr->r, d, closid, rmid,
rr->evtid, &tval, rr->arch_mon_ctx);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 1beb124e25f6..77d08229d855 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -3036,7 +3036,7 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
char name[32];
snc_mode = r->mon_scope == RESCTRL_L3_NODE;
- sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci->id : d->hdr.id);
+ sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
if (snc_mode)
sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id);
@@ -3061,7 +3061,7 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
return -EPERM;
list_for_each_entry(mevt, &r->evt_list, list) {
- domid = do_sum ? d->ci->id : d->hdr.id;
+ domid = do_sum ? d->ci_id : d->hdr.id;
priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum);
if (WARN_ON_ONCE(!priv))
return -EINVAL;
@@ -3089,7 +3089,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
lockdep_assert_held(&rdtgroup_mutex);
snc_mode = r->mon_scope == RESCTRL_L3_NODE;
- sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci->id : d->hdr.id);
+ sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
kn = kernfs_find_and_get(parent_kn, name);
if (kn) {
/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9ba771f2ddea..6fb4894b8cfd 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -159,7 +159,7 @@ struct rdt_ctrl_domain {
/**
* struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
* @hdr: common header for different domain types
- * @ci: cache info for this domain
+ * @ci_id: cache info id for this domain
* @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold
* @mbm_total: saved state for MBM total bandwidth
* @mbm_local: saved state for MBM local bandwidth
@@ -170,7 +170,7 @@ struct rdt_ctrl_domain {
*/
struct rdt_mon_domain {
struct rdt_domain_hdr hdr;
- struct cacheinfo *ci;
+ unsigned int ci_id;
unsigned long *rmid_busy_llc;
struct mbm_state *mbm_total;
struct mbm_state *mbm_local;
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v14 02/32] x86,fs/resctrl: Consolidate monitor event descriptions
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2025-06-13 21:04 ` [PATCH v14 01/32] x86,fs/resctrl: Remove unappropriate references to cacheinfo in the resctrl subsystem Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-24 21:28 ` Reinette Chatre
2025-06-13 21:04 ` [PATCH v14 03/32] x86,fs/resctrl: Replace architecture event enabled checks Babu Moger
` (31 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
From: Tony Luck <tony.luck@intel.com>
There are currently only three monitor events, all associated with
the RDT_RESOURCE_L3 resource. Growing support for additional events
will be easier with some restructuring to have a single point in
file system code where all attributes of all events are defined.
Place all event descriptions into an array mon_event_all[]. Doing
this has the beneficial side effect of removing the need for
rdt_resource::evt_list.
Add resctrl_event_id::QOS_FIRST_EVENT for a lower bound on range
checks for event ids and as the starting index to scan mon_event_all[].
Drop the code that builds evt_list and change the two places where
the list is scanned to scan mon_event_all[] instead using a new
helper macro for_each_mon_event().
Architecture code now informs file system code which events are
available with resctrl_enable_mon_event().
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
v14: This is Tony's work. This is part of Tony's telemetry series.
https://lore.kernel.org/lkml/20250521225049.132551-1-tony.luck@intel.com/
Tony made special update for me to include in this series.
https://lore.kernel.org/lkml/20250609162139.91651-1-tony.luck@intel.com/.
---
arch/x86/kernel/cpu/resctrl/core.c | 12 ++++--
fs/resctrl/internal.h | 13 ++++--
fs/resctrl/monitor.c | 63 +++++++++++++++---------------
fs/resctrl/rdtgroup.c | 11 +++---
include/linux/resctrl.h | 4 +-
include/linux/resctrl_types.h | 12 ++++--
6 files changed, 66 insertions(+), 49 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 187d527ef73b..7fcae25874fe 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -864,12 +864,18 @@ static __init bool get_rdt_mon_resources(void)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC))
+ if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
+ resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);
- if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL))
+ }
+ if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
+ resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
rdt_mon_features |= (1 << QOS_L3_MBM_TOTAL_EVENT_ID);
- if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))
+ }
+ if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
+ resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
rdt_mon_features |= (1 << QOS_L3_MBM_LOCAL_EVENT_ID);
+ }
if (!rdt_mon_features)
return false;
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 0a1eedba2b03..20e2c45cea64 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -52,19 +52,26 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
}
/**
- * struct mon_evt - Entry in the event list of a resource
+ * struct mon_evt - Description of a monitor event
* @evtid: event id
+ * @rid: index of the resource for this event
* @name: name of the event
* @configurable: true if the event is configurable
- * @list: entry in &rdt_resource->evt_list
+ * @enabled: true if the event is enabled
*/
struct mon_evt {
enum resctrl_event_id evtid;
+ enum resctrl_res_level rid;
char *name;
bool configurable;
- struct list_head list;
+ bool enabled;
};
+extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
+
+#define for_each_mon_event(mevt) for (mevt = &mon_event_all[QOS_FIRST_EVENT]; \
+ mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++)
+
/**
* struct mon_data - Monitoring details for each event file.
* @list: Member of the global @mon_data_kn_priv_list list.
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index f5637855c3ac..2313e48de55f 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -844,38 +844,39 @@ static void dom_data_exit(struct rdt_resource *r)
mutex_unlock(&rdtgroup_mutex);
}
-static struct mon_evt llc_occupancy_event = {
- .name = "llc_occupancy",
- .evtid = QOS_L3_OCCUP_EVENT_ID,
-};
-
-static struct mon_evt mbm_total_event = {
- .name = "mbm_total_bytes",
- .evtid = QOS_L3_MBM_TOTAL_EVENT_ID,
-};
-
-static struct mon_evt mbm_local_event = {
- .name = "mbm_local_bytes",
- .evtid = QOS_L3_MBM_LOCAL_EVENT_ID,
-};
-
/*
- * Initialize the event list for the resource.
- *
- * Note that MBM events are also part of RDT_RESOURCE_L3 resource
- * because as per the SDM the total and local memory bandwidth
- * are enumerated as part of L3 monitoring.
+ * All available events. Architecture code marks the ones that
+ * are supported by a system using resctrl_enable_mon_event()
+ * to set .enabled.
*/
-static void l3_mon_evt_init(struct rdt_resource *r)
+struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
+ [QOS_L3_OCCUP_EVENT_ID] = {
+ .name = "llc_occupancy",
+ .evtid = QOS_L3_OCCUP_EVENT_ID,
+ .rid = RDT_RESOURCE_L3,
+ },
+ [QOS_L3_MBM_TOTAL_EVENT_ID] = {
+ .name = "mbm_total_bytes",
+ .evtid = QOS_L3_MBM_TOTAL_EVENT_ID,
+ .rid = RDT_RESOURCE_L3,
+ },
+ [QOS_L3_MBM_LOCAL_EVENT_ID] = {
+ .name = "mbm_local_bytes",
+ .evtid = QOS_L3_MBM_LOCAL_EVENT_ID,
+ .rid = RDT_RESOURCE_L3,
+ },
+};
+
+void resctrl_enable_mon_event(enum resctrl_event_id eventid)
{
- INIT_LIST_HEAD(&r->evt_list);
+ if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
+ return;
+ if (mon_event_all[eventid].enabled) {
+ pr_warn("Duplicate enable for event %d\n", eventid);
+ return;
+ }
- if (resctrl_arch_is_llc_occupancy_enabled())
- list_add_tail(&llc_occupancy_event.list, &r->evt_list);
- if (resctrl_arch_is_mbm_total_enabled())
- list_add_tail(&mbm_total_event.list, &r->evt_list);
- if (resctrl_arch_is_mbm_local_enabled())
- list_add_tail(&mbm_local_event.list, &r->evt_list);
+ mon_event_all[eventid].enabled = true;
}
/**
@@ -902,15 +903,13 @@ int resctrl_mon_resource_init(void)
if (ret)
return ret;
- l3_mon_evt_init(r);
-
if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
- mbm_total_event.configurable = true;
+ mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].configurable = true;
resctrl_file_fflags_init("mbm_total_bytes_config",
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_LOCAL_EVENT_ID)) {
- mbm_local_event.configurable = true;
+ mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].configurable = true;
resctrl_file_fflags_init("mbm_local_bytes_config",
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 77d08229d855..b95501d4b5de 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1152,7 +1152,9 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
struct mon_evt *mevt;
- list_for_each_entry(mevt, &r->evt_list, list) {
+ for_each_mon_event(mevt) {
+ if (mevt->rid != r->rid || !mevt->enabled)
+ continue;
seq_printf(seq, "%s\n", mevt->name);
if (mevt->configurable)
seq_printf(seq, "%s_config\n", mevt->name);
@@ -3057,10 +3059,9 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
struct mon_evt *mevt;
int ret, domid;
- if (WARN_ON(list_empty(&r->evt_list)))
- return -EPERM;
-
- list_for_each_entry(mevt, &r->evt_list, list) {
+ for_each_mon_event(mevt) {
+ if (mevt->rid != r->rid || !mevt->enabled)
+ continue;
domid = do_sum ? d->ci_id : d->hdr.id;
priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum);
if (WARN_ON_ONCE(!priv))
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 6fb4894b8cfd..2944042bd84c 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -269,7 +269,6 @@ enum resctrl_schema_fmt {
* @mon_domains: RCU list of all monitor domains for this resource
* @name: Name to use in "schemata" file.
* @schema_fmt: Which format string and parser is used for this schema.
- * @evt_list: List of monitoring events
* @mbm_cfg_mask: Bandwidth sources that can be tracked when bandwidth
* monitoring events can be configured.
* @cdp_capable: Is the CDP feature available on this resource
@@ -287,7 +286,6 @@ struct rdt_resource {
struct list_head mon_domains;
char *name;
enum resctrl_schema_fmt schema_fmt;
- struct list_head evt_list;
unsigned int mbm_cfg_mask;
bool cdp_capable;
};
@@ -372,6 +370,8 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
u32 resctrl_arch_system_num_rmid_idx(void);
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
+void resctrl_enable_mon_event(enum resctrl_event_id eventid);
+
bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt);
/**
diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index a25fb9c4070d..2dadbc54e4b3 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -34,11 +34,15 @@
/* Max event bits supported */
#define MAX_EVT_CONFIG_BITS GENMASK(6, 0)
-/*
- * Event IDs, the values match those used to program IA32_QM_EVTSEL before
- * reading IA32_QM_CTR on RDT systems.
- */
+/* Event IDs */
enum resctrl_event_id {
+ /* Must match value of first event below */
+ QOS_FIRST_EVENT = 0x01,
+
+ /*
+ * These values match those used to program IA32_QM_EVTSEL before
+ * reading IA32_QM_CTR on RDT systems.
+ */
QOS_L3_OCCUP_EVENT_ID = 0x01,
QOS_L3_MBM_TOTAL_EVENT_ID = 0x02,
QOS_L3_MBM_LOCAL_EVENT_ID = 0x03,
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 02/32] x86,fs/resctrl: Consolidate monitor event descriptions
2025-06-13 21:04 ` [PATCH v14 02/32] x86,fs/resctrl: Consolidate monitor event descriptions Babu Moger
@ 2025-06-24 21:28 ` Reinette Chatre
2025-06-25 15:57 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 21:28 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu/Tony,
On 6/13/25 2:04 PM, Babu Moger wrote:
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 0a1eedba2b03..20e2c45cea64 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -52,19 +52,26 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
> }
>
> /**
> - * struct mon_evt - Entry in the event list of a resource
> + * struct mon_evt - Description of a monitor event
nit: I still think "Properties" is more appropriate.
> * @evtid: event id
> + * @rid: index of the resource for this event
> * @name: name of the event
> * @configurable: true if the event is configurable
> - * @list: entry in &rdt_resource->evt_list
> + * @enabled: true if the event is enabled
> */
> struct mon_evt {
> enum resctrl_event_id evtid;
> + enum resctrl_res_level rid;
> char *name;
> bool configurable;
> - struct list_head list;
> + bool enabled;
> };
>
> +extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
> +
> +#define for_each_mon_event(mevt) for (mevt = &mon_event_all[QOS_FIRST_EVENT]; \
> + mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++)
> +
From what I can tell this series does not build on some core changes
made by this patch:
- note that resource ID is added to struct mon_evt and the events
are *removed* from the evt_list associated with the resource. I'll try to point
out where I see it but this series still behaves as though it is traversing
evt_list associated with the resource. Take for example
patch #24 "fs/resctrl: Add event configuration directory under info/L3_MON/":
resctrl_mkdir_counter_configs() traverses mon_event_all[] that, after this
patch, contains all events for *all* resources, yet resctrl_mkdir_counter_configs(),
even though it has a struct rdt_resource as parameter, assumes that all events are
associated its resource but there is no checking to enforce this.
- note the new for_each_mon_event() above. This should be used throughout
instead of open-coding the loop because the array starts at index 0 but
the first valid entry is at index 1. The above macro makes this easier to
get right.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 02/32] x86,fs/resctrl: Consolidate monitor event descriptions
2025-06-24 21:28 ` Reinette Chatre
@ 2025-06-25 15:57 ` Moger, Babu
2025-06-25 17:55 ` Luck, Tony
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-06-25 15:57 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 16:28, Reinette Chatre wrote:
> Hi Babu/Tony,
>
> On 6/13/25 2:04 PM, Babu Moger wrote:
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 0a1eedba2b03..20e2c45cea64 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -52,19 +52,26 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
>> }
>>
>> /**
>> - * struct mon_evt - Entry in the event list of a resource
>> + * struct mon_evt - Description of a monitor event
>
> nit: I still think "Properties" is more appropriate.
I will let Tony take care of this.
Also, there is another comment
https://lore.kernel.org/lkml/b761e6ec-a874-4d06-8437-a3a717a91abb@intel.com/
I can pick up from your "aegl" tree. Let me know otherwise.
I am not in a hurry. I will plan to post the series next week.
>
>> * @evtid: event id
>> + * @rid: index of the resource for this event
>> * @name: name of the event
>> * @configurable: true if the event is configurable
>> - * @list: entry in &rdt_resource->evt_list
>> + * @enabled: true if the event is enabled
>> */
>> struct mon_evt {
>> enum resctrl_event_id evtid;
>> + enum resctrl_res_level rid;
>> char *name;
>> bool configurable;
>> - struct list_head list;
>> + bool enabled;
>> };
>>
>> +extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
>> +
>> +#define for_each_mon_event(mevt) for (mevt = &mon_event_all[QOS_FIRST_EVENT]; \
>> + mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++)
>> +
>
>>From what I can tell this series does not build on some core changes
> made by this patch:
> - note that resource ID is added to struct mon_evt and the events
> are *removed* from the evt_list associated with the resource. I'll try to point
> out where I see it but this series still behaves as though it is traversing
> evt_list associated with the resource. Take for example
> patch #24 "fs/resctrl: Add event configuration directory under info/L3_MON/":
> resctrl_mkdir_counter_configs() traverses mon_event_all[] that, after this
> patch, contains all events for *all* resources, yet resctrl_mkdir_counter_configs(),
> even though it has a struct rdt_resource as parameter, assumes that all events are
> associated its resource but there is no checking to enforce this.
> - note the new for_each_mon_event() above. This should be used throughout
> instead of open-coding the loop because the array starts at index 0 but
> the first valid entry is at index 1. The above macro makes this easier to
> get right.
Yes. Make sense. Will take of this in patch #24, #28 and #29.--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* RE: [PATCH v14 02/32] x86,fs/resctrl: Consolidate monitor event descriptions
2025-06-25 15:57 ` Moger, Babu
@ 2025-06-25 17:55 ` Luck, Tony
2025-06-25 20:12 ` Luck, Tony
0 siblings, 1 reply; 114+ messages in thread
From: Luck, Tony @ 2025-06-25 17:55 UTC (permalink / raw)
To: babu.moger@amd.com, Chatre, Reinette, corbet@lwn.net,
Dave.Martin@arm.com, james.morse@arm.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org,
rostedt@goodmis.org, paulmck@kernel.org, thuth@redhat.com,
ardb@kernel.org, gregkh@linuxfoundation.org, seanjc@google.com,
thomas.lendacky@amd.com, pawan.kumar.gupta@linux.intel.com,
manali.shukla@amd.com, perry.yuan@amd.com, Huang, Kai,
peterz@infradead.org, Li, Xiaoyao, kan.liang@linux.intel.com,
mario.limonciello@amd.com, Li, Xin3, gautham.shenoy@amd.com,
xin@zytor.com, Bae, Chang Seok, fenghuay@nvidia.com,
peternewman@google.com, Wieczor-Retman, Maciej, Eranian, Stephane,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
> >> /**
> >> - * struct mon_evt - Entry in the event list of a resource
> >> + * struct mon_evt - Description of a monitor event
> >
> > nit: I still think "Properties" is more appropriate.
>
> I will let Tony take care of this.
> Also, there is another comment
> https://lore.kernel.org/lkml/b761e6ec-a874-4d06-8437-a3a717a91abb@intel.com/
>
> I can pick up from your "aegl" tree. Let me know otherwise.
> I am not in a hurry. I will plan to post the series next week.
I'm working on fixing these additional issues. I'll ping you when I
push to my GIT tree.
-Tony
^ permalink raw reply [flat|nested] 114+ messages in thread
* RE: [PATCH v14 02/32] x86,fs/resctrl: Consolidate monitor event descriptions
2025-06-25 17:55 ` Luck, Tony
@ 2025-06-25 20:12 ` Luck, Tony
2025-06-25 22:31 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Luck, Tony @ 2025-06-25 20:12 UTC (permalink / raw)
To: babu.moger@amd.com, Chatre, Reinette, corbet@lwn.net,
Dave.Martin@arm.com, james.morse@arm.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org,
rostedt@goodmis.org, paulmck@kernel.org, thuth@redhat.com,
ardb@kernel.org, gregkh@linuxfoundation.org, seanjc@google.com,
thomas.lendacky@amd.com, pawan.kumar.gupta@linux.intel.com,
manali.shukla@amd.com, perry.yuan@amd.com, Huang, Kai,
peterz@infradead.org, Li, Xiaoyao, kan.liang@linux.intel.com,
mario.limonciello@amd.com, Li, Xin3, gautham.shenoy@amd.com,
xin@zytor.com, Bae, Chang Seok, fenghuay@nvidia.com,
peternewman@google.com, Wieczor-Retman, Maciej, Eranian, Stephane,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
> I'm working on fixing these additional issues. I'll ping you when I
> push to my GIT tree.
Pushed to the rdt-aet-v5.5 branch of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
You need these four commits:
2feb4e5716f7 x86,fs/resctrl: Prepare for more monitor events
3a86f90a9b81 x86/resctrl: Remove 'rdt_mon_features' global variable
3e720a9d3b46 x86,fs/resctrl: Replace architecture event enabled checks
ed06edafba78 x86,fs/resctrl: Consolidate monitor event descriptions
Only first and last have substantive changes. Middle two might just have
changed line numbers because of the first.
To fix the "landmine " code using "while (--idx)" I added a macro to
do the iteration (originally suggested by Fenghua, but there were
only two places to use it then, so it didn't seem worth it.)
Now there are 4 ... so here's my macro:
/* Iterate over memory bandwidth arrays in domain structures */
#define for_each_mbm_idx(idx) \
for (idx = 0; idx < QOS_NUM_L3_MBM_EVENTS; idx++)
Hopefully enough different from:
/* Iterate over all memory bandwidth events */
#define for_each_mbm_event_id(eventid) \
for (eventid = QOS_L3_MBM_TOTAL_EVENT_ID; \
eventid <= QOS_L3_MBM_LOCAL_EVENT_ID; eventid++)
to not cause confusion.
-Tony
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: RE: [PATCH v14 02/32] x86,fs/resctrl: Consolidate monitor event descriptions
2025-06-25 20:12 ` Luck, Tony
@ 2025-06-25 22:31 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-25 22:31 UTC (permalink / raw)
To: Luck, Tony, babu.moger@amd.com, Chatre, Reinette, corbet@lwn.net,
Dave.Martin@arm.com, james.morse@arm.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org,
rostedt@goodmis.org, paulmck@kernel.org, thuth@redhat.com,
ardb@kernel.org, gregkh@linuxfoundation.org, seanjc@google.com,
thomas.lendacky@amd.com, pawan.kumar.gupta@linux.intel.com,
manali.shukla@amd.com, perry.yuan@amd.com, Huang, Kai,
peterz@infradead.org, Li, Xiaoyao, kan.liang@linux.intel.com,
mario.limonciello@amd.com, Li, Xin3, gautham.shenoy@amd.com,
xin@zytor.com, Bae, Chang Seok, fenghuay@nvidia.com,
peternewman@google.com, Wieczor-Retman, Maciej, Eranian, Stephane,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Hi Tony,
On 6/25/2025 3:12 PM, Luck, Tony wrote:
>> I'm working on fixing these additional issues. I'll ping you when I
>> push to my GIT tree.
>
> Pushed to the rdt-aet-v5.5 branch of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
>
> You need these four commits:
>
> 2feb4e5716f7 x86,fs/resctrl: Prepare for more monitor events
> 3a86f90a9b81 x86/resctrl: Remove 'rdt_mon_features' global variable
> 3e720a9d3b46 x86,fs/resctrl: Replace architecture event enabled checks
> ed06edafba78 x86,fs/resctrl: Consolidate monitor event descriptions
>
> Only first and last have substantive changes. Middle two might just have
> changed line numbers because of the first.
>
> To fix the "landmine " code using "while (--idx)" I added a macro to
> do the iteration (originally suggested by Fenghua, but there were
> only two places to use it then, so it didn't seem worth it.)
>
> Now there are 4 ... so here's my macro:
>
> /* Iterate over memory bandwidth arrays in domain structures */
> #define for_each_mbm_idx(idx) \
> for (idx = 0; idx < QOS_NUM_L3_MBM_EVENTS; idx++)
>
> Hopefully enough different from:
>
> /* Iterate over all memory bandwidth events */
> #define for_each_mbm_event_id(eventid) \
> for (eventid = QOS_L3_MBM_TOTAL_EVENT_ID; \
> eventid <= QOS_L3_MBM_LOCAL_EVENT_ID; eventid++)
>
> to not cause confusion.
>
Picked up the patches. Applied cleanly. Thanks a lot.
Thanks,
Babu
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 03/32] x86,fs/resctrl: Replace architecture event enabled checks
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2025-06-13 21:04 ` [PATCH v14 01/32] x86,fs/resctrl: Remove unappropriate references to cacheinfo in the resctrl subsystem Babu Moger
2025-06-13 21:04 ` [PATCH v14 02/32] x86,fs/resctrl: Consolidate monitor event descriptions Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-13 21:04 ` [PATCH v14 04/32] x86/resctrl: Remove 'rdt_mon_features' global variable Babu Moger
` (30 subsequent siblings)
33 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
From: Tony Luck <tony.luck@intel.com>
The resctrl file system now has complete knowledge of the status
of every event. So there is no need for per-event function calls
to check.
Replace each of the resctrl_arch_is_{event}enabled() calls with
resctrl_is_mon_event_enabled(QOS_{EVENT}).
No functional change.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
v14: This is Tony's work. This is part of Tony's telemetry series.
https://lore.kernel.org/lkml/20250521225049.132551-1-tony.luck@intel.com/
Tony made special update for me to include in this series.
https://lore.kernel.org/lkml/20250609162139.91651-1-tony.luck@intel.com/
---
arch/x86/include/asm/resctrl.h | 15 ---------------
arch/x86/kernel/cpu/resctrl/core.c | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 4 ++--
fs/resctrl/ctrlmondata.c | 4 ++--
fs/resctrl/monitor.c | 16 +++++++++++-----
fs/resctrl/rdtgroup.c | 18 +++++++++---------
include/linux/resctrl.h | 2 ++
7 files changed, 28 insertions(+), 35 deletions(-)
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index feb93b50e990..b1dd5d6b87db 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -84,21 +84,6 @@ static inline void resctrl_arch_disable_mon(void)
static_branch_dec_cpuslocked(&rdt_enable_key);
}
-static inline bool resctrl_arch_is_llc_occupancy_enabled(void)
-{
- return (rdt_mon_features & (1 << QOS_L3_OCCUP_EVENT_ID));
-}
-
-static inline bool resctrl_arch_is_mbm_total_enabled(void)
-{
- return (rdt_mon_features & (1 << QOS_L3_MBM_TOTAL_EVENT_ID));
-}
-
-static inline bool resctrl_arch_is_mbm_local_enabled(void)
-{
- return (rdt_mon_features & (1 << QOS_L3_MBM_LOCAL_EVENT_ID));
-}
-
/*
* __resctrl_sched_in() - Writes the task's CLOSid/RMID to IA32_PQR_MSR
*
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7fcae25874fe..1a319ce9328c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -402,13 +402,13 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
{
size_t tsize;
- if (resctrl_arch_is_mbm_total_enabled()) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
tsize = sizeof(*hw_dom->arch_mbm_total);
hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
if (!hw_dom->arch_mbm_total)
return -ENOMEM;
}
- if (resctrl_arch_is_mbm_local_enabled()) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
tsize = sizeof(*hw_dom->arch_mbm_local);
hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
if (!hw_dom->arch_mbm_local) {
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c261558276cd..61d38517e2bf 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -207,11 +207,11 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
{
struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
- if (resctrl_arch_is_mbm_total_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
memset(hw_dom->arch_mbm_total, 0,
sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
- if (resctrl_arch_is_mbm_local_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
memset(hw_dom->arch_mbm_local, 0,
sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
}
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index d98e0d2de09f..ad7ffc6acf13 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -473,12 +473,12 @@ ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
rdt_last_cmd_clear();
if (!strcmp(buf, "mbm_local_bytes")) {
- if (resctrl_arch_is_mbm_local_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
else
ret = -EINVAL;
} else if (!strcmp(buf, "mbm_total_bytes")) {
- if (resctrl_arch_is_mbm_total_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;
else
ret = -EINVAL;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 2313e48de55f..9e988b2c1a22 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -336,7 +336,7 @@ void free_rmid(u32 closid, u32 rmid)
entry = __rmid_entry(idx);
- if (resctrl_arch_is_llc_occupancy_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
add_rmid_to_limbo(entry);
else
list_add_tail(&entry->list, &rmid_free_lru);
@@ -637,10 +637,10 @@ static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
* This is protected from concurrent reads from user as both
* the user and overflow handler hold the global mutex.
*/
- if (resctrl_arch_is_mbm_total_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_TOTAL_EVENT_ID);
- if (resctrl_arch_is_mbm_local_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_LOCAL_EVENT_ID);
}
@@ -879,6 +879,12 @@ void resctrl_enable_mon_event(enum resctrl_event_id eventid)
mon_event_all[eventid].enabled = true;
}
+bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
+{
+ return eventid >= QOS_FIRST_EVENT && eventid < QOS_NUM_EVENTS &&
+ mon_event_all[eventid].enabled;
+}
+
/**
* resctrl_mon_resource_init() - Initialise global monitoring structures.
*
@@ -914,9 +920,9 @@ int resctrl_mon_resource_init(void)
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
- if (resctrl_arch_is_mbm_local_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
mba_mbps_default_event = QOS_L3_MBM_LOCAL_EVENT_ID;
- else if (resctrl_arch_is_mbm_total_enabled())
+ else if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
return 0;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index b95501d4b5de..a7eeb33501da 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -123,8 +123,8 @@ void rdt_staged_configs_clear(void)
static bool resctrl_is_mbm_enabled(void)
{
- return (resctrl_arch_is_mbm_total_enabled() ||
- resctrl_arch_is_mbm_local_enabled());
+ return (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID) ||
+ resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID));
}
static bool resctrl_is_mbm_event(int e)
@@ -196,7 +196,7 @@ static int closid_alloc(void)
lockdep_assert_held(&rdtgroup_mutex);
if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID) &&
- resctrl_arch_is_llc_occupancy_enabled()) {
+ resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) {
cleanest_closid = resctrl_find_cleanest_closid();
if (cleanest_closid < 0)
return cleanest_closid;
@@ -4051,7 +4051,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
if (resctrl_is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
- if (resctrl_arch_is_llc_occupancy_enabled() && has_busy_rmid(d)) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
/*
* When a package is going down, forcefully
* decrement rmid->ebusy. There is no way to know
@@ -4087,12 +4087,12 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
size_t tsize;
- if (resctrl_arch_is_llc_occupancy_enabled()) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) {
d->rmid_busy_llc = bitmap_zalloc(idx_limit, GFP_KERNEL);
if (!d->rmid_busy_llc)
return -ENOMEM;
}
- if (resctrl_arch_is_mbm_total_enabled()) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
tsize = sizeof(*d->mbm_total);
d->mbm_total = kcalloc(idx_limit, tsize, GFP_KERNEL);
if (!d->mbm_total) {
@@ -4100,7 +4100,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
return -ENOMEM;
}
}
- if (resctrl_arch_is_mbm_local_enabled()) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
tsize = sizeof(*d->mbm_local);
d->mbm_local = kcalloc(idx_limit, tsize, GFP_KERNEL);
if (!d->mbm_local) {
@@ -4145,7 +4145,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
RESCTRL_PICK_ANY_CPU);
}
- if (resctrl_arch_is_llc_occupancy_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
/*
@@ -4220,7 +4220,7 @@ void resctrl_offline_cpu(unsigned int cpu)
cancel_delayed_work(&d->mbm_over);
mbm_setup_overflow_handler(d, 0, cpu);
}
- if (resctrl_arch_is_llc_occupancy_enabled() &&
+ if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) &&
cpu == d->cqm_work_cpu && has_busy_rmid(d)) {
cancel_delayed_work(&d->cqm_limbo);
cqm_setup_limbo_handler(d, 0, cpu);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 2944042bd84c..40aba6b5d4f0 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -372,6 +372,8 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
void resctrl_enable_mon_event(enum resctrl_event_id eventid);
+bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
+
bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt);
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v14 04/32] x86/resctrl: Remove 'rdt_mon_features' global variable
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (2 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 03/32] x86,fs/resctrl: Replace architecture event enabled checks Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-13 21:04 ` [PATCH v14 05/32] x86,fs/resctrl: Prepare for more monitor events Babu Moger
` (29 subsequent siblings)
33 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
From: Tony Luck <tony.luck@intel.com>
rdt_mon_features is used as a bitmask of enabled monitor events. A monitor
event's status is now maintained in mon_evt::enabled with all monitor
events' mon_evt structures found in the filesystem's mon_event_all[] array.
Remove the remaining uses of rdt_mon_features.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
v14: This is Tony's work. This is part of Tony's telemetry series.
https://lore.kernel.org/lkml/20250521225049.132551-1-tony.luck@intel.com/
Tony made special update for me to include in this series.
https://lore.kernel.org/lkml/20250609162139.91651-1-tony.luck@intel.com/.
---
arch/x86/include/asm/resctrl.h | 1 -
arch/x86/kernel/cpu/resctrl/core.c | 9 +++++----
arch/x86/kernel/cpu/resctrl/monitor.c | 5 -----
3 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index b1dd5d6b87db..575f8408a9e7 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -44,7 +44,6 @@ DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
extern bool rdt_alloc_capable;
extern bool rdt_mon_capable;
-extern unsigned int rdt_mon_features;
DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1a319ce9328c..5d14f9a14eda 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -863,21 +863,22 @@ static __init bool get_rdt_alloc_resources(void)
static __init bool get_rdt_mon_resources(void)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ bool ret = false;
if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
- rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);
+ ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
- rdt_mon_features |= (1 << QOS_L3_MBM_TOTAL_EVENT_ID);
+ ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
- rdt_mon_features |= (1 << QOS_L3_MBM_LOCAL_EVENT_ID);
+ ret = true;
}
- if (!rdt_mon_features)
+ if (!ret)
return false;
return !rdt_get_mon_l3_config(r);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 61d38517e2bf..07f8ab097cbe 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -31,11 +31,6 @@
*/
bool rdt_mon_capable;
-/*
- * Global to indicate which monitoring events are enabled.
- */
-unsigned int rdt_mon_features;
-
#define CF(cf) ((unsigned long)(1048576 * (cf) + 0.5))
static int snc_nodes_per_l3_cache = 1;
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v14 05/32] x86,fs/resctrl: Prepare for more monitor events
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (3 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 04/32] x86/resctrl: Remove 'rdt_mon_features' global variable Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-24 21:30 ` Reinette Chatre
2025-06-13 21:04 ` [PATCH v14 06/32] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (28 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
From: Tony Luck <tony.luck@intel.com>
There's a rule in computer programming that objects appear zero,
once, or many times. So code accordingly.
There are two MBM events and resctrl is coded with a lot of
if (local)
do one thing
if (total)
do a different thing
Change the rdt_mon_domain and rdt_hw_mon_domain structures to hold arrays
of pointers to per event data instead of explicit fields for total and
local bandwidth.
Simplify by coding for many events using loops on which are enabled.
Move resctrl_is_mbm_event() to <linux/resctrl.h> so it can be used more
widely. Also provide a for_each_mbm_event_id() helper macro.
Cleanup variable names in functions touched to consistently use
"eventid" for those with type enum resctrl_event_id.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
v14: This is Tony's work. This is part of Tony's telemetry series.
https://lore.kernel.org/lkml/20250521225049.132551-1-tony.luck@intel.com/
Tony made special update for me to include in this series.
https://lore.kernel.org/lkml/20250609162139.91651-1-tony.luck@intel.com/.
---
arch/x86/kernel/cpu/resctrl/core.c | 38 ++++++++++----------
arch/x86/kernel/cpu/resctrl/internal.h | 9 ++---
arch/x86/kernel/cpu/resctrl/monitor.c | 36 +++++++++----------
fs/resctrl/monitor.c | 13 ++++---
fs/resctrl/rdtgroup.c | 48 ++++++++++++--------------
include/linux/resctrl.h | 18 +++++++---
include/linux/resctrl_types.h | 3 ++
7 files changed, 88 insertions(+), 77 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 5d14f9a14eda..6bf2103aac27 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -365,8 +365,8 @@ static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
static void mon_domain_free(struct rdt_hw_mon_domain *hw_dom)
{
- kfree(hw_dom->arch_mbm_total);
- kfree(hw_dom->arch_mbm_local);
+ for (int i = 0; i < QOS_NUM_L3_MBM_EVENTS; i++)
+ kfree(hw_dom->arch_mbm_states[i]);
kfree(hw_dom);
}
@@ -400,25 +400,27 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *
*/
static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
{
- size_t tsize;
-
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
- tsize = sizeof(*hw_dom->arch_mbm_total);
- hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
- if (!hw_dom->arch_mbm_total)
- return -ENOMEM;
- }
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
- tsize = sizeof(*hw_dom->arch_mbm_local);
- hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
- if (!hw_dom->arch_mbm_local) {
- kfree(hw_dom->arch_mbm_total);
- hw_dom->arch_mbm_total = NULL;
- return -ENOMEM;
- }
+ size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
+ enum resctrl_event_id eventid;
+ int idx;
+
+ for_each_mbm_event_id(eventid) {
+ if (!resctrl_is_mon_event_enabled(eventid))
+ continue;
+ idx = MBM_STATE_IDX(eventid);
+ hw_dom->arch_mbm_states[idx] = kcalloc(num_rmid, tsize, GFP_KERNEL);
+ if (!hw_dom->arch_mbm_states[idx])
+ goto cleanup;
}
return 0;
+cleanup:
+ while (--idx >= 0) {
+ kfree(hw_dom->arch_mbm_states[idx]);
+ hw_dom->arch_mbm_states[idx] = NULL;
+ }
+
+ return -ENOMEM;
}
static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 5e3c41b36437..44ef0d94131e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -54,15 +54,16 @@ struct rdt_hw_ctrl_domain {
* struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
* a resource for a monitor function
* @d_resctrl: Properties exposed to the resctrl file system
- * @arch_mbm_total: arch private state for MBM total bandwidth
- * @arch_mbm_local: arch private state for MBM local bandwidth
+ * @arch_mbm_states: arch private state for each MBM event
+ * @arch_mbm_states: Per-event pointer to the MBM event's saved state.
+ * An MBM event's state is an array of struct arch_mbm_state
+ * indexed by RMID on x86 or combined CLOSID, RMID on Arm.
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
struct rdt_hw_mon_domain {
struct rdt_mon_domain d_resctrl;
- struct arch_mbm_state *arch_mbm_total;
- struct arch_mbm_state *arch_mbm_local;
+ struct arch_mbm_state *arch_mbm_states[QOS_NUM_L3_MBM_EVENTS];
};
static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 07f8ab097cbe..0add57b29a4d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -161,18 +161,14 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_do
u32 rmid,
enum resctrl_event_id eventid)
{
- switch (eventid) {
- case QOS_L3_OCCUP_EVENT_ID:
- return NULL;
- case QOS_L3_MBM_TOTAL_EVENT_ID:
- return &hw_dom->arch_mbm_total[rmid];
- case QOS_L3_MBM_LOCAL_EVENT_ID:
- return &hw_dom->arch_mbm_local[rmid];
- default:
- /* Never expect to get here */
- WARN_ON_ONCE(1);
+ struct arch_mbm_state *state;
+
+ if (!resctrl_is_mbm_event(eventid))
return NULL;
- }
+
+ state = hw_dom->arch_mbm_states[MBM_STATE_IDX(eventid)];
+
+ return state ? &state[rmid] : NULL;
}
void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
@@ -201,14 +197,16 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
{
struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
-
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
- memset(hw_dom->arch_mbm_total, 0,
- sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
-
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
- memset(hw_dom->arch_mbm_local, 0,
- sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
+ enum resctrl_event_id eventid;
+ int idx;
+
+ for_each_mbm_event_id(eventid) {
+ if (!resctrl_is_mon_event_enabled(eventid))
+ continue;
+ idx = MBM_STATE_IDX(eventid);
+ memset(hw_dom->arch_mbm_states[idx], 0,
+ sizeof(struct arch_mbm_state) * r->num_rmid);
+ }
}
static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9e988b2c1a22..dcc6c00eb362 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -346,15 +346,14 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
u32 rmid, enum resctrl_event_id evtid)
{
u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+ struct mbm_state *state;
- switch (evtid) {
- case QOS_L3_MBM_TOTAL_EVENT_ID:
- return &d->mbm_total[idx];
- case QOS_L3_MBM_LOCAL_EVENT_ID:
- return &d->mbm_local[idx];
- default:
+ if (!resctrl_is_mbm_event(evtid))
return NULL;
- }
+
+ state = d->mbm_states[MBM_STATE_IDX(evtid)];
+
+ return state ? &state[idx] : NULL;
}
static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index a7eeb33501da..bd6718f0ffd6 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -127,12 +127,6 @@ static bool resctrl_is_mbm_enabled(void)
resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID));
}
-static bool resctrl_is_mbm_event(int e)
-{
- return (e >= QOS_L3_MBM_TOTAL_EVENT_ID &&
- e <= QOS_L3_MBM_LOCAL_EVENT_ID);
-}
-
/*
* Trivial allocator for CLOSIDs. Use BITMAP APIs to manipulate a bitmap
* of free CLOSIDs.
@@ -4024,8 +4018,10 @@ static void rdtgroup_setup_default(void)
static void domain_destroy_mon_state(struct rdt_mon_domain *d)
{
bitmap_free(d->rmid_busy_llc);
- kfree(d->mbm_total);
- kfree(d->mbm_local);
+ for (int i = 0; i < QOS_NUM_L3_MBM_EVENTS; i++) {
+ kfree(d->mbm_states[i]);
+ d->mbm_states[i] = NULL;
+ }
}
void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
@@ -4085,32 +4081,34 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d)
{
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
- size_t tsize;
+ size_t tsize = sizeof(*d->mbm_states[0]);
+ enum resctrl_event_id eventid;
+ int idx;
if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) {
d->rmid_busy_llc = bitmap_zalloc(idx_limit, GFP_KERNEL);
if (!d->rmid_busy_llc)
return -ENOMEM;
}
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
- tsize = sizeof(*d->mbm_total);
- d->mbm_total = kcalloc(idx_limit, tsize, GFP_KERNEL);
- if (!d->mbm_total) {
- bitmap_free(d->rmid_busy_llc);
- return -ENOMEM;
- }
- }
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
- tsize = sizeof(*d->mbm_local);
- d->mbm_local = kcalloc(idx_limit, tsize, GFP_KERNEL);
- if (!d->mbm_local) {
- bitmap_free(d->rmid_busy_llc);
- kfree(d->mbm_total);
- return -ENOMEM;
- }
+
+ for_each_mbm_event_id(eventid) {
+ if (!resctrl_is_mon_event_enabled(eventid))
+ continue;
+ idx = MBM_STATE_IDX(eventid);
+ d->mbm_states[idx] = kcalloc(idx_limit, tsize, GFP_KERNEL);
+ if (!d->mbm_states[idx])
+ goto cleanup;
}
return 0;
+cleanup:
+ bitmap_free(d->rmid_busy_llc);
+ while (--idx >= 0) {
+ kfree(d->mbm_states[idx]);
+ d->mbm_states[idx] = NULL;
+ }
+
+ return -ENOMEM;
}
int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 40aba6b5d4f0..bbe57eff962b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -161,8 +161,9 @@ struct rdt_ctrl_domain {
* @hdr: common header for different domain types
* @ci_id: cache info id for this domain
* @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold
- * @mbm_total: saved state for MBM total bandwidth
- * @mbm_local: saved state for MBM local bandwidth
+ * @mbm_states: Per-event pointer to the MBM event's saved state.
+ * An MBM event's state is an array of struct mbm_state
+ * indexed by RMID on x86 or combined CLOSID, RMID on Arm.
* @mbm_over: worker to periodically read MBM h/w counters
* @cqm_limbo: worker to periodically read CQM h/w counters
* @mbm_work_cpu: worker CPU for MBM h/w counters
@@ -172,8 +173,7 @@ struct rdt_mon_domain {
struct rdt_domain_hdr hdr;
unsigned int ci_id;
unsigned long *rmid_busy_llc;
- struct mbm_state *mbm_total;
- struct mbm_state *mbm_local;
+ struct mbm_state *mbm_states[QOS_NUM_L3_MBM_EVENTS];
struct delayed_work mbm_over;
struct delayed_work cqm_limbo;
int mbm_work_cpu;
@@ -376,6 +376,16 @@ bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt);
+static inline bool resctrl_is_mbm_event(enum resctrl_event_id eventid)
+{
+ return (eventid >= QOS_L3_MBM_TOTAL_EVENT_ID &&
+ eventid <= QOS_L3_MBM_LOCAL_EVENT_ID);
+}
+
+#define for_each_mbm_event_id(eventid) \
+ for (eventid = QOS_L3_MBM_TOTAL_EVENT_ID; \
+ eventid <= QOS_L3_MBM_LOCAL_EVENT_ID; eventid++)
+
/**
* resctrl_arch_mon_event_config_write() - Write the config for an event.
* @config_info: struct resctrl_mon_config_info describing the resource, domain
diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index 2dadbc54e4b3..d98351663c2c 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -51,4 +51,7 @@ enum resctrl_event_id {
QOS_NUM_EVENTS,
};
+#define QOS_NUM_L3_MBM_EVENTS (QOS_L3_MBM_LOCAL_EVENT_ID - QOS_L3_MBM_TOTAL_EVENT_ID + 1)
+#define MBM_STATE_IDX(evt) ((evt) - QOS_L3_MBM_TOTAL_EVENT_ID)
+
#endif /* __LINUX_RESCTRL_TYPES_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 05/32] x86,fs/resctrl: Prepare for more monitor events
2025-06-13 21:04 ` [PATCH v14 05/32] x86,fs/resctrl: Prepare for more monitor events Babu Moger
@ 2025-06-24 21:30 ` Reinette Chatre
0 siblings, 0 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 21:30 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu/Tony,
On 6/13/25 2:04 PM, Babu Moger wrote:
> From: Tony Luck <tony.luck@intel.com>
...
> @@ -400,25 +400,27 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *
> */
> static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
> {
> - size_t tsize;
> -
> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
> - tsize = sizeof(*hw_dom->arch_mbm_total);
> - hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
> - if (!hw_dom->arch_mbm_total)
> - return -ENOMEM;
> - }
> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
> - tsize = sizeof(*hw_dom->arch_mbm_local);
> - hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
> - if (!hw_dom->arch_mbm_local) {
> - kfree(hw_dom->arch_mbm_total);
> - hw_dom->arch_mbm_total = NULL;
> - return -ENOMEM;
> - }
> + size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
> + enum resctrl_event_id eventid;
> + int idx;
> +
> + for_each_mbm_event_id(eventid) {
> + if (!resctrl_is_mon_event_enabled(eventid))
> + continue;
> + idx = MBM_STATE_IDX(eventid);
> + hw_dom->arch_mbm_states[idx] = kcalloc(num_rmid, tsize, GFP_KERNEL);
> + if (!hw_dom->arch_mbm_states[idx])
> + goto cleanup;
> }
>
> return 0;
> +cleanup:
> + while (--idx >= 0) {
(please see note about this pattern below)
> + kfree(hw_dom->arch_mbm_states[idx]);
> + hw_dom->arch_mbm_states[idx] = NULL;
> + }
> +
> + return -ENOMEM;
> }
>
> static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 5e3c41b36437..44ef0d94131e 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -54,15 +54,16 @@ struct rdt_hw_ctrl_domain {
> * struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
> * a resource for a monitor function
> * @d_resctrl: Properties exposed to the resctrl file system
> - * @arch_mbm_total: arch private state for MBM total bandwidth
> - * @arch_mbm_local: arch private state for MBM local bandwidth
> + * @arch_mbm_states: arch private state for each MBM event
Duplicate @arch_mbm_states
> + * @arch_mbm_states: Per-event pointer to the MBM event's saved state.
> + * An MBM event's state is an array of struct arch_mbm_state
> + * indexed by RMID on x86 or combined CLOSID, RMID on Arm.
The "or combined CLOSID, RMID on Arm" can be dropped from the x86 arch specific
docs.
> *
> * Members of this structure are accessed via helpers that provide abstraction.
> */
> struct rdt_hw_mon_domain {
> struct rdt_mon_domain d_resctrl;
> - struct arch_mbm_state *arch_mbm_total;
> - struct arch_mbm_state *arch_mbm_local;
> + struct arch_mbm_state *arch_mbm_states[QOS_NUM_L3_MBM_EVENTS];
> };
>
> static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 07f8ab097cbe..0add57b29a4d 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -161,18 +161,14 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_do
> u32 rmid,
> enum resctrl_event_id eventid)
> {
> - switch (eventid) {
> - case QOS_L3_OCCUP_EVENT_ID:
> - return NULL;
> - case QOS_L3_MBM_TOTAL_EVENT_ID:
> - return &hw_dom->arch_mbm_total[rmid];
> - case QOS_L3_MBM_LOCAL_EVENT_ID:
> - return &hw_dom->arch_mbm_local[rmid];
> - default:
> - /* Never expect to get here */
> - WARN_ON_ONCE(1);
> + struct arch_mbm_state *state;
> +
> + if (!resctrl_is_mbm_event(eventid))
> return NULL;
> - }
> +
> + state = hw_dom->arch_mbm_states[MBM_STATE_IDX(eventid)];
> +
> + return state ? &state[rmid] : NULL;
> }
>
> void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
> @@ -201,14 +197,16 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
> void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
> {
> struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> -
> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> - memset(hw_dom->arch_mbm_total, 0,
> - sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
> -
> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
> - memset(hw_dom->arch_mbm_local, 0,
> - sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
> + enum resctrl_event_id eventid;
> + int idx;
> +
> + for_each_mbm_event_id(eventid) {
> + if (!resctrl_is_mon_event_enabled(eventid))
> + continue;
> + idx = MBM_STATE_IDX(eventid);
> + memset(hw_dom->arch_mbm_states[idx], 0,
> + sizeof(struct arch_mbm_state) * r->num_rmid);
sizeof(struct arch_mbm_state) -> sizeof(*hw_dom->arch_mbm_states[0])?
> + }
> }
>
> static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
...
> void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
> @@ -4085,32 +4081,34 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
> static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d)
> {
> u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> - size_t tsize;
> + size_t tsize = sizeof(*d->mbm_states[0]);
> + enum resctrl_event_id eventid;
> + int idx;
>
> if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) {
> d->rmid_busy_llc = bitmap_zalloc(idx_limit, GFP_KERNEL);
> if (!d->rmid_busy_llc)
> return -ENOMEM;
> }
> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
> - tsize = sizeof(*d->mbm_total);
> - d->mbm_total = kcalloc(idx_limit, tsize, GFP_KERNEL);
> - if (!d->mbm_total) {
> - bitmap_free(d->rmid_busy_llc);
> - return -ENOMEM;
> - }
> - }
> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
> - tsize = sizeof(*d->mbm_local);
> - d->mbm_local = kcalloc(idx_limit, tsize, GFP_KERNEL);
> - if (!d->mbm_local) {
> - bitmap_free(d->rmid_busy_llc);
> - kfree(d->mbm_total);
> - return -ENOMEM;
> - }
> +
> + for_each_mbm_event_id(eventid) {
> + if (!resctrl_is_mon_event_enabled(eventid))
> + continue;
> + idx = MBM_STATE_IDX(eventid);
> + d->mbm_states[idx] = kcalloc(idx_limit, tsize, GFP_KERNEL);
> + if (!d->mbm_states[idx])
> + goto cleanup;
> }
Looks like this cleanup pattern is a landmine that this
series stepped on in patch #13. Any code added here that fails
and then run the "cleanup" code will either end up with a memory
leak or accessing an uninitialized variable.
>
> return 0;
> +cleanup:
> + bitmap_free(d->rmid_busy_llc);
> + while (--idx >= 0) {
> + kfree(d->mbm_states[idx]);
> + d->mbm_states[idx] = NULL;
> + }
This pattern should be made safer by not relying on idx, or
ensure here that idx is initialized correctly.
> +
> + return -ENOMEM;
> }
>
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 06/32] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (4 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 05/32] x86,fs/resctrl: Prepare for more monitor events Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-24 21:31 ` Reinette Chatre
2025-06-13 21:04 ` [PATCH v14 07/32] x86/resctrl: Add ABMC feature in the command line options Babu Moger
` (27 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware. The
counters of any other RMIDs which are no longer being tracked will be reset
to zero. The MBM event counters return "Unavailable" for the RMIDs that are
not tracked by hardware. So, there can be only limited number of groups
that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups are
being tracked for certain point of time. Users do not have the option to
monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.
The ABMC feature allows users to assign a hardware counter ID to an RMID,
event pair and monitor bandwidth usage as long as it is assigned. The
hardware continues to track the assigned counter until it is explicitly
unassigned by the user. There is no need to worry about counters being
reset during this period. Additionally, the user can specify a particular
type of memory transactions for the counter to track.
Without ABMC enabled, monitoring will work in current mode without
assignment option.
The Linux resctrl subsystem provides an interface that allows monitoring of
up to two memory bandwidth events per group, selected from a combination of
available total and local events. When ABMC is enabled, two events will be
assigned to each group by default, in line with the current interface
design. Users will also have the option to configure which types of memory
transactions are counted by these events.
Due to the limited number of available counters (32), users may quickly
exhaust the available counters. If the system runs out of assignable ABMC
counters, the kernel will report an error. In such cases, users will need
to unassign one or more active counters to free up counters for new
assignments. resctrl will provide options to assign or unassign events
through the group-specific interface file.
The feature is detected via CPUID_Fn80000020_EBX_x00 bit 5.
Bits Description
5 ABMC (Assignable Bandwidth Monitoring Counters)
The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
Note: Checkpatch checks/warnings are ignored to maintain coding style.
v14: Removed the dependancy on X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL.
as discussed in https://lore.kernel.org/lkml/5f8b21c6-5166-46a6-be14-0c7c9bfb7cde@intel.com/
Need to re-work on ABMC enumeration during the init.
Updated changelog with few text update.
v13: Updated the commit log with Linux interface details.
v12: Removed the dependancy on X86_FEATURE_BMEC.
Removed the Reviewed-by tag as patch has changed.
v11: No changes.
v10: No changes.
v9: Took care of couple of minor merge conflicts. No other changes.
v8: No changes.
v7: Removed "" from feature flags. Not required anymore.
https://lore.kernel.org/lkml/20240817145058.GCZsC40neU4wkPXeVR@fat_crate.local/
v6: Added Reinette's Reviewed-by. Moved the Checkpatch note below ---.
v5: Minor rebase change and subject line update.
v4: Changes because of rebase. Feature word 21 has few more additions now.
Changed the text to "tracked by hardware" instead of active.
v3: Change because of rebase. Actual patch did not change.
v2: Added dependency on X86_FEATURE_BMEC.
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ee176236c2be..44ae69a8748d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -487,6 +487,7 @@
#define X86_FEATURE_PREFER_YMM (21*32+ 8) /* Avoid ZMM registers due to downclocking */
#define X86_FEATURE_APX (21*32+ 9) /* Advanced Performance Extensions */
#define X86_FEATURE_INDIRECT_THUNK_ITS (21*32+10) /* Use thunk for indirect branches in lower half of cacheline */
+#define X86_FEATURE_ABMC (21*32+11) /* Assignable Bandwidth Monitoring Counters */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index dbf6d71bdf18..d5d4a573aaf7 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -50,6 +50,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
{ X86_FEATURE_SMBA, CPUID_EBX, 2, 0x80000020, 0 },
{ X86_FEATURE_BMEC, CPUID_EBX, 3, 0x80000020, 0 },
+ { X86_FEATURE_ABMC, CPUID_EBX, 5, 0x80000020, 0 },
{ X86_FEATURE_AMD_WORKLOAD_CLASS, CPUID_EAX, 22, 0x80000021, 0 },
{ X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 06/32] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
2025-06-13 21:04 ` [PATCH v14 06/32] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
@ 2025-06-24 21:31 ` Reinette Chatre
2025-06-25 16:28 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 21:31 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:04 PM, Babu Moger wrote:
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
> RMIDs currently assigned to a processor will be tracked by hardware. The
> counters of any other RMIDs which are no longer being tracked will be reset
> to zero. The MBM event counters return "Unavailable" for the RMIDs that are
> not tracked by hardware. So, there can be only limited number of groups
> that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups are
> being tracked for certain point of time. Users do not have the option to
> monitor a group or set of groups for certain period of time without
> worrying about RMID being reset in between.
>
> The ABMC feature allows users to assign a hardware counter ID to an RMID,
> event pair and monitor bandwidth usage as long as it is assigned. The
> hardware continues to track the assigned counter until it is explicitly
> unassigned by the user. There is no need to worry about counters being
> reset during this period. Additionally, the user can specify a particular
> type of memory transactions for the counter to track.
Looks like grammar updates from cover letter did not make it into this
copied text. For example,
"being tracked for certain point of time" -> "being tracked during a particular time"
"for certain period of time" -> "for a certain period of time"
"specify a particular type of memory transactions" -> "specify the type of
memory transactions (e.g., reads, writes)"
>
> Without ABMC enabled, monitoring will work in current mode without
> assignment option.
>
> The Linux resctrl subsystem provides an interface that allows monitoring of
> up to two memory bandwidth events per group, selected from a combination of
> available total and local events. When ABMC is enabled, two events will be
> assigned to each group by default, in line with the current interface
> design. Users will also have the option to configure which types of memory
> transactions are counted by these events.
>
> Due to the limited number of available counters (32), users may quickly
> exhaust the available counters. If the system runs out of assignable ABMC
> counters, the kernel will report an error. In such cases, users will need
> to unassign one or more active counters to free up counters for new
> assignments. resctrl will provide options to assign or unassign events
> through the group-specific interface file.
>
> The feature is detected via CPUID_Fn80000020_EBX_x00 bit 5.
> Bits Description
> 5 ABMC (Assignable Bandwidth Monitoring Counters)
>
> The feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 06/32] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
2025-06-24 21:31 ` Reinette Chatre
@ 2025-06-25 16:28 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-25 16:28 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 16:31, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:04 PM, Babu Moger wrote:
>> Users can create as many monitor groups as RMIDs supported by the hardware.
>> However, bandwidth monitoring feature on AMD system only guarantees that
>> RMIDs currently assigned to a processor will be tracked by hardware. The
>> counters of any other RMIDs which are no longer being tracked will be reset
>> to zero. The MBM event counters return "Unavailable" for the RMIDs that are
>> not tracked by hardware. So, there can be only limited number of groups
>> that can give guaranteed monitoring numbers. With ever changing
>> configurations there is no way to definitely know which of these groups are
>> being tracked for certain point of time. Users do not have the option to
>> monitor a group or set of groups for certain period of time without
>> worrying about RMID being reset in between.
>>
>> The ABMC feature allows users to assign a hardware counter ID to an RMID,
>> event pair and monitor bandwidth usage as long as it is assigned. The
>> hardware continues to track the assigned counter until it is explicitly
>> unassigned by the user. There is no need to worry about counters being
>> reset during this period. Additionally, the user can specify a particular
>> type of memory transactions for the counter to track.
>
> Looks like grammar updates from cover letter did not make it into this
> copied text. For example,
> "being tracked for certain point of time" -> "being tracked during a particular time"
> "for certain period of time" -> "for a certain period of time"
> "specify a particular type of memory transactions" -> "specify the type of
> memory transactions (e.g., reads, writes)"
Yes. Missed it. Will take of it now.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 07/32] x86/resctrl: Add ABMC feature in the command line options
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (5 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 06/32] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-13 21:04 ` [PATCH v14 08/32] x86,fs/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
` (26 subsequent siblings)
33 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Add a kernel command-line parameter to enable or disable the exposure of
the ABMC (Assignable Bandwidth Monitoring Counters) hardware feature to
resctrl.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Slight changelog modification.
v13: Removed the Reviewed-by as the file resctrl.rst is moved to
Documentation/filesystems/resctrl.rst. In that sense patch has changed.
v12: No changes.
v11: No changes.
v10: No changes.
v9: No code changes. Added Reviewed-by.
v8: Commit message update.
v7: No changes
v6: No changes
v5: No changes
v4: No changes
v3: No changes
v2: No changes
---
Documentation/admin-guide/kernel-parameters.txt | 2 +-
Documentation/filesystems/resctrl.rst | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f1f2c0874da9..f2f2511b0ec3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6066,7 +6066,7 @@
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
- mba, smba, bmec.
+ mba, smba, bmec, abmc.
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index c7949dd44f2f..c97fd77a107d 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -26,6 +26,7 @@ MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
MBA (Memory Bandwidth Allocation) "mba"
SMBA (Slow Memory Bandwidth Allocation) ""
BMEC (Bandwidth Monitoring Event Configuration) ""
+ABMC (Assignable Bandwidth Monitoring Counters) ""
=============================================== ================================
Historically, new features were made visible by default in /proc/cpuinfo. This
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 6bf2103aac27..6426b92492dc 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -709,6 +709,7 @@ enum {
RDT_FLAG_MBA,
RDT_FLAG_SMBA,
RDT_FLAG_BMEC,
+ RDT_FLAG_ABMC,
};
#define RDT_OPT(idx, n, f) \
@@ -734,6 +735,7 @@ static struct rdt_options rdt_options[] __ro_after_init = {
RDT_OPT(RDT_FLAG_MBA, "mba", X86_FEATURE_MBA),
RDT_OPT(RDT_FLAG_SMBA, "smba", X86_FEATURE_SMBA),
RDT_OPT(RDT_FLAG_BMEC, "bmec", X86_FEATURE_BMEC),
+ RDT_OPT(RDT_FLAG_ABMC, "abmc", X86_FEATURE_ABMC),
};
#define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v14 08/32] x86,fs/resctrl: Consolidate monitoring related data from rdt_resource
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (6 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 07/32] x86/resctrl: Add ABMC feature in the command line options Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-24 21:32 ` Reinette Chatre
2025-06-13 21:04 ` [PATCH v14 09/32] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
` (25 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
The cache allocation and memory bandwidth allocation feature properties
are consolidated into struct resctrl_cache and struct resctrl_membw
respectively.
In preparation for more monitoring properties that will clobber the
existing resource struct more, re-organize the monitoring specific
properties to also be in a separate structure.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Updated the code comment in resctrl.h.
v13: Changes due to FS/ARCH restructure.
v12: Fixed the conflicts due to recent changes in rdt_resource data structure.
Added new mbm_cfg_mask field to resctrl_mon.
Removed Reviewed-by tag as patch has changed.
v11: No changes.
v10: No changes.
v9: No changes.
v8: Added Reviewed-by from Reinette. No other changes.
v7: Added kernel doc for data structure. Minor text update.
v6: Update commit message and update kernel doc for rdt_resource.
v5: Commit message update.
Also changes related to data structure updates does to SNC support.
v4: New patch.
---
arch/x86/kernel/cpu/resctrl/core.c | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 10 +++++-----
fs/resctrl/rdtgroup.c | 6 +++---
include/linux/resctrl.h | 18 +++++++++++++-----
4 files changed, 23 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 6426b92492dc..22a414802cbb 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -107,7 +107,7 @@ u32 resctrl_arch_system_num_rmid_idx(void)
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
/* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
- return r->num_rmid;
+ return r->mon.num_rmid;
}
struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
@@ -539,7 +539,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
arch_mon_domain_online(r, d);
- if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
mon_domain_free(hw_dom);
return;
}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 0add57b29a4d..42a9e3cc6654 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -130,7 +130,7 @@ static int logical_rmid_to_physical_rmid(int cpu, int lrmid)
if (snc_nodes_per_l3_cache == 1)
return lrmid;
- return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
+ return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->mon.num_rmid;
}
static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
@@ -205,7 +205,7 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
continue;
idx = MBM_STATE_IDX(eventid);
memset(hw_dom->arch_mbm_states[idx], 0,
- sizeof(struct arch_mbm_state) * r->num_rmid);
+ sizeof(struct arch_mbm_state) * r->mon.num_rmid);
}
}
@@ -344,7 +344,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
- r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
+ r->mon.num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
@@ -359,7 +359,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
*
* For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
*/
- threshold = resctrl_rmid_realloc_limit / r->num_rmid;
+ threshold = resctrl_rmid_realloc_limit / r->mon.num_rmid;
/*
* Because num_rmid may not be a power of two, round the value
@@ -373,7 +373,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
/* Detect list of bandwidth sources that can be tracked */
cpuid_count(0x80000020, 3, &eax, &ebx, &ecx, &edx);
- r->mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
+ r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
}
r->mon_capable = true;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index bd6718f0ffd6..5874cfdf8d8d 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1135,7 +1135,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
{
struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
- seq_printf(seq, "%d\n", r->num_rmid);
+ seq_printf(seq, "%d\n", r->mon.num_rmid);
return 0;
}
@@ -1731,9 +1731,9 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
}
/* Value from user cannot be more than the supported set of events */
- if ((val & r->mbm_cfg_mask) != val) {
+ if ((val & r->mon.mbm_cfg_mask) != val) {
rdt_last_cmd_printf("Invalid event configuration: max valid mask is 0x%02x\n",
- r->mbm_cfg_mask);
+ r->mon.mbm_cfg_mask);
return -EINVAL;
}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index bbe57eff962b..22766b8b670b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -255,38 +255,46 @@ enum resctrl_schema_fmt {
RESCTRL_SCHEMA_RANGE,
};
+/**
+ * struct resctrl_mon - Monitoring related data of a resctrl resource.
+ * @num_rmid: Number of RMIDs available.
+ * @mbm_cfg_mask: Memory transactions that can be tracked when bandwidth
+ * monitoring events are configured.
+ */
+struct resctrl_mon {
+ int num_rmid;
+ unsigned int mbm_cfg_mask;
+};
+
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
- * @num_rmid: Number of RMIDs available
* @ctrl_scope: Scope of this resource for control functions
* @mon_scope: Scope of this resource for monitor functions
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
+ * @mon: Monitoring related data.
* @ctrl_domains: RCU list of all control domains for this resource
* @mon_domains: RCU list of all monitor domains for this resource
* @name: Name to use in "schemata" file.
* @schema_fmt: Which format string and parser is used for this schema.
- * @mbm_cfg_mask: Bandwidth sources that can be tracked when bandwidth
- * monitoring events can be configured.
* @cdp_capable: Is the CDP feature available on this resource
*/
struct rdt_resource {
int rid;
bool alloc_capable;
bool mon_capable;
- int num_rmid;
enum resctrl_scope ctrl_scope;
enum resctrl_scope mon_scope;
struct resctrl_cache cache;
struct resctrl_membw membw;
+ struct resctrl_mon mon;
struct list_head ctrl_domains;
struct list_head mon_domains;
char *name;
enum resctrl_schema_fmt schema_fmt;
- unsigned int mbm_cfg_mask;
bool cdp_capable;
};
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 08/32] x86,fs/resctrl: Consolidate monitoring related data from rdt_resource
2025-06-13 21:04 ` [PATCH v14 08/32] x86,fs/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2025-06-24 21:32 ` Reinette Chatre
2025-06-25 16:53 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 21:32 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:04 PM, Babu Moger wrote:
...
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index bbe57eff962b..22766b8b670b 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -255,38 +255,46 @@ enum resctrl_schema_fmt {
> RESCTRL_SCHEMA_RANGE,
> };
>
> +/**
> + * struct resctrl_mon - Monitoring related data of a resctrl resource.
> + * @num_rmid: Number of RMIDs available.
> + * @mbm_cfg_mask: Memory transactions that can be tracked when bandwidth
> + * monitoring events are configured.
"are configured" -> "can be configured" (like it was before). This is a property
that is discovered from hardware. The feature need not be in use for the property
to be valid.
Also, this version switches "Bandwidth sources" -> "Memory transactions". I think this
is a good change but it may be unexpected. Perhaps a snippet in changelog to
point out the motivation for this change: "Also switch "bandwidth sources" term
to "memory transactions" to use consistent term within resctrl for related monitoring
features". Please feel free to improve.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 08/32] x86,fs/resctrl: Consolidate monitoring related data from rdt_resource
2025-06-24 21:32 ` Reinette Chatre
@ 2025-06-25 16:53 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-25 16:53 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 16:32, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:04 PM, Babu Moger wrote:
>
> ...
>
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index bbe57eff962b..22766b8b670b 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -255,38 +255,46 @@ enum resctrl_schema_fmt {
>> RESCTRL_SCHEMA_RANGE,
>> };
>>
>> +/**
>> + * struct resctrl_mon - Monitoring related data of a resctrl resource.
>> + * @num_rmid: Number of RMIDs available.
>> + * @mbm_cfg_mask: Memory transactions that can be tracked when bandwidth
>> + * monitoring events are configured.
>
> "are configured" -> "can be configured" (like it was before). This is a property
> that is discovered from hardware. The feature need not be in use for the property
> to be valid.
Sure.
> Also, this version switches "Bandwidth sources" -> "Memory transactions". I think this
> is a good change but it may be unexpected. Perhaps a snippet in changelog to
> point out the motivation for this change: "Also switch "bandwidth sources" term
> to "memory transactions" to use consistent term within resctrl for related monitoring
> features". Please feel free to improve.
>
Sure. This is how it looks now.
"The cache allocation and memory bandwidth allocation feature properties
are consolidated into struct resctrl_cache and struct resctrl_membw
respectively.
In preparation for more monitoring properties that will clobber the
existing resource struct more, re-organize the monitoring specific
properties to also be in a separate structure.
Also switch "bandwidth sources" term to "memory transactions" to use
consistent term within resctrl for related monitoring features."
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 09/32] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (7 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 08/32] x86,fs/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-24 21:33 ` Reinette Chatre
2025-06-13 21:04 ` [PATCH v14 10/32] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
` (24 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
Bits Description
15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
Monitoring Counter ID + 1
The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Detect the feature and number of assignable counters supported. Also,
enable QOS_L3_MBM_TOTAL_EVENT_ID and QOS_L3_MBM_LOCAL_EVENT_ID upon
detecting the ABMC feature. The current expectation is to support
these two events by default when ABMC is enabled.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Updated enumeration to support ABMC regardless of MBM total and local support.
Updated the changelog accordingly.
v13: No changes.
v12: Resolved conflicts because of latest merge.
Removed Reviewed-by as the patch has changed.
v11: No changes.
v10: No changes.
v9: Added Reviewed-by tag. No code changes
v8: Used GENMASK for the mask.
v7: Removed WARN_ON for num_mbm_cntrs. Decided to dynamically allocate the
bitmap. WARN_ON is not required anymore.
Removed redundant comments.
v6: Commit message update.
Renamed abmc_capable to mbm_cntr_assignable.
v5: Name change num_cntrs to num_mbm_cntrs.
Moved abmc_capable to resctrl_mon.
v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
need to separate this as arch code.
v3: Removed changes related to mon_features.
Moved rdt_cpu_has to core.c and added new function resctrl_arch_has_abmc.
Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
rdt_resource. (James)
v2: Changed the field name to mbm_assign_capable from abmc_capable.
---
arch/x86/kernel/cpu/resctrl/core.c | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 11 ++++++++---
include/linux/resctrl.h | 4 ++++
3 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 22a414802cbb..01b210febc7d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -873,11 +873,11 @@ static __init bool get_rdt_mon_resources(void)
resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
ret = true;
}
- if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
+ if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
ret = true;
}
- if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
+ if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
ret = true;
}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 42a9e3cc6654..a6b9a6ba036d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -339,6 +339,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
unsigned int threshold;
+ u32 eax, ebx, ecx, edx;
snc_nodes_per_l3_cache = snc_get_config();
@@ -368,14 +369,18 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
*/
resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);
- if (rdt_cpu_has(X86_FEATURE_BMEC)) {
- u32 eax, ebx, ecx, edx;
-
+ if (rdt_cpu_has(X86_FEATURE_BMEC) || rdt_cpu_has(X86_FEATURE_ABMC)) {
/* Detect list of bandwidth sources that can be tracked */
cpuid_count(0x80000020, 3, &eax, &ebx, &ecx, &edx);
r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
}
+ if (rdt_cpu_has(X86_FEATURE_ABMC)) {
+ r->mon.mbm_cntr_assignable = true;
+ cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
+ r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
+ }
+
r->mon_capable = true;
return 0;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 22766b8b670b..c0195498bd4a 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -260,10 +260,14 @@ enum resctrl_schema_fmt {
* @num_rmid: Number of RMIDs available.
* @mbm_cfg_mask: Memory transactions that can be tracked when bandwidth
* monitoring events are configured.
+ * @num_mbm_cntrs: Number of assignable counters.
+ * @mbm_cntr_assignable:Is system capable of supporting counter assignment?
*/
struct resctrl_mon {
int num_rmid;
unsigned int mbm_cfg_mask;
+ int num_mbm_cntrs;
+ bool mbm_cntr_assignable;
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 09/32] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
2025-06-13 21:04 ` [PATCH v14 09/32] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2025-06-24 21:33 ` Reinette Chatre
2025-06-25 17:58 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 21:33 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:04 PM, Babu Moger wrote:
> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
> Bits Description
> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
> Monitoring Counter ID + 1
>
> The feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Detect the feature and number of assignable counters supported. Also,
> enable QOS_L3_MBM_TOTAL_EVENT_ID and QOS_L3_MBM_LOCAL_EVENT_ID upon
> detecting the ABMC feature. The current expectation is to support
> these two events by default when ABMC is enabled.
"The current expectation ..." this need not be vague since this is what
this series does. Perhaps previous sentence can be:
"For backward compatibility, upon detecting the assignable counter feature,
enable the mbm_total_bytes and mbm_local_bytes events that users are
familiar with as part of original L3 MBM support." Although, when it comes to
this patch this may not be appropriate in that this is something that
resctrl fs should do, not the architecture.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> arch/x86/kernel/cpu/resctrl/core.c | 4 ++--
> arch/x86/kernel/cpu/resctrl/monitor.c | 11 ++++++++---
> include/linux/resctrl.h | 4 ++++
> 3 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 22a414802cbb..01b210febc7d 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -873,11 +873,11 @@ static __init bool get_rdt_mon_resources(void)
> resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
> ret = true;
> }
> - if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
> + if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
> resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
> ret = true;
> }
> - if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
> + if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
> resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
> ret = true;
This backward compatibility needs to be managed by resctrl fs, no? What do you think of
instead doing:
int resctrl_mon_resource_init(void) {
...
if (r->mon.mbm_cntr_assignable) {
resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
}
...
}
There is another dependency that does not seem to be handled ... ABMC requires
properties enumerated in resctrl_cpu_detect(), but that enumeration is only
done if legacy monitoring features are supported, not ABMC. Does AMD support
enumeration CPUID.(EAX=0xF, ECX=1) if ABMC is supported but not the legacy MBM
total and local?
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 09/32] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
2025-06-24 21:33 ` Reinette Chatre
@ 2025-06-25 17:58 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-25 17:58 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 16:33, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:04 PM, Babu Moger wrote:
>> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
>> Bits Description
>> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
>> Monitoring Counter ID + 1
>>
>> The feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Detect the feature and number of assignable counters supported. Also,
>> enable QOS_L3_MBM_TOTAL_EVENT_ID and QOS_L3_MBM_LOCAL_EVENT_ID upon
>> detecting the ABMC feature. The current expectation is to support
>> these two events by default when ABMC is enabled.
>
> "The current expectation ..." this need not be vague since this is what
> this series does. Perhaps previous sentence can be:
> "For backward compatibility, upon detecting the assignable counter feature,
> enable the mbm_total_bytes and mbm_local_bytes events that users are
> familiar with as part of original L3 MBM support." Although, when it comes to
> this patch this may not be appropriate in that this is something that
> resctrl fs should do, not the architecture.
Sure. Added the above line. Removed the "The current expectation" line.
>
>
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> ---
>> arch/x86/kernel/cpu/resctrl/core.c | 4 ++--
>> arch/x86/kernel/cpu/resctrl/monitor.c | 11 ++++++++---
>> include/linux/resctrl.h | 4 ++++
>> 3 files changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 22a414802cbb..01b210febc7d 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -873,11 +873,11 @@ static __init bool get_rdt_mon_resources(void)
>> resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
>> ret = true;
>> }
>> - if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
>> + if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
>> resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
>> ret = true;
>> }
>> - if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
>> + if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
>> resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
>> ret = true;
>
> This backward compatibility needs to be managed by resctrl fs, no? What do you think of
> instead doing:
Looks good to me.
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index dcc6c00eb362..7e816341da6a 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -924,6 +924,11 @@ int resctrl_mon_resource_init(void)
else if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
+ if (r->mon.mbm_cntr_assignable) {
+ resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
+ resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
+ }
+
return 0;
}
>
> int resctrl_mon_resource_init(void) {
>
> ...
> if (r->mon.mbm_cntr_assignable) {
> resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
> resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
> }
> ...
> }
>
> There is another dependency that does not seem to be handled ... ABMC requires
> properties enumerated in resctrl_cpu_detect(), but that enumeration is only
> done if legacy monitoring features are supported, not ABMC. Does AMD support
> enumeration CPUID.(EAX=0xF, ECX=1) if ABMC is supported but not the legacy MBM
> total and local?
Yes. The CPUID.(EAX=0xF, ECX=1) is kind of building block. I would think
it will always be supported.
Added this check now.
diff --git a/arch/x86/kernel/cpu/resctrl/core.c
b/arch/x86/kernel/cpu/resctrl/core.c
index 22a414802cbb..e9a8c4778d22 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -988,7 +988,8 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
if (cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC) ||
cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL) ||
- cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)) {
+ cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL) ||
+ cpu_has(c, X86_FEATURE_ABMC)) {
u32 eax, ebx, ecx, edx;
/* QoS sub-leaf, EAX=0Fh, ECX=1 */
--
Thanks
Babu Moger
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v14 10/32] x86/resctrl: Add support to enable/disable AMD ABMC feature
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (8 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 09/32] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-24 22:37 ` Reinette Chatre
2025-06-13 21:04 ` [PATCH v14 11/32] fs/resctrl: Introduce the interface to display monitoring modes Babu Moger
` (23 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Add the functionality to enable/disable AMD ABMC feature.
AMD ABMC feature is enabled by setting enabled bit(0) in MSR
L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
to be updated on all the logical processors in the QOS Domain.
Hardware counters will reset when ABMC state is changed.
The ABMC feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Added lockdep_assert_cpus_held() in _resctrl_abmc_enable().
Removed inline for resctrl_arch_mbm_cntr_assign_enabled().
Added prototype descriptions for resctrl_arch_mbm_cntr_assign_enabled()
and resctrl_arch_mbm_cntr_assign_set() in include/linux/resctrl.h.
v13: Resolved minor conflicts with recent FS/ARCH restructure.
v12: Clarified the comment on _resctrl_abmc_enable().
Added the code to reset arch state in _resctrl_abmc_enable().
Resolved the conflicts with latest merge.
v11: Moved the monitoring related calls to monitor.c file.
Moved the changes from include/linux/resctrl.h to
arch/x86/kernel/cpu/resctrl/internal.h.
Removed the Reviewed-by tag as patch changed.
Actual code did not change.
v10: No changes.
v9: Re-ordered the MSR and added Reviewed-by tag.
v8: Commit message update and moved around the comments about L3_QOS_EXT_CFG
to _resctrl_abmc_enable.
v7: Renamed the function
resctrl_arch_get_abmc_enabled() to resctrl_arch_mbm_cntr_assign_enabled().
Merged resctrl_arch_mbm_cntr_assign_disable, resctrl_arch_mbm_cntr_assign_disable
and renamed to resctrl_arch_mbm_cntr_assign_set().
Moved the function definition to linux/resctrl.h.
Passed the struct rdt_resource to these functions.
Removed resctrl_arch_reset_rmid_all() from arch code. This will be done
from the caller.
v6: Renamed abmc_enabled to mbm_cntr_assign_enabled.
Used msr_set_bit and msr_clear_bit for msr updates.
Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().
Made _resctrl_abmc_enable to return void.
v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
Introduced resctrl_arch_get_abmc_enabled to get abmc state from
non-arch code.
Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
Modified commit log to make it clear about AMD ABMC feature.
v3: No changes.
v2: Few text changes in commit message.
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 5 +++
arch/x86/kernel/cpu/resctrl/monitor.c | 45 ++++++++++++++++++++++++++
include/linux/resctrl.h | 20 ++++++++++++
4 files changed, 71 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b7dded3c8113..b92b04fa9888 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1215,6 +1215,7 @@
/* - AMD: */
#define MSR_IA32_MBA_BW_BASE 0xc0000200
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
+#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
/* AMD-V MSRs */
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 44ef0d94131e..1a4e96044aac 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -37,6 +37,9 @@ struct arch_mbm_state {
u64 prev_msr;
};
+/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
+#define ABMC_ENABLE_BIT 0
+
/**
* struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
* a resource for a control function
@@ -103,6 +106,7 @@ struct msr_param {
* @mon_scale: cqm counter * mon_scale = occupancy in bytes
* @mbm_width: Monitor width, to detect and correct for overflow.
* @cdp_enabled: CDP state of this resource
+ * @mbm_cntr_assign_enabled: ABMC feature is enabled
*
* Members of this structure are either private to the architecture
* e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
@@ -116,6 +120,7 @@ struct rdt_hw_resource {
unsigned int mon_scale;
unsigned int mbm_width;
bool cdp_enabled;
+ bool mbm_cntr_assign_enabled;
};
static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index a6b9a6ba036d..0ad9c731c13e 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -399,3 +399,48 @@ void __init intel_rdt_mbm_apply_quirk(void)
mbm_cf_rmidthreshold = mbm_cf_table[cf_index].rmidthreshold;
mbm_cf = mbm_cf_table[cf_index].cf;
}
+
+static void resctrl_abmc_set_one_amd(void *arg)
+{
+ bool *enable = arg;
+
+ if (*enable)
+ msr_set_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+ else
+ msr_clear_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+}
+
+/*
+ * ABMC enable/disable requires update of L3_QOS_EXT_CFG MSR on all the CPUs
+ * associated with all monitor domains.
+ */
+static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
+{
+ struct rdt_mon_domain *d;
+
+ lockdep_assert_cpus_held();
+
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ on_each_cpu_mask(&d->hdr.cpu_mask, resctrl_abmc_set_one_amd,
+ &enable, 1);
+ resctrl_arch_reset_rmid_all(r, d);
+ }
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+ if (r->mon.mbm_cntr_assignable &&
+ hw_res->mbm_cntr_assign_enabled != enable) {
+ _resctrl_abmc_enable(r, enable);
+ hw_res->mbm_cntr_assign_enabled = enable;
+ }
+
+ return 0;
+}
+
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+ return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
+}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index c0195498bd4a..f078ef24a8ad 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -440,6 +440,26 @@ static inline u32 resctrl_get_config_index(u32 closid,
bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level l);
int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
+/**
+ * resctrl_arch_mbm_cntr_assign_enabled() - Check if MBM counter assignment
+ * mode is enabled.
+ * @r: Pointer to the resource structure.
+ *
+ * Return:
+ * true if the assignment mode is enabled, false otherwise.
+ */
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r);
+
+/**
+ * resctrl_arch_mbm_cntr_assign_set() - Configure the MBM counter assignment mode.
+ * @r: Pointer to the resource structure.
+ * @enable: Set to true to enable, false to disable the assignment mode.
+ *
+ * Return:
+ * 0 on success, non-zero on failure.
+ */
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
+
/*
* Update the ctrl_val and apply this config right now.
* Must be called on one of the domain's CPUs.
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 10/32] x86/resctrl: Add support to enable/disable AMD ABMC feature
2025-06-13 21:04 ` [PATCH v14 10/32] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2025-06-24 22:37 ` Reinette Chatre
2025-06-25 19:50 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 22:37 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:04 PM, Babu Moger wrote:
> +/**
> + * resctrl_arch_mbm_cntr_assign_set() - Configure the MBM counter assignment mode.
> + * @r: Pointer to the resource structure.
> + * @enable: Set to true to enable, false to disable the assignment mode.
> + *
> + * Return:
> + * 0 on success, non-zero on failure.
* 0 on success, <0 on failure.
(Just to be specific that it will be negative. Could also be "on error" to match
similar in same file, but resctrl is not consistent in this regard.)
> + */
> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
> +
> /*
> * Update the ctrl_val and apply this config right now.
> * Must be called on one of the domain's CPUs.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 10/32] x86/resctrl: Add support to enable/disable AMD ABMC feature
2025-06-24 22:37 ` Reinette Chatre
@ 2025-06-25 19:50 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-25 19:50 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 17:37, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:04 PM, Babu Moger wrote:
>> +/**
>> + * resctrl_arch_mbm_cntr_assign_set() - Configure the MBM counter assignment mode.
>> + * @r: Pointer to the resource structure.
>> + * @enable: Set to true to enable, false to disable the assignment mode.
>> + *
>> + * Return:
>> + * 0 on success, non-zero on failure.
>
> * 0 on success, <0 on failure.
Sure.
0 on success, < 0 on error.
>
> (Just to be specific that it will be negative. Could also be "on error" to match
> similar in same file, but resctrl is not consistent in this regard.)
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 11/32] fs/resctrl: Introduce the interface to display monitoring modes
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (9 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 10/32] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-24 22:47 ` Reinette Chatre
2025-06-13 21:04 ` [PATCH v14 12/32] fs/resctrl: Introduce interface to display number of assignable counter IDs Babu Moger
` (22 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Introduce the resctrl file "mbm_assign_mode" to list the supported
monitoring modes.
The "mbm_event" mode allows users to assign a hardware counter ID to an
RMID, event pair and monitor bandwidth usage as long as it is assigned.
The hardware continues to track the assigned counter until it is
explicitly unassigned by the user. Each event within a resctrl group
can be assigned independently in this mode.
On AMD systems "mbm_event" mode is backed by the ABMC (Assignable
Bandwidth Monitoring Counters) hardware feature and is enabled by default.
The "default" mode is the existing mode that works without the explicit
counter assignment, instead relying on dynamic counter assignment by
hardware that may result in hardware not dedicating a counter resulting
in monitoring data reads returning "Unavailable".
Provide an interface to display the monitor modes on the system.
$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_event]
default
Add IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED) check to support Arm64.
On x86, CONFIG_RESCTRL_ASSIGN_FIXED is not defined. On Arm64, it will be
defined when the "mbm_event" mode is supported.
Add IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED) check early to ensure the user
interface remains compatible with upcoming Arm64 support. IS_ENABLED()
safely evaluates to 0 when the configuration is not defined.
As a result, for MPAM, the display would be either:
[default]
or
[mbm_event]
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Changed the name of the monitor mode to mbm_cntr_evt_assign based on the discussion.
https://lore.kernel.org/lkml/7628cec8-5914-4895-8289-027e7821777e@amd.com/
Changed the name of the mbm_assign_mode's.
Updated resctrl.rst for mbm_event mode.
Changed subject line to fs/resctrl.
v13: Updated the commit log with motivation for adding CONFIG_RESCTRL_ASSIGN_FIXED.
Added fflag RFTYPE_RES_CACHE for mbm_assign_mode file.
Updated user doc. Removed the references to "mbm_assign_control".
Resolved the conflicts with latest FS/ARCH code restructure.
v12: Minor text update in change log and user documentation.
Added the check CONFIG_RESCTRL_ASSIGN_FIXED to take care of arm platforms.
This will be defined only in arm and not in x86.
v11: Renamed rdtgroup_mbm_assign_mode_show() to resctrl_mbm_assign_mode_show().
Removed few texts in resctrl.rst about AMD specific information.
Updated few texts.
v10: Added few more text to user documentation clarify on the default mode.
v9: Updated user documentation based on comments.
v8: Commit message update.
v7: Updated the descriptions/commit log in resctrl.rst to generic text.
Thanks to James and Reinette.
Rename mbm_mode to mbm_assign_mode.
Introduced mutex lock in rdtgroup_mbm_mode_show().
v6: Added documentation for mbm_cntr_assign and legacy mode.
Moved mbm_mode fflags initialization to static initialization.
v5: Changed interface name to mbm_mode.
It will be always available even if ABMC feature is not supported.
Added description in resctrl.rst about ABMC mode.
Fixed display abmc and legacy consistantly.
v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
v3: New patch to display ABMC capability.
---
Documentation/filesystems/resctrl.rst | 31 ++++++++++++++++++++++
fs/resctrl/rdtgroup.c | 37 +++++++++++++++++++++++++++
2 files changed, 68 insertions(+)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index c97fd77a107d..4e76e4ac5d3a 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -257,6 +257,37 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
0=0x30;1=0x30;3=0x15;4=0x15
+"mbm_assign_mode":
+ The supported monitoring modes. The enclosed brackets indicate which mode
+ is enabled.
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+ [mbm_event]
+ default
+
+ "mbm_event":
+
+ mbm_event mode allows users to assign a hardware counter ID to an RMID, event
+ pair and monitor the bandwidth usage as long as it is assigned. The hardware
+ continues to track the assigned counter until it is explicitly unassigned by
+ the user. Each event within a resctrl group can be assigned independently.
+
+ In this mode, a monitoring event can only accumulate data while it is backed
+ by a hardware counter. Use "mbm_L3_assignments" found in each CTRL_MON and MON
+ group to specify which of the events should have a counter assigned. The number
+ of counters available is described in the "num_mbm_cntrs" file. Changing the
+ mode may cause all counters on the resource to reset.
+
+ "default":
+
+ In default mode, resctrl assumes there is a hardware counter for each
+ event within every CTRL_MON and MON group. On AMD platforms, it is
+ recommended to use the mbm_event mode, if supported, to prevent reset of MBM
+ events between reads resulting from hardware re-allocating counters. This can
+ result in misleading values or display "Unavailable" if no counter is assigned
+ to the event.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 5874cfdf8d8d..ba7a9a68c5a6 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1799,6 +1799,36 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+ bool enabled;
+
+ mutex_lock(&rdtgroup_mutex);
+ enabled = resctrl_arch_mbm_cntr_assign_enabled(r);
+
+ if (r->mon.mbm_cntr_assignable) {
+ if (enabled)
+ seq_puts(s, "[mbm_event]\n");
+ else
+ seq_puts(s, "[default]\n");
+
+ if (!IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED)) {
+ if (enabled)
+ seq_puts(s, "default\n");
+ else
+ seq_puts(s, "mbm_event\n");
+ }
+ } else {
+ seq_puts(s, "[default]\n");
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+
+ return 0;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
@@ -1911,6 +1941,13 @@ static struct rftype res_common_files[] = {
.seq_show = mbm_local_bytes_config_show,
.write = mbm_local_bytes_config_write,
},
+ {
+ .name = "mbm_assign_mode",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = resctrl_mbm_assign_mode_show,
+ .fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE,
+ },
{
.name = "cpus",
.mode = 0644,
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 11/32] fs/resctrl: Introduce the interface to display monitoring modes
2025-06-13 21:04 ` [PATCH v14 11/32] fs/resctrl: Introduce the interface to display monitoring modes Babu Moger
@ 2025-06-24 22:47 ` Reinette Chatre
2025-06-25 20:14 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 22:47 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:04 PM, Babu Moger wrote:
> Introduce the resctrl file "mbm_assign_mode" to list the supported
> monitoring modes.
"the supported monitoring modes" -> "the supported counter assignment modes"?
>
> The "mbm_event" mode allows users to assign a hardware counter ID to an
nit: users do not assign/pick the ID, this is done by resctrl. So perhaps
just "users to assign a hardware counter to ..."
> RMID, event pair and monitor bandwidth usage as long as it is assigned.
> The hardware continues to track the assigned counter until it is
> explicitly unassigned by the user. Each event within a resctrl group
> can be assigned independently in this mode.
>
> On AMD systems "mbm_event" mode is backed by the ABMC (Assignable
> Bandwidth Monitoring Counters) hardware feature and is enabled by default.
>
> The "default" mode is the existing mode that works without the explicit
> counter assignment, instead relying on dynamic counter assignment by
> hardware that may result in hardware not dedicating a counter resulting
> in monitoring data reads returning "Unavailable".
>
> Provide an interface to display the monitor modes on the system.
>
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_event]
> default
>
> Add IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED) check to support Arm64.
>
> On x86, CONFIG_RESCTRL_ASSIGN_FIXED is not defined. On Arm64, it will be
> defined when the "mbm_event" mode is supported.
>
> Add IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED) check early to ensure the user
> interface remains compatible with upcoming Arm64 support. IS_ENABLED()
> safely evaluates to 0 when the configuration is not defined.
>
> As a result, for MPAM, the display would be either:
> [default]
> or
> [mbm_event]
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> Documentation/filesystems/resctrl.rst | 31 ++++++++++++++++++++++
> fs/resctrl/rdtgroup.c | 37 +++++++++++++++++++++++++++
> 2 files changed, 68 insertions(+)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index c97fd77a107d..4e76e4ac5d3a 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -257,6 +257,37 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 0=0x30;1=0x30;3=0x15;4=0x15
>
> +"mbm_assign_mode":
> + The supported monitoring modes. The enclosed brackets indicate which mode
"The supported monitoring modes." -> "The supported counter assignment modes."?
> + is enabled.
> + ::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> + [mbm_event]
> + default
> +
> + "mbm_event":
> +
> + mbm_event mode allows users to assign a hardware counter ID to an RMID, event
"hardware counter ID" -> "hardware counter"
> + pair and monitor the bandwidth usage as long as it is assigned. The hardware
> + continues to track the assigned counter until it is explicitly unassigned by
> + the user. Each event within a resctrl group can be assigned independently.
> +
> + In this mode, a monitoring event can only accumulate data while it is backed
> + by a hardware counter. Use "mbm_L3_assignments" found in each CTRL_MON and MON
> + group to specify which of the events should have a counter assigned. The number
> + of counters available is described in the "num_mbm_cntrs" file. Changing the
> + mode may cause all counters on the resource to reset.
> +
> + "default":
> +
> + In default mode, resctrl assumes there is a hardware counter for each
> + event within every CTRL_MON and MON group. On AMD platforms, it is
> + recommended to use the mbm_event mode, if supported, to prevent reset of MBM
> + events between reads resulting from hardware re-allocating counters. This can
> + result in misleading values or display "Unavailable" if no counter is assigned
> + to the event.
> +
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 11/32] fs/resctrl: Introduce the interface to display monitoring modes
2025-06-24 22:47 ` Reinette Chatre
@ 2025-06-25 20:14 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-25 20:14 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 17:47, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:04 PM, Babu Moger wrote:
>> Introduce the resctrl file "mbm_assign_mode" to list the supported
>> monitoring modes.
>
> "the supported monitoring modes" -> "the supported counter assignment modes"?
Sure.
>
>>
>> The "mbm_event" mode allows users to assign a hardware counter ID to an
>
> nit: users do not assign/pick the ID, this is done by resctrl. So perhaps
> just "users to assign a hardware counter to ..."
Sure.
>
>> RMID, event pair and monitor bandwidth usage as long as it is assigned.
>> The hardware continues to track the assigned counter until it is
>> explicitly unassigned by the user. Each event within a resctrl group
>> can be assigned independently in this mode.
>>
>> On AMD systems "mbm_event" mode is backed by the ABMC (Assignable
>> Bandwidth Monitoring Counters) hardware feature and is enabled by default.
>>
>> The "default" mode is the existing mode that works without the explicit
>> counter assignment, instead relying on dynamic counter assignment by
>> hardware that may result in hardware not dedicating a counter resulting
>> in monitoring data reads returning "Unavailable".
>>
>> Provide an interface to display the monitor modes on the system.
>>
>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> [mbm_event]
>> default
>>
>> Add IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED) check to support Arm64.
>>
>> On x86, CONFIG_RESCTRL_ASSIGN_FIXED is not defined. On Arm64, it will be
>> defined when the "mbm_event" mode is supported.
>>
>> Add IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED) check early to ensure the user
>> interface remains compatible with upcoming Arm64 support. IS_ENABLED()
>> safely evaluates to 0 when the configuration is not defined.
>>
>> As a result, for MPAM, the display would be either:
>> [default]
>> or
>> [mbm_event]
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> ---
>> Documentation/filesystems/resctrl.rst | 31 ++++++++++++++++++++++
>> fs/resctrl/rdtgroup.c | 37 +++++++++++++++++++++++++++
>> 2 files changed, 68 insertions(+)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index c97fd77a107d..4e76e4ac5d3a 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -257,6 +257,37 @@ with the following files:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>> 0=0x30;1=0x30;3=0x15;4=0x15
>>
>> +"mbm_assign_mode":
>> + The supported monitoring modes. The enclosed brackets indicate which mode
>
> "The supported monitoring modes." -> "The supported counter assignment modes."?
>
Sure.
>> + is enabled.
>> + ::
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> + [mbm_event]
>> + default
>> +
>> + "mbm_event":
>> +
>> + mbm_event mode allows users to assign a hardware counter ID to an RMID, event
>
> "hardware counter ID" -> "hardware counter"
>
Sure.
>> + pair and monitor the bandwidth usage as long as it is assigned. The hardware
>> + continues to track the assigned counter until it is explicitly unassigned by
>> + the user. Each event within a resctrl group can be assigned independently.
>> +
>> + In this mode, a monitoring event can only accumulate data while it is backed
>> + by a hardware counter. Use "mbm_L3_assignments" found in each CTRL_MON and MON
>> + group to specify which of the events should have a counter assigned. The number
>> + of counters available is described in the "num_mbm_cntrs" file. Changing the
>> + mode may cause all counters on the resource to reset.
>> +
>> + "default":
>> +
>> + In default mode, resctrl assumes there is a hardware counter for each
>> + event within every CTRL_MON and MON group. On AMD platforms, it is
>> + recommended to use the mbm_event mode, if supported, to prevent reset of MBM
>> + events between reads resulting from hardware re-allocating counters. This can
>> + result in misleading values or display "Unavailable" if no counter is assigned
>> + to the event.
>> +
>> "max_threshold_occupancy":
>> Read/write file provides the largest value (in
>> bytes) at which a previously used LLC_occupancy
>
> Reinette
>
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 12/32] fs/resctrl: Introduce interface to display number of assignable counter IDs
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (10 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 11/32] fs/resctrl: Introduce the interface to display monitoring modes Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-24 23:05 ` Reinette Chatre
2025-06-13 21:04 ` [PATCH v14 13/32] fs/resctrl: Introduce mbm_cntr_cfg to track assignable counters per domain Babu Moger
` (21 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
The "mbm_event" mode allows users to assign a hardware counter ID to an
RMID, event pair and monitor bandwidth usage as long as it is assigned.
The hardware continues to track the assigned counter until it is
explicitly unassigned by the user.
Create 'num_mbm_cntrs' resctrl file that displays the number of counter
IDs supported in each domain. 'num_mbm_cntrs' is only visible to user
space when the system supports "mbm_event" mode.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Minor update to changelog and user doc (resctrl.rst).
Changed subject line to fs/resctrl.
v13: Updated the changelog.
Added fflags RFTYPE_RES_CACHE to the file num_mbm_cntrs.
Replaced seq_puts from seq_putc where applicable.
Resolved conflicts caused by the recent FS/ARCH code restructure.
The files monitor.c/rdtgroup.c have been split between FS and ARCH directories.
v12: Changed the code to display the max supported monitoring counters in
each domain. Also updated the documentation.
Resolved the conflict with the latest code.
v11: Renamed rdtgroup_num_mbm_cntrs_show() to resctrl_num_mbm_cntrs_show().
Few monor text updates.
v10: No changes.
v9: Updated user document based on the comments.
Will add a new file available_mbm_cntrs later in the series.
v8: Commit message update and documentation update.
v7: Minor commit log text changes.
v6: No changes.
v5: Changed the display name from num_cntrs to num_mbm_cntrs.
Updated the commit message.
Moved the patch after mbm_mode is introduced.
v4: Changed the counter name to num_cntrs. And few text changes.
v3: Changed the field name to mbm_assign_cntrs.
v2: Changed the field name to mbm_assignable_counters from abmc_counter.
---
Documentation/filesystems/resctrl.rst | 11 ++++++++++
fs/resctrl/monitor.c | 4 ++++
fs/resctrl/rdtgroup.c | 30 +++++++++++++++++++++++++++
3 files changed, 45 insertions(+)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 4e76e4ac5d3a..801914de0c81 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -288,6 +288,17 @@ with the following files:
result in misleading values or display "Unavailable" if no counter is assigned
to the event.
+"num_mbm_cntrs":
+ The maximum number of counter IDs (total of available and assigned counters)
+ in each domain when the system supports mbm_event mode.
+
+ For example, on a system with maximum of 32 memory bandwidth monitoring
+ counters in each of its L3 domains:
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
+ 0=32;1=32
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index dcc6c00eb362..92a87aa97b0f 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -924,6 +924,10 @@ int resctrl_mon_resource_init(void)
else if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
+ if (r->mon.mbm_cntr_assignable)
+ resctrl_file_fflags_init("num_mbm_cntrs",
+ RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
+
return 0;
}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index ba7a9a68c5a6..967e4df62a19 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1829,6 +1829,30 @@ static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
return 0;
}
+static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+ struct rdt_mon_domain *dom;
+ bool sep = false;
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+ if (sep)
+ seq_putc(s, ';');
+
+ seq_printf(s, "%d=%d", dom->hdr.id, r->mon.num_mbm_cntrs);
+ sep = true;
+ }
+ seq_putc(s, '\n');
+
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+ return 0;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
@@ -1866,6 +1890,12 @@ static struct rftype res_common_files[] = {
.seq_show = rdt_default_ctrl_show,
.fflags = RFTYPE_CTRL_INFO | RFTYPE_RES_CACHE,
},
+ {
+ .name = "num_mbm_cntrs",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = resctrl_num_mbm_cntrs_show,
+ },
{
.name = "min_cbm_bits",
.mode = 0444,
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 12/32] fs/resctrl: Introduce interface to display number of assignable counter IDs
2025-06-13 21:04 ` [PATCH v14 12/32] fs/resctrl: Introduce interface to display number of assignable counter IDs Babu Moger
@ 2025-06-24 23:05 ` Reinette Chatre
2025-06-25 20:33 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 23:05 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
There seems to be many places referring to user space assigning "counter IDs",
as I understand the interface the user has no control over the actual ID of the
counter being assigned. Please correct me if I am wrong.
Considering this, how about:
fs/resctrl: Add resctrl file to display number of assignable counters
If you agree, please check the whole series as this seems to be an often
copy&pasted term.
On 6/13/25 2:04 PM, Babu Moger wrote:
> The "mbm_event" mode allows users to assign a hardware counter ID to an
"a hardware counter ID" -> "a hardware counter"?
> RMID, event pair and monitor bandwidth usage as long as it is assigned.
> The hardware continues to track the assigned counter until it is
> explicitly unassigned by the user.
>
> Create 'num_mbm_cntrs' resctrl file that displays the number of counter
> IDs supported in each domain. 'num_mbm_cntrs' is only visible to user
"number of counter IDs" -> "number of counters"?
> space when the system supports "mbm_event" mode.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> Documentation/filesystems/resctrl.rst | 11 ++++++++++
> fs/resctrl/monitor.c | 4 ++++
> fs/resctrl/rdtgroup.c | 30 +++++++++++++++++++++++++++
> 3 files changed, 45 insertions(+)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index 4e76e4ac5d3a..801914de0c81 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -288,6 +288,17 @@ with the following files:
> result in misleading values or display "Unavailable" if no counter is assigned
> to the event.
>
> +"num_mbm_cntrs":
> + The maximum number of counter IDs (total of available and assigned counters)
"number of counter IDs" -> "number of counters"
> + in each domain when the system supports mbm_event mode.
> +
> + For example, on a system with maximum of 32 memory bandwidth monitoring
> + counters in each of its L3 domains:
> + ::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> + 0=32;1=32
> +
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index dcc6c00eb362..92a87aa97b0f 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -924,6 +924,10 @@ int resctrl_mon_resource_init(void)
> else if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>
> + if (r->mon.mbm_cntr_assignable)
> + resctrl_file_fflags_init("num_mbm_cntrs",
> + RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
> +
> return 0;
> }
>
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index ba7a9a68c5a6..967e4df62a19 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1829,6 +1829,30 @@ static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
> return 0;
> }
>
> +static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
> + struct seq_file *s, void *v)
> +{
> + struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
> + struct rdt_mon_domain *dom;
> + bool sep = false;
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> +
> + list_for_each_entry(dom, &r->mon_domains, hdr.list) {
> + if (sep)
> + seq_putc(s, ';');
> +
> + seq_printf(s, "%d=%d", dom->hdr.id, r->mon.num_mbm_cntrs);
> + sep = true;
> + }
> + seq_putc(s, '\n');
> +
> + mutex_unlock(&rdtgroup_mutex);
> + cpus_read_unlock();
> + return 0;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
> @@ -1866,6 +1890,12 @@ static struct rftype res_common_files[] = {
> .seq_show = rdt_default_ctrl_show,
> .fflags = RFTYPE_CTRL_INFO | RFTYPE_RES_CACHE,
> },
> + {
> + .name = "num_mbm_cntrs",
> + .mode = 0444,
> + .kf_ops = &rdtgroup_kf_single_ops,
> + .seq_show = resctrl_num_mbm_cntrs_show,
> + },
> {
> .name = "min_cbm_bits",
> .mode = 0444,
Patch looks good.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 12/32] fs/resctrl: Introduce interface to display number of assignable counter IDs
2025-06-24 23:05 ` Reinette Chatre
@ 2025-06-25 20:33 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-25 20:33 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 18:05, Reinette Chatre wrote:
> Hi Babu,
>
> There seems to be many places referring to user space assigning "counter IDs",
> as I understand the interface the user has no control over the actual ID of the
> counter being assigned. Please correct me if I am wrong.
Yes. Agree. Users have no control over which counter is assigned.
> Considering this, how about:
> fs/resctrl: Add resctrl file to display number of assignable counters
>
> If you agree, please check the whole series as this seems to be an often
> copy&pasted term.
Yes. Sure.
>
> On 6/13/25 2:04 PM, Babu Moger wrote:
>> The "mbm_event" mode allows users to assign a hardware counter ID to an
>
> "a hardware counter ID" -> "a hardware counter"?
>
>> RMID, event pair and monitor bandwidth usage as long as it is assigned.
>> The hardware continues to track the assigned counter until it is
>> explicitly unassigned by the user.
>>
>> Create 'num_mbm_cntrs' resctrl file that displays the number of counter
>> IDs supported in each domain. 'num_mbm_cntrs' is only visible to user
>
> "number of counter IDs" -> "number of counters"?
>
>> space when the system supports "mbm_event" mode.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> ---
>> Documentation/filesystems/resctrl.rst | 11 ++++++++++
>> fs/resctrl/monitor.c | 4 ++++
>> fs/resctrl/rdtgroup.c | 30 +++++++++++++++++++++++++++
>> 3 files changed, 45 insertions(+)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index 4e76e4ac5d3a..801914de0c81 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -288,6 +288,17 @@ with the following files:
>> result in misleading values or display "Unavailable" if no counter is assigned
>> to the event.
>>
>> +"num_mbm_cntrs":
>> + The maximum number of counter IDs (total of available and assigned counters)
>
> "number of counter IDs" -> "number of counters"
Sure.
>
>> + in each domain when the system supports mbm_event mode.
>> +
>> + For example, on a system with maximum of 32 memory bandwidth monitoring
>> + counters in each of its L3 domains:
>> + ::
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>> + 0=32;1=32
>> +
>> "max_threshold_occupancy":
>> Read/write file provides the largest value (in
>> bytes) at which a previously used LLC_occupancy
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index dcc6c00eb362..92a87aa97b0f 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -924,6 +924,10 @@ int resctrl_mon_resource_init(void)
>> else if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>> mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>>
>> + if (r->mon.mbm_cntr_assignable)
>> + resctrl_file_fflags_init("num_mbm_cntrs",
>> + RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>> +
>> return 0;
>> }
>>
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index ba7a9a68c5a6..967e4df62a19 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -1829,6 +1829,30 @@ static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
>> return 0;
>> }
>>
>> +static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
>> + struct seq_file *s, void *v)
>> +{
>> + struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
>> + struct rdt_mon_domain *dom;
>> + bool sep = false;
>> +
>> + cpus_read_lock();
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>> + if (sep)
>> + seq_putc(s, ';');
>> +
>> + seq_printf(s, "%d=%d", dom->hdr.id, r->mon.num_mbm_cntrs);
>> + sep = true;
>> + }
>> + seq_putc(s, '\n');
>> +
>> + mutex_unlock(&rdtgroup_mutex);
>> + cpus_read_unlock();
>> + return 0;
>> +}
>> +
>> /* rdtgroup information files for one cache resource. */
>> static struct rftype res_common_files[] = {
>> {
>> @@ -1866,6 +1890,12 @@ static struct rftype res_common_files[] = {
>> .seq_show = rdt_default_ctrl_show,
>> .fflags = RFTYPE_CTRL_INFO | RFTYPE_RES_CACHE,
>> },
>> + {
>> + .name = "num_mbm_cntrs",
>> + .mode = 0444,
>> + .kf_ops = &rdtgroup_kf_single_ops,
>> + .seq_show = resctrl_num_mbm_cntrs_show,
>> + },
>> {
>> .name = "min_cbm_bits",
>> .mode = 0444,
>
> Patch looks good.
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 13/32] fs/resctrl: Introduce mbm_cntr_cfg to track assignable counters per domain
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (11 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 12/32] fs/resctrl: Introduce interface to display number of assignable counter IDs Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-24 23:31 ` Reinette Chatre
2025-06-13 21:04 ` [PATCH v14 14/32] fs/resctrl: Introduce interface to display number of free MBM counters Babu Moger
` (20 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
The "mbm_event" mode allows users to assign a hardware counter ID to an
RMID, event pair and monitor bandwidth usage as long as it is assigned.
The hardware continues to track the assigned counter until it is
explicitly unassigned by the user. Counters are assigned/unassigned at
monitoring domain level.
Manage a monitoring domain's hardware counters using a per monitoring
domain array of struct mbm_cntr_cfg that is indexed by the hardware
counter ID. A hardware counter's configuration contains the MBM event
ID and points to the monitoring group that it is assigned to, with a
NULL pointer meaning that the hardware counter is available for assignment.
There is no direct way to determine which hardware counters are assigned
to a particular monitoring group. Check every entry of every hardware
counter configuration array in every monitoring domain to query which
MBM events of a monitoring group is tracked by hardware. Such queries are
acceptable because of a very small number of assignable counters (32
to 64).
Suggested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Updated code documentation and changelog.
Fixed up the indentation in resctrl.h.
Changed subject line to fs/resctrl.
v13: Resolved conflicts caused by the recent FS/ARCH code restructure.
The files monitor.c/rdtgroup.c have been split between FS and ARCH directories.
v12: Fixed the struct mbm_cntr_cfg code documentation.
Removed few strange charactors in changelog.
Added the counter range for better understanding.
Moved the struct mbm_cntr_cfg definition to resctrl/internal.h as
suggested by James.
v11: Refined the change log based on Reinette's feedback.
Fixed few style issues.
v10: Patch changed completely to handle the counters at domain level.
https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
Removed Reviewed-by tag.
Did not see the need to add cntr_id in mbm_state structure. Not used in the code.
v9: Added Reviewed-by tag. No other changes.
v8: Minor commit message changes.
v7: Added check mbm_cntr_assignable for allocating bitmap mbm_cntr_map
v6: New patch to add domain level assignment.
---
fs/resctrl/rdtgroup.c | 8 ++++++++
include/linux/resctrl.h | 19 +++++++++++++++++++
2 files changed, 27 insertions(+)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 967e4df62a19..90b52593ef29 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4084,6 +4084,7 @@ static void rdtgroup_setup_default(void)
static void domain_destroy_mon_state(struct rdt_mon_domain *d)
{
+ kfree(d->cntr_cfg);
bitmap_free(d->rmid_busy_llc);
for (int i = 0; i < QOS_NUM_L3_MBM_EVENTS; i++) {
kfree(d->mbm_states[i]);
@@ -4167,6 +4168,13 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
goto cleanup;
}
+ if (resctrl_is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
+ tsize = sizeof(*d->cntr_cfg);
+ d->cntr_cfg = kcalloc(r->mon.num_mbm_cntrs, tsize, GFP_KERNEL);
+ if (!d->cntr_cfg)
+ goto cleanup;
+ }
+
return 0;
cleanup:
bitmap_free(d->rmid_busy_llc);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f078ef24a8ad..468a4ebabc64 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -156,6 +156,22 @@ struct rdt_ctrl_domain {
u32 *mbps_val;
};
+/**
+ * struct mbm_cntr_cfg - Assignable counter configuration.
+ * @evtid: MBM event to which the counter is assigned. Only valid
+ * if @rdtgroup is not NULL.
+ * @evt_cfg: Event configuration created using the READS_TO_LOCAL_MEM,
+ * READS_TO_REMOTE_MEM, etc. bits that represent the memory
+ * transactions being counted.
+ * @rdtgrp: resctrl group assigned to the counter. NULL if the
+ * counter is free.
+ */
+struct mbm_cntr_cfg {
+ enum resctrl_event_id evtid;
+ u32 evt_cfg;
+ struct rdtgroup *rdtgrp;
+};
+
/**
* struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
* @hdr: common header for different domain types
@@ -168,6 +184,8 @@ struct rdt_ctrl_domain {
* @cqm_limbo: worker to periodically read CQM h/w counters
* @mbm_work_cpu: worker CPU for MBM h/w counters
* @cqm_work_cpu: worker CPU for CQM h/w counters
+ * @cntr_cfg: array of assignable counters' configuration (indexed
+ * by counter ID)
*/
struct rdt_mon_domain {
struct rdt_domain_hdr hdr;
@@ -178,6 +196,7 @@ struct rdt_mon_domain {
struct delayed_work cqm_limbo;
int mbm_work_cpu;
int cqm_work_cpu;
+ struct mbm_cntr_cfg *cntr_cfg;
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 13/32] fs/resctrl: Introduce mbm_cntr_cfg to track assignable counters per domain
2025-06-13 21:04 ` [PATCH v14 13/32] fs/resctrl: Introduce mbm_cntr_cfg to track assignable counters per domain Babu Moger
@ 2025-06-24 23:31 ` Reinette Chatre
2025-06-26 1:31 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 23:31 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:04 PM, Babu Moger wrote:
> The "mbm_event" mode allows users to assign a hardware counter ID to an
"hardware counter ID" -> "hardware counter" (I'll stop pointing these out)
> RMID, event pair and monitor bandwidth usage as long as it is assigned.
> The hardware continues to track the assigned counter until it is
> explicitly unassigned by the user. Counters are assigned/unassigned at
> monitoring domain level.
>
> Manage a monitoring domain's hardware counters using a per monitoring
> domain array of struct mbm_cntr_cfg that is indexed by the hardware
> counter ID. A hardware counter's configuration contains the MBM event
> ID and points to the monitoring group that it is assigned to, with a
> NULL pointer meaning that the hardware counter is available for assignment.
>
> There is no direct way to determine which hardware counters are assigned
> to a particular monitoring group. Check every entry of every hardware
> counter configuration array in every monitoring domain to query which
> MBM events of a monitoring group is tracked by hardware. Such queries are
> acceptable because of a very small number of assignable counters (32
> to 64).
>
> Suggested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> fs/resctrl/rdtgroup.c | 8 ++++++++
> include/linux/resctrl.h | 19 +++++++++++++++++++
> 2 files changed, 27 insertions(+)
>
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 967e4df62a19..90b52593ef29 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -4084,6 +4084,7 @@ static void rdtgroup_setup_default(void)
>
> static void domain_destroy_mon_state(struct rdt_mon_domain *d)
> {
> + kfree(d->cntr_cfg);
> bitmap_free(d->rmid_busy_llc);
> for (int i = 0; i < QOS_NUM_L3_MBM_EVENTS; i++) {
> kfree(d->mbm_states[i]);
> @@ -4167,6 +4168,13 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
> goto cleanup;
> }
>
> + if (resctrl_is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
> + tsize = sizeof(*d->cntr_cfg);
> + d->cntr_cfg = kcalloc(r->mon.num_mbm_cntrs, tsize, GFP_KERNEL);
> + if (!d->cntr_cfg)
> + goto cleanup;
> + }
> +
Please see my earlier comment https://lore.kernel.org/lkml/b761e6ec-a874-4d06-8437-a3a717a91abb@intel.com/
Before this addition the "cleanup" goto label can only be called when
(a) idx is guaranteed to be initialized and (b) d->mbm_states[idx] == NULL.
Using that goto label in snippet above cannot guarantee either.
> return 0;
> cleanup:
> bitmap_free(d->rmid_busy_llc);
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index f078ef24a8ad..468a4ebabc64 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -156,6 +156,22 @@ struct rdt_ctrl_domain {
> u32 *mbps_val;
> };
>
> +/**
> + * struct mbm_cntr_cfg - Assignable counter configuration.
> + * @evtid: MBM event to which the counter is assigned. Only valid
> + * if @rdtgroup is not NULL.
> + * @evt_cfg: Event configuration created using the READS_TO_LOCAL_MEM,
> + * READS_TO_REMOTE_MEM, etc. bits that represent the memory
> + * transactions being counted.
> + * @rdtgrp: resctrl group assigned to the counter. NULL if the
> + * counter is free.
> + */
> +struct mbm_cntr_cfg {
> + enum resctrl_event_id evtid;
> + u32 evt_cfg;
It is not clear to me why the event configuration needs to be duplicated
between mbm_cntr_cfg::evt_cfg and mon_evt::evt_cfg (done in patch #16).
I think there should be only one "source of truth" and mon_evt::evt_cfg
seems most appropriate since then it can be shared with BMEC.
It also seems unnecessary to make so many copies of the event configuration
if it can just be determined from the event ID.
Looking ahead at how this is used, for example in event_filter_write()
introduced in patch #25:
ret = resctrl_process_configs(buf, &evt_cfg);
if (!ret && mevt->evt_cfg != evt_cfg) {
mevt->evt_cfg = evt_cfg;
resctrl_assign_cntr_allrdtgrp(r, mevt);
}
After user provides new event configuration the mon_evt::evt_cfg is
updated. Since there is this initial check to determine if counters need
to be updated I think it is unnecessary to have a second copy of mbm_cntr_cfg::evt_cfg
that needs to be checked again. The functions called by resctrl_assign_cntr_allrdtgrp(r, mevt)
should just update the counters without any additional comparison.
For example, rdtgroup_assign_cntr() can be simplified to:
rdtgroup_assign_cntr() {
...
list_for_each_entry(d, &r->mon_domains, hdr.list) {
cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
if (cntr_id >= 0)
resctrl_arch_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
rdtgrp->closid, cntr_id, true);
}
}
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 13/32] fs/resctrl: Introduce mbm_cntr_cfg to track assignable counters per domain
2025-06-24 23:31 ` Reinette Chatre
@ 2025-06-26 1:31 ` Moger, Babu
2025-06-26 15:05 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-06-26 1:31 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/2025 6:31 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:04 PM, Babu Moger wrote:
>> The "mbm_event" mode allows users to assign a hardware counter ID to an
>
> "hardware counter ID" -> "hardware counter" (I'll stop pointing these out)
Sure.
>
>> RMID, event pair and monitor bandwidth usage as long as it is assigned.
>> The hardware continues to track the assigned counter until it is
>> explicitly unassigned by the user. Counters are assigned/unassigned at
>> monitoring domain level.
>>
>> Manage a monitoring domain's hardware counters using a per monitoring
>> domain array of struct mbm_cntr_cfg that is indexed by the hardware
>> counter ID. A hardware counter's configuration contains the MBM event
>> ID and points to the monitoring group that it is assigned to, with a
>> NULL pointer meaning that the hardware counter is available for assignment.
>>
>> There is no direct way to determine which hardware counters are assigned
>> to a particular monitoring group. Check every entry of every hardware
>> counter configuration array in every monitoring domain to query which
>> MBM events of a monitoring group is tracked by hardware. Such queries are
>> acceptable because of a very small number of assignable counters (32
>> to 64).
>>
>> Suggested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> ...
>
>> ---
>> fs/resctrl/rdtgroup.c | 8 ++++++++
>> include/linux/resctrl.h | 19 +++++++++++++++++++
>> 2 files changed, 27 insertions(+)
>>
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 967e4df62a19..90b52593ef29 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -4084,6 +4084,7 @@ static void rdtgroup_setup_default(void)
>>
>> static void domain_destroy_mon_state(struct rdt_mon_domain *d)
>> {
>> + kfree(d->cntr_cfg);
>> bitmap_free(d->rmid_busy_llc);
>> for (int i = 0; i < QOS_NUM_L3_MBM_EVENTS; i++) {
>> kfree(d->mbm_states[i]);
>> @@ -4167,6 +4168,13 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
>> goto cleanup;
>> }
>>
>> + if (resctrl_is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
>> + tsize = sizeof(*d->cntr_cfg);
>> + d->cntr_cfg = kcalloc(r->mon.num_mbm_cntrs, tsize, GFP_KERNEL);
>> + if (!d->cntr_cfg)
>> + goto cleanup;
>> + }
>> +
>
> Please see my earlier comment https://lore.kernel.org/lkml/b761e6ec-a874-4d06-8437-a3a717a91abb@intel.com/
> Before this addition the "cleanup" goto label can only be called when
> (a) idx is guaranteed to be initialized and (b) d->mbm_states[idx] == NULL.
> Using that goto label in snippet above cannot guarantee either.
Yes. Tony took care of this.
cleanup:
bitmap_free(d->rmid_busy_llc);
for_each_mbm_idx(idx) {
kfree(d->mbm_states[idx]);
d->mbm_states[idx] = NULL;
}
return -ENOMEM;
}
>
>> return 0;
>> cleanup:
>> bitmap_free(d->rmid_busy_llc);
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index f078ef24a8ad..468a4ebabc64 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -156,6 +156,22 @@ struct rdt_ctrl_domain {
>> u32 *mbps_val;
>> };
>>
>> +/**
>> + * struct mbm_cntr_cfg - Assignable counter configuration.
>> + * @evtid: MBM event to which the counter is assigned. Only valid
>> + * if @rdtgroup is not NULL.
>> + * @evt_cfg: Event configuration created using the READS_TO_LOCAL_MEM,
>> + * READS_TO_REMOTE_MEM, etc. bits that represent the memory
>> + * transactions being counted.
>> + * @rdtgrp: resctrl group assigned to the counter. NULL if the
>> + * counter is free.
>> + */
>> +struct mbm_cntr_cfg {
>> + enum resctrl_event_id evtid;
>> + u32 evt_cfg;
>
> It is not clear to me why the event configuration needs to be duplicated
> between mbm_cntr_cfg::evt_cfg and mon_evt::evt_cfg (done in patch #16).
> I think there should be only one "source of truth" and mon_evt::evt_cfg
> seems most appropriate since then it can be shared with BMEC.
>
> It also seems unnecessary to make so many copies of the event configuration
> if it can just be determined from the event ID.
>
> Looking ahead at how this is used, for example in event_filter_write()
> introduced in patch #25:
> ret = resctrl_process_configs(buf, &evt_cfg);
> if (!ret && mevt->evt_cfg != evt_cfg) {
> mevt->evt_cfg = evt_cfg;
> resctrl_assign_cntr_allrdtgrp(r, mevt);
> }
>
> After user provides new event configuration the mon_evt::evt_cfg is
> updated. Since there is this initial check to determine if counters need
> to be updated I think it is unnecessary to have a second copy of mbm_cntr_cfg::evt_cfg
> that needs to be checked again. The functions called by resctrl_assign_cntr_allrdtgrp(r, mevt)
> should just update the counters without any additional comparison.
>
> For example, rdtgroup_assign_cntr() can be simplified to:
> rdtgroup_assign_cntr() {
> ...
> list_for_each_entry(d, &r->mon_domains, hdr.list) {
> cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
> if (cntr_id >= 0)
> resctrl_arch_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
> rdtgrp->closid, cntr_id, true);
> }
> }
>
>
Actually, this interaction works as intended.
It serves as an optimization for cases where the user repeatedly tries
to assign the same event to a group. Since we have no way of knowing
whether the event is up-to-date, this mechanism helps us avoid
unnecessary MSR writes.
For example:
mbm_L3_assignments_write() → resctrl_assign_cntr_event() →
resctrl_alloc_config_cntr() → resctrl_config_cntr() →
resctrl_arch_config_cntr()
resctrl_alloc_config_cntr()
{
..
/*
* Skip reconfiguration if the event setup is current; otherwise,
* update and apply the new configuration to the domain.
*/
if (mevt->evt_cfg != d->cntr_cfg[cntr_id].evt_cfg) {
d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
resctrl_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
rdtgrp->closid, cntr_id, true);
}
}
Thanks
Babu
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 13/32] fs/resctrl: Introduce mbm_cntr_cfg to track assignable counters per domain
2025-06-26 1:31 ` Moger, Babu
@ 2025-06-26 15:05 ` Reinette Chatre
2025-06-26 15:46 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-26 15:05 UTC (permalink / raw)
To: Moger, Babu, Babu Moger, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/25/25 6:31 PM, Moger, Babu wrote:
> On 6/24/2025 6:31 PM, Reinette Chatre wrote:
>> On 6/13/25 2:04 PM, Babu Moger wrote:
>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>> index f078ef24a8ad..468a4ebabc64 100644
>>> --- a/include/linux/resctrl.h
>>> +++ b/include/linux/resctrl.h
>>> @@ -156,6 +156,22 @@ struct rdt_ctrl_domain {
>>> u32 *mbps_val;
>>> };
>>> +/**
>>> + * struct mbm_cntr_cfg - Assignable counter configuration.
>>> + * @evtid: MBM event to which the counter is assigned. Only valid
>>> + * if @rdtgroup is not NULL.
>>> + * @evt_cfg: Event configuration created using the READS_TO_LOCAL_MEM,
>>> + * READS_TO_REMOTE_MEM, etc. bits that represent the memory
>>> + * transactions being counted.
>>> + * @rdtgrp: resctrl group assigned to the counter. NULL if the
>>> + * counter is free.
>>> + */
>>> +struct mbm_cntr_cfg {
>>> + enum resctrl_event_id evtid;
>>> + u32 evt_cfg;
>>
>> It is not clear to me why the event configuration needs to be duplicated
>> between mbm_cntr_cfg::evt_cfg and mon_evt::evt_cfg (done in patch #16).
>> I think there should be only one "source of truth" and mon_evt::evt_cfg
>> seems most appropriate since then it can be shared with BMEC.
>>
>> It also seems unnecessary to make so many copies of the event configuration
>> if it can just be determined from the event ID.
>>
>> Looking ahead at how this is used, for example in event_filter_write()
>> introduced in patch #25:
>> ret = resctrl_process_configs(buf, &evt_cfg);
>> if (!ret && mevt->evt_cfg != evt_cfg) {
>> mevt->evt_cfg = evt_cfg;
>> resctrl_assign_cntr_allrdtgrp(r, mevt);
>> }
>>
>> After user provides new event configuration the mon_evt::evt_cfg is
>> updated. Since there is this initial check to determine if counters need
>> to be updated I think it is unnecessary to have a second copy of mbm_cntr_cfg::evt_cfg
>> that needs to be checked again. The functions called by resctrl_assign_cntr_allrdtgrp(r, mevt)
>> should just update the counters without any additional comparison.
>>
>> For example, rdtgroup_assign_cntr() can be simplified to:
>> rdtgroup_assign_cntr() {
>> ...
>> list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
>> if (cntr_id >= 0)
>> resctrl_arch_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
>> rdtgrp->closid, cntr_id, true);
>> }
>> }
>>
>>
>
> Actually, this interaction works as intended.
>
> It serves as an optimization for cases where the user repeatedly tries to assign the same event to a group. Since we have no way of knowing whether the event is up-to-date, this mechanism helps us avoid unnecessary MSR writes.
>
> For example:
> mbm_L3_assignments_write() → resctrl_assign_cntr_event() → resctrl_alloc_config_cntr() → resctrl_config_cntr() → resctrl_arch_config_cntr()
>
>
> resctrl_alloc_config_cntr()
>
> {
> ..
>
> /*
> * Skip reconfiguration if the event setup is current; otherwise,
> * update and apply the new configuration to the domain.
> */
> if (mevt->evt_cfg != d->cntr_cfg[cntr_id].evt_cfg) {
> d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
> resctrl_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
> rdtgrp->closid, cntr_id, true);
> }
> }
This ties in with the feedback to patch #18 where this snippet is
introduced. Please see
https://lore.kernel.org/lkml/77ce3646-2213-4987-a438-a69f6d7c6cfd@intel.com/
It is not clear to me that this reconfiguration should be done, if the
counter is assigned to a group then it should be up to date, no? If there
was any change in configuration after assignment then event_filter_write()
will ensure that all resource groups are updated.
If a user repeatedly assigns the same event to a group then mbm_cntr_get()
will return a valid counter and resctrl_alloc_config_cntr() in above flow
can just return success without doing a reconfigure.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 13/32] fs/resctrl: Introduce mbm_cntr_cfg to track assignable counters per domain
2025-06-26 15:05 ` Reinette Chatre
@ 2025-06-26 15:46 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-26 15:46 UTC (permalink / raw)
To: Reinette Chatre, Moger, Babu, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/26/25 10:05, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/25/25 6:31 PM, Moger, Babu wrote:
>> On 6/24/2025 6:31 PM, Reinette Chatre wrote:
>>> On 6/13/25 2:04 PM, Babu Moger wrote:
>
>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>>> index f078ef24a8ad..468a4ebabc64 100644
>>>> --- a/include/linux/resctrl.h
>>>> +++ b/include/linux/resctrl.h
>>>> @@ -156,6 +156,22 @@ struct rdt_ctrl_domain {
>>>> u32 *mbps_val;
>>>> };
>>>> +/**
>>>> + * struct mbm_cntr_cfg - Assignable counter configuration.
>>>> + * @evtid: MBM event to which the counter is assigned. Only valid
>>>> + * if @rdtgroup is not NULL.
>>>> + * @evt_cfg: Event configuration created using the READS_TO_LOCAL_MEM,
>>>> + * READS_TO_REMOTE_MEM, etc. bits that represent the memory
>>>> + * transactions being counted.
>>>> + * @rdtgrp: resctrl group assigned to the counter. NULL if the
>>>> + * counter is free.
>>>> + */
>>>> +struct mbm_cntr_cfg {
>>>> + enum resctrl_event_id evtid;
>>>> + u32 evt_cfg;
>>>
>>> It is not clear to me why the event configuration needs to be duplicated
>>> between mbm_cntr_cfg::evt_cfg and mon_evt::evt_cfg (done in patch #16).
>>> I think there should be only one "source of truth" and mon_evt::evt_cfg
>>> seems most appropriate since then it can be shared with BMEC.
>>>
>>> It also seems unnecessary to make so many copies of the event configuration
>>> if it can just be determined from the event ID.
>>>
>>> Looking ahead at how this is used, for example in event_filter_write()
>>> introduced in patch #25:
>>> ret = resctrl_process_configs(buf, &evt_cfg);
>>> if (!ret && mevt->evt_cfg != evt_cfg) {
>>> mevt->evt_cfg = evt_cfg;
>>> resctrl_assign_cntr_allrdtgrp(r, mevt);
>>> }
>>>
>>> After user provides new event configuration the mon_evt::evt_cfg is
>>> updated. Since there is this initial check to determine if counters need
>>> to be updated I think it is unnecessary to have a second copy of mbm_cntr_cfg::evt_cfg
>>> that needs to be checked again. The functions called by resctrl_assign_cntr_allrdtgrp(r, mevt)
>>> should just update the counters without any additional comparison.
>>>
>>> For example, rdtgroup_assign_cntr() can be simplified to:
>>> rdtgroup_assign_cntr() {
>>> ...
>>> list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>> cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
>>> if (cntr_id >= 0)
>>> resctrl_arch_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
>>> rdtgrp->closid, cntr_id, true);
>>> }
>>> }
>>>
>>>
>>
>> Actually, this interaction works as intended.
>>
>> It serves as an optimization for cases where the user repeatedly tries to assign the same event to a group. Since we have no way of knowing whether the event is up-to-date, this mechanism helps us avoid unnecessary MSR writes.
>>
>> For example:
>> mbm_L3_assignments_write() → resctrl_assign_cntr_event() → resctrl_alloc_config_cntr() → resctrl_config_cntr() → resctrl_arch_config_cntr()
>>
>>
>> resctrl_alloc_config_cntr()
>>
>> {
>> ..
>>
>> /*
>> * Skip reconfiguration if the event setup is current; otherwise,
>> * update and apply the new configuration to the domain.
>> */
>> if (mevt->evt_cfg != d->cntr_cfg[cntr_id].evt_cfg) {
>> d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
>> resctrl_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
>> rdtgrp->closid, cntr_id, true);
>> }
>> }
>
> This ties in with the feedback to patch #18 where this snippet is
> introduced. Please see
> https://lore.kernel.org/lkml/77ce3646-2213-4987-a438-a69f6d7c6cfd@intel.com/
>
> It is not clear to me that this reconfiguration should be done, if the
> counter is assigned to a group then it should be up to date, no? If there
> was any change in configuration after assignment then event_filter_write()
> will ensure that all resource groups are updated.
Yes. That is the good point. We can do that. I think we started this code
before we introduced event_filter_write().
>
> If a user repeatedly assigns the same event to a group then mbm_cntr_get()
> will return a valid counter and resctrl_alloc_config_cntr() in above flow
> can just return success without doing a reconfigure.
Sure. We can do that. Will remove evt_cfg from struct mbm_cntr_cfg.
That for pointing this out.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 14/32] fs/resctrl: Introduce interface to display number of free MBM counters
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (12 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 13/32] fs/resctrl: Introduce mbm_cntr_cfg to track assignable counters per domain Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-24 23:39 ` Reinette Chatre
2025-06-24 23:41 ` Reinette Chatre
2025-06-13 21:04 ` [PATCH v14 15/32] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
` (19 subsequent siblings)
33 siblings, 2 replies; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Provide the interface to display the number of counters IDs available for
assignment in each domain when "mbm_event" mode is enabled.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Minor changelog update.
Changed subject line to fs/resctrl.
v13: Resolved conflicts caused by the recent FS/ARCH code restructure.
The files monitor.c and rdtgroup.c file has now been split between
the FS and ARCH directories.
v12: Minor change to change log.
Updated the documentation text with an example.
Replaced seq_puts(s, ";") with seq_putc(s, ';');
Added missing rdt_last_cmd_clear() in resctrl_available_mbm_cntrs_show().
v11: Rename rdtgroup_available_mbm_cntrs_show() to resctrl_available_mbm_cntrs_show().
Few minor text changes.
v10: Patch changed to handle the counters at domain level.
https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
So, display logic also changed now.
v9: New patch
---
Documentation/filesystems/resctrl.rst | 11 ++++++
fs/resctrl/monitor.c | 5 ++-
fs/resctrl/rdtgroup.c | 48 +++++++++++++++++++++++++++
3 files changed, 63 insertions(+), 1 deletion(-)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 801914de0c81..8a2050098091 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -299,6 +299,17 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=32;1=32
+"available_mbm_cntrs":
+ The number of assignable counters available in each domain when
+ mbm_event mode is enabled on the system.
+
+ For example, on a system with 30 available [hardware] assignable counters
+ in each of its L3 domains:
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
+ 0=30;1=30
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 92a87aa97b0f..2893da994f3c 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -924,9 +924,12 @@ int resctrl_mon_resource_init(void)
else if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
- if (r->mon.mbm_cntr_assignable)
+ if (r->mon.mbm_cntr_assignable) {
resctrl_file_fflags_init("num_mbm_cntrs",
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
+ resctrl_file_fflags_init("available_mbm_cntrs",
+ RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
+ }
return 0;
}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 90b52593ef29..08bcca9bd8b6 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1853,6 +1853,48 @@ static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
return 0;
}
+static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+ struct rdt_mon_domain *dom;
+ bool sep = false;
+ u32 cntrs, i;
+ int ret = 0;
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ rdt_last_cmd_clear();
+
+ if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+ ret = -EINVAL;
+ goto unlock_cntrs_show;
+ }
+
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+ if (sep)
+ seq_putc(s, ';');
+
+ cntrs = 0;
+ for (i = 0; i < r->mon.num_mbm_cntrs; i++) {
+ if (!dom->cntr_cfg[i].rdtgrp)
+ cntrs++;
+ }
+
+ seq_printf(s, "%d=%u", dom->hdr.id, cntrs);
+ sep = true;
+ }
+ seq_putc(s, '\n');
+
+unlock_cntrs_show:
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
+ return ret;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
@@ -1876,6 +1918,12 @@ static struct rftype res_common_files[] = {
.seq_show = rdt_mon_features_show,
.fflags = RFTYPE_MON_INFO,
},
+ {
+ .name = "available_mbm_cntrs",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = resctrl_available_mbm_cntrs_show,
+ },
{
.name = "num_rmids",
.mode = 0444,
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 14/32] fs/resctrl: Introduce interface to display number of free MBM counters
2025-06-13 21:04 ` [PATCH v14 14/32] fs/resctrl: Introduce interface to display number of free MBM counters Babu Moger
@ 2025-06-24 23:39 ` Reinette Chatre
2025-06-26 14:17 ` Moger, Babu
2025-06-24 23:41 ` Reinette Chatre
1 sibling, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 23:39 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:04 PM, Babu Moger wrote:
> Provide the interface to display the number of counters IDs available for
"Introduce the "available_mbm_cntrs" resctrl file to display the number of
counters available ..."
> assignment in each domain when "mbm_event" mode is enabled.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> Documentation/filesystems/resctrl.rst | 11 ++++++
> fs/resctrl/monitor.c | 5 ++-
> fs/resctrl/rdtgroup.c | 48 +++++++++++++++++++++++++++
> 3 files changed, 63 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index 801914de0c81..8a2050098091 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -299,6 +299,17 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=32;1=32
>
> +"available_mbm_cntrs":
> + The number of assignable counters available in each domain when
"The number of counters available for assignment in each domain ..."
> + mbm_event mode is enabled on the system.
> +
> + For example, on a system with 30 available [hardware] assignable counters
> + in each of its L3 domains:
> + ::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
> + 0=30;1=30
> +
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 92a87aa97b0f..2893da994f3c 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -924,9 +924,12 @@ int resctrl_mon_resource_init(void)
> else if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>
> - if (r->mon.mbm_cntr_assignable)
> + if (r->mon.mbm_cntr_assignable) {
> resctrl_file_fflags_init("num_mbm_cntrs",
> RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
> + resctrl_file_fflags_init("available_mbm_cntrs",
> + RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
> + }
>
> return 0;
> }
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 90b52593ef29..08bcca9bd8b6 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1853,6 +1853,48 @@ static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
> return 0;
> }
>
> +static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
> + struct seq_file *s, void *v)
> +{
> + struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
> + struct rdt_mon_domain *dom;
> + bool sep = false;
> + u32 cntrs, i;
> + int ret = 0;
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdt_last_cmd_clear();
> +
> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
> + ret = -EINVAL;
> + goto unlock_cntrs_show;
unlock_cntrs_show -> out_unlock (to be consistent with rest of resctrl)
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 14/32] fs/resctrl: Introduce interface to display number of free MBM counters
2025-06-24 23:39 ` Reinette Chatre
@ 2025-06-26 14:17 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-26 14:17 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette
On 6/24/25 18:39, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:04 PM, Babu Moger wrote:
>> Provide the interface to display the number of counters IDs available for
>
> "Introduce the "available_mbm_cntrs" resctrl file to display the number of
> counters available ..."
Sure.
> ,
>> assignment in each domain when "mbm_event" mode is enabled.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> ...
>
>> ---
>> Documentation/filesystems/resctrl.rst | 11 ++++++
>> fs/resctrl/monitor.c | 5 ++-
>> fs/resctrl/rdtgroup.c | 48 +++++++++++++++++++++++++++
>> 3 files changed, 63 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index 801914de0c81..8a2050098091 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -299,6 +299,17 @@ with the following files:
>> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>> 0=32;1=32
>>
>> +"available_mbm_cntrs":
>> + The number of assignable counters available in each domain when
>
> "The number of counters available for assignment in each domain ..."
Sure.
>
>> + mbm_event mode is enabled on the system.
>> +
>> + For example, on a system with 30 available [hardware] assignable counters
>> + in each of its L3 domains:
>> + ::
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>> + 0=30;1=30
>> +
>> "max_threshold_occupancy":
>> Read/write file provides the largest value (in
>> bytes) at which a previously used LLC_occupancy
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 92a87aa97b0f..2893da994f3c 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -924,9 +924,12 @@ int resctrl_mon_resource_init(void)
>> else if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>> mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>>
>> - if (r->mon.mbm_cntr_assignable)
>> + if (r->mon.mbm_cntr_assignable) {
>> resctrl_file_fflags_init("num_mbm_cntrs",
>> RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>> + resctrl_file_fflags_init("available_mbm_cntrs",
>> + RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>> + }
>>
>> return 0;
>> }
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 90b52593ef29..08bcca9bd8b6 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -1853,6 +1853,48 @@ static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
>> return 0;
>> }
>>
>> +static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
>> + struct seq_file *s, void *v)
>> +{
>> + struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
>> + struct rdt_mon_domain *dom;
>> + bool sep = false;
>> + u32 cntrs, i;
>> + int ret = 0;
>> +
>> + cpus_read_lock();
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + rdt_last_cmd_clear();
>> +
>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>> + ret = -EINVAL;
>> + goto unlock_cntrs_show;
>
> unlock_cntrs_show -> out_unlock (to be consistent with rest of resctrl)
>
Sure.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 14/32] fs/resctrl: Introduce interface to display number of free MBM counters
2025-06-13 21:04 ` [PATCH v14 14/32] fs/resctrl: Introduce interface to display number of free MBM counters Babu Moger
2025-06-24 23:39 ` Reinette Chatre
@ 2025-06-24 23:41 ` Reinette Chatre
2025-06-26 14:19 ` Moger, Babu
1 sibling, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 23:41 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:04 PM, Babu Moger wrote:
> +static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
> + struct seq_file *s, void *v)
> +{
> + struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
> + struct rdt_mon_domain *dom;
> + bool sep = false;
> + u32 cntrs, i;
> + int ret = 0;
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdt_last_cmd_clear();
> +
> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
Missed this in earlier response ... needs update to new terms.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 14/32] fs/resctrl: Introduce interface to display number of free MBM counters
2025-06-24 23:41 ` Reinette Chatre
@ 2025-06-26 14:19 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-26 14:19 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 18:41, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:04 PM, Babu Moger wrote:
>> +static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
>> + struct seq_file *s, void *v)
>> +{
>> + struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
>> + struct rdt_mon_domain *dom;
>> + bool sep = false;
>> + u32 cntrs, i;
>> + int ret = 0;
>> +
>> + cpus_read_lock();
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + rdt_last_cmd_clear();
>> +
>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>
> Missed this in earlier response ... needs update to new terms.
Sure. Changed it to.
rdt_last_cmd_puts("mbm_event mode is not enabled\n");
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 15/32] x86/resctrl: Add data structures and definitions for ABMC assignment
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (13 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 14/32] fs/resctrl: Introduce interface to display number of free MBM counters Babu Moger
@ 2025-06-13 21:04 ` Babu Moger
2025-06-13 21:05 ` [PATCH v14 16/32] x86,fs/resctrl: Introduce event configuration field in struct mon_evt Babu Moger
` (18 subsequent siblings)
33 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:04 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
The ABMC feature allows users to assign a hardware counter ID to an RMID,
event pair and monitor bandwidth usage as long as it is assigned. The
hardware continues to track the assigned counter until it is explicitly
unassigned by the user.
The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh).
ABMC counter assignment is done by setting the counter id, bandwidth
source (RMID) and bandwidth configuration.
Attempts to read or write the MSR when ABMC is not enabled will result
in a #GP(0) exception.
Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG
(0xC000_03FDh):
=========================================================================
Bits Mnemonic Description Access Reset
Type Value
=========================================================================
63 CfgEn Configuration Enable R/W 0
62 CtrEn Enable/disable counting R/W 0
61:53 – Reserved MBZ 0
52:48 CtrID Counter Identifier R/W 0
47 IsCOS BwSrc field is a CLOSID R/W 0
(not an RMID)
46:44 – Reserved MBZ 0
43:32 BwSrc Bandwidth Source R/W 0
(RMID or CLOSID)
31:0 BwType Bandwidth configuration R/W 0
tracked by the CtrID
==========================================================================
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Removed BMEC reference internal.h.
Updated the changelog and code documentation.
v13: Removed the Reviewed-by tag as there is commit log change to remove
BMEC reference.
v12: No changes.
v11: No changes.
v10: No changes.
v9: Removed the references of L3_QOS_ABMC_DSC.
Text changes about configuration in kernel doc.
v8: Update the configuration notes in kernel_doc.
Few commit message update.
v7: Removed the reference of L3_QOS_ABMC_DSC as it is not used anymore.
Moved the configuration notes to kernel_doc.
Adjusted the tabs for l3_qos_abmc_cfg and checkpatch seems happy.
v6: Removed all the fs related changes.
Added note on CfgEn,CtrEn.
Removed the definitions which are not used.
Removed cntr_id initialization.
v5: Moved assignment flags here (path 10/19 of v4).
Added MON_CNTR_UNSET definition to initialize cntr_id's.
More details in commit log.
Renamed few fields in l3_qos_abmc_cfg for readability.
v4: Added more descriptions.
Changed the name abmc_ctr_id to ctr_id.
Added L3_QOS_ABMC_DSC. Used for reading the configuration.
v3: No changes.
v2: No changes.
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 36 ++++++++++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b92b04fa9888..7342ff03a5a0 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1215,6 +1215,7 @@
/* - AMD: */
#define MSR_IA32_MBA_BW_BASE 0xc0000200
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
+#define MSR_IA32_L3_QOS_ABMC_CFG 0xc00003fd
#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1a4e96044aac..23c17ce172d3 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -165,6 +165,42 @@ union cpuid_0x10_x_edx {
unsigned int full;
};
+/*
+ * ABMC counters are configured by writing to L3_QOS_ABMC_CFG.
+ *
+ * @bw_type : Event configuration that represent the memory
+ * transactions being tracked by the @cntr_id.
+ * @bw_src : Bandwidth source (RMID or CLOSID).
+ * @reserved1 : Reserved.
+ * @is_clos : @bw_src field is a CLOSID (not an RMID).
+ * @cntr_id : Counter identifier.
+ * @reserved : Reserved.
+ * @cntr_en : Counting enable bit.
+ * @cfg_en : Configuration enable bit.
+ *
+ * Configuration and counting:
+ * Counter can be configured across multiple writes to MSR. Configuration
+ * is applied only when @cfg_en = 1. Counter @cntr_id is reset when the
+ * configuration is applied.
+ * @cfg_en = 1, @cntr_en = 0 : Apply @cntr_id configuration but do not
+ * count events.
+ * @cfg_en = 1, @cntr_en = 1 : Apply @cntr_id configuration and start
+ * counting events.
+ */
+union l3_qos_abmc_cfg {
+ struct {
+ unsigned long bw_type :32,
+ bw_src :12,
+ reserved1: 3,
+ is_clos : 1,
+ cntr_id : 5,
+ reserved : 9,
+ cntr_en : 1,
+ cfg_en : 1;
+ } split;
+ unsigned long full;
+};
+
void rdt_ctrl_update(void *arg);
int rdt_get_mon_l3_config(struct rdt_resource *r);
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v14 16/32] x86,fs/resctrl: Introduce event configuration field in struct mon_evt
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (14 preceding siblings ...)
2025-06-13 21:04 ` [PATCH v14 15/32] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-24 23:51 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 17/32] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
` (17 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
When supported, mbm_assign_mode allows the user to configure events to
track specific types of memory transactions.
Introduce the evt_cfg field in struct mon_evt to define the type of memory
transactions tracked by a monitoring event. Also add helper functions to
get and set the evt_cfg value.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: This is updated patch from previous patch.
https://lore.kernel.org/lkml/95b7f4e9d72773e8fda327fc80b429646efc3a8a.1747349530.git.babu.moger@amd.com/
Removed mbm_mode as it is not required anymore.
Added resctrl_get_mon_evt_cfg() and resctrl_set_mon_evt_cfg().
v13: New patch to handle different event configuration types with
mbm_cntr_assign mode.
---
arch/x86/kernel/cpu/resctrl/core.c | 4 ++++
fs/resctrl/internal.h | 4 ++++
fs/resctrl/monitor.c | 10 ++++++++++
include/linux/resctrl.h | 3 +++
4 files changed, 21 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 01b210febc7d..1df171d04bea 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -875,10 +875,14 @@ static __init bool get_rdt_mon_resources(void)
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
+ resctrl_set_mon_evt_cfg(QOS_L3_MBM_TOTAL_EVENT_ID, MAX_EVT_CONFIG_BITS);
ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
+ resctrl_set_mon_evt_cfg(QOS_L3_MBM_LOCAL_EVENT_ID,
+ READS_TO_LOCAL_MEM | READS_TO_LOCAL_S_MEM |
+ NON_TEMP_WRITE_TO_LOCAL_MEM);
ret = true;
}
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 20e2c45cea64..71059c2cda16 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -56,6 +56,9 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
* @evtid: event id
* @rid: index of the resource for this event
* @name: name of the event
+ * @@evt_cfg: Event configuration value that represents the
+ * memory transactions (e.g., READS_TO_LOCAL_MEM,
+ * READS_TO_REMOTE_MEM) being tracked by @evtid.
* @configurable: true if the event is configurable
* @enabled: true if the event is enabled
*/
@@ -63,6 +66,7 @@ struct mon_evt {
enum resctrl_event_id evtid;
enum resctrl_res_level rid;
char *name;
+ u32 evt_cfg;
bool configurable;
bool enabled;
};
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 2893da994f3c..3e1a8239b0d3 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -884,6 +884,16 @@ bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
mon_event_all[eventid].enabled;
}
+u32 resctrl_get_mon_evt_cfg(enum resctrl_event_id evtid)
+{
+ return mon_event_all[evtid].evt_cfg;
+}
+
+void resctrl_set_mon_evt_cfg(enum resctrl_event_id evtid, u32 evt_cfg)
+{
+ mon_event_all[evtid].evt_cfg = evt_cfg;
+}
+
/**
* resctrl_mon_resource_init() - Initialise global monitoring structures.
*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 468a4ebabc64..a58dd40b7246 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -413,6 +413,9 @@ static inline bool resctrl_is_mbm_event(enum resctrl_event_id eventid)
eventid <= QOS_L3_MBM_LOCAL_EVENT_ID);
}
+u32 resctrl_get_mon_evt_cfg(enum resctrl_event_id eventid);
+void resctrl_set_mon_evt_cfg(enum resctrl_event_id eventid, u32 evt_cfg);
+
#define for_each_mbm_event_id(eventid) \
for (eventid = QOS_L3_MBM_TOTAL_EVENT_ID; \
eventid <= QOS_L3_MBM_LOCAL_EVENT_ID; eventid++)
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 16/32] x86,fs/resctrl: Introduce event configuration field in struct mon_evt
2025-06-13 21:05 ` [PATCH v14 16/32] x86,fs/resctrl: Introduce event configuration field in struct mon_evt Babu Moger
@ 2025-06-24 23:51 ` Reinette Chatre
2025-06-26 16:47 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 23:51 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> When supported, mbm_assign_mode allows the user to configure events to
> track specific types of memory transactions.
Since there is also a "default" mbm_assign_mode this should be made specific
to mbm_event.
>
> Introduce the evt_cfg field in struct mon_evt to define the type of memory
> transactions tracked by a monitoring event. Also add helper functions to
> get and set the evt_cfg value.
hmmm ... this does not sound right (more below)
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v14: This is updated patch from previous patch.
> https://lore.kernel.org/lkml/95b7f4e9d72773e8fda327fc80b429646efc3a8a.1747349530.git.babu.moger@amd.com/
> Removed mbm_mode as it is not required anymore.
> Added resctrl_get_mon_evt_cfg() and resctrl_set_mon_evt_cfg().
>
> v13: New patch to handle different event configuration types with
> mbm_cntr_assign mode.
> ---
> arch/x86/kernel/cpu/resctrl/core.c | 4 ++++
> fs/resctrl/internal.h | 4 ++++
> fs/resctrl/monitor.c | 10 ++++++++++
> include/linux/resctrl.h | 3 +++
> 4 files changed, 21 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 01b210febc7d..1df171d04bea 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -875,10 +875,14 @@ static __init bool get_rdt_mon_resources(void)
> }
> if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
> resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
> + resctrl_set_mon_evt_cfg(QOS_L3_MBM_TOTAL_EVENT_ID, MAX_EVT_CONFIG_BITS);
> ret = true;
> }
> if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
> resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
> + resctrl_set_mon_evt_cfg(QOS_L3_MBM_LOCAL_EVENT_ID,
> + READS_TO_LOCAL_MEM | READS_TO_LOCAL_S_MEM |
> + NON_TEMP_WRITE_TO_LOCAL_MEM);
> ret = true;
> }
The architecture should have no business setting the event configuration. This should
all be managed via resctrl fs, no? I think the resctrl_set_mon_evt_cfg() helper should
be dropped. The above initialization can be done as part of mon_event_all[] initialization
within resctrl.
>
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 20e2c45cea64..71059c2cda16 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -56,6 +56,9 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
> * @evtid: event id
> * @rid: index of the resource for this event
> * @name: name of the event
> + * @@evt_cfg: Event configuration value that represents the
Extra @ in "@@evt_cfg"
> + * memory transactions (e.g., READS_TO_LOCAL_MEM,
> + * READS_TO_REMOTE_MEM) being tracked by @evtid.
Can append "Only valid if @evtid is an MBM event."
> * @configurable: true if the event is configurable
> * @enabled: true if the event is enabled
> */
> @@ -63,6 +66,7 @@ struct mon_evt {
> enum resctrl_event_id evtid;
> enum resctrl_res_level rid;
> char *name;
> + u32 evt_cfg;
> bool configurable;
> bool enabled;
> };
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 2893da994f3c..3e1a8239b0d3 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -884,6 +884,16 @@ bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
> mon_event_all[eventid].enabled;
> }
>
> +u32 resctrl_get_mon_evt_cfg(enum resctrl_event_id evtid)
> +{
> + return mon_event_all[evtid].evt_cfg;
> +}
> +
> +void resctrl_set_mon_evt_cfg(enum resctrl_event_id evtid, u32 evt_cfg)
> +{
> + mon_event_all[evtid].evt_cfg = evt_cfg;
> +}
> +
> /**
> * resctrl_mon_resource_init() - Initialise global monitoring structures.
> *
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 468a4ebabc64..a58dd40b7246 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -413,6 +413,9 @@ static inline bool resctrl_is_mbm_event(enum resctrl_event_id eventid)
> eventid <= QOS_L3_MBM_LOCAL_EVENT_ID);
> }
>
> +u32 resctrl_get_mon_evt_cfg(enum resctrl_event_id eventid);
> +void resctrl_set_mon_evt_cfg(enum resctrl_event_id eventid, u32 evt_cfg);
I think resctrl_set_mon_evt_cfg() should be dropped. Any changes to mon_evt:evt_cfg
should be via resctrl, either via initialization (all archs should use same default)
or when user writes to the event configuration's file.
> +
> #define for_each_mbm_event_id(eventid) \
> for (eventid = QOS_L3_MBM_TOTAL_EVENT_ID; \
> eventid <= QOS_L3_MBM_LOCAL_EVENT_ID; eventid++)
sidenote: This change looks to be a good foundation to bring back the BMEC optimization
you worked on earlier where it is no longer needed to read event configuration from
hardware.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 16/32] x86,fs/resctrl: Introduce event configuration field in struct mon_evt
2025-06-24 23:51 ` Reinette Chatre
@ 2025-06-26 16:47 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-26 16:47 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 18:51, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> When supported, mbm_assign_mode allows the user to configure events to
>> track specific types of memory transactions.
>
> Since there is also a "default" mbm_assign_mode this should be made specific
> to mbm_event.
>
Changed to
"When supported, mbm_event mode allows the user to configure events to
track specific types of memory transactions."
>>
>> Introduce the evt_cfg field in struct mon_evt to define the type of memory
>> transactions tracked by a monitoring event. Also add helper functions to
>> get and set the evt_cfg value.
>
> hmmm ... this does not sound right (more below)
>
Removed the resctrl_set_mon_evt_cfg().
Changed to
"Introduce the evt_cfg field in struct mon_evt to define the type of
memory transactions tracked by a monitoring event. Also add a helper
function to get the evt_cfg value."
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v14: This is updated patch from previous patch.
>> https://lore.kernel.org/lkml/95b7f4e9d72773e8fda327fc80b429646efc3a8a.1747349530.git.babu.moger@amd.com/
>> Removed mbm_mode as it is not required anymore.
>> Added resctrl_get_mon_evt_cfg() and resctrl_set_mon_evt_cfg().
>>
>> v13: New patch to handle different event configuration types with
>> mbm_cntr_assign mode.
>> ---
>> arch/x86/kernel/cpu/resctrl/core.c | 4 ++++
>> fs/resctrl/internal.h | 4 ++++
>> fs/resctrl/monitor.c | 10 ++++++++++
>> include/linux/resctrl.h | 3 +++
>> 4 files changed, 21 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 01b210febc7d..1df171d04bea 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -875,10 +875,14 @@ static __init bool get_rdt_mon_resources(void)
>> }
>> if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
>> resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
>> + resctrl_set_mon_evt_cfg(QOS_L3_MBM_TOTAL_EVENT_ID, MAX_EVT_CONFIG_BITS);
>> ret = true;
>> }
>> if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL) || rdt_cpu_has(X86_FEATURE_ABMC)) {
>> resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
>> + resctrl_set_mon_evt_cfg(QOS_L3_MBM_LOCAL_EVENT_ID,
>> + READS_TO_LOCAL_MEM | READS_TO_LOCAL_S_MEM |
>> + NON_TEMP_WRITE_TO_LOCAL_MEM);
>> ret = true;
>> }
>
> The architecture should have no business setting the event configuration. This should
> all be managed via resctrl fs, no? I think the resctrl_set_mon_evt_cfg() helper should
> be dropped. The above initialization can be done as part of mon_event_all[] initialization
> within resctrl.
>
Moved it to resctrl_mon_resource_init().
@@ -926,7 +931,12 @@ int resctrl_mon_resource_init(void)
if (r->mon.mbm_cntr_assignable) {
resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
+ mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].evt_cfg =
MAX_EVT_CONFIG_BITS;
+
resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
+ mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].evt_cfg =
READS_TO_LOCAL_MEM |
+
READS_TO_LOCAL_S_MEM |
+
NON_TEMP_WRITE_TO_LOCAL_MEM;
resctrl_file_fflags_init("num_mbm_cntrs",
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
resctrl_file_fflags_init("available_mbm_cntrs",
>>
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 20e2c45cea64..71059c2cda16 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -56,6 +56,9 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
>> * @evtid: event id
>> * @rid: index of the resource for this event
>> * @name: name of the event
>> + * @@evt_cfg: Event configuration value that represents the
>
> Extra @ in "@@evt_cfg"
Sure.
>
>> + * memory transactions (e.g., READS_TO_LOCAL_MEM,
>> + * READS_TO_REMOTE_MEM) being tracked by @evtid.
>
> Can append "Only valid if @evtid is an MBM event."
Sure.
>
>> * @configurable: true if the event is configurable
>> * @enabled: true if the event is enabled
>> */
>> @@ -63,6 +66,7 @@ struct mon_evt {
>> enum resctrl_event_id evtid;
>> enum resctrl_res_level rid;
>> char *name;
>> + u32 evt_cfg;
>> bool configurable;
>> bool enabled;
>> };
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 2893da994f3c..3e1a8239b0d3 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -884,6 +884,16 @@ bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
>> mon_event_all[eventid].enabled;
>> }
>>
>> +u32 resctrl_get_mon_evt_cfg(enum resctrl_event_id evtid)
>> +{
>> + return mon_event_all[evtid].evt_cfg;
>> +}
>> +
>> +void resctrl_set_mon_evt_cfg(enum resctrl_event_id evtid, u32 evt_cfg)
>> +{
>> + mon_event_all[evtid].evt_cfg = evt_cfg;
>> +}
>> +
>> /**
>> * resctrl_mon_resource_init() - Initialise global monitoring structures.
>> *
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 468a4ebabc64..a58dd40b7246 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -413,6 +413,9 @@ static inline bool resctrl_is_mbm_event(enum resctrl_event_id eventid)
>> eventid <= QOS_L3_MBM_LOCAL_EVENT_ID);
>> }
>>
>> +u32 resctrl_get_mon_evt_cfg(enum resctrl_event_id eventid);
>> +void resctrl_set_mon_evt_cfg(enum resctrl_event_id eventid, u32 evt_cfg);
>
> I think resctrl_set_mon_evt_cfg() should be dropped. Any changes to mon_evt:evt_cfg
> should be via resctrl, either via initialization (all archs should use same default)
> or when user writes to the event configuration's file.
Sure.
>
>> +
>> #define for_each_mbm_event_id(eventid) \
>> for (eventid = QOS_L3_MBM_TOTAL_EVENT_ID; \
>> eventid <= QOS_L3_MBM_LOCAL_EVENT_ID; eventid++)
>
> sidenote: This change looks to be a good foundation to bring back the BMEC optimization
> you worked on earlier where it is no longer needed to read event configuration from
> hardware.
>
Yes. Sure. Will work on it later.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 17/32] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (15 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 16/32] x86,fs/resctrl: Introduce event configuration field in struct mon_evt Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 3:03 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 18/32] fs/resctrl: Add the functionality to assign MBM events Babu Moger
` (16 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
The ABMC feature allows users to assign a hardware counter ID to an RMID,
event pair and monitor bandwidth usage as long as it is assigned. The
hardware continues to track the assigned counter until it is explicitly
unassigned by the user.
Implement an architecture-specific handler to assign and unassign the
counter. Configure counters by writing to the L3_QOS_ABMC_CFG MSR,
specifying the counter ID, bandwidth source (RMID), and event
configuration.
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Removed evt_cfg parameter in resctrl_arch_config_cntr(). Get evt_cfg
only when assign is required.
Minor update to changelog.
v13: Moved resctrl_arch_config_cntr() prototype to include/linux/resctrl.h.
Changed resctrl_arch_config_cntr() to retun void from int.
Updated the kernal doc for the prototype.
Updated the code comment.
12: Added the check to reset the architecture-specific state only when
assign is requested.
Added evt_cfg as the parameter as the user will be passing the event
configuration from /info/L3_MON/event_configs/.
v11: Moved resctrl_arch_assign_cntr() and resctrl_abmc_config_one_amd() to
monitor.c.
Added the code to reset the arch state in resctrl_arch_assign_cntr().
Also removed resctrl_arch_reset_rmid() inside IPI as the counters are
reset from the callers.
Re-wrote commit message.
v10: Added call resctrl_arch_reset_rmid() to reset the RMID in the domain
inside IPI call.
SMP and non-SMP call support is not required in resctrl_arch_config_cntr
with new domain specific assign approach/data structure.
Commit message update.
v9: Removed the code to reset the architectural state. It will done
in another patch.
v8: Rename resctrl_arch_assign_cntr to resctrl_arch_config_cntr.
v7: Separated arch and fs functions. This patch only has arch implementation.
Added struct rdt_resource to the interface resctrl_arch_assign_cntr.
Rename rdtgroup_abmc_cfg() to resctrl_abmc_config_one_amd().
v6: Removed mbm_cntr_alloc() from this patch to keep fs and arch code
separate.
Added code to update the counter assignment at domain level.
v5: Few name changes to match cntr_id.
Changed the function names to
rdtgroup_assign_cntr
resctr_arch_assign_cntr
More comments on commit log.
Added function summary.
v4: Commit message update.
User bitmap APIs where applicable.
Changed the interfaces considering MPAM(arm).
Added domain specific assignment.
v3: Removed the static from the prototype of rdtgroup_assign_abmc.
The function is not called directly from user anymore. These
changes are related to global assignment interface.
v2: Minor text changes in commit message.
---
arch/x86/kernel/cpu/resctrl/monitor.c | 38 +++++++++++++++++++++++++++
include/linux/resctrl.h | 19 ++++++++++++++
2 files changed, 57 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 0ad9c731c13e..6b0ea4b17c7a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -444,3 +444,41 @@ bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
{
return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
}
+
+static void resctrl_abmc_config_one_amd(void *info)
+{
+ union l3_qos_abmc_cfg *abmc_cfg = info;
+
+ wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg->full);
+}
+
+/*
+ * Send an IPI to the domain to assign the counter to RMID, event pair.
+ */
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ enum resctrl_event_id evtid, u32 rmid, u32 closid,
+ u32 cntr_id, bool assign)
+{
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+ union l3_qos_abmc_cfg abmc_cfg = { 0 };
+ struct arch_mbm_state *am;
+
+ abmc_cfg.split.cfg_en = 1;
+ abmc_cfg.split.cntr_en = assign ? 1 : 0;
+ abmc_cfg.split.cntr_id = cntr_id;
+ abmc_cfg.split.bw_src = rmid;
+ if (assign)
+ abmc_cfg.split.bw_type = resctrl_get_mon_evt_cfg(evtid);
+
+ smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &abmc_cfg, 1);
+
+ /*
+ * The hardware counter is reset (because cfg_en == 1) so there is no
+ * need to record initial non-zero counts.
+ */
+ if (assign) {
+ am = get_arch_mbm_state(hw_dom, rmid, evtid);
+ if (am)
+ memset(am, 0, sizeof(*am));
+ }
+}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index a58dd40b7246..1539d1faa1a1 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -594,6 +594,25 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
*/
void resctrl_arch_reset_all_ctrls(struct rdt_resource *r);
+/**
+ * resctrl_arch_config_cntr() - Configure the counter with its new RMID
+ * and event details.
+ * @r: Resource structure.
+ * @d: The domain in which the counter ID is to be configured.
+ * @evtid: Monitoring event type (e.g., QOS_L3_MBM_TOTAL_EVENT_ID
+ * or QOS_L3_MBM_LOCAL_EVENT_ID).
+ * @rmid: RMID.
+ * @closid: CLOSID.
+ * @cntr_id: Counter ID to configure.
+ * @assign: True to assign the counter, false to unassign
+ * the counter.
+ *
+ * This can be called from any CPU.
+ */
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ enum resctrl_event_id evtid, u32 rmid, u32 closid,
+ u32 cntr_id, bool assign);
+
extern unsigned int resctrl_rmid_realloc_threshold;
extern unsigned int resctrl_rmid_realloc_limit;
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 17/32] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2025-06-13 21:05 ` [PATCH v14 17/32] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
@ 2025-06-25 3:03 ` Reinette Chatre
2025-06-26 17:41 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 3:03 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
With the new arch API this have "x86,fs/resctrl" prefix.
On 6/13/25 2:05 PM, Babu Moger wrote:
> The ABMC feature allows users to assign a hardware counter ID to an RMID,
> event pair and monitor bandwidth usage as long as it is assigned. The
> hardware continues to track the assigned counter until it is explicitly
> unassigned by the user.
>
> Implement an architecture-specific handler to assign and unassign the
> counter. Configure counters by writing to the L3_QOS_ABMC_CFG MSR,
> specifying the counter ID, bandwidth source (RMID), and event
> configuration.
>
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 38 +++++++++++++++++++++++++++
> include/linux/resctrl.h | 19 ++++++++++++++
> 2 files changed, 57 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 0ad9c731c13e..6b0ea4b17c7a 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -444,3 +444,41 @@ bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
> {
> return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
> }
> +
> +static void resctrl_abmc_config_one_amd(void *info)
> +{
> + union l3_qos_abmc_cfg *abmc_cfg = info;
> +
> + wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg->full);
> +}
> +
> +/*
> + * Send an IPI to the domain to assign the counter to RMID, event pair.
> + */
> +void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
> + u32 cntr_id, bool assign)
> +{
> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
> + struct arch_mbm_state *am;
> +
> + abmc_cfg.split.cfg_en = 1;
> + abmc_cfg.split.cntr_en = assign ? 1 : 0;
> + abmc_cfg.split.cntr_id = cntr_id;
> + abmc_cfg.split.bw_src = rmid;
> + if (assign)
> + abmc_cfg.split.bw_type = resctrl_get_mon_evt_cfg(evtid);
> +
> + smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &abmc_cfg, 1);
An optimization to consider is to direct the IPI to a housekeeping CPU.
If one exist, a further optimization could be to queue it on that CPU
instead of IPI. Your call since I am not familiar with the use cases here.
Looks like all paths leading to this is triggered by a user space action
(mount, mkdir, or write to update event config). This would require exposing
the housekeeping CPU code to arch.
> +
> + /*
> + * The hardware counter is reset (because cfg_en == 1) so there is no
> + * need to record initial non-zero counts.
> + */
> + if (assign) {
> + am = get_arch_mbm_state(hw_dom, rmid, evtid);
> + if (am)
> + memset(am, 0, sizeof(*am));
> + }
I am not able to recognize how the struct rdt_resource parameter is used. What am I missing?
> +}
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index a58dd40b7246..1539d1faa1a1 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -594,6 +594,25 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
> */
> void resctrl_arch_reset_all_ctrls(struct rdt_resource *r);
>
> +/**
> + * resctrl_arch_config_cntr() - Configure the counter with its new RMID
> + * and event details.
> + * @r: Resource structure.
> + * @d: The domain in which the counter ID is to be configured.
"The domain in which the counter should be configured." or "The domain in which counter
with ID @cntr_id should be configured."?
> + * @evtid: Monitoring event type (e.g., QOS_L3_MBM_TOTAL_EVENT_ID
> + * or QOS_L3_MBM_LOCAL_EVENT_ID).
> + * @rmid: RMID.
> + * @closid: CLOSID.
> + * @cntr_id: Counter ID to configure.
> + * @assign: True to assign the counter, false to unassign
> + * the counter.
The changelog and comments only mention counter "assignment" but this same call
is used to update an existing assignment, which on ABMC are done the same.
This may be ok for now but I think it will be helpful to amend the above to say
something like:
* @assign: True to assign the counter or update an existing assignment, false to unassign
* the counter.
> + *
> + * This can be called from any CPU.
> + */
> +void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
> + u32 cntr_id, bool assign);
> +
> extern unsigned int resctrl_rmid_realloc_threshold;
> extern unsigned int resctrl_rmid_realloc_limit;
>
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 17/32] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2025-06-25 3:03 ` Reinette Chatre
@ 2025-06-26 17:41 ` Moger, Babu
2025-06-26 18:02 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-06-26 17:41 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 22:03, Reinette Chatre wrote:
> Hi Babu,
>
> With the new arch API this have "x86,fs/resctrl" prefix.
Sure.
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> The ABMC feature allows users to assign a hardware counter ID to an RMID,
>> event pair and monitor bandwidth usage as long as it is assigned. The
>> hardware continues to track the assigned counter until it is explicitly
>> unassigned by the user.
>>
>> Implement an architecture-specific handler to assign and unassign the
>> counter. Configure counters by writing to the L3_QOS_ABMC_CFG MSR,
>> specifying the counter ID, bandwidth source (RMID), and event
>> configuration.
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> ---
>> arch/x86/kernel/cpu/resctrl/monitor.c | 38 +++++++++++++++++++++++++++
>> include/linux/resctrl.h | 19 ++++++++++++++
>> 2 files changed, 57 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 0ad9c731c13e..6b0ea4b17c7a 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -444,3 +444,41 @@ bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
>> {
>> return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
>> }
>> +
>> +static void resctrl_abmc_config_one_amd(void *info)
>> +{
>> + union l3_qos_abmc_cfg *abmc_cfg = info;
>> +
>> + wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg->full);
>> +}
>> +
>> +/*
>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>> + */
>> +void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> + u32 cntr_id, bool assign)
>> +{
>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
>> + struct arch_mbm_state *am;
>> +
>> + abmc_cfg.split.cfg_en = 1;
>> + abmc_cfg.split.cntr_en = assign ? 1 : 0;
>> + abmc_cfg.split.cntr_id = cntr_id;
>> + abmc_cfg.split.bw_src = rmid;
>> + if (assign)
>> + abmc_cfg.split.bw_type = resctrl_get_mon_evt_cfg(evtid);
>> +
>> + smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &abmc_cfg, 1);
>
> An optimization to consider is to direct the IPI to a housekeeping CPU.
> If one exist, a further optimization could be to queue it on that CPU
> instead of IPI. Your call since I am not familiar with the use cases here.
> Looks like all paths leading to this is triggered by a user space action
> (mount, mkdir, or write to update event config). This would require exposing
> the housekeeping CPU code to arch.
Do you mean something like this?
cpu = cpumask_any_housekeeping(&d->hdr.cpu_mask, RESCTRL_PICK_ANY_CPU);
smp_call_on_cpu(cpu, resctrl_abmc_config_one_amd, &abmc_cfg, false);
You want to do these changes now or later? It requires few other changes
to move around the code.
>
>> +
>> + /*
>> + * The hardware counter is reset (because cfg_en == 1) so there is no
>> + * need to record initial non-zero counts.
>> + */
>> + if (assign) {
>> + am = get_arch_mbm_state(hw_dom, rmid, evtid);
>> + if (am)
>> + memset(am, 0, sizeof(*am));
>> + }
>
> I am not able to recognize how the struct rdt_resource parameter is used. What am I missing?
No. It is not used here. It is kept as other arch's can use it. I think
James commented about it earlier.
>
>> +}
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index a58dd40b7246..1539d1faa1a1 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -594,6 +594,25 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
>> */
>> void resctrl_arch_reset_all_ctrls(struct rdt_resource *r);
>>
>> +/**
>> + * resctrl_arch_config_cntr() - Configure the counter with its new RMID
>> + * and event details.
>> + * @r: Resource structure.
>> + * @d: The domain in which the counter ID is to be configured.
>
> "The domain in which the counter should be configured." or "The domain in which counter
> with ID @cntr_id should be configured."?
Added
"The domain in which counter with ID @cntr_id should be configured."
>
>> + * @evtid: Monitoring event type (e.g., QOS_L3_MBM_TOTAL_EVENT_ID
>> + * or QOS_L3_MBM_LOCAL_EVENT_ID).
>> + * @rmid: RMID.
>> + * @closid: CLOSID.
>> + * @cntr_id: Counter ID to configure.
>> + * @assign: True to assign the counter, false to unassign
>> + * the counter.
>
> The changelog and comments only mention counter "assignment" but this same call
> is used to update an existing assignment, which on ABMC are done the same.
> This may be ok for now but I think it will be helpful to amend the above to say
> something like:
>
> * @assign: True to assign the counter or update an existing assignment, false to unassign
> * the counter.
Sure.
>
>> + *
>> + * This can be called from any CPU.
>> + */
>> +void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> + u32 cntr_id, bool assign);
>> +
>> extern unsigned int resctrl_rmid_realloc_threshold;
>> extern unsigned int resctrl_rmid_realloc_limit;
>>
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 17/32] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2025-06-26 17:41 ` Moger, Babu
@ 2025-06-26 18:02 ` Reinette Chatre
2025-06-26 18:35 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-26 18:02 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/26/25 10:41 AM, Moger, Babu wrote:
> On 6/24/25 22:03, Reinette Chatre wrote:
>> On 6/13/25 2:05 PM, Babu Moger wrote:
..
>>> ---
>>> arch/x86/kernel/cpu/resctrl/monitor.c | 38 +++++++++++++++++++++++++++
>>> include/linux/resctrl.h | 19 ++++++++++++++
>>> 2 files changed, 57 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> index 0ad9c731c13e..6b0ea4b17c7a 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> @@ -444,3 +444,41 @@ bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
>>> {
>>> return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
>>> }
>>> +
>>> +static void resctrl_abmc_config_one_amd(void *info)
>>> +{
>>> + union l3_qos_abmc_cfg *abmc_cfg = info;
>>> +
>>> + wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg->full);
>>> +}
>>> +
>>> +/*
>>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>>> + */
>>> +void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>>> + u32 cntr_id, bool assign)
>>> +{
>>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>>> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
>>> + struct arch_mbm_state *am;
>>> +
>>> + abmc_cfg.split.cfg_en = 1;
>>> + abmc_cfg.split.cntr_en = assign ? 1 : 0;
>>> + abmc_cfg.split.cntr_id = cntr_id;
>>> + abmc_cfg.split.bw_src = rmid;
>>> + if (assign)
>>> + abmc_cfg.split.bw_type = resctrl_get_mon_evt_cfg(evtid);
>>> +
>>> + smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &abmc_cfg, 1);
>>
>> An optimization to consider is to direct the IPI to a housekeeping CPU.
>> If one exist, a further optimization could be to queue it on that CPU
>> instead of IPI. Your call since I am not familiar with the use cases here.
>> Looks like all paths leading to this is triggered by a user space action
>> (mount, mkdir, or write to update event config). This would require exposing
>> the housekeeping CPU code to arch.
>
> Do you mean something like this?
>
> cpu = cpumask_any_housekeeping(&d->hdr.cpu_mask, RESCTRL_PICK_ANY_CPU);
>
> smp_call_on_cpu(cpu, resctrl_abmc_config_one_amd, &abmc_cfg, false);
Please note the returned "cpu" may be nohz_full and if it is it would need
an IPI. Similar to mon_event_read().
>
>
> You want to do these changes now or later? It requires few other changes
> to move around the code.
I'll leave this up to you. I think it would be ideal if cpumask_any_housekeeping()
can be hosted in include/linux/cpumask.h instead of moving it around within
resctrl.
>
>>
>>> +
>>> + /*
>>> + * The hardware counter is reset (because cfg_en == 1) so there is no
>>> + * need to record initial non-zero counts.
>>> + */
>>> + if (assign) {
>>> + am = get_arch_mbm_state(hw_dom, rmid, evtid);
>>> + if (am)
>>> + memset(am, 0, sizeof(*am));
>>> + }
>>
>> I am not able to recognize how the struct rdt_resource parameter is used. What am I missing?
>
> No. It is not used here. It is kept as other arch's can use it. I think
> James commented about it earlier.
I see, thank you.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 17/32] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2025-06-26 18:02 ` Reinette Chatre
@ 2025-06-26 18:35 ` Moger, Babu
2025-06-26 20:24 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-06-26 18:35 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/26/25 13:02, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/26/25 10:41 AM, Moger, Babu wrote:
>> On 6/24/25 22:03, Reinette Chatre wrote:
>>> On 6/13/25 2:05 PM, Babu Moger wrote:
>
> ..
>
>>>> ---
>>>> arch/x86/kernel/cpu/resctrl/monitor.c | 38 +++++++++++++++++++++++++++
>>>> include/linux/resctrl.h | 19 ++++++++++++++
>>>> 2 files changed, 57 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> index 0ad9c731c13e..6b0ea4b17c7a 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> @@ -444,3 +444,41 @@ bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
>>>> {
>>>> return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
>>>> }
>>>> +
>>>> +static void resctrl_abmc_config_one_amd(void *info)
>>>> +{
>>>> + union l3_qos_abmc_cfg *abmc_cfg = info;
>>>> +
>>>> + wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg->full);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>>>> + */
>>>> +void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>>>> + u32 cntr_id, bool assign)
>>>> +{
>>>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>>>> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
>>>> + struct arch_mbm_state *am;
>>>> +
>>>> + abmc_cfg.split.cfg_en = 1;
>>>> + abmc_cfg.split.cntr_en = assign ? 1 : 0;
>>>> + abmc_cfg.split.cntr_id = cntr_id;
>>>> + abmc_cfg.split.bw_src = rmid;
>>>> + if (assign)
>>>> + abmc_cfg.split.bw_type = resctrl_get_mon_evt_cfg(evtid);
>>>> +
>>>> + smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &abmc_cfg, 1);
>>>
>>> An optimization to consider is to direct the IPI to a housekeeping CPU.
>>> If one exist, a further optimization could be to queue it on that CPU
>>> instead of IPI. Your call since I am not familiar with the use cases here.
>>> Looks like all paths leading to this is triggered by a user space action
>>> (mount, mkdir, or write to update event config). This would require exposing
>>> the housekeeping CPU code to arch.
>>
>> Do you mean something like this?
>>
>> cpu = cpumask_any_housekeeping(&d->hdr.cpu_mask, RESCTRL_PICK_ANY_CPU);
>>
>> smp_call_on_cpu(cpu, resctrl_abmc_config_one_amd, &abmc_cfg, false);
>
> Please note the returned "cpu" may be nohz_full and if it is it would need
> an IPI. Similar to mon_event_read().
>
Yes. Got that.
>>
>>
>> You want to do these changes now or later? It requires few other changes
>> to move around the code.
>
> I'll leave this up to you. I think it would be ideal if cpumask_any_housekeeping()
> can be hosted in include/linux/cpumask.h instead of moving it around within
> resctrl.
>
ok. It will be couple more patches to re-arrange all the related code. I
would prefer its done separately.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 17/32] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2025-06-26 18:35 ` Moger, Babu
@ 2025-06-26 20:24 ` Reinette Chatre
0 siblings, 0 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-06-26 20:24 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/26/25 11:35 AM, Moger, Babu wrote:
> On 6/26/25 13:02, Reinette Chatre wrote:
>> On 6/26/25 10:41 AM, Moger, Babu wrote:
>>> On 6/24/25 22:03, Reinette Chatre wrote:
>>>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>
>> ..
>>
>>>>> ---
>>>>> arch/x86/kernel/cpu/resctrl/monitor.c | 38 +++++++++++++++++++++++++++
>>>>> include/linux/resctrl.h | 19 ++++++++++++++
>>>>> 2 files changed, 57 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> index 0ad9c731c13e..6b0ea4b17c7a 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> @@ -444,3 +444,41 @@ bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
>>>>> {
>>>>> return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
>>>>> }
>>>>> +
>>>>> +static void resctrl_abmc_config_one_amd(void *info)
>>>>> +{
>>>>> + union l3_qos_abmc_cfg *abmc_cfg = info;
>>>>> +
>>>>> + wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg->full);
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>>>>> + */
>>>>> +void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>>>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>>>>> + u32 cntr_id, bool assign)
>>>>> +{
>>>>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>>>>> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
>>>>> + struct arch_mbm_state *am;
>>>>> +
>>>>> + abmc_cfg.split.cfg_en = 1;
>>>>> + abmc_cfg.split.cntr_en = assign ? 1 : 0;
>>>>> + abmc_cfg.split.cntr_id = cntr_id;
>>>>> + abmc_cfg.split.bw_src = rmid;
>>>>> + if (assign)
>>>>> + abmc_cfg.split.bw_type = resctrl_get_mon_evt_cfg(evtid);
>>>>> +
>>>>> + smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &abmc_cfg, 1);
>>>>
>>>> An optimization to consider is to direct the IPI to a housekeeping CPU.
>>>> If one exist, a further optimization could be to queue it on that CPU
>>>> instead of IPI. Your call since I am not familiar with the use cases here.
>>>> Looks like all paths leading to this is triggered by a user space action
>>>> (mount, mkdir, or write to update event config). This would require exposing
>>>> the housekeeping CPU code to arch.
>>>
>>> Do you mean something like this?
>>>
>>> cpu = cpumask_any_housekeeping(&d->hdr.cpu_mask, RESCTRL_PICK_ANY_CPU);
>>>
>>> smp_call_on_cpu(cpu, resctrl_abmc_config_one_amd, &abmc_cfg, false);
>>
>> Please note the returned "cpu" may be nohz_full and if it is it would need
>> an IPI. Similar to mon_event_read().
>>
>
> Yes. Got that.
>
>>>
>>>
>>> You want to do these changes now or later? It requires few other changes
>>> to move around the code.
>>
>> I'll leave this up to you. I think it would be ideal if cpumask_any_housekeeping()
>> can be hosted in include/linux/cpumask.h instead of moving it around within
>> resctrl.
>>
>
> ok. It will be couple more patches to re-arrange all the related code. I
> would prefer its done separately.
>
This is ok with me. Thank you.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 18/32] fs/resctrl: Add the functionality to assign MBM events
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (16 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 17/32] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 3:32 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 19/32] fs/resctrl: Add the functionality to unassign " Babu Moger
` (15 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
When supported "mbm_event" mode offers "num_mbm_cntrs" number of counters
that can be assigned to RMID, event pairs and monitor bandwidth usage as
long as it is assigned.
Add the functionality to allocate and assign a counter ID to an RMID, event
pair in the domain.
If all the counters are in use, kernel will log the error message "Unable
to allocate counter in domain" in /sys/fs/resctrl/info/last_cmd_status
when a new assignment is requested. Exit on the first failure when
assigning counters across all the domains.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Updated the changelog little bit.
Updated the code documentation for mbm_cntr_alloc() and mbm_cntr_get().
Passed struct mon_evt to resctrl_assign_cntr_event() that way to avoid
back and forth calls to get event details.
Updated the code documentation about the failure when counters are exhasted.
Changed subject line to fs/resctrl.
v13: Updated changelog.
Changed resctrl_arch_config_cntr() to return void instead of int.
Just passing evtid is to resctrl_alloc_config_cntr() and
resctrl_assign_cntr_event(). Event configuration value can be easily
obtained from mon_evt list.
Introduced new function mbm_get_mon_event() to get event configuration value.
Added prototype descriptions to mbm_cntr_get() and mbm_cntr_alloc().
Resolved conflicts caused by the recent FS/ARCH code restructure.
The files monitor.c/rdtgroup.c have been split between FS and ARCH directories.
v12: Fixed typo in the subjest line.
Replaced several counters with "num_mbm_cntrs" counters.
Changed the check in resctrl_alloc_config_cntr() to reduce the indentation.
Fixed the handling error on first failure.
Added domain id and event id on failure.
Fixed the return error override.
Added new parameter event configuration (evt_cfg) to get the event configuration
from user space.
v11: Patch changed again quite a bit.
Moved the functions to monitor.c.
Renamed rdtgroup_assign_cntr_event() to resctrl_assign_cntr_event().
Refactored the resctrl_assign_cntr_event().
Added functionality to exit on the first error during assignment.
Simplified mbm_cntr_free().
Removed the function mbm_cntr_assigned(). Will be using mbm_cntr_get() to
figure out if the counter is assigned or not.
Updated commit message and code comments.
v10: Patch changed completely.
Counters are managed at the domain based on the discussion.
https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
Reset non-architectural MBM state.
Commit message update.
v9: Introduced new function resctrl_config_cntr to assign the counter, update
the bitmap and reset the architectural state.
Taken care of error handling(freeing the counter) when assignment fails.
Moved mbm_cntr_assigned_to_domain here as it used in this patch.
Minor text changes.
v8: Renamed rdtgroup_assign_cntr() to rdtgroup_assign_cntr_event().
Added the code to return the error if rdtgroup_assign_cntr_event fails.
Moved definition of MBM_EVENT_ARRAY_INDEX to resctrl/internal.h.
Updated typo in the comments.
v7: New patch. Moved all the FS code here.
Merged rdtgroup_assign_cntr and rdtgroup_alloc_cntr.
Adde new #define MBM_EVENT_ARRAY_INDEX.
---
fs/resctrl/internal.h | 3 +
fs/resctrl/monitor.c | 134 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 137 insertions(+)
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 71059c2cda16..0767a1c46f26 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -386,6 +386,9 @@ bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
+int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, struct mon_evt *mevt);
+
#ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 3e1a8239b0d3..38800fe45931 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -950,3 +950,137 @@ void resctrl_mon_resource_exit(void)
dom_data_exit(r);
}
+
+/**
+ * resctrl_config_cntr() - Configure the counter ID for the event, RMID pair in
+ * the domain.
+ *
+ * Assign the counter if @assign is true else unassign the counter. Reset the
+ * associated non-architectural state.
+ */
+static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ enum resctrl_event_id evtid, u32 rmid, u32 closid,
+ u32 cntr_id, bool assign)
+{
+ struct mbm_state *m;
+
+ resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id, assign);
+
+ m = get_mbm_state(d, closid, rmid, evtid);
+ if (m)
+ memset(m, 0, sizeof(struct mbm_state));
+}
+
+/**
+ * mbm_cntr_get() - Return the counter ID for the matching @evtid and @rdtgrp.
+ *
+ * Return:
+ * Valid counter ID on success, or -ENOENT on failure.
+ */
+static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+ int cntr_id;
+
+ if (!resctrl_is_mbm_event(evtid))
+ return -ENOENT;
+
+ for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
+ if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
+ d->cntr_cfg[cntr_id].evtid == evtid)
+ return cntr_id;
+ }
+
+ return -ENOENT;
+}
+
+/**
+ * mbm_cntr_alloc() - Initilialize and return a new counter ID in the domain @d.
+ *
+ * Return:
+ * Valid counter ID on success, or -ENOSPC on failure.
+ */
+static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+ int cntr_id;
+
+ for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
+ if (!d->cntr_cfg[cntr_id].rdtgrp) {
+ d->cntr_cfg[cntr_id].rdtgrp = rdtgrp;
+ d->cntr_cfg[cntr_id].evtid = evtid;
+ return cntr_id;
+ }
+ }
+
+ return -ENOSPC;
+}
+
+/**
+ * resctrl_alloc_config_cntr() - Allocate a counter ID and configure it for the
+ * event pointed to by @mevt and the resctrl group @rdtgrp within the domain @d.
+ *
+ * Return:
+ * 0 on success, or a non-zero value on failure.
+ */
+static int resctrl_alloc_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, struct mon_evt *mevt)
+{
+ int cntr_id;
+
+ /* No need to allocate a new counter if it is already assigned */
+ cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
+ if (cntr_id >= 0)
+ goto cntr_configure;
+
+ cntr_id = mbm_cntr_alloc(r, d, rdtgrp, mevt->evtid);
+ if (cntr_id < 0) {
+ rdt_last_cmd_printf("Unable to allocate counter in domain %d\n",
+ d->hdr.id);
+ return cntr_id;
+ }
+
+cntr_configure:
+ /*
+ * Skip reconfiguration if the event setup is current; otherwise,
+ * update and apply the new configuration to the domain.
+ */
+ if (mevt->evt_cfg != d->cntr_cfg[cntr_id].evt_cfg) {
+ d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
+ resctrl_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
+ rdtgrp->closid, cntr_id, true);
+ }
+
+ return 0;
+}
+
+/**
+ * resctrl_assign_cntr_event() - Assign a hardware counter for the event in
+ * @mevt to the resctrl group @rdtgrp. Assign counters to all domains if @d is
+ * NULL; otherwise, assign the counter to the specified domain @d.
+ *
+ * If all counters in a domain are already in use, resctrl_alloc_config_cntr()
+ * will fail. The assignment process will abort at the first failure encountered
+ * during domain traversal, which may result in the event being only partially
+ * assigned.
+ *
+ * Return:
+ * 0 on success, or a non-zero value on failure.
+ */
+int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, struct mon_evt *mevt)
+{
+ int ret = 0;
+
+ if (!d) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ ret = resctrl_alloc_config_cntr(r, d, rdtgrp, mevt);
+ if (ret)
+ return ret;
+ }
+ } else {
+ ret = resctrl_alloc_config_cntr(r, d, rdtgrp, mevt);
+ }
+
+ return ret;
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 18/32] fs/resctrl: Add the functionality to assign MBM events
2025-06-13 21:05 ` [PATCH v14 18/32] fs/resctrl: Add the functionality to assign MBM events Babu Moger
@ 2025-06-25 3:32 ` Reinette Chatre
2025-06-26 19:31 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 3:32 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> When supported "mbm_event" mode offers "num_mbm_cntrs" number of counters
"When supported, "mbm_event" counter assignment mode offers ..."?
> that can be assigned to RMID, event pairs and monitor bandwidth usage as
> long as it is assigned.
>
> Add the functionality to allocate and assign a counter ID to an RMID, event
> pair in the domain.
>
> If all the counters are in use, kernel will log the error message "Unable
> to allocate counter in domain" in /sys/fs/resctrl/info/last_cmd_status
> when a new assignment is requested. Exit on the first failure when
> assigning counters across all the domains.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> fs/resctrl/internal.h | 3 +
> fs/resctrl/monitor.c | 134 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 137 insertions(+)
>
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 71059c2cda16..0767a1c46f26 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -386,6 +386,9 @@ bool closid_allocated(unsigned int closid);
>
> int resctrl_find_cleanest_closid(void);
>
> +int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, struct mon_evt *mevt);
> +
> #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
> int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
>
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 3e1a8239b0d3..38800fe45931 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -950,3 +950,137 @@ void resctrl_mon_resource_exit(void)
>
> dom_data_exit(r);
> }
> +
> +/**
> + * resctrl_config_cntr() - Configure the counter ID for the event, RMID pair in
> + * the domain.
> + *
> + * Assign the counter if @assign is true else unassign the counter. Reset the
> + * associated non-architectural state.
A few reports came through about the kernel-doc issues but I did not see a
discussion finalize on how to resolve them. I do not think it is required for these
static functions to have full kernel-doc. Just having useful comments without
kernel-doc style is valuable. Some kernel-doc syntax can still be useful though, like
above when referring to the parameters. It is ok to keep doing so even if section
does not start with /**.
Where I think kernel-doc is important is include/linux/resctrl.h.
> + */
> +static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
> + u32 cntr_id, bool assign)
If resctrl_arch_config_cntr() does not need a struct resource then resctrl_config_cntr()
may not either?
> +{
> + struct mbm_state *m;
> +
> + resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id, assign);
> +
> + m = get_mbm_state(d, closid, rmid, evtid);
> + if (m)
> + memset(m, 0, sizeof(struct mbm_state));
sizeof(*m).
> +}
> +
> +/**
> + * mbm_cntr_get() - Return the counter ID for the matching @evtid and @rdtgrp.
> + *
> + * Return:
> + * Valid counter ID on success, or -ENOENT on failure.
> + */
> +static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> + int cntr_id;
> +
Since mbm_cntr_get() is called in regular flows, could you please also
add an explicit check to return -ENOENT if !r->mon.mbm_cntr_assignable?
Otherwise this is quite subtle with the assumption that
r->mon.num_mbm_cntrs is zero in this case.
> + if (!resctrl_is_mbm_event(evtid))
> + return -ENOENT;
> +
> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
> + if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
> + d->cntr_cfg[cntr_id].evtid == evtid)
> + return cntr_id;
> + }
> +
> + return -ENOENT;
> +}
> +
> +/**
> + * mbm_cntr_alloc() - Initilialize and return a new counter ID in the domain @d.
"Initilialize" -> "Initialize"
> + *
mbm_cntr_alloc() will allocate a counter to a RMID/event pair even
if that pair already has a counter assigned. The doc should note that caveat
here with documentation that the caller is responsible for checking that
a counter is not already assigned.
> + * Return:
> + * Valid counter ID on success, or -ENOSPC on failure.
> + */
> +static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> + int cntr_id;
> +
> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
> + if (!d->cntr_cfg[cntr_id].rdtgrp) {
> + d->cntr_cfg[cntr_id].rdtgrp = rdtgrp;
> + d->cntr_cfg[cntr_id].evtid = evtid;
> + return cntr_id;
> + }
> + }
> +
> + return -ENOSPC;
> +}
> +
> +/**
> + * resctrl_alloc_config_cntr() - Allocate a counter ID and configure it for the
> + * event pointed to by @mevt and the resctrl group @rdtgrp within the domain @d.
> + *
> + * Return:
> + * 0 on success, or a non-zero value on failure.
"or a non-zero value on failure." -> "<0 on failure"
> + */
> +static int resctrl_alloc_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, struct mon_evt *mevt)
> +{
> + int cntr_id;
> +
> + /* No need to allocate a new counter if it is already assigned */
> + cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
> + if (cntr_id >= 0)
> + goto cntr_configure;
> +
> + cntr_id = mbm_cntr_alloc(r, d, rdtgrp, mevt->evtid);
> + if (cntr_id < 0) {
> + rdt_last_cmd_printf("Unable to allocate counter in domain %d\n",
> + d->hdr.id);
> + return cntr_id;
> + }
> +
> +cntr_configure:
> + /*
> + * Skip reconfiguration if the event setup is current; otherwise,
> + * update and apply the new configuration to the domain.
When could "event setup" *not* be current? As mentioned in earlier patch
I do not see why mon_evt::evt_cfg as well as mbm_cntr_cfg::evt_cfg is
needed. There should be no need to keep these two "in sync" with
only mon_evt::evt_cfg as the source of configuration. I seem to be missing
something here, could you please detail this scenario?
> + */
> + if (mevt->evt_cfg != d->cntr_cfg[cntr_id].evt_cfg) {
> + d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
> + resctrl_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
> + rdtgrp->closid, cntr_id, true);
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * resctrl_assign_cntr_event() - Assign a hardware counter for the event in
> + * @mevt to the resctrl group @rdtgrp. Assign counters to all domains if @d is
> + * NULL; otherwise, assign the counter to the specified domain @d.
> + *
> + * If all counters in a domain are already in use, resctrl_alloc_config_cntr()
> + * will fail. The assignment process will abort at the first failure encountered
> + * during domain traversal, which may result in the event being only partially
> + * assigned.
> + *
> + * Return:
> + * 0 on success, or a non-zero value on failure.
"or a non-zero value on failure" -> "<0 on failure"
> + */
> +int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, struct mon_evt *mevt)
> +{
> + int ret = 0;
> +
> + if (!d) {
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + ret = resctrl_alloc_config_cntr(r, d, rdtgrp, mevt);
> + if (ret)
> + return ret;
> + }
> + } else {
> + ret = resctrl_alloc_config_cntr(r, d, rdtgrp, mevt);
> + }
> +
> + return ret;
> +}
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 18/32] fs/resctrl: Add the functionality to assign MBM events
2025-06-25 3:32 ` Reinette Chatre
@ 2025-06-26 19:31 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-26 19:31 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
On 6/24/25 22:32, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> When supported "mbm_event" mode offers "num_mbm_cntrs" number of counters
>
> "When supported, "mbm_event" counter assignment mode offers ..."?
Sure.
>
>> that can be assigned to RMID, event pairs and monitor bandwidth usage as
>> long as it is assigned.
>>
>> Add the functionality to allocate and assign a counter ID to an RMID, event
>> pair in the domain.
>>
>> If all the counters are in use, kernel will log the error message "Unable
>> to allocate counter in domain" in /sys/fs/resctrl/info/last_cmd_status
>> when a new assignment is requested. Exit on the first failure when
>> assigning counters across all the domains.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> ---
>> fs/resctrl/internal.h | 3 +
>> fs/resctrl/monitor.c | 134 ++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 137 insertions(+)
>>
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 71059c2cda16..0767a1c46f26 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -386,6 +386,9 @@ bool closid_allocated(unsigned int closid);
>>
>> int resctrl_find_cleanest_closid(void);
>>
>> +int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, struct mon_evt *mevt);
>> +
>> #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
>> int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
>>
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 3e1a8239b0d3..38800fe45931 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -950,3 +950,137 @@ void resctrl_mon_resource_exit(void)
>>
>> dom_data_exit(r);
>> }
>> +
>> +/**
>> + * resctrl_config_cntr() - Configure the counter ID for the event, RMID pair in
>> + * the domain.
>> + *
>> + * Assign the counter if @assign is true else unassign the counter. Reset the
>> + * associated non-architectural state.
>
> A few reports came through about the kernel-doc issues but I did not see a
> discussion finalize on how to resolve them. I do not think it is required for these
> static functions to have full kernel-doc. Just having useful comments without
> kernel-doc style is valuable. Some kernel-doc syntax can still be useful though, like
> above when referring to the parameters. It is ok to keep doing so even if section
> does not start with /**.
Sure. Thanks
>
> Where I think kernel-doc is important is include/linux/resctrl.h.
Sure.
>
>> + */
>> +static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> + u32 cntr_id, bool assign)
>
> If resctrl_arch_config_cntr() does not need a struct resource then resctrl_config_cntr()
> may not either?
>
>> +{
>> + struct mbm_state *m;
>> +
>> + resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id, assign);
>> +
>> + m = get_mbm_state(d, closid, rmid, evtid);
>> + if (m)
>> + memset(m, 0, sizeof(struct mbm_state));
>
> sizeof(*m).
Sure.
>
>> +}
>> +
>> +/**
>> + * mbm_cntr_get() - Return the counter ID for the matching @evtid and @rdtgrp.
>> + *
>> + * Return:
>> + * Valid counter ID on success, or -ENOENT on failure.
>> + */
>> +static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
>> + int cntr_id;
>> +
>
> Since mbm_cntr_get() is called in regular flows, could you please also
> add an explicit check to return -ENOENT if !r->mon.mbm_cntr_assignable?
> Otherwise this is quite subtle with the assumption that
> r->mon.num_mbm_cntrs is zero in this case.
Sure. Added the check.
if (!r->mon.mbm_cntr_assignable)
return -ENOENT;
>
>> + if (!resctrl_is_mbm_event(evtid))
>> + return -ENOENT;
>> +
>> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
>> + if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
>> + d->cntr_cfg[cntr_id].evtid == evtid)
>> + return cntr_id;
>> + }
>> +
>> + return -ENOENT;
>> +}
>> +
>> +/**
>> + * mbm_cntr_alloc() - Initilialize and return a new counter ID in the domain @d.
>
> "Initilialize" -> "Initialize"
Sure.
>
>> + *
>
> mbm_cntr_alloc() will allocate a counter to a RMID/event pair even
> if that pair already has a counter assigned. The doc should note that caveat
> here with documentation that the caller is responsible for checking that
> a counter is not already assigned.
Added the text.
Caller must ensure that the specified event is not assigned already.
>
>> + * Return:
>> + * Valid counter ID on success, or -ENOSPC on failure.
>> + */
>> +static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
>> + int cntr_id;
>> +
>> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
>> + if (!d->cntr_cfg[cntr_id].rdtgrp) {
>> + d->cntr_cfg[cntr_id].rdtgrp = rdtgrp;
>> + d->cntr_cfg[cntr_id].evtid = evtid;
>> + return cntr_id;
>> + }
>> + }
>> +
>> + return -ENOSPC;
>> +}
>> +
>> +/**
>> + * resctrl_alloc_config_cntr() - Allocate a counter ID and configure it for the
>> + * event pointed to by @mevt and the resctrl group @rdtgrp within the domain @d.
>> + *
>> + * Return:
>> + * 0 on success, or a non-zero value on failure.
>
> "or a non-zero value on failure." -> "<0 on failure"
Sure.
>
>> + */
>> +static int resctrl_alloc_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, struct mon_evt *mevt)
>> +{
>> + int cntr_id;
>> +
>> + /* No need to allocate a new counter if it is already assigned */
>> + cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
>> + if (cntr_id >= 0)
>> + goto cntr_configure;
>> +
>> + cntr_id = mbm_cntr_alloc(r, d, rdtgrp, mevt->evtid);
>> + if (cntr_id < 0) {
>> + rdt_last_cmd_printf("Unable to allocate counter in domain %d\n",
>> + d->hdr.id);
>> + return cntr_id;
>> + }
>> +
>> +cntr_configure:
>> + /*
>> + * Skip reconfiguration if the event setup is current; otherwise,
>> + * update and apply the new configuration to the domain.
>
> When could "event setup" *not* be current? As mentioned in earlier patch
> I do not see why mon_evt::evt_cfg as well as mbm_cntr_cfg::evt_cfg is
> needed. There should be no need to keep these two "in sync" with
> only mon_evt::evt_cfg as the source of configuration. I seem to be missing
> something here, could you please detail this scenario?
As discussed earlier, removed the following check. Return success if the
counter is assigned already.
https://lore.kernel.org/lkml/887bad33-7f4a-4b6d-95a7-fdfe0451f42b@intel.com/
>
>> + */
>> + if (mevt->evt_cfg != d->cntr_cfg[cntr_id].evt_cfg) {
>> + d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
>> + resctrl_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
>> + rdtgrp->closid, cntr_id, true);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/**
>> + * resctrl_assign_cntr_event() - Assign a hardware counter for the event in
>> + * @mevt to the resctrl group @rdtgrp. Assign counters to all domains if @d is
>> + * NULL; otherwise, assign the counter to the specified domain @d.
>> + *
>> + * If all counters in a domain are already in use, resctrl_alloc_config_cntr()
>> + * will fail. The assignment process will abort at the first failure encountered
>> + * during domain traversal, which may result in the event being only partially
>> + * assigned.
>> + *
>> + * Return:
>> + * 0 on success, or a non-zero value on failure.
>
> "or a non-zero value on failure" -> "<0 on failure"
>
Sure.
>> + */
>> +int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, struct mon_evt *mevt)
>> +{
>> + int ret = 0;
>> +
>> + if (!d) {
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + ret = resctrl_alloc_config_cntr(r, d, rdtgrp, mevt);
>> + if (ret)
>> + return ret;
>> + }
>> + } else {
>> + ret = resctrl_alloc_config_cntr(r, d, rdtgrp, mevt);
>> + }
>> +
>> + return ret;
>> +}
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 19/32] fs/resctrl: Add the functionality to unassign MBM events
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (17 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 18/32] fs/resctrl: Add the functionality to assign MBM events Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 3:38 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 20/32] fs/resctrl: Report 'Unassigned' for MBM events in mbm_event mode Babu Moger
` (14 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
The "mbm_event" mode offers "num_mbm_cntrs" number of counters that can be
assigned to RMID, event pairs and monitor bandwidth usage as long as it is
assigned. If all the counters are in use, the kernel logs the error message
"Unable to allocate counter in domain" in
/sys/fs/resctrl/info/last_cmd_status when a new assignment is requested.
To make space for a new assignment, users must unassign an already
assigned counter and retry the assignment again.
Add the functionality to unassign and free the counters in the domain.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Passing the struct mon_evt to resctrl_free_config_cntr() and removed
the need for mbm_get_mon_event() call.
Corrected the code documentation for mbm_cntr_free().
Changed resctrl_free_config_cntr() and resctrl_unassign_cntr_event()
to return void.
Changed subject line to fs/resctrl.
Updated the changelog.
v13: Moved mbm_cntr_free() to this patch as it is used in here first.
Not required to pass evt_cfg to resctrl_unassign_cntr_event(). It is
available via mbm_get_mon_event().
Resolved conflicts caused by the recent FS/ARCH code restructure.
The monitor.c file has now been split between the FS and ARCH directories.
v12: Updated the commit text to make bit more clear.
Replaced several counters with "num_mbm_cntrs" counters.
Fixed typo in the subjest line.
Fixed the handling error on first failure.
Added domain id and event id on failure.
Added new parameter event configuration (evt_cfg) to provide the event from
user space.
v11: Moved the functions to monitor.c.
Renamed rdtgroup_unassign_cntr_event() to resctrl_unassign_cntr_event().
Refactored the resctrl_unassign_cntr_event().
Updated commit message and code comments.
v10: Patch changed again.
Counters are managed at the domain based on the discussion.
https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
commit message update.
v9: Changes related to addition of new function resctrl_config_cntr().
The removed rdtgroup_mbm_cntr_is_assigned() as it was introduced
already.
Text changes to take care comments.
v8: Renamed rdtgroup_mbm_cntr_is_assigned to mbm_cntr_assigned_to_domain
Added return error handling in resctrl_arch_config_cntr().
v7: Merged rdtgroup_unassign_cntr and rdtgroup_free_cntr functions.
Renamed rdtgroup_mbm_cntr_test() to rdtgroup_mbm_cntr_is_assigned().
Reworded the commit log little bit.
v6: Removed mbm_cntr_free from this patch.
Added counter test in all the domains and free if it is not assigned to
any domains.
v5: Few name changes to match cntr_id.
Changed the function names to rdtgroup_unassign_cntr
More comments on commit log.
v4: Added domain specific unassign feature.
Few name changes.
v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
The function is not called directly from user anymore. These
changes are related to global assignment interface.
v2: No changes.
---
fs/resctrl/internal.h | 2 ++
fs/resctrl/monitor.c | 47 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 49 insertions(+)
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 0767a1c46f26..4496c359ac22 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -388,6 +388,8 @@ int resctrl_find_cleanest_closid(void);
int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
struct rdtgroup *rdtgrp, struct mon_evt *mevt);
+void resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, struct mon_evt *mevt);
#ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 38800fe45931..f2636aea6545 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1016,6 +1016,14 @@ static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
return -ENOSPC;
}
+/**
+ * mbm_cntr_free() - Clear the counter ID configuration details in the domain @d.
+ */
+static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
+{
+ memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
+}
+
/**
* resctrl_alloc_config_cntr() - Allocate a counter ID and configure it for the
* event pointed to by @mevt and the resctrl group @rdtgrp within the domain @d.
@@ -1084,3 +1092,42 @@ int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
return ret;
}
+
+/**
+ * resctrl_free_config_cntr() - Unassign and reset the counter ID configuration
+ * for the event pointed to by @mevt within the domain @d and resctrl group @rdtgrp.
+ */
+static void resctrl_free_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, struct mon_evt *mevt)
+{
+ int cntr_id;
+
+ cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
+
+ /* If there is no cntr_id assigned, nothing to do */
+ if (cntr_id < 0)
+ return;
+
+ resctrl_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid, rdtgrp->closid,
+ cntr_id, false);
+
+ mbm_cntr_free(d, cntr_id);
+
+ return;
+}
+
+/**
+ * resctrl_unassign_cntr_event() - Unassign a hardware counter associated with
+ * the event structure @mevt from the domain @d and the group @rdtgrp. Unassign
+ * the counters from all the domains if @d is NULL else unassign from @d.
+ */
+void resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, struct mon_evt *mevt)
+{
+ if (!d) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list)
+ resctrl_free_config_cntr(r, d, rdtgrp, mevt);
+ } else {
+ resctrl_free_config_cntr(r, d, rdtgrp, mevt);
+ }
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 19/32] fs/resctrl: Add the functionality to unassign MBM events
2025-06-13 21:05 ` [PATCH v14 19/32] fs/resctrl: Add the functionality to unassign " Babu Moger
@ 2025-06-25 3:38 ` Reinette Chatre
2025-06-26 21:12 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 3:38 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> The "mbm_event" mode offers "num_mbm_cntrs" number of counters that can be
"The "mbm_event" mode" -> "The "mbm_event" counter assignment mode"?
> assigned to RMID, event pairs and monitor bandwidth usage as long as it is
> assigned. If all the counters are in use, the kernel logs the error message
> "Unable to allocate counter in domain" in
> /sys/fs/resctrl/info/last_cmd_status when a new assignment is requested.
>
> To make space for a new assignment, users must unassign an already
> assigned counter and retry the assignment again.
>
> Add the functionality to unassign and free the counters in the domain.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> fs/resctrl/internal.h | 2 ++
> fs/resctrl/monitor.c | 47 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 49 insertions(+)
>
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 0767a1c46f26..4496c359ac22 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -388,6 +388,8 @@ int resctrl_find_cleanest_closid(void);
>
> int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> struct rdtgroup *rdtgrp, struct mon_evt *mevt);
> +void resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, struct mon_evt *mevt);
>
> #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
> int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 38800fe45931..f2636aea6545 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -1016,6 +1016,14 @@ static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
> return -ENOSPC;
> }
>
> +/**
> + * mbm_cntr_free() - Clear the counter ID configuration details in the domain @d.
> + */
> +static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
> +{
> + memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
sizeof(struct mbm_cntr_cfg) -> sizeof(*d->cntr_cfg[0])
> +}
> +
> /**
> * resctrl_alloc_config_cntr() - Allocate a counter ID and configure it for the
> * event pointed to by @mevt and the resctrl group @rdtgrp within the domain @d.
> @@ -1084,3 +1092,42 @@ int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>
> return ret;
> }
> +
> +/**
> + * resctrl_free_config_cntr() - Unassign and reset the counter ID configuration
> + * for the event pointed to by @mevt within the domain @d and resctrl group @rdtgrp.
> + */
> +static void resctrl_free_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, struct mon_evt *mevt)
> +{
> + int cntr_id;
> +
> + cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
> +
> + /* If there is no cntr_id assigned, nothing to do */
> + if (cntr_id < 0)
> + return;
> +
> + resctrl_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid, rdtgrp->closid,
> + cntr_id, false);
> +
> + mbm_cntr_free(d, cntr_id);
> +
> + return;
No need for this return.
> +}
> +
> +/**
> + * resctrl_unassign_cntr_event() - Unassign a hardware counter associated with
> + * the event structure @mevt from the domain @d and the group @rdtgrp. Unassign
> + * the counters from all the domains if @d is NULL else unassign from @d.
> + */
> +void resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, struct mon_evt *mevt)
> +{
> + if (!d) {
> + list_for_each_entry(d, &r->mon_domains, hdr.list)
> + resctrl_free_config_cntr(r, d, rdtgrp, mevt);
> + } else {
> + resctrl_free_config_cntr(r, d, rdtgrp, mevt);
> + }
> +}
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 19/32] fs/resctrl: Add the functionality to unassign MBM events
2025-06-25 3:38 ` Reinette Chatre
@ 2025-06-26 21:12 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-26 21:12 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 22:38, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> The "mbm_event" mode offers "num_mbm_cntrs" number of counters that can be
>
> "The "mbm_event" mode" -> "The "mbm_event" counter assignment mode"?
Sure.
>
>> assigned to RMID, event pairs and monitor bandwidth usage as long as it is
>> assigned. If all the counters are in use, the kernel logs the error message
>> "Unable to allocate counter in domain" in
>> /sys/fs/resctrl/info/last_cmd_status when a new assignment is requested.
>>
>> To make space for a new assignment, users must unassign an already
>> assigned counter and retry the assignment again.
>>
>> Add the functionality to unassign and free the counters in the domain.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> ...
>
>> ---
>> fs/resctrl/internal.h | 2 ++
>> fs/resctrl/monitor.c | 47 +++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 49 insertions(+)
>>
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 0767a1c46f26..4496c359ac22 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -388,6 +388,8 @@ int resctrl_find_cleanest_closid(void);
>>
>> int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>> struct rdtgroup *rdtgrp, struct mon_evt *mevt);
>> +void resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, struct mon_evt *mevt);
>>
>> #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
>> int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 38800fe45931..f2636aea6545 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -1016,6 +1016,14 @@ static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
>> return -ENOSPC;
>> }
>>
>> +/**
>> + * mbm_cntr_free() - Clear the counter ID configuration details in the domain @d.
>> + */
>> +static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
>> +{
>> + memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
>
> sizeof(struct mbm_cntr_cfg) -> sizeof(*d->cntr_cfg[0])
Sure. Changed it to.
memset(&d->cntr_cfg[cntr_id], 0, sizeof(*d->cntr_cfg));
>
>> +}
>> +
>> /**
>> * resctrl_alloc_config_cntr() - Allocate a counter ID and configure it for the
>> * event pointed to by @mevt and the resctrl group @rdtgrp within the domain @d.
>> @@ -1084,3 +1092,42 @@ int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>>
>> return ret;
>> }
>> +
>> +/**
>> + * resctrl_free_config_cntr() - Unassign and reset the counter ID configuration
>> + * for the event pointed to by @mevt within the domain @d and resctrl group @rdtgrp.
>> + */
>> +static void resctrl_free_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, struct mon_evt *mevt)
>> +{
>> + int cntr_id;
>> +
>> + cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
>> +
>> + /* If there is no cntr_id assigned, nothing to do */
>> + if (cntr_id < 0)
>> + return;
>> +
>> + resctrl_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid, rdtgrp->closid,
>> + cntr_id, false);
>> +
>> + mbm_cntr_free(d, cntr_id);
>> +
>> + return;
>
> No need for this return.
Sure.
>
>> +}
>> +
>> +/**
>> + * resctrl_unassign_cntr_event() - Unassign a hardware counter associated with
>> + * the event structure @mevt from the domain @d and the group @rdtgrp. Unassign
>> + * the counters from all the domains if @d is NULL else unassign from @d.
>> + */
>> +void resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, struct mon_evt *mevt)
>> +{
>> + if (!d) {
>> + list_for_each_entry(d, &r->mon_domains, hdr.list)
>> + resctrl_free_config_cntr(r, d, rdtgrp, mevt);
>> + } else {
>> + resctrl_free_config_cntr(r, d, rdtgrp, mevt);
>> + }
>> +}
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 20/32] fs/resctrl: Report 'Unassigned' for MBM events in mbm_event mode
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (18 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 19/32] fs/resctrl: Add the functionality to unassign " Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 4:14 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 21/32] fs/resctrl: Pass entire struct rdtgroup rather than passing individual members Babu Moger
` (13 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
When "mbm_event" mode is enabled, a hardware counter must be assigned to
read the event.
Report 'Unassigned' in case the user attempts to read the event without
assigning a hardware counter.
Export mbm_cntr_get() to allow usage from other functions within
fs/resctrl.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Updated the changelog.
Added the code comment for "-ENOENT" when counter is read without assignement.
Removed the references to resctrl_is_mbm_event().
v13: Minor commitlog and user doc update.
Resolved conflicts caused by the recent FS/ARCH code restructure.
The monitor.c/rdtgroup.c files have been split between the FS and ARCH directories.
v12: Updated the documentation for more clarity.
v11: Domain can be NULL with SNC support so moved the unassign check in
rdtgroup_mondata_show().
v10: Moved the code to check the assign state inside mon_event_read().
Fixed few text comments.
v9: Used is_mbm_event() to check the event type.
Minor user documentation update.
v8: Used MBM_EVENT_ARRAY_INDEX to get the index for the MBM event.
Documentation update to make the text generic.
v7: Moved the documentation under "mon_data".
Updated the text little bit.
v6: Added more explaination in the resctrl.rst
Added checks to detect "Unassigned" before reading RMID.
v5: New patch.
---
Documentation/filesystems/resctrl.rst | 8 ++++++++
fs/resctrl/ctrlmondata.c | 19 ++++++++++++++++++-
fs/resctrl/internal.h | 2 ++
fs/resctrl/monitor.c | 4 ++--
4 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 8a2050098091..18de335e1ff8 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -434,6 +434,14 @@ When monitoring is enabled all MON groups will also contain:
for the L3 cache they occupy). These are named "mon_sub_L3_YY"
where "YY" is the node number.
+ The "mbm_event" mode offers "num_mbm_cntrs" number of counters and
+ allows users to assign counter IDs to mon_hw_id, event pairs enabling
+ bandwidth monitoring for as long as the counter remains assigned. The
+ hardware will continue tracking the assigned mon_hw_id until the user
+ manually unassigns it, ensuring that event data is not reset during this
+ period. An MBM event returns 'Unassigned' when the event does not have
+ a hardware counter assigned.
+
"mon_hw_id":
Available only with debug option. The identifier used by hardware
for the monitor group. On x86 this is the RMID.
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index ad7ffc6acf13..8a182f506877 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -648,15 +648,32 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
goto out;
}
d = container_of(hdr, struct rdt_mon_domain, hdr);
+
+ /*
+ * Report 'Unassigned' if "mbm_event" mode is enabled and counter
+ * is unassigned.
+ */
+ if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
+ resctrl_is_mbm_event(evtid) &&
+ (mbm_cntr_get(r, d, rdtgrp, evtid) < 0)) {
+ rr.err = -ENOENT;
+ goto checkresult;
+ }
+
mon_event_read(&rr, r, d, rdtgrp, &d->hdr.cpu_mask, evtid, false);
}
checkresult:
-
+ /*
+ * -ENOENT is a special case, set only when "mbm_event" mode is enabled
+ * and no counter has been assigned.
+ */
if (rr.err == -EIO)
seq_puts(m, "Error\n");
else if (rr.err == -EINVAL)
seq_puts(m, "Unavailable\n");
+ else if (rr.err == -ENOENT)
+ seq_puts(m, "Unassigned\n");
else
seq_printf(m, "%llu\n", rr.val);
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 4496c359ac22..4a7130018aa1 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -390,6 +390,8 @@ int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
struct rdtgroup *rdtgrp, struct mon_evt *mevt);
void resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
struct rdtgroup *rdtgrp, struct mon_evt *mevt);
+int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
#ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index f2636aea6545..cf7f6a22ea51 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -977,8 +977,8 @@ static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d
* Return:
* Valid counter ID on success, or -ENOENT on failure.
*/
-static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
- struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
{
int cntr_id;
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 20/32] fs/resctrl: Report 'Unassigned' for MBM events in mbm_event mode
2025-06-13 21:05 ` [PATCH v14 20/32] fs/resctrl: Report 'Unassigned' for MBM events in mbm_event mode Babu Moger
@ 2025-06-25 4:14 ` Reinette Chatre
2025-06-27 1:34 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 4:14 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> When "mbm_event" mode is enabled, a hardware counter must be assigned to
"When the "mbm_event" counter assignment mode is enabled ..."
> read the event.
>
> Report 'Unassigned' in case the user attempts to read the event without
> assigning a hardware counter.
>
> Export mbm_cntr_get() to allow usage from other functions within
"Export" can be a loaded term in the Linux kernel. Perhaps:
"Export mbm_cntr_get() ... " -> "Declare mbm_cntr_get() in fs/resctrl/internal.h ..."
> fs/resctrl.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> Documentation/filesystems/resctrl.rst | 8 ++++++++
> fs/resctrl/ctrlmondata.c | 19 ++++++++++++++++++-
> fs/resctrl/internal.h | 2 ++
> fs/resctrl/monitor.c | 4 ++--
> 4 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index 8a2050098091..18de335e1ff8 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -434,6 +434,14 @@ When monitoring is enabled all MON groups will also contain:
> for the L3 cache they occupy). These are named "mon_sub_L3_YY"
> where "YY" is the node number.
>
> + The "mbm_event" mode offers "num_mbm_cntrs" number of counters and
"The "mbm_event" mode" -> "The "mbm_event" counter assignment mode"?
> + allows users to assign counter IDs to mon_hw_id, event pairs enabling
"users to assign counter IDs" -> "users to assign counters"
> + bandwidth monitoring for as long as the counter remains assigned. The
> + hardware will continue tracking the assigned mon_hw_id until the user
"assigned mon_hw_id" -> "assigned counter"?
> + manually unassigns it, ensuring that event data is not reset during this
> + period. An MBM event returns 'Unassigned' when the event does not have
> + a hardware counter assigned.
> +
> "mon_hw_id":
> Available only with debug option. The identifier used by hardware
> for the monitor group. On x86 this is the RMID.
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index ad7ffc6acf13..8a182f506877 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -648,15 +648,32 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
> goto out;
> }
> d = container_of(hdr, struct rdt_mon_domain, hdr);
> +
> + /*
> + * Report 'Unassigned' if "mbm_event" mode is enabled and counter
> + * is unassigned.
> + */
> + if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
> + resctrl_is_mbm_event(evtid) &&
> + (mbm_cntr_get(r, d, rdtgrp, evtid) < 0)) {
> + rr.err = -ENOENT;
> + goto checkresult;
> + }
> +
When looking at this snippet in combination with patch #22 that adds the support for
reading counters the flow does not look ideal. While above adds a check whether
this is dealing with counters, it only does so to check if a counter is *not* assigned.
I cannot see *any* other check by resctrl whether it is dealing with counters while
it lumps all information into parameters to resctrl_arch_reset_rmid() and
resctrl_arch_rmid_read(), needing to provide "dummy" parameters when not all information
is relevant, and leaving the arch to need to determine if it is
dealing with counters and then use provided parameters based on that information.
I think it will be simpler for resctrl to determine if a counter or RMID needs to be
read and then call appropriate arch API for each and provide only necessary information
to support that call.
I think this can be accomplished with following changes:
- drop above snippet from rdtgroup_mondata_show() (this will be done in mon_event_read())
- introduce new rmid_read::is_cntr that is a boolean that is true if it is a counter
that should be read.
- mon_event_read() initializes rmid_read::is_cntr and returns with rmid_read::err
set if a counter should be read but no counter is assigned (above snippet). The
added benefit of doing this in mon_event_read() is that if a counter is not
assigned on new monitor group create or domain add then the mon_add_all_files()->mon_event_read()
will return immediately with this error instead of trying to read the unassigned
counter.
- __mon_event_count() should *only* attempt to initialize the counter ID (call mbm_cntr_get)
if rmid_read::is_cntr is true.
- Introduce two new arch calls (naming TBD):
resctrl_arch_cntr_read() and resctrl_arch_reset_cntr() that will respectively read
and reset the counter.
- __mon_event_count() calls appropriate API based on rmid_read::is_cntr.
What do you think?
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 20/32] fs/resctrl: Report 'Unassigned' for MBM events in mbm_event mode
2025-06-25 4:14 ` Reinette Chatre
@ 2025-06-27 1:34 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-27 1:34 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/2025 11:14 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> When "mbm_event" mode is enabled, a hardware counter must be assigned to
>
> "When the "mbm_event" counter assignment mode is enabled ..."
Sure.
>
>> read the event.
>>
>> Report 'Unassigned' in case the user attempts to read the event without
>> assigning a hardware counter.
>>
>> Export mbm_cntr_get() to allow usage from other functions within
>
> "Export" can be a loaded term in the Linux kernel. Perhaps:
> "Export mbm_cntr_get() ... " -> "Declare mbm_cntr_get() in fs/resctrl/internal.h ..."
>
Sure.
>> fs/resctrl.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> ---
>> Documentation/filesystems/resctrl.rst | 8 ++++++++
>> fs/resctrl/ctrlmondata.c | 19 ++++++++++++++++++-
>> fs/resctrl/internal.h | 2 ++
>> fs/resctrl/monitor.c | 4 ++--
>> 4 files changed, 30 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index 8a2050098091..18de335e1ff8 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -434,6 +434,14 @@ When monitoring is enabled all MON groups will also contain:
>> for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>> where "YY" is the node number.
>>
>> + The "mbm_event" mode offers "num_mbm_cntrs" number of counters and
>
> "The "mbm_event" mode" -> "The "mbm_event" counter assignment mode"?
Sure.
>
>> + allows users to assign counter IDs to mon_hw_id, event pairs enabling
>
> "users to assign counter IDs" -> "users to assign counters"
>
Sure.
>> + bandwidth monitoring for as long as the counter remains assigned. The
>> + hardware will continue tracking the assigned mon_hw_id until the user
>
> "assigned mon_hw_id" -> "assigned counter"?
>
Sure.
>> + manually unassigns it, ensuring that event data is not reset during this
>> + period. An MBM event returns 'Unassigned' when the event does not have
>> + a hardware counter assigned.
>> +
>> "mon_hw_id":
>> Available only with debug option. The identifier used by hardware
>> for the monitor group. On x86 this is the RMID.
>> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
>> index ad7ffc6acf13..8a182f506877 100644
>> --- a/fs/resctrl/ctrlmondata.c
>> +++ b/fs/resctrl/ctrlmondata.c
>> @@ -648,15 +648,32 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>> goto out;
>> }
>> d = container_of(hdr, struct rdt_mon_domain, hdr);
>> +
>> + /*
>> + * Report 'Unassigned' if "mbm_event" mode is enabled and counter
>> + * is unassigned.
>> + */
>> + if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
>> + resctrl_is_mbm_event(evtid) &&
>> + (mbm_cntr_get(r, d, rdtgrp, evtid) < 0)) {
>> + rr.err = -ENOENT;
>> + goto checkresult;
>> + }
>> +
>
> When looking at this snippet in combination with patch #22 that adds the support for
> reading counters the flow does not look ideal. While above adds a check whether
> this is dealing with counters, it only does so to check if a counter is *not* assigned.
> I cannot see *any* other check by resctrl whether it is dealing with counters while
> it lumps all information into parameters to resctrl_arch_reset_rmid() and
> resctrl_arch_rmid_read(), needing to provide "dummy" parameters when not all information
> is relevant, and leaving the arch to need to determine if it is
> dealing with counters and then use provided parameters based on that information.
>
> I think it will be simpler for resctrl to determine if a counter or RMID needs to be
> read and then call appropriate arch API for each and provide only necessary information
> to support that call.
>
> I think this can be accomplished with following changes:
> - drop above snippet from rdtgroup_mondata_show() (this will be done in mon_event_read())
> - introduce new rmid_read::is_cntr that is a boolean that is true if it is a counter
> that should be read.
> - mon_event_read() initializes rmid_read::is_cntr and returns with rmid_read::err
> set if a counter should be read but no counter is assigned (above snippet). The
> added benefit of doing this in mon_event_read() is that if a counter is not
> assigned on new monitor group create or domain add then the mon_add_all_files()->mon_event_read()
> will return immediately with this error instead of trying to read the unassigned
> counter.
> - __mon_event_count() should *only* attempt to initialize the counter ID (call mbm_cntr_get)
> if rmid_read::is_cntr is true.
> - Introduce two new arch calls (naming TBD):
> resctrl_arch_cntr_read() and resctrl_arch_reset_cntr() that will respectively read
> and reset the counter.
It may be necessary to restructure resctrl_arch_cntr_read(), as there is
some shared logic that applies to both resctrl_arch_rmid_read() and
resctrl_arch_cntr_read().
> - __mon_event_count() calls appropriate API based on rmid_read::is_cntr.
>
> What do you think?
Sounds good to me—this seems like a much cleaner approach. I’ll start
making the changes on Monday(out of the office tomorrow). I’ll let you
know if I run into any issues. I might post snippet if necessary.
Thanks
Babu
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 21/32] fs/resctrl: Pass entire struct rdtgroup rather than passing individual members
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (19 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 20/32] fs/resctrl: Report 'Unassigned' for MBM events in mbm_event mode Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 4:18 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 22/32] x86,fs/resctrl: Add the support for reading ABMC counters Babu Moger
` (12 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Reading the monitoring data requires RMID, CLOSID, and event ID, among
other parameters. These are passed individually, resulting in architecture
specific function calls.
Passing the pointer to the full rdtgroup structure simplifies access to
these parameters.
Additionally, when "mbm_event" mode is enabled, a counter ID is required
to read the event. The counter ID is obtained through mbm_cntr_get(),
which expects a struct rdtgroup pointer.
Refactor the code to pass a pointer to struct rdtgroup instead of
individual members in preparation for this requirement.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Few text update to commit log.
v13: New patch to pass the entire struct rdtgroup to __mon_event_count(),
mbm_update(), and related functions.
---
fs/resctrl/monitor.c | 29 ++++++++++++++++-------------
1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index cf7f6a22ea51..31e08d891db2 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -356,9 +356,11 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
return state ? &state[idx] : NULL;
}
-static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
+static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
{
int cpu = smp_processor_id();
+ u32 closid = rdtgrp->closid;
+ u32 rmid = rdtgrp->mon.rmid;
struct rdt_mon_domain *d;
struct cacheinfo *ci;
struct mbm_state *m;
@@ -429,9 +431,11 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
* __mon_event_count() is compared with the chunks value from the previous
* invocation. This must be called once per second to maintain values in MBps.
*/
-static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
+static void mbm_bw_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
{
u64 cur_bw, bytes, cur_bytes;
+ u32 closid = rdtgrp->closid;
+ u32 rmid = rdtgrp->mon.rmid;
struct mbm_state *m;
m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
@@ -460,7 +464,7 @@ void mon_event_count(void *info)
rdtgrp = rr->rgrp;
- ret = __mon_event_count(rdtgrp->closid, rdtgrp->mon.rmid, rr);
+ ret = __mon_event_count(rdtgrp, rr);
/*
* For Ctrl groups read data from child monitor groups and
@@ -471,8 +475,7 @@ void mon_event_count(void *info)
if (rdtgrp->type == RDTCTRL_GROUP) {
list_for_each_entry(entry, head, mon.crdtgrp_list) {
- if (__mon_event_count(entry->closid, entry->mon.rmid,
- rr) == 0)
+ if (__mon_event_count(entry, rr) == 0)
ret = 0;
}
}
@@ -603,7 +606,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
}
static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d,
- u32 closid, u32 rmid, enum resctrl_event_id evtid)
+ struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
{
struct rmid_read rr = {0};
@@ -617,30 +620,30 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *
return;
}
- __mon_event_count(closid, rmid, &rr);
+ __mon_event_count(rdtgrp, &rr);
/*
* If the software controller is enabled, compute the
* bandwidth for this event id.
*/
if (is_mba_sc(NULL))
- mbm_bw_count(closid, rmid, &rr);
+ mbm_bw_count(rdtgrp, &rr);
resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
}
static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
- u32 closid, u32 rmid)
+ struct rdtgroup *rdtgrp)
{
/*
* This is protected from concurrent reads from user as both
* the user and overflow handler hold the global mutex.
*/
if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
- mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_TOTAL_EVENT_ID);
+ mbm_update_one_event(r, d, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
- mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_LOCAL_EVENT_ID);
+ mbm_update_one_event(r, d, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
}
/*
@@ -713,11 +716,11 @@ void mbm_handle_overflow(struct work_struct *work)
d = container_of(work, struct rdt_mon_domain, mbm_over.work);
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
- mbm_update(r, d, prgrp->closid, prgrp->mon.rmid);
+ mbm_update(r, d, prgrp);
head = &prgrp->mon.crdtgrp_list;
list_for_each_entry(crgrp, head, mon.crdtgrp_list)
- mbm_update(r, d, crgrp->closid, crgrp->mon.rmid);
+ mbm_update(r, d, crgrp);
if (is_mba_sc(NULL))
update_mba_bw(prgrp, d);
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 21/32] fs/resctrl: Pass entire struct rdtgroup rather than passing individual members
2025-06-13 21:05 ` [PATCH v14 21/32] fs/resctrl: Pass entire struct rdtgroup rather than passing individual members Babu Moger
@ 2025-06-25 4:18 ` Reinette Chatre
2025-06-30 13:57 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 4:18 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> Reading the monitoring data requires RMID, CLOSID, and event ID, among
> other parameters. These are passed individually, resulting in architecture
It is not clear how "event ID" and "other parameters" are relevant to this
change since (in this context) it is only RMID and CLOSID that can be
found in rdtgroup.
> specific function calls.
Could you please elaborate what you meant with: "These are passed individually,
resulting in architecture specific function calls."?
>
> Passing the pointer to the full rdtgroup structure simplifies access to
> these parameters.
>
> Additionally, when "mbm_event" mode is enabled, a counter ID is required
> to read the event. The counter ID is obtained through mbm_cntr_get(),
> which expects a struct rdtgroup pointer.
>
> Refactor the code to pass a pointer to struct rdtgroup instead of
> individual members in preparation for this requirement.
>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 21/32] fs/resctrl: Pass entire struct rdtgroup rather than passing individual members
2025-06-25 4:18 ` Reinette Chatre
@ 2025-06-30 13:57 ` Moger, Babu
2025-06-30 15:44 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-06-30 13:57 UTC (permalink / raw)
To: Reinette Chatre, Moger, Babu, corbet@lwn.net, tony.luck@intel.com,
Dave.Martin@arm.com, james.morse@arm.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org,
rostedt@goodmis.org, paulmck@kernel.org, thuth@redhat.com,
ardb@kernel.org, gregkh@linuxfoundation.org, seanjc@google.com,
Lendacky, Thomas, pawan.kumar.gupta@linux.intel.com,
Shukla, Manali, Yuan, Perry, kai.huang@intel.com,
peterz@infradead.org, xiaoyao.li@intel.com,
kan.liang@linux.intel.com, Limonciello, Mario, xin3.li@intel.com,
Shenoy, Gautham Ranjal, xin@zytor.com, chang.seok.bae@intel.com,
fenghuay@nvidia.com, peternewman@google.com,
maciej.wieczor-retman@intel.com, eranian@google.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Hi Reinette,
On 6/24/2025 11:18 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> Reading the monitoring data requires RMID, CLOSID, and event ID, among
>> other parameters. These are passed individually, resulting in architecture
>
> It is not clear how "event ID" and "other parameters" are relevant to this
> change since (in this context) it is only RMID and CLOSID that can be
> found in rdtgroup.
>
>> specific function calls.
>
> Could you please elaborate what you meant with: "These are passed individually,
> resulting in architecture specific function calls."?
Rephrased the whole changelog.
"fs/resctrl: Pass the full rdtgroup structure instead of individual RMID
and CLOSID
The functions resctrl_arch_reset_rmid() and resctrl_arch_rmid_read()
require several parameters, including RMID and CLOSID. Currently, RMID and
CLOSID are passed individually, even though they are available within the
rdtgroup structure.
Refactor the code to pass a pointer to struct rdtgroup instead of
individual members in preparation for this requirement.
Additionally, when "mbm_event" counter assignment mode is enabled, a
counter ID is required to read the event. The counter ID is obtained
through mbm_cntr_get(), which expects a struct rdtgroup pointer."
Thanks
Babu
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 21/32] fs/resctrl: Pass entire struct rdtgroup rather than passing individual members
2025-06-30 13:57 ` Moger, Babu
@ 2025-06-30 15:44 ` Reinette Chatre
2025-06-30 20:58 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-30 15:44 UTC (permalink / raw)
To: Moger, Babu, corbet@lwn.net, tony.luck@intel.com,
Dave.Martin@arm.com, james.morse@arm.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org,
rostedt@goodmis.org, paulmck@kernel.org, thuth@redhat.com,
ardb@kernel.org, gregkh@linuxfoundation.org, seanjc@google.com,
Lendacky, Thomas, pawan.kumar.gupta@linux.intel.com,
Shukla, Manali, Yuan, Perry, kai.huang@intel.com,
peterz@infradead.org, xiaoyao.li@intel.com,
kan.liang@linux.intel.com, Limonciello, Mario, xin3.li@intel.com,
Shenoy, Gautham Ranjal, xin@zytor.com, chang.seok.bae@intel.com,
fenghuay@nvidia.com, peternewman@google.com,
maciej.wieczor-retman@intel.com, eranian@google.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Hi Babu,
On 6/30/25 6:57 AM, Moger, Babu wrote:
> Hi Reinette,
>
> On 6/24/2025 11:18 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>> Reading the monitoring data requires RMID, CLOSID, and event ID, among
>>> other parameters. These are passed individually, resulting in architecture
>>
>> It is not clear how "event ID" and "other parameters" are relevant to this
>> change since (in this context) it is only RMID and CLOSID that can be
>> found in rdtgroup.
>>
>>> specific function calls.
>>
>> Could you please elaborate what you meant with: "These are passed individually,
>> resulting in architecture specific function calls."?
>
> Rephrased the whole changelog.
>
> "fs/resctrl: Pass the full rdtgroup structure instead of individual RMID
> and CLOSID
nit, can be simplified to:
fs/resctrl: Pass struct rdtgroup instead of individual members
>
> The functions resctrl_arch_reset_rmid() and resctrl_arch_rmid_read()
(No need to say "function" when using ().)
But wait ... this now changes to different functions from what the original
patch touched and even more so it changes _arch_ functions that should not
have access to struct rdtgroup. This new changelog does not seem to document
the original patch but something new that has not yet been posted.
> require several parameters, including RMID and CLOSID. Currently, RMID and
> CLOSID are passed individually, even though they are available within the
> rdtgroup structure.
>
> Refactor the code to pass a pointer to struct rdtgroup instead of
> individual members in preparation for this requirement.
"this requirement" .. what requirement are you referring to?
There is no requirement that individual members of a struct cannot be passed
as separate parameters and there is no problem doing so.
From "Changelog" in Documentation/process/maintainer-tip.rst:
"A good structure is to explain the context, the problem and the solution in
separate paragraphs and this order."
This new changelog has structure of "context, solution, problem".
>
> Additionally, when "mbm_event" counter assignment mode is enabled, a
This seems to be primary motivation since passing struct rdtgroup will
simplify the code (when I consider the original patch, not what this new
changelog implies) ... but if this change is indeed to the arch API as the
context suggest then passing the individual members is the right thing to
do because arch code should not access struct rdtgroup.
> counter ID is required to read the event. The counter ID is obtained
> through mbm_cntr_get(), which expects a struct rdtgroup pointer."
This is even stranger. mbm_cntr_get() is private to resctrl fs while
the new changelog describes how the arch functions resctrl_arch_reset_rmid()
and resctrl_arch_rmid_read() need struct rdtgroup to call mbm_cntr_get()?
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 21/32] fs/resctrl: Pass entire struct rdtgroup rather than passing individual members
2025-06-30 15:44 ` Reinette Chatre
@ 2025-06-30 20:58 ` Moger, Babu
2025-06-30 21:59 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-06-30 20:58 UTC (permalink / raw)
To: Reinette Chatre, corbet@lwn.net, tony.luck@intel.com,
Dave.Martin@arm.com, james.morse@arm.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org,
rostedt@goodmis.org, paulmck@kernel.org, thuth@redhat.com,
ardb@kernel.org, gregkh@linuxfoundation.org, seanjc@google.com,
Lendacky, Thomas, pawan.kumar.gupta@linux.intel.com,
Shukla, Manali, Yuan, Perry, kai.huang@intel.com,
peterz@infradead.org, xiaoyao.li@intel.com,
kan.liang@linux.intel.com, Limonciello, Mario, xin3.li@intel.com,
Shenoy, Gautham Ranjal, xin@zytor.com, chang.seok.bae@intel.com,
fenghuay@nvidia.com, peternewman@google.com,
maciej.wieczor-retman@intel.com, eranian@google.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Hi Reinette,
On 6/30/25 10:44, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/30/25 6:57 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 6/24/2025 11:18 PM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>>> Reading the monitoring data requires RMID, CLOSID, and event ID, among
>>>> other parameters. These are passed individually, resulting in architecture
>>>
>>> It is not clear how "event ID" and "other parameters" are relevant to this
>>> change since (in this context) it is only RMID and CLOSID that can be
>>> found in rdtgroup.
>>>
>>>> specific function calls.
>>>
>>> Could you please elaborate what you meant with: "These are passed individually,
>>> resulting in architecture specific function calls."?
>>
>> Rephrased the whole changelog.
>>
>> "fs/resctrl: Pass the full rdtgroup structure instead of individual RMID
>> and CLOSID
>
> nit, can be simplified to:
> fs/resctrl: Pass struct rdtgroup instead of individual members
sure.
>
>>
>> The functions resctrl_arch_reset_rmid() and resctrl_arch_rmid_read()
>
> (No need to say "function" when using ().)
>
> But wait ... this now changes to different functions from what the original
> patch touched and even more so it changes _arch_ functions that should not
> have access to struct rdtgroup. This new changelog does not seem to document
> the original patch but something new that has not yet been posted.
No. patch has not changed.
>
>> require several parameters, including RMID and CLOSID. Currently, RMID and
>> CLOSID are passed individually, even though they are available within the
>> rdtgroup structure.
>>
>> Refactor the code to pass a pointer to struct rdtgroup instead of
>> individual members in preparation for this requirement.
>
> "this requirement" .. what requirement are you referring to?
> There is no requirement that individual members of a struct cannot be passed
> as separate parameters and there is no problem doing so.
>
>>From "Changelog" in Documentation/process/maintainer-tip.rst:
> "A good structure is to explain the context, the problem and the solution in
> separate paragraphs and this order."
>
> This new changelog has structure of "context, solution, problem".
>
>>
>> Additionally, when "mbm_event" counter assignment mode is enabled, a
>
> This seems to be primary motivation since passing struct rdtgroup will
> simplify the code (when I consider the original patch, not what this new
> changelog implies) ... but if this change is indeed to the arch API as the
> context suggest then passing the individual members is the right thing to
> do because arch code should not access struct rdtgroup.
Again. patch did not change.
>
>> counter ID is required to read the event. The counter ID is obtained
>> through mbm_cntr_get(), which expects a struct rdtgroup pointer."
>
> This is even stranger. mbm_cntr_get() is private to resctrl fs while
> the new changelog describes how the arch functions resctrl_arch_reset_rmid()
> and resctrl_arch_rmid_read() need struct rdtgroup to call mbm_cntr_get()?
>
> Reinette
>
>
Patch is same.. I am having trouble with changelog. ):
How does this look?
"fs/resctrl: Pass struct rdtgroup instead of individual members
Reading monitoring data for a resctrl group requires both the RMID and
CLOSID. These parameters are passed to functions like __mon_event_count(),
mbm_bw_count(), mbm_update_one_event(), and mbm_update(), where they are
ultimately used to retrieve event data.
When "mbm_event" counter assignment mode is enabled, a counter ID is
required to read the event. The counter ID is obtained through
mbm_cntr_get(), which expects a struct rdtgroup pointer.
Passing the pointer to the full rdtgroup structure simplifies access to
these parameters. Refactor the code to pass a pointer to struct rdtgroup
instead of individual members in preparation for this requirement."
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 21/32] fs/resctrl: Pass entire struct rdtgroup rather than passing individual members
2025-06-30 20:58 ` Moger, Babu
@ 2025-06-30 21:59 ` Reinette Chatre
2025-06-30 22:47 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-30 21:59 UTC (permalink / raw)
To: babu.moger, corbet@lwn.net, tony.luck@intel.com,
Dave.Martin@arm.com, james.morse@arm.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org,
rostedt@goodmis.org, paulmck@kernel.org, thuth@redhat.com,
ardb@kernel.org, gregkh@linuxfoundation.org, seanjc@google.com,
Lendacky, Thomas, pawan.kumar.gupta@linux.intel.com,
Shukla, Manali, Yuan, Perry, kai.huang@intel.com,
peterz@infradead.org, xiaoyao.li@intel.com,
kan.liang@linux.intel.com, Limonciello, Mario, xin3.li@intel.com,
Shenoy, Gautham Ranjal, xin@zytor.com, chang.seok.bae@intel.com,
fenghuay@nvidia.com, peternewman@google.com,
maciej.wieczor-retman@intel.com, eranian@google.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Hi Babu,
On 6/30/25 1:58 PM, Moger, Babu wrote:
>
> How does this look?
>
> "fs/resctrl: Pass struct rdtgroup instead of individual members
>
> Reading monitoring data for a resctrl group requires both the RMID and
> CLOSID. These parameters are passed to functions like __mon_event_count(),
> mbm_bw_count(), mbm_update_one_event(), and mbm_update(), where they are
> ultimately used to retrieve event data.
>
> When "mbm_event" counter assignment mode is enabled, a counter ID is
> required to read the event. The counter ID is obtained through
> mbm_cntr_get(), which expects a struct rdtgroup pointer.
>
> Passing the pointer to the full rdtgroup structure simplifies access to
> these parameters. Refactor the code to pass a pointer to struct rdtgroup
> instead of individual members in preparation for this requirement."
This looks good. I made a few adjustments that result in below. What do you think?
Reading monitoring data for a monitoring group requires both the RMID and
CLOSID. The RMID and CLOSID are members of struct rdtgroup but passed
separately to several functions involved in retrieving event data.
When "mbm_event" counter assignment mode is enabled, a counter ID is
required to read event data. The counter ID is obtained through
mbm_cntr_get(), which expects a struct rdtgroup pointer.
Provide a pointer to the struct rdtgroup as parameter to functions
involved in retrieving event data to simplify access to RMID, CLOSID,
and counter ID.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 21/32] fs/resctrl: Pass entire struct rdtgroup rather than passing individual members
2025-06-30 21:59 ` Reinette Chatre
@ 2025-06-30 22:47 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-30 22:47 UTC (permalink / raw)
To: Reinette Chatre, babu.moger, corbet@lwn.net, tony.luck@intel.com,
Dave.Martin@arm.com, james.morse@arm.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org,
rostedt@goodmis.org, paulmck@kernel.org, thuth@redhat.com,
ardb@kernel.org, gregkh@linuxfoundation.org, seanjc@google.com,
Lendacky, Thomas, pawan.kumar.gupta@linux.intel.com,
Shukla, Manali, Yuan, Perry, kai.huang@intel.com,
peterz@infradead.org, xiaoyao.li@intel.com,
kan.liang@linux.intel.com, Limonciello, Mario, xin3.li@intel.com,
Shenoy, Gautham Ranjal, xin@zytor.com, chang.seok.bae@intel.com,
fenghuay@nvidia.com, peternewman@google.com,
maciej.wieczor-retman@intel.com, eranian@google.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Hi Reinette,
On 6/30/2025 4:59 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/30/25 1:58 PM, Moger, Babu wrote:
>>
>> How does this look?
>>
>> "fs/resctrl: Pass struct rdtgroup instead of individual members
>>
>> Reading monitoring data for a resctrl group requires both the RMID and
>> CLOSID. These parameters are passed to functions like __mon_event_count(),
>> mbm_bw_count(), mbm_update_one_event(), and mbm_update(), where they are
>> ultimately used to retrieve event data.
>>
>> When "mbm_event" counter assignment mode is enabled, a counter ID is
>> required to read the event. The counter ID is obtained through
>> mbm_cntr_get(), which expects a struct rdtgroup pointer.
>>
>> Passing the pointer to the full rdtgroup structure simplifies access to
>> these parameters. Refactor the code to pass a pointer to struct rdtgroup
>> instead of individual members in preparation for this requirement."
>
> This looks good. I made a few adjustments that result in below. What do you think?
Looks good. Thanks
>
> Reading monitoring data for a monitoring group requires both the RMID and
> CLOSID. The RMID and CLOSID are members of struct rdtgroup but passed
> separately to several functions involved in retrieving event data.
>
> When "mbm_event" counter assignment mode is enabled, a counter ID is
> required to read event data. The counter ID is obtained through
> mbm_cntr_get(), which expects a struct rdtgroup pointer.
>
> Provide a pointer to the struct rdtgroup as parameter to functions
> involved in retrieving event data to simplify access to RMID, CLOSID,
> and counter ID.
>
> Reinette
>
>
-Babu
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 22/32] x86,fs/resctrl: Add the support for reading ABMC counters
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (20 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 21/32] fs/resctrl: Pass entire struct rdtgroup rather than passing individual members Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-13 21:05 ` [PATCH v14 23/32] fs/resctrl: Add definitions for MBM event configuration Babu Moger
` (11 subsequent siblings)
33 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
System software can read resctrl event data for a particular resource by
writing the RMID and Event Identifier (EvtID) to the QM_EVTSEL register
and then reading the event data from the QM_CTR register.
In ABMC mode, the event data of a specific counter ID can be read by
setting the following fields in QM_EVTSEL.ExtendedEvtID = 1,
QM_EVTSEL.EvtID = L3CacheABMC (=1) and setting [RMID] to the desired
counter ID. Reading QM_CTR will then return the contents of the specified
counter ID. The E bit will be set if the counter configuration was invalid,
or if an invalid counter ID was set in the QM_EVTSEL[RMID] field.
Introduce __cntr_id_read_phys() to read event data for a specific counter
ID. In ABMC mode, ensure QM_EVTSEL is properly configured by setting the
counter ID, Extended Event Identifier, and Event Identifier.
QM_EVTSEL Register definition:
=======================================================
Bits Mnemonic Description
=======================================================
63:44 -- Reserved
43:32 RMID Resource Monitoring Identifier
31 ExtEvtID Extended Event Identifier
30:8 -- Reserved
7:0 EvtID Event Identifier
=======================================================
Link: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Updated the context in changelog. Added text in imperative tone.
Added WARN_ON_ONCE() when cntr_id < 0.
Improved code documentation in include/linux/resctrl.h.
Added the check in mbm_update() to skip overflow handler when counter is unassigned.
v13: Split the patch into 2. First one to handle the passing of rdtgroup structure to few
functions( __mon_event_count and mbm_update(). Second one to handle ABMC counter reading.
Added new function __cntr_id_read_phys() to handle ABMC event reading.
Updated kernel doc for resctrl_arch_reset_rmid() and resctrl_arch_rmid_read().
Resolved conflicts caused by the recent FS/ARCH code restructure.
The monitor.c file has now been split between the FS and ARCH directories.
v12: New patch to support extended event mode when ABMC is enabled.
---
arch/x86/kernel/cpu/resctrl/internal.h | 6 +++
arch/x86/kernel/cpu/resctrl/monitor.c | 66 ++++++++++++++++++++++----
fs/resctrl/monitor.c | 26 +++++++---
include/linux/resctrl.h | 13 +++--
4 files changed, 94 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 23c17ce172d3..77a9ce4a8403 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -40,6 +40,12 @@ struct arch_mbm_state {
/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
#define ABMC_ENABLE_BIT 0
+/*
+ * Qos Event Identifiers.
+ */
+#define ABMC_EXTENDED_EVT_ID BIT(31)
+#define ABMC_EVT_ID BIT(0)
+
/**
* struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
* a resource for a control function
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 6b0ea4b17c7a..ee0aa741cf6c 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -157,6 +157,41 @@ static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
return 0;
}
+static int __cntr_id_read_phys(u32 cntr_id, u64 *val)
+{
+ u64 msr_val;
+
+ /*
+ * QM_EVTSEL Register definition:
+ * =======================================================
+ * Bits Mnemonic Description
+ * =======================================================
+ * 63:44 -- Reserved
+ * 43:32 RMID Resource Monitoring Identifier
+ * 31 ExtEvtID Extended Event Identifier
+ * 30:8 -- Reserved
+ * 7:0 EvtID Event Identifier
+ * =======================================================
+ * The contents of a specific counter can be read by setting the
+ * following fields in QM_EVTSEL.ExtendedEvtID(=1) and
+ * QM_EVTSEL.EvtID = L3CacheABMC (=1) and setting [RMID] to the
+ * desired counter ID. Reading QM_CTR will then return the
+ * contents of the specified counter. The E bit will be set if the
+ * counter configuration was invalid, or if an invalid counter ID
+ * was set in the QM_EVTSEL[RMID] field.
+ */
+ wrmsr(MSR_IA32_QM_EVTSEL, ABMC_EXTENDED_EVT_ID | ABMC_EVT_ID, cntr_id);
+ rdmsrl(MSR_IA32_QM_CTR, msr_val);
+
+ if (msr_val & RMID_VAL_ERROR)
+ return -EIO;
+ if (msr_val & RMID_VAL_UNAVAIL)
+ return -EINVAL;
+
+ *val = msr_val;
+ return 0;
+}
+
static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom,
u32 rmid,
enum resctrl_event_id eventid)
@@ -172,7 +207,7 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_do
}
void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
- u32 unused, u32 rmid,
+ u32 unused, u32 rmid, int cntr_id,
enum resctrl_event_id eventid)
{
struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
@@ -184,9 +219,16 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
if (am) {
memset(am, 0, sizeof(*am));
- prmid = logical_rmid_to_physical_rmid(cpu, rmid);
- /* Record any initial, non-zero count value. */
- __rmid_read_phys(prmid, eventid, &am->prev_msr);
+ if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
+ resctrl_is_mbm_event(eventid)) {
+ if (WARN_ON_ONCE(cntr_id < 0))
+ return;
+ __cntr_id_read_phys(cntr_id, &am->prev_msr);
+ } else {
+ prmid = logical_rmid_to_physical_rmid(cpu, rmid);
+ /* Record any initial, non-zero count value. */
+ __rmid_read_phys(prmid, eventid, &am->prev_msr);
+ }
}
}
@@ -218,8 +260,8 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
}
int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
- u32 unused, u32 rmid, enum resctrl_event_id eventid,
- u64 *val, void *ignored)
+ u32 unused, u32 rmid, int cntr_id,
+ enum resctrl_event_id eventid, u64 *val, void *ignored)
{
struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -231,8 +273,16 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
resctrl_arch_rmid_read_context_check();
- prmid = logical_rmid_to_physical_rmid(cpu, rmid);
- ret = __rmid_read_phys(prmid, eventid, &msr_val);
+ if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
+ resctrl_is_mbm_event(eventid)) {
+ if (WARN_ON_ONCE(cntr_id < 0))
+ return cntr_id;
+ ret = __cntr_id_read_phys(cntr_id, &msr_val);
+ } else {
+ prmid = logical_rmid_to_physical_rmid(cpu, rmid);
+ ret = __rmid_read_phys(prmid, eventid, &msr_val);
+ }
+
if (ret)
return ret;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 31e08d891db2..ef6ef58f180b 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -159,7 +159,11 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free)
break;
entry = __rmid_entry(idx);
- if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
+ /*
+ * cntr_id is not relevant for QOS_L3_OCCUP_EVENT_ID.
+ * Pass dummy value -1.
+ */
+ if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid, -1,
QOS_L3_OCCUP_EVENT_ID, &val,
arch_mon_ctx)) {
rmid_dirty = true;
@@ -358,6 +362,7 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
{
+ int cntr_id = mbm_cntr_get(rr->r, rr->d, rdtgrp, rr->evtid);
int cpu = smp_processor_id();
u32 closid = rdtgrp->closid;
u32 rmid = rdtgrp->mon.rmid;
@@ -368,7 +373,7 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
u64 tval = 0;
if (rr->first) {
- resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
+ resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, cntr_id, rr->evtid);
m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
if (m)
memset(m, 0, sizeof(struct mbm_state));
@@ -379,7 +384,7 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
/* Reading a single domain, must be on a CPU in that domain. */
if (!cpumask_test_cpu(cpu, &rr->d->hdr.cpu_mask))
return -EINVAL;
- rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid,
+ rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, cntr_id,
rr->evtid, &tval, rr->arch_mon_ctx);
if (rr->err)
return rr->err;
@@ -405,7 +410,8 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
if (d->ci_id != rr->ci_id)
continue;
- err = resctrl_arch_rmid_read(rr->r, d, closid, rmid,
+ cntr_id = mbm_cntr_get(rr->r, d, rdtgrp, rr->evtid);
+ err = resctrl_arch_rmid_read(rr->r, d, closid, rmid, cntr_id,
rr->evtid, &tval, rr->arch_mon_ctx);
if (!err) {
rr->val += tval;
@@ -638,12 +644,20 @@ static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
/*
* This is protected from concurrent reads from user as both
* the user and overflow handler hold the global mutex.
+ * Skip the update if the counter is unassigned while mbm_event
+ * mode is enabled.
*/
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID) &&
+ (!resctrl_arch_mbm_cntr_assign_enabled(r) ||
+ mbm_cntr_get(r, d, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID) >= 0)) {
mbm_update_one_event(r, d, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+ }
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID) &&
+ (!resctrl_arch_mbm_cntr_assign_enabled(r) ||
+ mbm_cntr_get(r, d, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID) >= 0)) {
mbm_update_one_event(r, d, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+ }
}
/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 1539d1faa1a1..4b52bac5dbbc 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -507,6 +507,9 @@ void resctrl_offline_cpu(unsigned int cpu);
* counter may match traffic of both @closid and @rmid, or @rmid
* only.
* @rmid: rmid of the counter to read.
+ * @cntr_id: Counter ID used to read MBM events in mbm_event mode. Only valid
+ * when mbm_event mode is enabled and @eventid is an MBM event.
+ * Can be negative when invalid.
* @eventid: eventid to read, e.g. L3 occupancy.
* @val: result of the counter read in bytes.
* @arch_mon_ctx: An architecture specific value from
@@ -524,8 +527,9 @@ void resctrl_offline_cpu(unsigned int cpu);
* 0 on success, or -EIO, -EINVAL etc on error.
*/
int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
- u32 closid, u32 rmid, enum resctrl_event_id eventid,
- u64 *val, void *arch_mon_ctx);
+ u32 closid, u32 rmid, int cntr_id,
+ enum resctrl_event_id eventid, u64 *val,
+ void *arch_mon_ctx);
/**
* resctrl_arch_rmid_read_context_check() - warn about invalid contexts
@@ -566,12 +570,15 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
* @closid: closid that matches the rmid. Depending on the architecture, the
* counter may match traffic of both @closid and @rmid, or @rmid only.
* @rmid: The rmid whose counter values should be reset.
+ * @cntr_id: Counter ID used to read MBM events in mbm_event mode. Only valid
+ * when mbm_event mode is enabled and @eventid is an MBM event. Can
+ * be negative when invalid.
* @eventid: The eventid whose counter values should be reset.
*
* This can be called from any CPU.
*/
void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
- u32 closid, u32 rmid,
+ u32 closid, u32 rmid, int cntr_id,
enum resctrl_event_id eventid);
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v14 23/32] fs/resctrl: Add definitions for MBM event configuration
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (21 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 22/32] x86,fs/resctrl: Add the support for reading ABMC counters Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 4:32 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 24/32] fs/resctrl: Add event configuration directory under info/L3_MON/ Babu Moger
` (10 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
The "mbm_event" mode allows the user to assign a hardware counter ID to
an RMID, event pair and monitor the bandwidth as long as it is assigned.
Additionally, the user can specify a particular type of memory
transactions for the counter to track.
By default, each resctrl group supports two MBM events: mbm_total_bytes
and mbm_local_bytes. Each event corresponds to an MBM configuration that
specifies the memory transactions being tracked by the event.
Add definitions for supported memory transactions (e.g., read, write,
etc.).
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Changed the term memory events to memory transactions to be consistant.
Changed the name of the structure to mbm_config_value(from mbm_evt_value).
Changed name to memory trasactions where applicable.
Changes subject line to fs/resctrl.
v13: Updated the changelog.
Removed the definitions from resctrl_types.h and moved to internal.h.
Removed mbm_assign_config definition. Configurations will be part of
mon_evt list.
Resolved conflicts caused by the recent FS/ARCH code restructure.
The rdtgroup.c file has now been split between the FS and ARCH directories.
v12: New patch to support event configurations via new counter_configs
method.
---
fs/resctrl/internal.h | 11 +++++++++++
fs/resctrl/rdtgroup.c | 14 ++++++++++++++
2 files changed, 25 insertions(+)
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 4a7130018aa1..84a136194d9a 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -212,6 +212,17 @@ struct rdtgroup {
struct pseudo_lock_region *plr;
};
+/**
+ * struct mbm_config_value - Memory transaction an MBM event can be configured with.
+ * @name: Name of memory transaction (read, write ...).
+ * @val: The bit used to represent the memory transaction within an
+ * event's configuration.
+ */
+struct mbm_config_value {
+ char name[32];
+ u32 val;
+};
+
/* rdtgroup.flags */
#define RDT_DELETED 1
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 08bcca9bd8b6..5fb6a9939e23 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -75,6 +75,20 @@ static void rdtgroup_destroy_root(void);
struct dentry *debugfs_resctrl;
+/* Number of memory transactions that an MBM event can be configured with. */
+#define NUM_MBM_EVT_VALUES 7
+
+/* Decoded values for each type of memory transactions */
+struct mbm_config_value mbm_config_values[NUM_MBM_EVT_VALUES] = {
+ {"local_reads", READS_TO_LOCAL_MEM},
+ {"remote_reads", READS_TO_REMOTE_MEM},
+ {"local_non_temporal_writes", NON_TEMP_WRITE_TO_LOCAL_MEM},
+ {"remote_non_temporal_writes", NON_TEMP_WRITE_TO_REMOTE_MEM},
+ {"local_reads_slow_memory", READS_TO_LOCAL_S_MEM},
+ {"remote_reads_slow_memory", READS_TO_REMOTE_S_MEM},
+ {"dirty_victim_writes_all", DIRTY_VICTIMS_TO_ALL_MEM},
+};
+
/*
* Memory bandwidth monitoring event to use for the default CTRL_MON group
* and each new CTRL_MON group created by the user. Only relevant when
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 23/32] fs/resctrl: Add definitions for MBM event configuration
2025-06-13 21:05 ` [PATCH v14 23/32] fs/resctrl: Add definitions for MBM event configuration Babu Moger
@ 2025-06-25 4:32 ` Reinette Chatre
2025-06-30 17:20 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 4:32 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> The "mbm_event" mode allows the user to assign a hardware counter ID to
"The "mbm_event" mode" -> "The "mbm_event" counter assignment mode"
(I'll stop noting this)
> an RMID, event pair and monitor the bandwidth as long as it is assigned.
> Additionally, the user can specify a particular type of memory
> transactions for the counter to track.
hmmm ... this is not "Additionally" since the event is used to specify
the memory transactions to track, no? Also please note mix of singular
and plural: *a* particular type of memory *transactions*.
>
> By default, each resctrl group supports two MBM events: mbm_total_bytes
> and mbm_local_bytes. Each event corresponds to an MBM configuration that
> specifies the memory transactions being tracked by the event.
Unclear how this is relevant to this change. This is just about the
memory transactions.
>
> Add definitions for supported memory transactions (e.g., read, write,
> etc.).
I think this changelog needs to connect that the memory transactions
defined here is what MBM events can be configured with.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> fs/resctrl/internal.h | 11 +++++++++++
> fs/resctrl/rdtgroup.c | 14 ++++++++++++++
> 2 files changed, 25 insertions(+)
>
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 4a7130018aa1..84a136194d9a 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -212,6 +212,17 @@ struct rdtgroup {
> struct pseudo_lock_region *plr;
> };
>
> +/**
> + * struct mbm_config_value - Memory transaction an MBM event can be configured with.
> + * @name: Name of memory transaction (read, write ...).
> + * @val: The bit used to represent the memory transaction within an
> + * event's configuration.
> + */
> +struct mbm_config_value {
> + char name[32];
> + u32 val;
> +};
"value" in struct name and "val" in member seems redundant. "config"
is also very generic. How about "struct mbm_transaction"? All the
descriptions already reflect this :)
> +
> /* rdtgroup.flags */
> #define RDT_DELETED 1
>
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 08bcca9bd8b6..5fb6a9939e23 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -75,6 +75,20 @@ static void rdtgroup_destroy_root(void);
>
> struct dentry *debugfs_resctrl;
>
> +/* Number of memory transactions that an MBM event can be configured with. */
> +#define NUM_MBM_EVT_VALUES 7
I think this should be in include/linux/resctrl_types.h to be with the
values it represents. Regarding name, how about "NUM_MBM_TRANSACTIONS"?
> +
> +/* Decoded values for each type of memory transactions */
> +struct mbm_config_value mbm_config_values[NUM_MBM_EVT_VALUES] = {
> + {"local_reads", READS_TO_LOCAL_MEM},
> + {"remote_reads", READS_TO_REMOTE_MEM},
> + {"local_non_temporal_writes", NON_TEMP_WRITE_TO_LOCAL_MEM},
> + {"remote_non_temporal_writes", NON_TEMP_WRITE_TO_REMOTE_MEM},
> + {"local_reads_slow_memory", READS_TO_LOCAL_S_MEM},
> + {"remote_reads_slow_memory", READS_TO_REMOTE_S_MEM},
> + {"dirty_victim_writes_all", DIRTY_VICTIMS_TO_ALL_MEM},
> +};
> +
> /*
> * Memory bandwidth monitoring event to use for the default CTRL_MON group
> * and each new CTRL_MON group created by the user. Only relevant when
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 23/32] fs/resctrl: Add definitions for MBM event configuration
2025-06-25 4:32 ` Reinette Chatre
@ 2025-06-30 17:20 ` Moger, Babu
2025-06-30 21:58 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-06-30 17:20 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/24/25 23:32, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> The "mbm_event" mode allows the user to assign a hardware counter ID to
>
> "The "mbm_event" mode" -> "The "mbm_event" counter assignment mode"
> (I'll stop noting this)
>
Sure.
>> an RMID, event pair and monitor the bandwidth as long as it is assigned.
>> Additionally, the user can specify a particular type of memory
>> transactions for the counter to track.
>
> hmmm ... this is not "Additionally" since the event is used to specify
> the memory transactions to track, no? Also please note mix of singular
> and plural: *a* particular type of memory *transactions*.
Sure.
>
>>
>> By default, each resctrl group supports two MBM events: mbm_total_bytes
>> and mbm_local_bytes. Each event corresponds to an MBM configuration that
>> specifies the memory transactions being tracked by the event.
>
> Unclear how this is relevant to this change. This is just about the
> memory transactions.
Removed it.
>
>>
>> Add definitions for supported memory transactions (e.g., read, write,
>> etc.).
>
> I think this changelog needs to connect that the memory transactions
> defined here is what MBM events can be configured with.
Yes.
Changed the whole changelog.
fs/resctrl: Add definitions for MBM event configuration
The "mbm_event" counter assignment mode allows the user to assign a
hardware counter to an RMID, event pair and monitor the bandwidth as long
as it is assigned. The user can specify a particular type of memory
transaction for the counter to track.
Add the definitions for supported memory transactions (e.g., read, write,
etc.) the counter can be configured with.
>
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> ...
>
>> ---
>> fs/resctrl/internal.h | 11 +++++++++++
>> fs/resctrl/rdtgroup.c | 14 ++++++++++++++
>> 2 files changed, 25 insertions(+)
>>
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 4a7130018aa1..84a136194d9a 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -212,6 +212,17 @@ struct rdtgroup {
>> struct pseudo_lock_region *plr;
>> };
>>
>> +/**
>> + * struct mbm_config_value - Memory transaction an MBM event can be configured with.
>> + * @name: Name of memory transaction (read, write ...).
>> + * @val: The bit used to represent the memory transaction within an
>> + * event's configuration.
>> + */
>> +struct mbm_config_value {
>> + char name[32];
>> + u32 val;
>> +};
>
> "value" in struct name and "val" in member seems redundant. "config"
> is also very generic. How about "struct mbm_transaction"? All the
> descriptions already reflect this :)
Sure.
>
>> +
>> /* rdtgroup.flags */
>> #define RDT_DELETED 1
>>
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 08bcca9bd8b6..5fb6a9939e23 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -75,6 +75,20 @@ static void rdtgroup_destroy_root(void);
>>
>> struct dentry *debugfs_resctrl;
>>
>> +/* Number of memory transactions that an MBM event can be configured with. */
>> +#define NUM_MBM_EVT_VALUES 7
>
> I think this should be in include/linux/resctrl_types.h to be with the
> values it represents. Regarding name, how about "NUM_MBM_TRANSACTIONS"?
Sure.
>
>> +
>> +/* Decoded values for each type of memory transactions */
>> +struct mbm_config_value mbm_config_values[NUM_MBM_EVT_VALUES] = {
>> + {"local_reads", READS_TO_LOCAL_MEM},
>> + {"remote_reads", READS_TO_REMOTE_MEM},
>> + {"local_non_temporal_writes", NON_TEMP_WRITE_TO_LOCAL_MEM},
>> + {"remote_non_temporal_writes", NON_TEMP_WRITE_TO_REMOTE_MEM},
>> + {"local_reads_slow_memory", READS_TO_LOCAL_S_MEM},
>> + {"remote_reads_slow_memory", READS_TO_REMOTE_S_MEM},
>> + {"dirty_victim_writes_all", DIRTY_VICTIMS_TO_ALL_MEM},
>> +};
>> +
>> /*
>> * Memory bandwidth monitoring event to use for the default CTRL_MON group
>> * and each new CTRL_MON group created by the user. Only relevant when
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 23/32] fs/resctrl: Add definitions for MBM event configuration
2025-06-30 17:20 ` Moger, Babu
@ 2025-06-30 21:58 ` Reinette Chatre
2025-06-30 22:51 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-30 21:58 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/30/25 10:20 AM, Moger, Babu wrote:
> On 6/24/25 23:32, Reinette Chatre wrote:
>> On 6/13/25 2:05 PM, Babu Moger wrote:
>
> Changed the whole changelog.
>
> fs/resctrl: Add definitions for MBM event configuration
>
> The "mbm_event" counter assignment mode allows the user to assign a
> hardware counter to an RMID, event pair and monitor the bandwidth as long
> as it is assigned. The user can specify a particular type of memory
> transaction for the counter to track.
Since an event can be configured with multiple memory transactions I think the
last sentence can be something like:
The user can specify the memory transaction(s) for the counter to
track.
>
> Add the definitions for supported memory transactions (e.g., read, write,
> etc.) the counter can be configured with.
Looks good. Thank you.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 23/32] fs/resctrl: Add definitions for MBM event configuration
2025-06-30 21:58 ` Reinette Chatre
@ 2025-06-30 22:51 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-30 22:51 UTC (permalink / raw)
To: Reinette Chatre, babu.moger, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
hi Reinette,
On 6/30/2025 4:58 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/30/25 10:20 AM, Moger, Babu wrote:
>> On 6/24/25 23:32, Reinette Chatre wrote:
>>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>
>> Changed the whole changelog.
>>
>> fs/resctrl: Add definitions for MBM event configuration
>>
>> The "mbm_event" counter assignment mode allows the user to assign a
>> hardware counter to an RMID, event pair and monitor the bandwidth as long
>> as it is assigned. The user can specify a particular type of memory
>> transaction for the counter to track.
>
> Since an event can be configured with multiple memory transactions I think the
> last sentence can be something like:
> The user can specify the memory transaction(s) for the counter to
> track.
Sure. Thanks
Babu
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 24/32] fs/resctrl: Add event configuration directory under info/L3_MON/
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (22 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 23/32] fs/resctrl: Add definitions for MBM event configuration Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 23:23 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 25/32] fs/resctrl: Provide interface to update the event configurations Babu Moger
` (9 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
When assignable counters are supported the
/sys/fs/resctrl/info/L3_MON/event_configs directory contains a
sub-directory for each MBM event that can be assigned to a counter.
The MBM event sub-directory contains a file named "event_filter" that
is used to view and modify which memory transactions the MBM event is
configured with.
Create the /sys/fs/resctrl/info/L3_MON/event_configs directory on resctrl
mount and pre-populate it with directories for the two existing MBM events:
mbm_total_bytes and mbm_local_bytes. Create the "event_filter" file within
each MBM event directory with the needed *show() that displays the memory
transactions with which the MBM event is configured.
Example:
$ mount -t resctrl resctrl /sys/fs/resctrl
$ cd /sys/fs/resctrl/
$ cat info/L3_MON/event_configs/mbm_total_bytes/event_filter
local_reads, remote_reads, local_non_temporal_writes,
remote_non_temporal_writes, local_reads_slow_memory,
remote_reads_slow_memory, dirty_victim_writes_all
$ cat info/L3_MON/event_configs/mbm_local_bytes/event_filter
local_reads, local_non_temporal_writes, local_reads_slow_memory
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Updated the changelog with context. Thanks to Reinette.
Changed the name of directory to event_configs from counter_config.
Updated user doc about the memory transactions supported by assignment.
Removed mbm_mode from struct mon_evt. Not required anymore.
v13: Updated user doc (resctrl.rst).
Changed the name of the function resctrl_mkdir_info_configs to
resctrl_mkdir_counter_configs().
Replaced seq_puts() with seq_putc() where applicable.
Removed RFTYPE_MON_CONFIG definition. Not required.
Changed the name of the flag RFTYPE_CONFIG to RFTYPE_ASSIGN_CONFIG.
Reinette suggested RFTYPE_MBM_EVENT_CONFIG but RFTYPE_ASSIGN_CONFIG
seemed shorter and pricise.
The configuration is created using evt_list.
Resolved conflicts caused by the recent FS/ARCH code restructure.
The monitor.c/rdtgroup.c files have been split between the FS and ARCH directories.
v12: New patch to hold the MBM event configurations for mbm_cntr_assign mode.
---
Documentation/filesystems/resctrl.rst | 32 +++++++++++
fs/resctrl/internal.h | 2 +
fs/resctrl/monitor.c | 1 +
fs/resctrl/rdtgroup.c | 78 +++++++++++++++++++++++++++
4 files changed, 113 insertions(+)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 18de335e1ff8..b1db1a53db2a 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -310,6 +310,38 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
0=30;1=30
+"event_configs":
+ Directory that exists when "mbm_event" mode is supported. Contains
+ sub-directory for each MBM event that can be assigned to a counter.
+
+ Two MBM events are supported by default: mbm_local_bytes and mbm_total_bytes.
+ Each MBM event's sub-directory contains a file named "event_filter" that is
+ used to view and modify which memory transactions the MBM event is configured
+ with.
+
+ List of memory transaction types supported:
+
+ ========================== ========================================================
+ Name Description
+ ========================== ========================================================
+ dirty_victim_writes_all Dirty Victims from the QOS domain to all types of memory
+ remote_reads_slow_memory Reads to slow memory in the non-local NUMA domain
+ local_reads_slow_memory Reads to slow memory in the local NUMA domain
+ remote_non_temporal_writes Non-temporal writes to non-local NUMA domain
+ local_non_temporal_writes Non-temporal writes to local NUMA domain
+ remote_reads Reads to memory in the non-local NUMA domain
+ local_reads Reads to memory in the local NUMA domain
+ ========================== ========================================================
+
+ For example::
+
+ # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
+ local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
+ local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
+
+ # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
+ local_reads, local_non_temporal_writes, local_reads_slow_memory
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 84a136194d9a..ed0e3b695ad5 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -248,6 +248,8 @@ struct mbm_config_value {
#define RFTYPE_DEBUG BIT(10)
+#define RFTYPE_ASSIGN_CONFIG BIT(11)
+
#define RFTYPE_CTRL_INFO (RFTYPE_INFO | RFTYPE_CTRL)
#define RFTYPE_MON_INFO (RFTYPE_INFO | RFTYPE_MON)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index ef6ef58f180b..09a49029a800 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -956,6 +956,7 @@ int resctrl_mon_resource_init(void)
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
resctrl_file_fflags_init("available_mbm_cntrs",
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
+ resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
}
return 0;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 5fb6a9939e23..e2fa5e10c2dd 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1909,6 +1909,25 @@ static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
return ret;
}
+static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq, void *v)
+{
+ struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
+ bool sep = false;
+ int i;
+
+ for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
+ if (mevt->evt_cfg & mbm_config_values[i].val) {
+ if (sep)
+ seq_putc(seq, ',');
+ seq_printf(seq, "%s", mbm_config_values[i].name);
+ sep = true;
+ }
+ }
+ seq_putc(seq, '\n');
+
+ return 0;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
@@ -2033,6 +2052,12 @@ static struct rftype res_common_files[] = {
.seq_show = mbm_local_bytes_config_show,
.write = mbm_local_bytes_config_write,
},
+ {
+ .name = "event_filter",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = event_filter_show,
+ },
{
.name = "mbm_assign_mode",
.mode = 0444,
@@ -2315,6 +2340,53 @@ static int rdtgroup_mkdir_info_resdir(void *priv, char *name,
return ret;
}
+static int resctrl_mkdir_counter_configs(struct rdt_resource *r, char *name)
+{
+ struct kernfs_node *l3_mon_kn, *kn_subdir, *kn_subdir2;
+ struct mon_evt *mevt;
+ int ret;
+
+ l3_mon_kn = kernfs_find_and_get(kn_info, name);
+ if (!l3_mon_kn)
+ return -ENOENT;
+
+ kn_subdir = kernfs_create_dir(l3_mon_kn, "event_configs", l3_mon_kn->mode, NULL);
+ if (IS_ERR(kn_subdir)) {
+ kernfs_put(l3_mon_kn);
+ return PTR_ERR(kn_subdir);
+ }
+
+ ret = rdtgroup_kn_set_ugid(kn_subdir);
+ if (ret) {
+ kernfs_put(l3_mon_kn);
+ return ret;
+ }
+
+ for (mevt = &mon_event_all[0]; mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++) {
+ if (!mevt->enabled || !resctrl_is_mbm_event(mevt->evtid))
+ continue;
+
+ kn_subdir2 = kernfs_create_dir(kn_subdir, mevt->name, kn_subdir->mode, mevt);
+ if (IS_ERR(kn_subdir2)) {
+ ret = PTR_ERR(kn_subdir2);
+ goto out_config;
+ }
+
+ ret = rdtgroup_kn_set_ugid(kn_subdir2);
+ if (ret)
+ goto out_config;
+
+ ret = rdtgroup_add_files(kn_subdir2, RFTYPE_ASSIGN_CONFIG);
+ if (!ret)
+ kernfs_activate(kn_subdir);
+ }
+
+out_config:
+ kernfs_put(l3_mon_kn);
+
+ return ret;
+}
+
static unsigned long fflags_from_resource(struct rdt_resource *r)
{
switch (r->rid) {
@@ -2361,6 +2433,12 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
if (ret)
goto out_destroy;
+
+ if (r->mon.mbm_cntr_assignable) {
+ ret = resctrl_mkdir_counter_configs(r, name);
+ if (ret)
+ goto out_destroy;
+ }
}
ret = rdtgroup_kn_set_ugid(kn_info);
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 24/32] fs/resctrl: Add event configuration directory under info/L3_MON/
2025-06-13 21:05 ` [PATCH v14 24/32] fs/resctrl: Add event configuration directory under info/L3_MON/ Babu Moger
@ 2025-06-25 23:23 ` Reinette Chatre
2025-06-30 19:06 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 23:23 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> When assignable counters are supported the
> /sys/fs/resctrl/info/L3_MON/event_configs directory contains a
> sub-directory for each MBM event that can be assigned to a counter.
> The MBM event sub-directory contains a file named "event_filter" that
> is used to view and modify which memory transactions the MBM event is
> configured with.
>
> Create the /sys/fs/resctrl/info/L3_MON/event_configs directory on resctrl
> mount and pre-populate it with directories for the two existing MBM events:
> mbm_total_bytes and mbm_local_bytes. Create the "event_filter" file within
> each MBM event directory with the needed *show() that displays the memory
> transactions with which the MBM event is configured.
>
> Example:
> $ mount -t resctrl resctrl /sys/fs/resctrl
> $ cd /sys/fs/resctrl/
> $ cat info/L3_MON/event_configs/mbm_total_bytes/event_filter
> local_reads, remote_reads, local_non_temporal_writes,
> remote_non_temporal_writes, local_reads_slow_memory,
> remote_reads_slow_memory, dirty_victim_writes_all
>
> $ cat info/L3_MON/event_configs/mbm_local_bytes/event_filter
> local_reads, local_non_temporal_writes, local_reads_slow_memory
Please let these examples match what the patch does wrt spacing.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> Documentation/filesystems/resctrl.rst | 32 +++++++++++
> fs/resctrl/internal.h | 2 +
> fs/resctrl/monitor.c | 1 +
> fs/resctrl/rdtgroup.c | 78 +++++++++++++++++++++++++++
> 4 files changed, 113 insertions(+)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index 18de335e1ff8..b1db1a53db2a 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -310,6 +310,38 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
> 0=30;1=30
>
> +"event_configs":
> + Directory that exists when "mbm_event" mode is supported. Contains
""mbm_event" mode" -> ""mbm_event" counter assignment mode"
> + sub-directory for each MBM event that can be assigned to a counter.
> +
> + Two MBM events are supported by default: mbm_local_bytes and mbm_total_bytes.
> + Each MBM event's sub-directory contains a file named "event_filter" that is
> + used to view and modify which memory transactions the MBM event is configured
> + with.
> +
> + List of memory transaction types supported:
> +
> + ========================== ========================================================
> + Name Description
> + ========================== ========================================================
> + dirty_victim_writes_all Dirty Victims from the QOS domain to all types of memory
> + remote_reads_slow_memory Reads to slow memory in the non-local NUMA domain
> + local_reads_slow_memory Reads to slow memory in the local NUMA domain
> + remote_non_temporal_writes Non-temporal writes to non-local NUMA domain
> + local_non_temporal_writes Non-temporal writes to local NUMA domain
> + remote_reads Reads to memory in the non-local NUMA domain
> + local_reads Reads to memory in the local NUMA domain
> + ========================== ========================================================
> +
> + For example::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
> + local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
> + local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
> +
> + # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
> + local_reads, local_non_temporal_writes, local_reads_slow_memory
> +
Please let these examples match what the patch does wrt spacing.
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 84a136194d9a..ed0e3b695ad5 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -248,6 +248,8 @@ struct mbm_config_value {
>
> #define RFTYPE_DEBUG BIT(10)
>
> +#define RFTYPE_ASSIGN_CONFIG BIT(11)
> +
> #define RFTYPE_CTRL_INFO (RFTYPE_INFO | RFTYPE_CTRL)
>
> #define RFTYPE_MON_INFO (RFTYPE_INFO | RFTYPE_MON)
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index ef6ef58f180b..09a49029a800 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -956,6 +956,7 @@ int resctrl_mon_resource_init(void)
> RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
> resctrl_file_fflags_init("available_mbm_cntrs",
> RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
> + resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
> }
>
> return 0;
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 5fb6a9939e23..e2fa5e10c2dd 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1909,6 +1909,25 @@ static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
> return ret;
> }
>
> +static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq, void *v)
> +{
> + struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
> + bool sep = false;
> + int i;
> +
> + for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
> + if (mevt->evt_cfg & mbm_config_values[i].val) {
> + if (sep)
> + seq_putc(seq, ',');
> + seq_printf(seq, "%s", mbm_config_values[i].name);
Taking a closer look I think we need to be more careful about how the
code is organized. Ideally the monitoring related code and data should
be located in fs/resctrl/monitor.c. Having event_filter_show() here is
ok because of its use in res_common_files[]. Since it is monitoring related
I expected its code/data to be in fs/resctrl/monitor.c, thus that
mbm_config_values[] (mbm_transactions[]?) to be in fs/resctrl/monitor.c,
(just like mon_event_all[]).
> + sep = true;
> + }
> + }
> + seq_putc(seq, '\n');
> +
> + return 0;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
> @@ -2033,6 +2052,12 @@ static struct rftype res_common_files[] = {
> .seq_show = mbm_local_bytes_config_show,
> .write = mbm_local_bytes_config_write,
> },
> + {
> + .name = "event_filter",
> + .mode = 0444,
> + .kf_ops = &rdtgroup_kf_single_ops,
> + .seq_show = event_filter_show,
> + },
> {
> .name = "mbm_assign_mode",
> .mode = 0444,
> @@ -2315,6 +2340,53 @@ static int rdtgroup_mkdir_info_resdir(void *priv, char *name,
> return ret;
> }
>
> +static int resctrl_mkdir_counter_configs(struct rdt_resource *r, char *name)
This can now be named resctrl_mkdir_event_configs()?
Also, I cannot see where the struct rdt_resource parameter is used. It should not
be removed though, as mentioned earlier it should be used to ensure
to check the mon_evt::rid values so that only events associated with resource
are considered.
> +{
> + struct kernfs_node *l3_mon_kn, *kn_subdir, *kn_subdir2;
> + struct mon_evt *mevt;
> + int ret;
> +
> + l3_mon_kn = kernfs_find_and_get(kn_info, name);
> + if (!l3_mon_kn)
> + return -ENOENT;
Needing to figure out this kn does not seem necessary. Can it not be
provided via parameter instead?
For example, resctrl_mkdir_counter_configs() (rather resctrl_mkdir_event_configs())
can be called from rdtgroup_mkdir_info_resdir(). I understand rdtgroup_mkdir_info_resdir()
is also called for struct resctrl_schema parameter but I think the fflags can be used
to make the right decision. Something like:
rdtgroup_mkdir_info_resdir() {
struct rdt_resource *r;
...
if (fflags & RFTYPE_MON_INFO) {
r = priv;
if (r->mon.mbm_cntr_assignable) {
ret = resctrl_mkdir_event_configs(kn_subdir, r);
...
}
}
}
What do you think?
> +
> + kn_subdir = kernfs_create_dir(l3_mon_kn, "event_configs", l3_mon_kn->mode, NULL);
> + if (IS_ERR(kn_subdir)) {
> + kernfs_put(l3_mon_kn);
> + return PTR_ERR(kn_subdir);
> + }
> +
> + ret = rdtgroup_kn_set_ugid(kn_subdir);
> + if (ret) {
> + kernfs_put(l3_mon_kn);
> + return ret;
> + }
> +
> + for (mevt = &mon_event_all[0]; mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++) {
Here is a spot where the for_each_mon_event() should be used.
> + if (!mevt->enabled || !resctrl_is_mbm_event(mevt->evtid))
> + continue;
> +
> + kn_subdir2 = kernfs_create_dir(kn_subdir, mevt->name, kn_subdir->mode, mevt);
> + if (IS_ERR(kn_subdir2)) {
> + ret = PTR_ERR(kn_subdir2);
> + goto out_config;
> + }
> +
> + ret = rdtgroup_kn_set_ugid(kn_subdir2);
> + if (ret)
> + goto out_config;
> +
> + ret = rdtgroup_add_files(kn_subdir2, RFTYPE_ASSIGN_CONFIG);
> + if (!ret)
> + kernfs_activate(kn_subdir);
> + }
> +
> +out_config:
> + kernfs_put(l3_mon_kn);
> +
> + return ret;
> +}
> +
> static unsigned long fflags_from_resource(struct rdt_resource *r)
> {
> switch (r->rid) {
> @@ -2361,6 +2433,12 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
> ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
> if (ret)
> goto out_destroy;
> +
> + if (r->mon.mbm_cntr_assignable) {
> + ret = resctrl_mkdir_counter_configs(r, name);
> + if (ret)
> + goto out_destroy;
> + }
> }
>
> ret = rdtgroup_kn_set_ugid(kn_info);
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 24/32] fs/resctrl: Add event configuration directory under info/L3_MON/
2025-06-25 23:23 ` Reinette Chatre
@ 2025-06-30 19:06 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-30 19:06 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/25/25 18:23, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> When assignable counters are supported the
>> /sys/fs/resctrl/info/L3_MON/event_configs directory contains a
>> sub-directory for each MBM event that can be assigned to a counter.
>> The MBM event sub-directory contains a file named "event_filter" that
>> is used to view and modify which memory transactions the MBM event is
>> configured with.
>>
>> Create the /sys/fs/resctrl/info/L3_MON/event_configs directory on resctrl
>> mount and pre-populate it with directories for the two existing MBM events:
>> mbm_total_bytes and mbm_local_bytes. Create the "event_filter" file within
>> each MBM event directory with the needed *show() that displays the memory
>> transactions with which the MBM event is configured.
>>
>> Example:
>> $ mount -t resctrl resctrl /sys/fs/resctrl
>> $ cd /sys/fs/resctrl/
>> $ cat info/L3_MON/event_configs/mbm_total_bytes/event_filter
>> local_reads, remote_reads, local_non_temporal_writes,
>> remote_non_temporal_writes, local_reads_slow_memory,
>> remote_reads_slow_memory, dirty_victim_writes_all
>>
>> $ cat info/L3_MON/event_configs/mbm_local_bytes/event_filter
>> local_reads, local_non_temporal_writes, local_reads_slow_memory
>
> Please let these examples match what the patch does wrt spacing.
Sure.
>
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> ---
>> Documentation/filesystems/resctrl.rst | 32 +++++++++++
>> fs/resctrl/internal.h | 2 +
>> fs/resctrl/monitor.c | 1 +
>> fs/resctrl/rdtgroup.c | 78 +++++++++++++++++++++++++++
>> 4 files changed, 113 insertions(+)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index 18de335e1ff8..b1db1a53db2a 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -310,6 +310,38 @@ with the following files:
>> # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>> 0=30;1=30
>>
>> +"event_configs":
>> + Directory that exists when "mbm_event" mode is supported. Contains
>
> ""mbm_event" mode" -> ""mbm_event" counter assignment mode"
Sure.
>
>> + sub-directory for each MBM event that can be assigned to a counter.
>> +
>> + Two MBM events are supported by default: mbm_local_bytes and mbm_total_bytes.
>> + Each MBM event's sub-directory contains a file named "event_filter" that is
>> + used to view and modify which memory transactions the MBM event is configured
>> + with.
>> +
>> + List of memory transaction types supported:
>> +
>> + ========================== ========================================================
>> + Name Description
>> + ========================== ========================================================
>> + dirty_victim_writes_all Dirty Victims from the QOS domain to all types of memory
>> + remote_reads_slow_memory Reads to slow memory in the non-local NUMA domain
>> + local_reads_slow_memory Reads to slow memory in the local NUMA domain
>> + remote_non_temporal_writes Non-temporal writes to non-local NUMA domain
>> + local_non_temporal_writes Non-temporal writes to local NUMA domain
>> + remote_reads Reads to memory in the non-local NUMA domain
>> + local_reads Reads to memory in the local NUMA domain
>> + ========================== ========================================================
>> +
>> + For example::
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
>> + local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>> + local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
>> + local_reads, local_non_temporal_writes, local_reads_slow_memory
>> +
>
> Please let these examples match what the patch does wrt spacing.
Sure.
>
>> "max_threshold_occupancy":
>> Read/write file provides the largest value (in
>> bytes) at which a previously used LLC_occupancy
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 84a136194d9a..ed0e3b695ad5 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -248,6 +248,8 @@ struct mbm_config_value {
>>
>> #define RFTYPE_DEBUG BIT(10)
>>
>> +#define RFTYPE_ASSIGN_CONFIG BIT(11)
>> +
>> #define RFTYPE_CTRL_INFO (RFTYPE_INFO | RFTYPE_CTRL)
>>
>> #define RFTYPE_MON_INFO (RFTYPE_INFO | RFTYPE_MON)
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index ef6ef58f180b..09a49029a800 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -956,6 +956,7 @@ int resctrl_mon_resource_init(void)
>> RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>> resctrl_file_fflags_init("available_mbm_cntrs",
>> RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>> + resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
>> }
>>
>> return 0;
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 5fb6a9939e23..e2fa5e10c2dd 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -1909,6 +1909,25 @@ static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
>> return ret;
>> }
>>
>> +static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq, void *v)
>> +{
>> + struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
>> + bool sep = false;
>> + int i;
>> +
>> + for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
>> + if (mevt->evt_cfg & mbm_config_values[i].val) {
>> + if (sep)
>> + seq_putc(seq, ',');
>> + seq_printf(seq, "%s", mbm_config_values[i].name);
>
> Taking a closer look I think we need to be more careful about how the
> code is organized. Ideally the monitoring related code and data should
> be located in fs/resctrl/monitor.c. Having event_filter_show() here is
> ok because of its use in res_common_files[]. Since it is monitoring related
> I expected its code/data to be in fs/resctrl/monitor.c, thus that
> mbm_config_values[] (mbm_transactions[]?) to be in fs/resctrl/monitor.c,
> (just like mon_event_all[]).
Sure. Moved mbm_transactions[] to monitor.c.
Defined it as extern in fs/resctrl/rdtgroup.c
extern struct mbm_transaction mbm_transactions[NUM_MBM_TRANSACTIONS];
>
>> + sep = true;
>> + }
>> + }
>> + seq_putc(seq, '\n');
>> +
>> + return 0;
>> +}
>> +
>> /* rdtgroup information files for one cache resource. */
>> static struct rftype res_common_files[] = {
>> {
>> @@ -2033,6 +2052,12 @@ static struct rftype res_common_files[] = {
>> .seq_show = mbm_local_bytes_config_show,
>> .write = mbm_local_bytes_config_write,
>> },
>> + {
>> + .name = "event_filter",
>> + .mode = 0444,
>> + .kf_ops = &rdtgroup_kf_single_ops,
>> + .seq_show = event_filter_show,
>> + },
>> {
>> .name = "mbm_assign_mode",
>> .mode = 0444,
>> @@ -2315,6 +2340,53 @@ static int rdtgroup_mkdir_info_resdir(void *priv, char *name,
>> return ret;
>> }
>>
>> +static int resctrl_mkdir_counter_configs(struct rdt_resource *r, char *name)
>
> This can now be named resctrl_mkdir_event_configs()?
Sure.
>
> Also, I cannot see where the struct rdt_resource parameter is used. It should not
> be removed though, as mentioned earlier it should be used to ensure
> to check the mon_evt::rid values so that only events associated with resource
> are considered.
Added the check.
mevt->rid != r->rid
>
>> +{
>> + struct kernfs_node *l3_mon_kn, *kn_subdir, *kn_subdir2;
>> + struct mon_evt *mevt;
>> + int ret;
>> +
>> + l3_mon_kn = kernfs_find_and_get(kn_info, name);
>> + if (!l3_mon_kn)
>> + return -ENOENT;
>
> Needing to figure out this kn does not seem necessary. Can it not be
> provided via parameter instead?
>
> For example, resctrl_mkdir_counter_configs() (rather resctrl_mkdir_event_configs())
> can be called from rdtgroup_mkdir_info_resdir(). I understand rdtgroup_mkdir_info_resdir()
> is also called for struct resctrl_schema parameter but I think the fflags can be used
> to make the right decision. Something like:
>
> rdtgroup_mkdir_info_resdir() {
> struct rdt_resource *r;
>
> ...
> if (fflags & RFTYPE_MON_INFO) {
> r = priv;
> if (r->mon.mbm_cntr_assignable) {
> ret = resctrl_mkdir_event_configs(kn_subdir, r);
> ...
> }
> }
> }
>
> What do you think?
We can do that. Need to test it. Hopefully its fine.
>
>> +
>> + kn_subdir = kernfs_create_dir(l3_mon_kn, "event_configs", l3_mon_kn->mode, NULL);
>> + if (IS_ERR(kn_subdir)) {
>> + kernfs_put(l3_mon_kn);
>> + return PTR_ERR(kn_subdir);
>> + }
>> +
>> + ret = rdtgroup_kn_set_ugid(kn_subdir);
>> + if (ret) {
>> + kernfs_put(l3_mon_kn);
>> + return ret;
>> + }
>> +
>> + for (mevt = &mon_event_all[0]; mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++) {
>
> Here is a spot where the for_each_mon_event() should be used.
Sure.
>
>> + if (!mevt->enabled || !resctrl_is_mbm_event(mevt->evtid))
>> + continue;
>> +
>> + kn_subdir2 = kernfs_create_dir(kn_subdir, mevt->name, kn_subdir->mode, mevt);
>> + if (IS_ERR(kn_subdir2)) {
>> + ret = PTR_ERR(kn_subdir2);
>> + goto out_config;
>> + }
>> +
>> + ret = rdtgroup_kn_set_ugid(kn_subdir2);
>> + if (ret)
>> + goto out_config;
>> +
>> + ret = rdtgroup_add_files(kn_subdir2, RFTYPE_ASSIGN_CONFIG);
>> + if (!ret)
>> + kernfs_activate(kn_subdir);
>> + }
>> +
>> +out_config:
>> + kernfs_put(l3_mon_kn);
>> +
>> + return ret;
>> +}
>> +
>> static unsigned long fflags_from_resource(struct rdt_resource *r)
>> {
>> switch (r->rid) {
>> @@ -2361,6 +2433,12 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
>> ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
>> if (ret)
>> goto out_destroy;
>> +
>> + if (r->mon.mbm_cntr_assignable) {
>> + ret = resctrl_mkdir_counter_configs(r, name);
>> + if (ret)
>> + goto out_destroy;
>> + }
>> }
>>
>> ret = rdtgroup_kn_set_ugid(kn_info);
>
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 25/32] fs/resctrl: Provide interface to update the event configurations
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (23 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 24/32] fs/resctrl: Add event configuration directory under info/L3_MON/ Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 23:21 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 26/32] fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir Babu Moger
` (8 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
When assignable counters are supported the users can modify the event
configuration by writing to the 'event_filter' interface file. The event
configurations for mbm_event mode are located in
/sys/fs/resctrl/info/L3_MON/event_configs/.
Update the assignments of all groups when the event configuration is
modified.
Example:
$ mount -t resctrl resctrl /sys/fs/resctrl
$ cd /sys/fs/resctrl/
$ cat info/L3_MON/event_configs/mbm_local_bytes/event_filter
local_reads,local_non_temporal_writes,local_reads_slow_memory
$ echo "local_reads,local_non_temporal_writes" >
info/L3_MON/event_configs/mbm_total_bytes/event_filter
$ cat info/L3_MON/event_configs/mbm_total_bytes/event_filter
local_reads,local_non_temporal_writes
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Passed struct mon_evt where applicable instead of just the event type.
Fixed few text corrections about memory trasaction type.
Renamed few functions resctrl_group_assign() -> rdtgroup_assign_cntr()
resctrl_update_assign() -> resctrl_assign_cntr_allrdtgrp()
Removed few extra bases.
v13: Updated changelog for imperative mode.
Added function description in the prototype.
Updated the user doc resctrl.rst to address few feedback.
Resolved conflicts caused by the recent FS/ARCH code restructure.
The rdtgroup.c/monitor.c file has now been split between the FS and ARCH directories.
v12: New patch to modify event configurations.
---
Documentation/filesystems/resctrl.rst | 12 +++
fs/resctrl/rdtgroup.c | 120 +++++++++++++++++++++++++-
2 files changed, 131 insertions(+), 1 deletion(-)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index b1db1a53db2a..2cd6107ca452 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -342,6 +342,18 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
local_reads, local_non_temporal_writes, local_reads_slow_memory
+ Modify the event configuration by writing to the "event_filter" file within the
+ configuration directory. The read/write event_filter file contains the configuration
+ of the event that reflects which memory transactions are counted by it.
+
+ For example::
+
+ # echo "local_reads, local_non_temporal_writes" >
+ /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
+
+ # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
+ local_reads, local_non_temporal_writes
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index e2fa5e10c2dd..fdea608e0796 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1928,6 +1928,123 @@ static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq,
return 0;
}
+/**
+ * rdtgroup_assign_cntr - Update the counter assignments for the event in
+ * a group.
+ * @r: Resource to which update needs to be done.
+ * @rdtgrp: Resctrl group.
+ * @mevt: MBM monitor event.
+ */
+static int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+ struct mon_evt *mevt)
+{
+ struct rdt_mon_domain *d;
+ int cntr_id;
+
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
+ if (cntr_id >= 0 && d->cntr_cfg[cntr_id].evt_cfg != mevt->evt_cfg) {
+ d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
+ resctrl_arch_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
+ rdtgrp->closid, cntr_id, true);
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * resctrl_assign_cntr_allrdtgrp - Update the counter assignments for the event
+ * for all the groups.
+ * @r: Resource to which update needs to be done.
+ * @mevt MBM Monitor event.
+ */
+static void resctrl_assign_cntr_allrdtgrp(struct rdt_resource *r, struct mon_evt *mevt)
+{
+ struct rdtgroup *prgrp, *crgrp;
+
+ /*
+ * Check all the groups where the event tyoe is assigned and update
+ * the assignment
+ */
+ list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
+ rdtgroup_assign_cntr(r, prgrp, mevt);
+
+ list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list)
+ rdtgroup_assign_cntr(r, crgrp, mevt);
+ }
+}
+
+static int resctrl_process_configs(char *tok, u32 *val)
+{
+ char *evt_str;
+ u32 temp_val;
+ bool found;
+ int i;
+
+next_config:
+ if (!tok || tok[0] == '\0')
+ return 0;
+
+ /* Start processing the strings for each memory transaction type */
+ evt_str = strim(strsep(&tok, ","));
+ found = false;
+ for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
+ if (!strcmp(mbm_config_values[i].name, evt_str)) {
+ temp_val = mbm_config_values[i].val;
+ found = true;
+ break;
+ }
+ }
+
+ if (!found) {
+ rdt_last_cmd_printf("Invalid memory transaction type %s\n", evt_str);
+ return -EINVAL;
+ }
+
+ *val |= temp_val;
+
+ goto next_config;
+}
+
+static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
+ size_t nbytes, loff_t off)
+{
+ struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+ struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
+ u32 evt_cfg = 0;
+ int ret = 0;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+
+ buf[nbytes - 1] = '\0';
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ rdt_last_cmd_clear();
+
+ if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ ret = resctrl_process_configs(buf, &evt_cfg);
+ if (!ret && mevt->evt_cfg != evt_cfg) {
+ mevt->evt_cfg = evt_cfg;
+ resctrl_assign_cntr_allrdtgrp(r, mevt);
+ }
+
+out_unlock:
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
+ return ret ?: nbytes;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
@@ -2054,9 +2171,10 @@ static struct rftype res_common_files[] = {
},
{
.name = "event_filter",
- .mode = 0444,
+ .mode = 0644,
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = event_filter_show,
+ .write = event_filter_write,
},
{
.name = "mbm_assign_mode",
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 25/32] fs/resctrl: Provide interface to update the event configurations
2025-06-13 21:05 ` [PATCH v14 25/32] fs/resctrl: Provide interface to update the event configurations Babu Moger
@ 2025-06-25 23:21 ` Reinette Chatre
2025-07-01 0:43 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 23:21 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> When assignable counters are supported the users can modify the event
"the users" -> "the user" or "users"?
> configuration by writing to the 'event_filter' interface file. The event
nit: "interface file" -> "resctrl file"
> configurations for mbm_event mode are located in
> /sys/fs/resctrl/info/L3_MON/event_configs/.
>
> Update the assignments of all groups when the event configuration is
(just to help be specific) "all groups" -> "all CTRL_MON and MON resource groups"
> modified.
>
> Example:
> $ mount -t resctrl resctrl /sys/fs/resctrl
>
> $ cd /sys/fs/resctrl/
>
> $ cat info/L3_MON/event_configs/mbm_local_bytes/event_filter
> local_reads,local_non_temporal_writes,local_reads_slow_memory
>
> $ echo "local_reads,local_non_temporal_writes" >
> info/L3_MON/event_configs/mbm_total_bytes/event_filter
>
> $ cat info/L3_MON/event_configs/mbm_total_bytes/event_filter
> local_reads,local_non_temporal_writes
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> Documentation/filesystems/resctrl.rst | 12 +++
> fs/resctrl/rdtgroup.c | 120 +++++++++++++++++++++++++-
> 2 files changed, 131 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index b1db1a53db2a..2cd6107ca452 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -342,6 +342,18 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
> local_reads, local_non_temporal_writes, local_reads_slow_memory
>
> + Modify the event configuration by writing to the "event_filter" file within the
> + configuration directory. The read/write event_filter file contains the configuration
(to help be specific)
"within the configuration directory" -> "within the "event_configs" directory"
Note that "event_filter" is not consistently in quotes.
> + of the event that reflects which memory transactions are counted by it.
> +
> + For example::
> +
> + # echo "local_reads, local_non_temporal_writes" >
> + /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
counter_configs -> event_configs
> +
> + # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
counter_configs -> event_configs
> + local_reads, local_non_temporal_writes
Please let example match what code does wrt spacing.
> +
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index e2fa5e10c2dd..fdea608e0796 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1928,6 +1928,123 @@ static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq,
> return 0;
> }
>
> +/**
> + * rdtgroup_assign_cntr - Update the counter assignments for the event in
Could this please be renamed to rdtgroup_update_cntr()? Actually, how about
rdtgroup_update_cntr_event() to pair with a rdtgroup_assign_cntr_event()?
After staring at this code it becomes confusing when the term "assign" is used
for both allocating and just updating.
Compare for example: rdtgroup_assign_cntrs() with this function ... the
only difference is "cntr" vs "cntrs" in the name but instead of both functions
doing the same just on single vs multiple counters as the name implies they do
significantly different things. I find this very confusing.
> + * a group.
> + * @r: Resource to which update needs to be done.
> + * @rdtgrp: Resctrl group.
> + * @mevt: MBM monitor event.
> + */
> +static int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> + struct mon_evt *mevt)
> +{
> + struct rdt_mon_domain *d;
> + int cntr_id;
> +
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
> + if (cntr_id >= 0 && d->cntr_cfg[cntr_id].evt_cfg != mevt->evt_cfg) {
> + d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
I referred to this snippet in earlier comment
https://lore.kernel.org/lkml/887bad33-7f4a-4b6d-95a7-fdfe0451f42b@intel.com/
> + resctrl_arch_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
> + rdtgrp->closid, cntr_id, true);
> + }
> + }
> +
> + return 0;
Looks like this can be a void function.
> +}
> +
> +/**
> + * resctrl_assign_cntr_allrdtgrp - Update the counter assignments for the event
> + * for all the groups.
> + * @r: Resource to which update needs to be done.
> + * @mevt MBM Monitor event.
> + */
> +static void resctrl_assign_cntr_allrdtgrp(struct rdt_resource *r, struct mon_evt *mevt)
resctrl_assign_cntr_allrdtgrp() -> resctrl_update_cntr_allrdtgrp()/resctrl_update_cntr_event_allrdtgrp()
> +{
> + struct rdtgroup *prgrp, *crgrp;
> +
> + /*
> + * Check all the groups where the event tyoe is assigned and update
I am not sure what is meant with "Check" here. Maybe "Find"?
tyoe -> type?
> + * the assignment
> + */
> + list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> + rdtgroup_assign_cntr(r, prgrp, mevt);
> +
> + list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list)
> + rdtgroup_assign_cntr(r, crgrp, mevt);
> + }
> +}
> +
> +static int resctrl_process_configs(char *tok, u32 *val)
> +{
> + char *evt_str;
> + u32 temp_val;
> + bool found;
> + int i;
> +
> +next_config:
> + if (!tok || tok[0] == '\0')
> + return 0;
> +
> + /* Start processing the strings for each memory transaction type */
> + evt_str = strim(strsep(&tok, ","));
> + found = false;
> + for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
> + if (!strcmp(mbm_config_values[i].name, evt_str)) {
> + temp_val = mbm_config_values[i].val;
> + found = true;
> + break;
> + }
> + }
> +
> + if (!found) {
> + rdt_last_cmd_printf("Invalid memory transaction type %s\n", evt_str);
> + return -EINVAL;
> + }
> +
> + *val |= temp_val;
This still returns a partially initialized value on failure. Please only set
provided parameter on success.
> +
> + goto next_config;
> +}
> +
> +static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
> + size_t nbytes, loff_t off)
> +{
> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> + struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
With mon_evt::rid available it should not be necessary to hardcode the resource?
Do any of these new functions need a struct rdt_resource parameter in addition
to struct mon_evt?
> + u32 evt_cfg = 0;
> + int ret = 0;
> +
> + /* Valid input requires a trailing newline */
> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
> + return -EINVAL;
> +
> + buf[nbytes - 1] = '\0';
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdt_last_cmd_clear();
> +
> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
Needs update to new names.
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> +
> + ret = resctrl_process_configs(buf, &evt_cfg);
> + if (!ret && mevt->evt_cfg != evt_cfg) {
> + mevt->evt_cfg = evt_cfg;
> + resctrl_assign_cntr_allrdtgrp(r, mevt);
Could only event_filter_write() be in fs/resctrl/rdtgroup.c with the rest
of the functions introduced here located with rest of monitoring code
in fs/resctrl/monitor.c?
> + }
> +
> +out_unlock:
> + mutex_unlock(&rdtgroup_mutex);
> + cpus_read_unlock();
> +
> + return ret ?: nbytes;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
> @@ -2054,9 +2171,10 @@ static struct rftype res_common_files[] = {
> },
> {
> .name = "event_filter",
> - .mode = 0444,
> + .mode = 0644,
> .kf_ops = &rdtgroup_kf_single_ops,
> .seq_show = event_filter_show,
> + .write = event_filter_write,
> },
> {
> .name = "mbm_assign_mode",
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 25/32] fs/resctrl: Provide interface to update the event configurations
2025-06-25 23:21 ` Reinette Chatre
@ 2025-07-01 0:43 ` Moger, Babu
2025-07-01 1:33 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-07-01 0:43 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/25/2025 6:21 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> When assignable counters are supported the users can modify the event
>
> "the users" -> "the user" or "users"?
"users"
>
>> configuration by writing to the 'event_filter' interface file. The event
>
> nit: "interface file" -> "resctrl file"
Sure.
>
>> configurations for mbm_event mode are located in
>> /sys/fs/resctrl/info/L3_MON/event_configs/.
>>
>> Update the assignments of all groups when the event configuration is
>
> (just to help be specific) "all groups" -> "all CTRL_MON and MON resource groups"
sure.
>
>> modified.
>>
>> Example:
>> $ mount -t resctrl resctrl /sys/fs/resctrl
>>
>> $ cd /sys/fs/resctrl/
>>
>> $ cat info/L3_MON/event_configs/mbm_local_bytes/event_filter
>> local_reads,local_non_temporal_writes,local_reads_slow_memory
>>
>> $ echo "local_reads,local_non_temporal_writes" >
>> info/L3_MON/event_configs/mbm_total_bytes/event_filter
>>
>> $ cat info/L3_MON/event_configs/mbm_total_bytes/event_filter
>> local_reads,local_non_temporal_writes
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> ---
>> Documentation/filesystems/resctrl.rst | 12 +++
>> fs/resctrl/rdtgroup.c | 120 +++++++++++++++++++++++++-
>> 2 files changed, 131 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index b1db1a53db2a..2cd6107ca452 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -342,6 +342,18 @@ with the following files:
>> # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
>> local_reads, local_non_temporal_writes, local_reads_slow_memory
>>
>> + Modify the event configuration by writing to the "event_filter" file within the
>> + configuration directory. The read/write event_filter file contains the configuration
>
> (to help be specific)
> "within the configuration directory" -> "within the "event_configs" directory"
Sure.
>
> Note that "event_filter" is not consistently in quotes.
Sure.
>
>> + of the event that reflects which memory transactions are counted by it.
>> +
>> + For example::
>> +
>> + # echo "local_reads, local_non_temporal_writes" >
>> + /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>
> counter_configs -> event_configs
Sure.
>
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>
> counter_configs -> event_configs
>
Sure.
>> + local_reads, local_non_temporal_writes
>
> Please let example match what code does wrt spacing.
Sure.
>
>> +
>> "max_threshold_occupancy":
>> Read/write file provides the largest value (in
>> bytes) at which a previously used LLC_occupancy
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index e2fa5e10c2dd..fdea608e0796 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -1928,6 +1928,123 @@ static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq,
>> return 0;
>> }
>>
>> +/**
>> + * rdtgroup_assign_cntr - Update the counter assignments for the event in
>
> Could this please be renamed to rdtgroup_update_cntr()? Actually, how about
> rdtgroup_update_cntr_event() to pair with a rdtgroup_assign_cntr_event()?
>
Sure.
> After staring at this code it becomes confusing when the term "assign" is used
> for both allocating and just updating.
>
> Compare for example: rdtgroup_assign_cntrs() with this function ... the
> only difference is "cntr" vs "cntrs" in the name but instead of both functions
> doing the same just on single vs multiple counters as the name implies they do
> significantly different things. I find this very confusing.
Agree.
>
>> + * a group.
>> + * @r: Resource to which update needs to be done.
>> + * @rdtgrp: Resctrl group.
>> + * @mevt: MBM monitor event.
>> + */
>> +static int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> + struct mon_evt *mevt)
>> +{
>> + struct rdt_mon_domain *d;
>> + int cntr_id;
>> +
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
>> + if (cntr_id >= 0 && d->cntr_cfg[cntr_id].evt_cfg != mevt->evt_cfg) {
>> + d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
>
> I referred to this snippet in earlier comment
> https://lore.kernel.org/lkml/887bad33-7f4a-4b6d-95a7-fdfe0451f42b@intel.com/
>
Yes. Taken care of this.
>> + resctrl_arch_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
>> + rdtgrp->closid, cntr_id, true);
>> + }
>> + }
>> +
>> + return 0;
>
> Looks like this can be a void function.
Sure.
>
>> +}
>> +
>> +/**
>> + * resctrl_assign_cntr_allrdtgrp - Update the counter assignments for the event
>> + * for all the groups.
>> + * @r: Resource to which update needs to be done.
>> + * @mevt MBM Monitor event.
>> + */
>> +static void resctrl_assign_cntr_allrdtgrp(struct rdt_resource *r, struct mon_evt *mevt)
>
> resctrl_assign_cntr_allrdtgrp() -> resctrl_update_cntr_allrdtgrp()/resctrl_update_cntr_event_allrdtgrp()
resctrl_update_cntr_allrdtgrp()
>
>> +{
>> + struct rdtgroup *prgrp, *crgrp;
>> +
>> + /*
>> + * Check all the groups where the event tyoe is assigned and update
>
> I am not sure what is meant with "Check" here. Maybe "Find"?
Find.
>
> tyoe -> type?
Sure.
>
>> + * the assignment
>> + */
>> + list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>> + rdtgroup_assign_cntr(r, prgrp, mevt);
>> +
>> + list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list)
>> + rdtgroup_assign_cntr(r, crgrp, mevt);
>> + }
>> +}
>> +
>> +static int resctrl_process_configs(char *tok, u32 *val)
>> +{
>> + char *evt_str;
>> + u32 temp_val;
>> + bool found;
>> + int i;
>> +
>> +next_config:
>> + if (!tok || tok[0] == '\0')
>> + return 0;
>> +
>> + /* Start processing the strings for each memory transaction type */
>> + evt_str = strim(strsep(&tok, ","));
>> + found = false;
>> + for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
>> + if (!strcmp(mbm_config_values[i].name, evt_str)) {
>> + temp_val = mbm_config_values[i].val;
>> + found = true;
>> + break;
>> + }
>> + }
>> +
>> + if (!found) {
>> + rdt_last_cmd_printf("Invalid memory transaction type %s\n", evt_str);
>> + return -EINVAL;
>> + }
>> +
>> + *val |= temp_val;
>
> This still returns a partially initialized value on failure. Please only set
> provided parameter on success.
Yes. Changed it.
if (!tok || tok[0] == '\0') {
*val = temp_val;
return 0;
}
>
>> +
>> + goto next_config;
>> +}
>> +
>> +static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
>> + size_t nbytes, loff_t off)
>> +{
>> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>> + struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
>
> With mon_evt::rid available it should not be necessary to hardcode the resource?
changed it
r = resctrl_arch_get_resource(mevt->rid);
> Do any of these new functions need a struct rdt_resource parameter in addition
> to struct mon_evt?
We need to make a call resctrl_arch_mbm_cntr_assign_enabled(r)) to
proceed. So we need struct rdt_resource.
>
>> + u32 evt_cfg = 0;
>> + int ret = 0;
>> +
>> + /* Valid input requires a trailing newline */
>> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> + return -EINVAL;
>> +
>> + buf[nbytes - 1] = '\0';
>> +
>> + cpus_read_lock();
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + rdt_last_cmd_clear();
>> +
>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>
> Needs update to new names.
Sure.
>
>> + ret = -EINVAL;
>> + goto out_unlock;
>> + }
>> +
>> + ret = resctrl_process_configs(buf, &evt_cfg);
>> + if (!ret && mevt->evt_cfg != evt_cfg) {
>> + mevt->evt_cfg = evt_cfg;
>> + resctrl_assign_cntr_allrdtgrp(r, mevt);
>
> Could only event_filter_write() be in fs/resctrl/rdtgroup.c with the rest
> of the functions introduced here located with rest of monitoring code
> in fs/resctrl/monitor.c?
Kept event_filter_write() and resctrl_process_configs() here. Moved
other two functions to fs/resctrl/monitor.c.
>
>> + }
>> +
>> +out_unlock:
>> + mutex_unlock(&rdtgroup_mutex);
>> + cpus_read_unlock();
>> +
>> + return ret ?: nbytes;
>> +}
>> +
>> /* rdtgroup information files for one cache resource. */
>> static struct rftype res_common_files[] = {
>> {
>> @@ -2054,9 +2171,10 @@ static struct rftype res_common_files[] = {
>> },
>> {
>> .name = "event_filter",
>> - .mode = 0444,
>> + .mode = 0644,
>> .kf_ops = &rdtgroup_kf_single_ops,
>> .seq_show = event_filter_show,
>> + .write = event_filter_write,
>> },
>> {
>> .name = "mbm_assign_mode",
>
> Reinette
>
thanks
Babu
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 25/32] fs/resctrl: Provide interface to update the event configurations
2025-07-01 0:43 ` Moger, Babu
@ 2025-07-01 1:33 ` Reinette Chatre
2025-07-01 16:14 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-07-01 1:33 UTC (permalink / raw)
To: Moger, Babu, Babu Moger, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/30/25 5:43 PM, Moger, Babu wrote:
> On 6/25/2025 6:21 PM, Reinette Chatre wrote:
>> On 6/13/25 2:05 PM, Babu Moger wrote:
...
>>> + * the assignment
>>> + */
>>> + list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>>> + rdtgroup_assign_cntr(r, prgrp, mevt);
>>> +
>>> + list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list)
>>> + rdtgroup_assign_cntr(r, crgrp, mevt);
>>> + }
>>> +}
>>> +
>>> +static int resctrl_process_configs(char *tok, u32 *val)
>>> +{
>>> + char *evt_str;
>>> + u32 temp_val;
>>> + bool found;
>>> + int i;
>>> +
>>> +next_config:
>>> + if (!tok || tok[0] == '\0')
>>> + return 0;
>>> +
>>> + /* Start processing the strings for each memory transaction type */
>>> + evt_str = strim(strsep(&tok, ","));
>>> + found = false;
>>> + for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
>>> + if (!strcmp(mbm_config_values[i].name, evt_str)) {
>>> + temp_val = mbm_config_values[i].val;
>>> + found = true;
>>> + break;
>>> + }
>>> + }
>>> +
>>> + if (!found) {
>>> + rdt_last_cmd_printf("Invalid memory transaction type %s\n", evt_str);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + *val |= temp_val;
>>
>> This still returns a partially initialized value on failure. Please only set
>> provided parameter on success.
>
> Yes. Changed it.
>
> if (!tok || tok[0] == '\0') {
> *val = temp_val;
> return 0;
> }
You may just not have included this in your snippet, but please ensure temp_val is always
initialized. Just this snippet on top of original patch risks using uninitialized variable.
>>> +
>>> + goto next_config;
>>> +}
>>> +
>>> +static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
>>> + size_t nbytes, loff_t off)
>>> +{
>>> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>>> + struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
>>
>> With mon_evt::rid available it should not be necessary to hardcode the resource?
>
> changed it
>
> r = resctrl_arch_get_resource(mevt->rid);
>
>> Do any of these new functions need a struct rdt_resource parameter in addition
>> to struct mon_evt?
>
> We need to make a call resctrl_arch_mbm_cntr_assign_enabled(r)) to proceed. So we need struct rdt_resource.
Understood, but since struct rdt_resource can be determined from mon_evt::rid
it is not obvious to me that providing both is always needed by all these functions.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 25/32] fs/resctrl: Provide interface to update the event configurations
2025-07-01 1:33 ` Reinette Chatre
@ 2025-07-01 16:14 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-07-01 16:14 UTC (permalink / raw)
To: Reinette Chatre, Moger, Babu, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/30/25 20:33, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/30/25 5:43 PM, Moger, Babu wrote:
>> On 6/25/2025 6:21 PM, Reinette Chatre wrote:
>>> On 6/13/25 2:05 PM, Babu Moger wrote:
>
> ...
>
>>>> + * the assignment
>>>> + */
>>>> + list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>>>> + rdtgroup_assign_cntr(r, prgrp, mevt);
>>>> +
>>>> + list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list)
>>>> + rdtgroup_assign_cntr(r, crgrp, mevt);
>>>> + }
>>>> +}
>>>> +
>>>> +static int resctrl_process_configs(char *tok, u32 *val)
>>>> +{
>>>> + char *evt_str;
>>>> + u32 temp_val;
>>>> + bool found;
>>>> + int i;
>>>> +
>>>> +next_config:
>>>> + if (!tok || tok[0] == '\0')
>>>> + return 0;
>>>> +
>>>> + /* Start processing the strings for each memory transaction type */
>>>> + evt_str = strim(strsep(&tok, ","));
>>>> + found = false;
>>>> + for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
>>>> + if (!strcmp(mbm_config_values[i].name, evt_str)) {
>>>> + temp_val = mbm_config_values[i].val;
>>>> + found = true;
>>>> + break;
>>>> + }
>>>> + }
>>>> +
>>>> + if (!found) {
>>>> + rdt_last_cmd_printf("Invalid memory transaction type %s\n", evt_str);
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + *val |= temp_val;
>>>
>>> This still returns a partially initialized value on failure. Please only set
>>> provided parameter on success.
>>
>> Yes. Changed it.
>>
>> if (!tok || tok[0] == '\0') {
>> *val = temp_val;
>> return 0;
>> }
>
> You may just not have included this in your snippet, but please ensure temp_val is always
> initialized. Just this snippet on top of original patch risks using uninitialized variable.
Yes. Got it. Should have pasted the full change. Its taken care already.
>
>>>> +
>>>> + goto next_config;
>>>> +}
>>>> +
>>>> +static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
>>>> + size_t nbytes, loff_t off)
>>>> +{
>>>> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>>>> + struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
>>>
>>> With mon_evt::rid available it should not be necessary to hardcode the resource?
>>
>> changed it
>>
>> r = resctrl_arch_get_resource(mevt->rid);
>>
>>> Do any of these new functions need a struct rdt_resource parameter in addition
>>> to struct mon_evt?
>>
>> We need to make a call resctrl_arch_mbm_cntr_assign_enabled(r)) to proceed. So we need struct rdt_resource.
>
> Understood, but since struct rdt_resource can be determined from mon_evt::rid
> it is not obvious to me that providing both is always needed by all these functions.
>
Yes. Got it. Taken care of this.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 26/32] fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (24 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 25/32] fs/resctrl: Provide interface to update the event configurations Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 23:24 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 27/32] x86,fs/resctrl: Auto assign/unassign counters " Babu Moger
` (7 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
The "mbm_event" mode allows users to assign a hardware counter ID to an
RMID, event pair and monitor the bandwidth as long as it is assigned.
Introduce a user-configurable option that determines if a counter will
automatically be assigned to an RMID, event pair when its associated
monitor group is created via mkdir.
Suggested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Added rdtgroup_mutex in resctrl_mbm_assign_on_mkdir_show().
Updated resctrl.rst for clarity.
Fixed squashing of few previous changes.
Added more code documentation.
v13: Added Suggested-by tag.
Resolved conflicts caused by the recent FS/ARCH code restructure.
The rdtgroup.c/monitor.c file has now been split between the FS and ARCH directories.
v12: New patch. Added after the discussion on the list.
https://lore.kernel.org/lkml/CALPaoCh8siZKjL_3yvOYGL4cF_n_38KpUFgHVGbQ86nD+Q2_SA@mail.gmail.com/
---
Documentation/filesystems/resctrl.rst | 16 ++++++++++
fs/resctrl/monitor.c | 2 ++
fs/resctrl/rdtgroup.c | 43 +++++++++++++++++++++++++++
include/linux/resctrl.h | 3 ++
4 files changed, 64 insertions(+)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 2cd6107ca452..f94c7c387416 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -354,6 +354,22 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
local_reads, local_non_temporal_writes
+"mbm_assign_on_mkdir":
+ Determines if a counter will automatically be assigned to an RMID, event pair
+ when its associated monitor group is created via mkdir. It is enabled by default
+ on boot and users can disable by writing to the interface.
+
+ "0":
+ Auto assignment is disabled.
+ "1":
+ Auto assignment is enabled.
+
+ Example::
+
+ # echo 0 > /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
+ 0
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 09a49029a800..1ec2efd50273 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -957,6 +957,8 @@ int resctrl_mon_resource_init(void)
resctrl_file_fflags_init("available_mbm_cntrs",
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
+ resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
+ RFTYPE_RES_CACHE);
}
return 0;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index fdea608e0796..bf5fd46bd455 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2045,6 +2045,42 @@ static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
return ret ?: nbytes;
}
+static int resctrl_mbm_assign_on_mkdir_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+
+ mutex_lock(&rdtgroup_mutex);
+ rdt_last_cmd_clear();
+
+ seq_printf(s, "%u\n", r->mon.mbm_assign_on_mkdir);
+
+ mutex_unlock(&rdtgroup_mutex);
+
+ return 0;
+}
+
+static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+ bool value;
+ int ret;
+
+ ret = kstrtobool(buf, &value);
+ if (ret)
+ return ret;
+
+ mutex_lock(&rdtgroup_mutex);
+ rdt_last_cmd_clear();
+
+ r->mon.mbm_assign_on_mkdir = value;
+
+ mutex_unlock(&rdtgroup_mutex);
+
+ return ret ?: nbytes;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
@@ -2054,6 +2090,13 @@ static struct rftype res_common_files[] = {
.seq_show = rdt_last_cmd_status_show,
.fflags = RFTYPE_TOP_INFO,
},
+ {
+ .name = "mbm_assign_on_mkdir",
+ .mode = 0644,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = resctrl_mbm_assign_on_mkdir_show,
+ .write = resctrl_mbm_assign_on_mkdir_write,
+ },
{
.name = "num_closids",
.mode = 0444,
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 4b52bac5dbbc..39dd3acff372 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -281,12 +281,15 @@ enum resctrl_schema_fmt {
* monitoring events are configured.
* @num_mbm_cntrs: Number of assignable counters.
* @mbm_cntr_assignable:Is system capable of supporting counter assignment?
+ * @mbm_assign_on_mkdir:True if counters should automatically be assigned to MBM
+ * events of monitor groups created via mkdir.
*/
struct resctrl_mon {
int num_rmid;
unsigned int mbm_cfg_mask;
int num_mbm_cntrs;
bool mbm_cntr_assignable;
+ bool mbm_assign_on_mkdir;
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 26/32] fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir
2025-06-13 21:05 ` [PATCH v14 26/32] fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir Babu Moger
@ 2025-06-25 23:24 ` Reinette Chatre
2025-07-01 16:23 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 23:24 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> +static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
> + char *buf, size_t nbytes, loff_t off)
> +{
> + struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
> + bool value;
> + int ret;
> +
> + ret = kstrtobool(buf, &value);
> + if (ret)
> + return ret;
> +
> + mutex_lock(&rdtgroup_mutex);
> + rdt_last_cmd_clear();
> +
> + r->mon.mbm_assign_on_mkdir = value;
> +
> + mutex_unlock(&rdtgroup_mutex);
> +
> + return ret ?: nbytes;
The static checker I tried complained here that ret can only be zero here.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 26/32] fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir
2025-06-25 23:24 ` Reinette Chatre
@ 2025-07-01 16:23 ` Moger, Babu
2025-07-01 16:37 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-07-01 16:23 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/25/25 18:24, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> +static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
>> + char *buf, size_t nbytes, loff_t off)
>> +{
>> + struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
>> + bool value;
>> + int ret;
>> +
>> + ret = kstrtobool(buf, &value);
>> + if (ret)
>> + return ret;
>> +
>> + mutex_lock(&rdtgroup_mutex);
>> + rdt_last_cmd_clear();
>> +
>> + r->mon.mbm_assign_on_mkdir = value;
>> +
>> + mutex_unlock(&rdtgroup_mutex);
>> +
>> + return ret ?: nbytes;
>
> The static checker I tried complained here that ret can only be zero here.
>
It should be
return 0;
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 26/32] fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir
2025-07-01 16:23 ` Moger, Babu
@ 2025-07-01 16:37 ` Reinette Chatre
0 siblings, 0 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-07-01 16:37 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 7/1/25 9:23 AM, Moger, Babu wrote:
> Hi Reinette,
>
> On 6/25/25 18:24, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>> +static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
>>> + char *buf, size_t nbytes, loff_t off)
>>> +{
>>> + struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
>>> + bool value;
>>> + int ret;
>>> +
>>> + ret = kstrtobool(buf, &value);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + mutex_lock(&rdtgroup_mutex);
>>> + rdt_last_cmd_clear();
>>> +
>>> + r->mon.mbm_assign_on_mkdir = value;
>>> +
>>> + mutex_unlock(&rdtgroup_mutex);
>>> +
>>> + return ret ?: nbytes;
>>
>> The static checker I tried complained here that ret can only be zero here.
>>
>
> It should be
>
> return 0;
>
hmmm ... I think it should be "return nbytes"
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 27/32] x86,fs/resctrl: Auto assign/unassign counters on mkdir
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (25 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 26/32] fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 23:25 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 28/32] fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group Babu Moger
` (6 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Resctrl provides a user-configurable option mbm_assign_on_mkdir that
determines if a counter will automatically be assigned to an RMID, event
pair when its associated monitor group is created via mkdir.
Enable mbm_assign_on_mkdir by default and automatically assign or unassign
counters when a resctrl group is created or deleted.
By default, each group requires two counters: one for the MBM total event
and one for the MBM local event.
If the counters are exhausted, the kernel will log the error message
"Unable to allocate counter in domain" in
/sys/fs/resctrl/info/last_cmd_status when a new group is created and the
counter assignment will fail. However, the creation of a group should not
fail due to assignment failures. Users have the flexibility to modify the
assignments at a later time.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Updated the changelog with changed name mbm_event.
Update code comments with changed name mbm_event.
Changed the code to reflect Tony's struct mon_evt changes.
v13: Changes due to calling of resctrl_assign_cntr_event() and resctrl_unassign_cntr_event().
It only takes evtid. evt_cfg is not required anymore.
Resolved conflicts caused by the recent FS/ARCH code restructure.
The monitor.c/rdtgroup.c files have been split between the FS and ARCH directories.
v12: Removed mbm_cntr_reset() as it is not required while removing the group.
Update the commit text.
Added r->mon_capable check in rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs.
v11: Moved mbm_cntr_reset() to monitor.c.
Added code reset non-architectural state in mbm_cntr_reset().
Added missing rdtgroup_unassign_cntrs() calls on failure path.
v10: Assigned the counter before exposing the event files.
Moved the call rdtgroup_assign_cntrs() inside mkdir_rdt_prepare_rmid_alloc().
This is called both CNTR_MON and MON group creation.
Call mbm_cntr_reset() when unmounted to clear all the assignments.
Taken care of few other feedback comments.
v9: Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to return void.
Updated couple of rdtgroup_unassign_cntrs() calls properly.
Updated function comments.
v8: Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
Fixed the problem with unassigning the child MON groups of CTRL_MON group.
v7: Reworded the commit message.
Removed the reference of ABMC with mbm_cntr_assign.
Renamed the function rdtgroup_assign_cntrs to rdtgroup_assign_grp.
v6: Removed the redundant comments on all the calls of
rdtgroup_assign_cntrs. Updated the commit message.
Dropped printing error message on every call of rdtgroup_assign_cntrs.
v5: Removed the code to enable/disable ABMC during the mount.
That will be another patch.
Added arch callers to get the arch specific data.
Renamed fuctions to match the other abmc function.
Added code comments for assignment failures.
v4: Few name changes based on the upstream discussion.
Commit message update.
v3: This is a new patch. Patch addresses the upstream comment to enable
ABMC feature by default if the feature is available.
---
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
fs/resctrl/rdtgroup.c | 71 ++++++++++++++++++++++++++-
2 files changed, 70 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index ee0aa741cf6c..053f516a8e67 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -429,6 +429,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
r->mon.mbm_cntr_assignable = true;
cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
+ r->mon.mbm_assign_on_mkdir = true;
}
r->mon_capable = true;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index bf5fd46bd455..128a9db339f3 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2945,6 +2945,55 @@ static void schemata_list_destroy(void)
}
}
+/**
+ * rdtgroup_assign_cntrs() - Assign counters to MBM events. Called when
+ * a new group is created.
+ * If "mbm_event" mode is enabled, counters are automatically assigned.
+ * Each group can accommodate two counters: one for the total event and
+ * one for the local event. Assignments may fail due to the limited number
+ * of counters. However, it is not necessary to fail the group creation
+ * and thus no failure is returned. Users have the option to modify the
+ * counter assignments after the group has been created.
+ */
+static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
+{
+ struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+
+ if (!r->mon_capable)
+ return;
+
+ if (resctrl_arch_mbm_cntr_assign_enabled(r) && !r->mon.mbm_assign_on_mkdir)
+ return;
+
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
+ resctrl_assign_cntr_event(r, NULL, rdtgrp,
+ &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
+
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
+ resctrl_assign_cntr_event(r, NULL, rdtgrp,
+ &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
+}
+
+/*
+ * rdtgroup_unassign_cntrs() - Unassign the counters associated with MBM events.
+ * Called when a group is deleted.
+ */
+static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
+{
+ struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+
+ if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r))
+ return;
+
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
+ resctrl_unassign_cntr_event(r, NULL, rdtgrp,
+ &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
+
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
+ resctrl_unassign_cntr_event(r, NULL, rdtgrp,
+ &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
+}
+
static int rdt_get_tree(struct fs_context *fc)
{
struct rdt_fs_context *ctx = rdt_fc2context(fc);
@@ -3001,6 +3050,8 @@ static int rdt_get_tree(struct fs_context *fc)
if (ret < 0)
goto out_info;
+ rdtgroup_assign_cntrs(&rdtgroup_default);
+
ret = mkdir_mondata_all(rdtgroup_default.kn,
&rdtgroup_default, &kn_mondata);
if (ret < 0)
@@ -3039,8 +3090,10 @@ static int rdt_get_tree(struct fs_context *fc)
if (resctrl_arch_mon_capable())
kernfs_remove(kn_mondata);
out_mongrp:
- if (resctrl_arch_mon_capable())
+ if (resctrl_arch_mon_capable()) {
+ rdtgroup_unassign_cntrs(&rdtgroup_default);
kernfs_remove(kn_mongrp);
+ }
out_info:
kernfs_remove(kn_info);
out_closid_exit:
@@ -3186,6 +3239,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
head = &rdtgrp->mon.crdtgrp_list;
list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
+ rdtgroup_unassign_cntrs(sentry);
free_rmid(sentry->closid, sentry->mon.rmid);
list_del(&sentry->mon.crdtgrp_list);
@@ -3226,6 +3280,8 @@ static void rmdir_all_sub(void)
cpumask_or(&rdtgroup_default.cpu_mask,
&rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
+ rdtgroup_unassign_cntrs(rdtgrp);
+
free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
kernfs_remove(rdtgrp->kn);
@@ -3310,6 +3366,7 @@ static void resctrl_fs_teardown(void)
return;
rmdir_all_sub();
+ rdtgroup_unassign_cntrs(&rdtgroup_default);
mon_put_kn_priv();
rdt_pseudo_lock_release();
rdtgroup_default.mode = RDT_MODE_SHAREABLE;
@@ -3790,9 +3847,12 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
}
rdtgrp->mon.rmid = ret;
+ rdtgroup_assign_cntrs(rdtgrp);
+
ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
if (ret) {
rdt_last_cmd_puts("kernfs subdir error\n");
+ rdtgroup_unassign_cntrs(rdtgrp);
free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
return ret;
}
@@ -3802,8 +3862,10 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
{
- if (resctrl_arch_mon_capable())
+ if (resctrl_arch_mon_capable()) {
+ rdtgroup_unassign_cntrs(rgrp);
free_rmid(rgrp->closid, rgrp->mon.rmid);
+ }
}
/*
@@ -4079,6 +4141,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
update_closid_rmid(tmpmask, NULL);
rdtgrp->flags = RDT_DELETED;
+
+ rdtgroup_unassign_cntrs(rdtgrp);
+
free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
/*
@@ -4126,6 +4191,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
update_closid_rmid(tmpmask, NULL);
+ rdtgroup_unassign_cntrs(rdtgrp);
+
free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
closid_free(rdtgrp->closid);
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 27/32] x86,fs/resctrl: Auto assign/unassign counters on mkdir
2025-06-13 21:05 ` [PATCH v14 27/32] x86,fs/resctrl: Auto assign/unassign counters " Babu Moger
@ 2025-06-25 23:25 ` Reinette Chatre
2025-07-01 19:06 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 23:25 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> Resctrl provides a user-configurable option mbm_assign_on_mkdir that
> determines if a counter will automatically be assigned to an RMID, event
> pair when its associated monitor group is created via mkdir.
>
> Enable mbm_assign_on_mkdir by default and automatically assign or unassign
> counters when a resctrl group is created or deleted.
This is a bit confusing since I do not think mbm_assign_on_mkdir has *anything*
to do with unassign of counters. Counters are always (irrespective of mbm_assign_on_mkdir)
unassigned when a resctrl group is deleted, no?
The subject also does not seem accurate since there is no unassign on
mkdir.
>
> By default, each group requires two counters: one for the MBM total event
> and one for the MBM local event.
>
> If the counters are exhausted, the kernel will log the error message
> "Unable to allocate counter in domain" in
> /sys/fs/resctrl/info/last_cmd_status when a new group is created and the
> counter assignment will fail. However, the creation of a group should not
> fail due to assignment failures. Users have the flexibility to modify the
> assignments at a later time.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
> fs/resctrl/rdtgroup.c | 71 ++++++++++++++++++++++++++-
> 2 files changed, 70 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index ee0aa741cf6c..053f516a8e67 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -429,6 +429,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
> r->mon.mbm_cntr_assignable = true;
> cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
> r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
> + r->mon.mbm_assign_on_mkdir = true;
> }
>
> r->mon_capable = true;
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index bf5fd46bd455..128a9db339f3 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2945,6 +2945,55 @@ static void schemata_list_destroy(void)
> }
> }
>
> +/**
> + * rdtgroup_assign_cntrs() - Assign counters to MBM events. Called when
> + * a new group is created.
> + * If "mbm_event" mode is enabled, counters are automatically assigned.
"counters are automatically assigned" -> "counters should be automatically assigned
if the "mbm_assign_on_mkdir" is set"?
> + * Each group can accommodate two counters: one for the total event and
> + * one for the local event. Assignments may fail due to the limited number
> + * of counters. However, it is not necessary to fail the group creation
> + * and thus no failure is returned. Users have the option to modify the
> + * counter assignments after the group has been created.
> + */
> +static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
> +{
> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> +
> + if (!r->mon_capable)
> + return;
> +
> + if (resctrl_arch_mbm_cntr_assign_enabled(r) && !r->mon.mbm_assign_on_mkdir)
> + return;
This check is not clear to me. It looks to me as though counter assignment
will be attempted if !resctrl_arch_mbm_cntr_assign_enabled(r)? Perhaps
something like:
if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r) ||
!r->mon.mbm_assign_on_mkdir)
return;
> +
> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> + resctrl_assign_cntr_event(r, NULL, rdtgrp,
> + &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
Switching the namespace like this is confusing to me. rdtgroup_assign_cntrs()
has prefix rdtgroup_ to indicate it operates on a resource group. It is confusing
when it switches namespace to call resctrl_assign_cntr_event() that actually assigns
a specific event to a resource group. I think this will be easier to follow if:
resctrl_assign_cntr_event() -> rdtgroup_assign_cntr_event()
> +
> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
> + resctrl_assign_cntr_event(r, NULL, rdtgrp,
> + &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
> +}
> +
> +/*
> + * rdtgroup_unassign_cntrs() - Unassign the counters associated with MBM events.
> + * Called when a group is deleted.
> + */
> +static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
> +{
> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> +
> + if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r))
> + return;
> +
> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> + resctrl_unassign_cntr_event(r, NULL, rdtgrp,
> + &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
same here, I think this will be easier to follow when namespace is
consistent:
resctrl_unassign_cntr_event() -> rdtgroup_unassign_cntr_event()
Also, the struct rdt_resource parameter should not be needed when
struct mon_evt is provided and resource can be obtained from mon_evt::rid.
> +
> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
> + resctrl_unassign_cntr_event(r, NULL, rdtgrp,
> + &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
> +}
> +
> static int rdt_get_tree(struct fs_context *fc)
> {
> struct rdt_fs_context *ctx = rdt_fc2context(fc);
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 27/32] x86,fs/resctrl: Auto assign/unassign counters on mkdir
2025-06-25 23:25 ` Reinette Chatre
@ 2025-07-01 19:06 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-07-01 19:06 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/25/25 18:25, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> Resctrl provides a user-configurable option mbm_assign_on_mkdir that
>> determines if a counter will automatically be assigned to an RMID, event
>> pair when its associated monitor group is created via mkdir.
>>
>> Enable mbm_assign_on_mkdir by default and automatically assign or unassignq
>> counters when a resctrl group is created or deleted.
>
> This is a bit confusing since I do not think mbm_assign_on_mkdir has *anything*
> to do with unassign of counters. Counters are always (irrespective of mbm_assign_on_mkdir)
> unassigned when a resctrl group is deleted, no?
Yes. That is correct. Changed the text now.
>
> The subject also does not seem accurate since there is no unassign on
> mkdir.
Changed the subject to:
x86,fs/resctrl: Auto assign counters on mkdir and clean up on group removal
>
>>
>> By default, each group requires two counters: one for the MBM total event
>> and one for the MBM local event.
>>
>> If the counters are exhausted, the kernel will log the error message
>> "Unable to allocate counter in domain" in
>> /sys/fs/resctrl/info/last_cmd_status when a new group is created and the
>> counter assignment will fail. However, the creation of a group should not
>> fail due to assignment failures. Users have the flexibility to modify the
>> assignments at a later time.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> ---
>> arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
>> fs/resctrl/rdtgroup.c | 71 ++++++++++++++++++++++++++-
>> 2 files changed, 70 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index ee0aa741cf6c..053f516a8e67 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -429,6 +429,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>> r->mon.mbm_cntr_assignable = true;
>> cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>> r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>> + r->mon.mbm_assign_on_mkdir = true;
>> }
>>
>> r->mon_capable = true;
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index bf5fd46bd455..128a9db339f3 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -2945,6 +2945,55 @@ static void schemata_list_destroy(void)
>> }
>> }
>>
>> +/**
>> + * rdtgroup_assign_cntrs() - Assign counters to MBM events. Called when
>> + * a new group is created.
>> + * If "mbm_event" mode is enabled, counters are automatically assigned.
>
> "counters are automatically assigned" -> "counters should be automatically assigned
> if the "mbm_assign_on_mkdir" is set"?
Sure.
>
>> + * Each group can accommodate two counters: one for the total event and
>> + * one for the local event. Assignments may fail due to the limited number
>> + * of counters. However, it is not necessary to fail the group creation
>> + * and thus no failure is returned. Users have the option to modify the
>> + * counter assignments after the group has been created.
>> + */
>> +static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>> +{
>> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>> +
>> + if (!r->mon_capable)
>> + return;
>> +
>> + if (resctrl_arch_mbm_cntr_assign_enabled(r) && !r->mon.mbm_assign_on_mkdir)
>> + return;
>
> This check is not clear to me. It looks to me as though counter assignment
> will be attempted if !resctrl_arch_mbm_cntr_assign_enabled(r)? Perhaps
> something like:
> if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r) ||
> !r->mon.mbm_assign_on_mkdir)
> return;
>
Yes. Good catch.
>> +
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>> + resctrl_assign_cntr_event(r, NULL, rdtgrp,
>> + &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
>
> Switching the namespace like this is confusing to me. rdtgroup_assign_cntrs()
> has prefix rdtgroup_ to indicate it operates on a resource group. It is confusing
> when it switches namespace to call resctrl_assign_cntr_event() that actually assigns
> a specific event to a resource group. I think this will be easier to follow if:
> resctrl_assign_cntr_event() -> rdtgroup_assign_cntr_event()
Sure.
>
>> +
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>> + resctrl_assign_cntr_event(r, NULL, rdtgrp,
>> + &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>> +}
>> +
>> +/*
>> + * rdtgroup_unassign_cntrs() - Unassign the counters associated with MBM events.
>> + * Called when a group is deleted.
>> + */
>> +static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
>> +{
>> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>> +
>> + if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r))
>> + return;
>> +
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>> + resctrl_unassign_cntr_event(r, NULL, rdtgrp,
>> + &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
>
> same here, I think this will be easier to follow when namespace is
> consistent:
> resctrl_unassign_cntr_event() -> rdtgroup_unassign_cntr_event()
>
Sure.
>
> Also, the struct rdt_resource parameter should not be needed when
> struct mon_evt is provided and resource can be obtained from mon_evt::rid.
>
>> +
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>> + resctrl_unassign_cntr_event(r, NULL, rdtgrp,
>> + &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>> +}
>> +
>> static int rdt_get_tree(struct fs_context *fc)
>> {
>> struct rdt_fs_context *ctx = rdt_fc2context(fc);
>
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 28/32] fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (26 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 27/32] x86,fs/resctrl: Auto assign/unassign counters " Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 23:27 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 29/32] fs/resctrl: Introduce the interface to modify " Babu Moger
` (5 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Introduce the interface to display the assignment states for resctrl group
when "mbm_event" mdoe is enabled.
The list is displayed in the following format:
<Event>:<Domain id>=<Assignment state>
Event: A valid MBM event listed in
/sys/fs/resctrl/info/L3_MON/event_configs directory.
Domain ID: A valid domain ID.
The assignment state can be one of the following:
_ : No counter assigned.
e : Counter assigned exclusively.
Example:
To list the assignment states for the default group
$ cd /sys/fs/resctrl
$ cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=e
mbm_local_bytes:0=e;1=e
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Added missed rdtgroup_kn_lock_live on failure case.
Updated the user doc resctrl.rst to clarify counter assignments.
Updated the changelog.
v13: Changelog update.
Few changes in mbm_L3_assignments_show() after moving the event config to evt_list.
Resolved conflicts caused by the recent FS/ARCH code restructure.
The rdtgroup.c/monitor.c files have been split between the FS and ARCH directories.
v12: New patch:
Assignment interface moved inside the group based the discussion
https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/#t
---
Documentation/filesystems/resctrl.rst | 28 ++++++++++++++
fs/resctrl/monitor.c | 1 +
fs/resctrl/rdtgroup.c | 54 +++++++++++++++++++++++++++
3 files changed, 83 insertions(+)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index f94c7c387416..a232a0b1356c 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -516,6 +516,34 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
/sys/fs/resctrl/info/L3_MON/mon_features changes the input
event.
+"mbm_L3_assignments":
+ Exists when "mbm_event" mode is supported and lists the counter
+ assignment states for the group.
+
+ The assignment list is displayed in the following format:
+
+ <Event>:<Domain ID>=<Assignment state>
+
+ Event: A valid MBM event in the
+ /sys/fs/resctrl/info/L3_MON/event_configs directory.
+
+ Domain ID: A valid domain ID.
+
+ Assignment states:
+
+ _ : No counter assigned.
+
+ e : Counter assigned exclusively.
+
+ Example:
+ To display the counter assignment states for the default group.
+ ::
+
+ # cd /sys/fs/resctrl
+ # cat /sys/fs/resctrl/mbm_L3_assignments
+ mbm_total_bytes:0=e;1=e
+ mbm_local_bytes:0=e;1=e
+
Resource allocation rules
-------------------------
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 1ec2efd50273..618c94cd1ad8 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -959,6 +959,7 @@ int resctrl_mon_resource_init(void)
resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
RFTYPE_RES_CACHE);
+ resctrl_file_fflags_init("mbm_L3_assignments", RFTYPE_MON_BASE);
}
return 0;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 128a9db339f3..18ec65801dbb 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2081,6 +2081,54 @@ static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+ struct rdt_mon_domain *d;
+ struct rdtgroup *rdtgrp;
+ struct mon_evt *mevt;
+ int ret = 0;
+ bool sep;
+
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+ if (!rdtgrp) {
+ ret = -ENOENT;
+ goto out_assign;
+ }
+
+ rdt_last_cmd_clear();
+ if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ rdt_last_cmd_puts("mbm_event mode is not enabled\n");
+ ret = -ENOENT;
+ goto out_assign;
+ }
+
+ for (mevt = &mon_event_all[0]; mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++) {
+ if (!mevt->enabled || !resctrl_is_mbm_event(mevt->evtid))
+ continue;
+
+ sep = false;
+ seq_printf(s, "%s:", mevt->name);
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ if (sep)
+ seq_putc(s, ';');
+
+ if (mbm_cntr_get(r, d, rdtgrp, mevt->evtid) >= 0)
+ seq_printf(s, "%d=e", d->hdr.id);
+ else
+ seq_printf(s, "%d=_", d->hdr.id);
+
+ sep = true;
+ }
+ seq_putc(s, '\n');
+ }
+
+out_assign:
+ rdtgroup_kn_unlock(of->kn);
+
+ return ret;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
@@ -2219,6 +2267,12 @@ static struct rftype res_common_files[] = {
.seq_show = event_filter_show,
.write = event_filter_write,
},
+ {
+ .name = "mbm_L3_assignments",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = mbm_L3_assignments_show,
+ },
{
.name = "mbm_assign_mode",
.mode = 0444,
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 28/32] fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group
2025-06-13 21:05 ` [PATCH v14 28/32] fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group Babu Moger
@ 2025-06-25 23:27 ` Reinette Chatre
2025-07-01 19:48 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 23:27 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> Introduce the interface to display the assignment states for resctrl group
"Introduce the mbm_L3_assignments resctrl file associated with
CTRL_MON an MON resource groups to display the counter assignment
states of the resource group when "mbm_event" counter assignment
mode is enabled."
> when "mbm_event" mdoe is enabled.
"mdoe" -> "counter assignment mode"
>
> The list is displayed in the following format:
> <Event>:<Domain id>=<Assignment state>
Similar to previous note, please add syntax for multiple domains to avoid
it appearing that each domain is on one line.
>
> Event: A valid MBM event listed in
> /sys/fs/resctrl/info/L3_MON/event_configs directory.
>
> Domain ID: A valid domain ID.
>
> The assignment state can be one of the following:
>
> _ : No counter assigned.
>
> e : Counter assigned exclusively.
>
> Example:
> To list the assignment states for the default group
> $ cd /sys/fs/resctrl
> $ cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v14: Added missed rdtgroup_kn_lock_live on failure case.
> Updated the user doc resctrl.rst to clarify counter assignments.
> Updated the changelog.
>
> v13: Changelog update.
> Few changes in mbm_L3_assignments_show() after moving the event config to evt_list.
> Resolved conflicts caused by the recent FS/ARCH code restructure.
> The rdtgroup.c/monitor.c files have been split between the FS and ARCH directories.
>
> v12: New patch:
> Assignment interface moved inside the group based the discussion
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/#t
> ---
> Documentation/filesystems/resctrl.rst | 28 ++++++++++++++
> fs/resctrl/monitor.c | 1 +
> fs/resctrl/rdtgroup.c | 54 +++++++++++++++++++++++++++
> 3 files changed, 83 insertions(+)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index f94c7c387416..a232a0b1356c 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -516,6 +516,34 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
> /sys/fs/resctrl/info/L3_MON/mon_features changes the input
> event.
>
> +"mbm_L3_assignments":
> + Exists when "mbm_event" mode is supported and lists the counter
""mbm_event" mode" -> "mbm_event" counter assignment mode"
> + assignment states for the group.
"for the group" -> "of the group"?
> +
> + The assignment list is displayed in the following format:
> +
> + <Event>:<Domain ID>=<Assignment state>
Same comment about syntax example.
> +
> + Event: A valid MBM event in the
> + /sys/fs/resctrl/info/L3_MON/event_configs directory.
> +
> + Domain ID: A valid domain ID.
> +
> + Assignment states:
> +
> + _ : No counter assigned.
> +
> + e : Counter assigned exclusively.
> +
> + Example:
> + To display the counter assignment states for the default group.
> + ::
> +
> + # cd /sys/fs/resctrl
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=e;1=e
> + mbm_local_bytes:0=e;1=e
> +
> Resource allocation rules
> -------------------------
>
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 1ec2efd50273..618c94cd1ad8 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -959,6 +959,7 @@ int resctrl_mon_resource_init(void)
> resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
> resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
> RFTYPE_RES_CACHE);
> + resctrl_file_fflags_init("mbm_L3_assignments", RFTYPE_MON_BASE);
> }
>
> return 0;
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 128a9db339f3..18ec65801dbb 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2081,6 +2081,54 @@ static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
> return ret ?: nbytes;
> }
>
> +static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file *s, void *v)
> +{
> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> + struct rdt_mon_domain *d;
> + struct rdtgroup *rdtgrp;
> + struct mon_evt *mevt;
> + int ret = 0;
> + bool sep;
> +
> + rdtgrp = rdtgroup_kn_lock_live(of->kn);
> + if (!rdtgrp) {
> + ret = -ENOENT;
> + goto out_assign;
out_assign -> out_unlock
> + }
> +
> + rdt_last_cmd_clear();
> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + rdt_last_cmd_puts("mbm_event mode is not enabled\n");
> + ret = -ENOENT;
> + goto out_assign;
> + }
> +
> + for (mevt = &mon_event_all[0]; mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++) {
> + if (!mevt->enabled || !resctrl_is_mbm_event(mevt->evtid))
> + continue;
(use macro and mon_evt::rid)
> +
> + sep = false;
> + seq_printf(s, "%s:", mevt->name);
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + if (sep)
> + seq_putc(s, ';');
> +
> + if (mbm_cntr_get(r, d, rdtgrp, mevt->evtid) >= 0)
> + seq_printf(s, "%d=e", d->hdr.id);
> + else
> + seq_printf(s, "%d=_", d->hdr.id);
> +
> + sep = true;
> + }
> + seq_putc(s, '\n');
> + }
> +
> +out_assign:
> + rdtgroup_kn_unlock(of->kn);
> +
> + return ret;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
> @@ -2219,6 +2267,12 @@ static struct rftype res_common_files[] = {
> .seq_show = event_filter_show,
> .write = event_filter_write,
> },
> + {
> + .name = "mbm_L3_assignments",
> + .mode = 0444,
> + .kf_ops = &rdtgroup_kf_single_ops,
> + .seq_show = mbm_L3_assignments_show,
> + },
> {
> .name = "mbm_assign_mode",
> .mode = 0444,
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 28/32] fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group
2025-06-25 23:27 ` Reinette Chatre
@ 2025-07-01 19:48 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-07-01 19:48 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/25/25 18:27, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> Introduce the interface to display the assignment states for resctrl group
>
> "Introduce the mbm_L3_assignments resctrl file associated with
> CTRL_MON an MON resource groups to display the counter assignment
> states of the resource group when "mbm_event" counter assignment
> mode is enabled."
Sure.
>
>> when "mbm_event" mdoe is enabled.
>
> "mdoe" -> "counter assignment mode"
>
Sure.
>>
>> The list is displayed in the following format:
>> <Event>:<Domain id>=<Assignment state>
>
> Similar to previous note, please add syntax for multiple domains to avoid
> it appearing that each domain is on one line.
The list is displayed in the following format:
<Event>:<Domain id>=<Assignment state>;<Domain id>=<Assignment state>
>
>>
>> Event: A valid MBM event listed in
>> /sys/fs/resctrl/info/L3_MON/event_configs directory.
>>
>> Domain ID: A valid domain ID.
>>
>> The assignment state can be one of the following:
>>
>> _ : No counter assigned.
>>
>> e : Counter assigned exclusively.
>>
>> Example:
>> To list the assignment states for the default group
>> $ cd /sys/fs/resctrl
>> $ cat /sys/fs/resctrl/mbm_L3_assignments
>> mbm_total_bytes:0=e;1=e
>> mbm_local_bytes:0=e;1=e
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v14: Added missed rdtgroup_kn_lock_live on failure case.
>> Updated the user doc resctrl.rst to clarify counter assignments.
>> Updated the changelog.
>>
>> v13: Changelog update.
>> Few changes in mbm_L3_assignments_show() after moving the event config to evt_list.
>> Resolved conflicts caused by the recent FS/ARCH code restructure.
>> The rdtgroup.c/monitor.c files have been split between the FS and ARCH directories.
>>
>> v12: New patch:
>> Assignment interface moved inside the group based the discussion
>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/#t
>> ---
>> Documentation/filesystems/resctrl.rst | 28 ++++++++++++++
>> fs/resctrl/monitor.c | 1 +
>> fs/resctrl/rdtgroup.c | 54 +++++++++++++++++++++++++++
>> 3 files changed, 83 insertions(+)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index f94c7c387416..a232a0b1356c 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -516,6 +516,34 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
>> /sys/fs/resctrl/info/L3_MON/mon_features changes the input
>> event.
>>
>> +"mbm_L3_assignments":
>> + Exists when "mbm_event" mode is supported and lists the counter
>
> ""mbm_event" mode" -> "mbm_event" counter assignment mode"
>
>> + assignment states for the group.
>
> "for the group" -> "of the group"?
>
Sure.
>> +
>> + The assignment list is displayed in the following format:
>> +
>> + <Event>:<Domain ID>=<Assignment state>
>
> Same comment about syntax example.
>
Sure.
>> +
>> + Event: A valid MBM event in the
>> + /sys/fs/resctrl/info/L3_MON/event_configs directory.
>> +
>> + Domain ID: A valid domain ID.
>> +
>> + Assignment states:
>> +
>> + _ : No counter assigned.
>> +
>> + e : Counter assigned exclusively.
>> +
>> + Example:
>> + To display the counter assignment states for the default group.
>> + ::
>> +
>> + # cd /sys/fs/resctrl
>> + # cat /sys/fs/resctrl/mbm_L3_assignments
>> + mbm_total_bytes:0=e;1=e
>> + mbm_local_bytes:0=e;1=e
>> +
>> Resource allocation rules
>> -------------------------
>>
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 1ec2efd50273..618c94cd1ad8 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -959,6 +959,7 @@ int resctrl_mon_resource_init(void)
>> resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
>> resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
>> RFTYPE_RES_CACHE);
>> + resctrl_file_fflags_init("mbm_L3_assignments", RFTYPE_MON_BASE);
>> }
>>
>> return 0;
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 128a9db339f3..18ec65801dbb 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -2081,6 +2081,54 @@ static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
>> return ret ?: nbytes;
>> }
>>
>> +static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file *s, void *v)
>> +{
>> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>> + struct rdt_mon_domain *d;
>> + struct rdtgroup *rdtgrp;
>> + struct mon_evt *mevt;
>> + int ret = 0;
>> + bool sep;
>> +
>> + rdtgrp = rdtgroup_kn_lock_live(of->kn);
>> + if (!rdtgrp) {
>> + ret = -ENOENT;
>> + goto out_assign;
>
> out_assign -> out_unlock
>
Sure.
>> + }
>> +
>> + rdt_last_cmd_clear();
>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> + rdt_last_cmd_puts("mbm_event mode is not enabled\n");
>> + ret = -ENOENT;
>> + goto out_assign;
>> + }
>> +
>> + for (mevt = &mon_event_all[0]; mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++) {
>> + if (!mevt->enabled || !resctrl_is_mbm_event(mevt->evtid))
>> + continue;
>
> (use macro and mon_evt::rid)
>
Sure.
>> +
>> + sep = false;
>> + seq_printf(s, "%s:", mevt->name);
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + if (sep)
>> + seq_putc(s, ';');
>> +
>> + if (mbm_cntr_get(r, d, rdtgrp, mevt->evtid) >= 0)
>> + seq_printf(s, "%d=e", d->hdr.id);
>> + else
>> + seq_printf(s, "%d=_", d->hdr.id);
>> +
>> + sep = true;
>> + }
>> + seq_putc(s, '\n');
>> + }
>> +
>> +out_assign:
>> + rdtgroup_kn_unlock(of->kn);
>> +
>> + return ret;
>> +}
>> +
>> /* rdtgroup information files for one cache resource. */
>> static struct rftype res_common_files[] = {
>> {
>> @@ -2219,6 +2267,12 @@ static struct rftype res_common_files[] = {
>> .seq_show = event_filter_show,
>> .write = event_filter_write,
>> },
>> + {
>> + .name = "mbm_L3_assignments",
>> + .mode = 0444,
>> + .kf_ops = &rdtgroup_kf_single_ops,
>> + .seq_show = mbm_L3_assignments_show,
>> + },
>> {
>> .name = "mbm_assign_mode",
>> .mode = 0444,
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 29/32] fs/resctrl: Introduce the interface to modify assignments in a group
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (27 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 28/32] fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 23:38 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled Babu Moger
` (4 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Introduce the interface to modify assignments within a group when
"mbm_event" mode is enabled.
The assignment modifications are done in the following format:
<Event>:<Domain id>=<Assignment state>
Event: A valid MBM event in the
/sys/fs/resctrl/info/L3_MON/event_configs directory.
Domain ID: A valid domain ID. When writing, '*' applies the changes
to all domains.
Assignment states:
_ : Unassign the counter.
e : Assign the counter exclusively.
Examples:
$ cd /sys/fs/resctrl
$ cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=e
mbm_local_bytes:0=e;1=e
To unassign the counter associated with the mbm_total_bytes event on
domain 0:
$ echo "mbm_total_bytes:0=_" > mbm_L3_assignments
$ cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=_;1=e
mbm_local_bytes:0=e;1=e
To unassign the counter associated with the mbm_total_bytes event on
all the domains:
$ echo "mbm_total_bytes:*=_" > mbm_L3_assignments
$ cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=_;1=_
mbm_local_bytes:0=e;1=e
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Fixed the problem reported by Peter.
Updated the changelog.
Updated the user doc resctrl.rst.
Added example section on how to use resctrl with mbm_assign_mode.
v13: Few changes in mbm_L3_assignments_write() after moving the event config to evt_list.
Resolved conflicts caused by the recent FS/ARCH code restructure.
v12: New patch:
Assignment interface moved inside the group based the discussion
https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/#t
---
Documentation/filesystems/resctrl.rst | 146 +++++++++++++++++++++++-
fs/resctrl/internal.h | 9 ++
fs/resctrl/rdtgroup.c | 157 +++++++++++++++++++++++++-
3 files changed, 310 insertions(+), 2 deletions(-)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index a232a0b1356c..cd82c2966ed7 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -527,7 +527,8 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
Event: A valid MBM event in the
/sys/fs/resctrl/info/L3_MON/event_configs directory.
- Domain ID: A valid domain ID.
+ Domain ID: A valid domain ID. When writing, '*' applies the changes
+ to all domains.
Assignment states:
@@ -544,6 +545,34 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
mbm_total_bytes:0=e;1=e
mbm_local_bytes:0=e;1=e
+ Assignments can be modified by writing to the interface.
+
+ Example:
+ To unassign the counter associated with the mbm_total_bytes event on domain 0:
+ ::
+
+ # echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
+ # cat /sys/fs/resctrl/mbm_L3_assignments
+ mbm_total_bytes:0=_;1=e
+ mbm_local_bytes:0=e;1=e
+
+ To unassign the counter associated with the mbm_total_bytes event on all the domains:
+ ::
+
+ # echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
+ # cat /sys/fs/resctrl/mbm_L3_assignments
+ mbm_total_bytes:0=_;1=_
+ mbm_local_bytes:0=e;1=e
+
+ To assign the counter associated with the mbm_total_bytes event on all domains in
+ exclusive mode:
+ ::
+
+ # echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
+ # cat /sys/fs/resctrl/mbm_L3_assignments
+ mbm_total_bytes:0=e;1=e
+ mbm_local_bytes:0=e;1=e
+
Resource allocation rules
-------------------------
@@ -1579,6 +1608,121 @@ View the llc occupancy snapshot::
# cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
11234000
+
+Examples on working with mbm_assign_mode
+========================================
+
+a. Check if MBM assign support is available
+::
+
+ #mount -t resctrl resctrl /sys/fs/resctrl/
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+ [mbm_event]
+ default
+
+mbm_event feature is detected and it is enabled.
+
+b. Check how many assignable counters are supported.
+::
+
+ # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
+ 0=32;1=32
+
+c. Check how many assignable counters are available for assignment in each domain.
+::
+
+ # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
+ 0=30;1=30
+
+d. To list the default group's assign states:
+::
+
+ # cat /sys/fs/resctrl/mbm_L3_assignments
+ mbm_total_bytes:0=e;1=e
+ mbm_local_bytes:0=e;1=e
+
+e. To unassign the counter associated with the mbm_total_bytes event on domain 0:
+::
+
+ # echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
+ # cat /sys/fs/resctrl/mbm_L3_assignments
+ mbm_total_bytes:0=_;1=e
+ mbm_local_bytes:0=e;1=e
+
+f. To unassign the counter associated with the mbm_total_bytes event on all domains:
+::
+
+ # echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
+ # cat /sys/fs/resctrl/mbm_L3_assignment
+ mbm_total_bytes:0=_;1=_
+ mbm_local_bytes:0=e;1=e
+
+g. To assign a counter associated with the mbm_total_bytes event on all domains i
+nexclusive mode:
+::
+
+ # echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
+ # cat /sys/fs/resctrl/mbm_L3_assignments
+ mbm_total_bytes:0=e;1=e
+ mbm_local_bytes:0=e;1=e
+
+h. Read the events mbm_total_bytes and mbm_local_bytes of the default group. There is
+no change in reading the events with the assignment. If the event is unassigned when
+reading, then the read will come back as "Unassigned".
+::
+
+ # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
+ 779247936
+ # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
+ 765207488
+
+i. Check the default event configurations.
+::
+
+ # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
+ local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
+ local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
+
+ # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
+ local_reads,local_non_temporal_writes,local_reads_slow_memory
+
+j. Change the event configuration for mbm_local_bytes.
+::
+
+ # echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
+ /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
+
+ # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
+ local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads
+
+This will update all (across all domains of all monitor groups) counter assignments
+associated with the mbm_local_bytes event.
+
+k. Now read the local event again. The first read may come back with "Unavailable"
+status. The subsequent read of mbm_local_bytes will display only the read events.
+::
+
+ # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
+ Unavailable
+ # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
+ 314101
+
+l. Users have the option to go back to 'default' mbm_assign_mode if required. This can be
+done using the following command. Note that switching the mbm_assign_mode will reset all
+the MBM counters (and thus all MBM events) of all the resctrl groups.
+::
+
+ # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+ mbm_event
+ [default]
+
+m. Unmount the resctrl
+::
+
+ #umount /sys/fs/resctrl/
+
Intel RDT Errata
================
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index ed0e3b695ad5..14d99c723ea5 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -51,6 +51,15 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
return container_of(kfc, struct rdt_fs_context, kfc);
}
+/*
+ * Assignment types for the monitor modes
+ */
+enum {
+ ASSIGN_NONE = 0,
+ ASSIGN_EXCLUSIVE,
+ ASSIGN_INVALID,
+};
+
/**
* struct mon_evt - Description of a monitor event
* @evtid: event id
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 18ec65801dbb..92bb8f3adfae 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2129,6 +2129,160 @@ static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file
return ret;
}
+/**
+ * mbm_get_mon_event_by_name() - Return the mon_evt entry for the matching
+ * event name.
+ */
+static struct mon_evt *mbm_get_mon_event_by_name(struct rdt_resource *r,
+ char *name)
+{
+ struct mon_evt *mevt;
+
+ for (mevt = &mon_event_all[0]; mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++) {
+ if (mevt->enabled && !strcmp(mevt->name, name))
+ return mevt;
+ }
+
+ return NULL;
+}
+
+static unsigned int resctrl_get_assign_state(char *assign)
+{
+ if (!assign || strlen(assign) != 1)
+ return ASSIGN_INVALID;
+
+ switch (*assign) {
+ case 'e':
+ return ASSIGN_EXCLUSIVE;
+ case '_':
+ return ASSIGN_NONE;
+ default:
+ return ASSIGN_INVALID;
+ }
+}
+
+static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+ char *event, char *tok)
+{
+ struct rdt_mon_domain *d;
+ unsigned long dom_id = 0;
+ char *dom_str, *id_str;
+ struct mon_evt *mevt;
+ int assign_state;
+ char domain[10];
+ bool found;
+ int ret;
+
+ mevt = mbm_get_mon_event_by_name(r, event);
+ if (!mevt) {
+ rdt_last_cmd_printf("Invalid event %s\n", event);
+ return -ENOENT;
+ }
+
+next:
+ if (!tok || tok[0] == '\0')
+ return 0;
+
+ /* Start processing the strings for each domain */
+ dom_str = strim(strsep(&tok, ";"));
+
+ id_str = strsep(&dom_str, "=");
+
+ /* Check for domain id '*' which means all domains */
+ if (id_str && *id_str == '*') {
+ d = NULL;
+ goto check_state;
+ } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
+ rdt_last_cmd_puts("Missing domain id\n");
+ return -EINVAL;
+ }
+
+ /* Verify if the dom_id is valid */
+ found = false;
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ if (d->hdr.id == dom_id) {
+ found = true;
+ break;
+ }
+ }
+
+ if (!found) {
+ rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
+ return -EINVAL;
+ }
+
+check_state:
+ assign_state = resctrl_get_assign_state(dom_str);
+
+ switch (assign_state) {
+ case ASSIGN_NONE:
+ ret = resctrl_unassign_cntr_event(r, d, rdtgrp, mevt);
+ break;
+ case ASSIGN_EXCLUSIVE:
+ ret = resctrl_assign_cntr_event(r, d, rdtgrp, mevt);
+ break;
+ case ASSIGN_INVALID:
+ ret = -EINVAL;
+ }
+
+ if (ret)
+ goto out_fail;
+
+ goto next;
+
+out_fail:
+ sprintf(domain, d ? "%ld" : "*", dom_id);
+
+ rdt_last_cmd_printf("Assign operation '%s:%s=%s' failed\n", event, domain, dom_str);
+
+ return ret;
+}
+
+static ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf,
+ size_t nbytes, loff_t off)
+{
+ struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+ struct rdtgroup *rdtgrp;
+ char *token, *event;
+ int ret = 0;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+
+ buf[nbytes - 1] = '\0';
+
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+ if (!rdtgrp) {
+ rdtgroup_kn_unlock(of->kn);
+ return -ENOENT;
+ }
+ rdt_last_cmd_clear();
+
+ if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ rdt_last_cmd_puts("mbm_event mode is not enabled\n");
+ rdtgroup_kn_unlock(of->kn);
+ return -EINVAL;
+ }
+
+ while ((token = strsep(&buf, "\n")) != NULL) {
+ /*
+ * The write command follows the following format:
+ * “<Event>:<Domain ID>=<Assignment state>”
+ * Extract the event name first.
+ */
+ event = strsep(&token, ":");
+
+ ret = resctrl_process_assign(r, rdtgrp, event, token);
+ if (ret)
+ break;
+ }
+
+ rdtgroup_kn_unlock(of->kn);
+
+ return ret ?: nbytes;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
@@ -2269,9 +2423,10 @@ static struct rftype res_common_files[] = {
},
{
.name = "mbm_L3_assignments",
- .mode = 0444,
+ .mode = 0644,
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = mbm_L3_assignments_show,
+ .write = mbm_L3_assignments_write,
},
{
.name = "mbm_assign_mode",
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 29/32] fs/resctrl: Introduce the interface to modify assignments in a group
2025-06-13 21:05 ` [PATCH v14 29/32] fs/resctrl: Introduce the interface to modify " Babu Moger
@ 2025-06-25 23:38 ` Reinette Chatre
2025-07-02 2:18 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 23:38 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> Introduce the interface to modify assignments within a group when
nit: This cannot be an introduction since the "interface" (resctrl file)
already exists at this point so this patch enables it to support modifications.
Perhaps:
"Enable the mbm_l3_assignments resctrl file to be used to modify counter
assignments of CTRL_MON and MON groups when the "mbm_event" counter
assignment mode is enabled." (Please feel free to improve)
> "mbm_event" mode is enabled.
>
> The assignment modifications are done in the following format:
> <Event>:<Domain id>=<Assignment state>
>
> Event: A valid MBM event in the
> /sys/fs/resctrl/info/L3_MON/event_configs directory.
>
> Domain ID: A valid domain ID. When writing, '*' applies the changes
> to all domains.
>
> Assignment states:
>
> _ : Unassign the counter.
>
> e : Assign the counter exclusively.
>
> Examples:
>
> $ cd /sys/fs/resctrl
> $ cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
>
> To unassign the counter associated with the mbm_total_bytes event on
> domain 0:
>
> $ echo "mbm_total_bytes:0=_" > mbm_L3_assignments
> $ cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=_;1=e
> mbm_local_bytes:0=e;1=e
>
> To unassign the counter associated with the mbm_total_bytes event on
> all the domains:
>
> $ echo "mbm_total_bytes:*=_" > mbm_L3_assignments
> $ cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=_;1=_
> mbm_local_bytes:0=e;1=e
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> Documentation/filesystems/resctrl.rst | 146 +++++++++++++++++++++++-
> fs/resctrl/internal.h | 9 ++
> fs/resctrl/rdtgroup.c | 157 +++++++++++++++++++++++++-
> 3 files changed, 310 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index a232a0b1356c..cd82c2966ed7 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -527,7 +527,8 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
> Event: A valid MBM event in the
> /sys/fs/resctrl/info/L3_MON/event_configs directory.
>
> - Domain ID: A valid domain ID.
> + Domain ID: A valid domain ID. When writing, '*' applies the changes
> + to all domains.
>
> Assignment states:
>
> @@ -544,6 +545,34 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
>
> + Assignments can be modified by writing to the interface.
> +
> + Example:
> + To unassign the counter associated with the mbm_total_bytes event on domain 0:
> + ::
> +
> + # echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=_;1=e
> + mbm_local_bytes:0=e;1=e
> +
> + To unassign the counter associated with the mbm_total_bytes event on all the domains:
> + ::
> +
> + # echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=_;1=_
> + mbm_local_bytes:0=e;1=e
> +
> + To assign the counter associated with the mbm_total_bytes event on all domains in
> + exclusive mode:
> + ::
> +
> + # echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=e;1=e
> + mbm_local_bytes:0=e;1=e
> +
> Resource allocation rules
> -------------------------
>
> @@ -1579,6 +1608,121 @@ View the llc occupancy snapshot::
> # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
> 11234000
>
> +
> +Examples on working with mbm_assign_mode
> +========================================
> +
> +a. Check if MBM assign support is available
"MBM assign support"? I do not think this term has been used so far.
> +::
> +
> + #mount -t resctrl resctrl /sys/fs/resctrl/
> +
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> + [mbm_event]
> + default
> +
> +mbm_event feature is detected and it is enabled.
> +
> +b. Check how many assignable counters are supported.
> +::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> + 0=32;1=32
> +
> +c. Check how many assignable counters are available for assignment in each domain.
> +::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
> + 0=30;1=30
> +
> +d. To list the default group's assign states:
> +::
> +
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=e;1=e
> + mbm_local_bytes:0=e;1=e
> +
> +e. To unassign the counter associated with the mbm_total_bytes event on domain 0:
> +::
> +
> + # echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=_;1=e
> + mbm_local_bytes:0=e;1=e
> +
> +f. To unassign the counter associated with the mbm_total_bytes event on all domains:
> +::
> +
> + # echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignment
> + mbm_total_bytes:0=_;1=_
> + mbm_local_bytes:0=e;1=e
> +
> +g. To assign a counter associated with the mbm_total_bytes event on all domains i
> +nexclusive mode:
"in exclusive"
> +::
> +
> + # echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=e;1=e
> + mbm_local_bytes:0=e;1=e
> +
> +h. Read the events mbm_total_bytes and mbm_local_bytes of the default group. There is
> +no change in reading the events with the assignment. If the event is unassigned when
> +reading, then the read will come back as "Unassigned".
> +::
> +
> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> + 779247936
> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> + 765207488
> +
> +i. Check the default event configurations.
> +::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
> + local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
> + local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
> +
> + # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
> + local_reads,local_non_temporal_writes,local_reads_slow_memory
> +
> +j. Change the event configuration for mbm_local_bytes.
> +::
> +
> + # echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
> + /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> +
> + # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> + local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads
Please let output match code wrt spacing.
> +
> +This will update all (across all domains of all monitor groups) counter assignments
> +associated with the mbm_local_bytes event.
> +
> +k. Now read the local event again. The first read may come back with "Unavailable"
> +status. The subsequent read of mbm_local_bytes will display only the read events.
(note comment about "read events" on duplicate text)
> +::
> +
> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> + Unavailable
> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> + 314101
> +
> +l. Users have the option to go back to 'default' mbm_assign_mode if required. This can be
> +done using the following command. Note that switching the mbm_assign_mode will reset all
> +the MBM counters (and thus all MBM events) of all the resctrl groups.
hmmm ... earlier documentation about mbm_assign_mode changes was careful to use
"may reset", and here is it switched to "will reset". I am still cautious to make any
strong commitments about resctrl behavior in user documentation.
> +::
> +
> + # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> + mbm_event
> + [default]
> +
> +m. Unmount the resctrl
> +::
> +
> + #umount /sys/fs/resctrl/
> +
> Intel RDT Errata
> ================
>
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index ed0e3b695ad5..14d99c723ea5 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -51,6 +51,15 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
> return container_of(kfc, struct rdt_fs_context, kfc);
> }
>
> +/*
> + * Assignment types for the monitor modes
> + */
> +enum {
> + ASSIGN_NONE = 0,
> + ASSIGN_EXCLUSIVE,
> + ASSIGN_INVALID,
> +};
I do not think this is necessary (more below)
> +
> /**
> * struct mon_evt - Description of a monitor event
> * @evtid: event id
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 18ec65801dbb..92bb8f3adfae 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2129,6 +2129,160 @@ static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file
> return ret;
> }
>
> +/**
> + * mbm_get_mon_event_by_name() - Return the mon_evt entry for the matching
> + * event name.
> + */
> +static struct mon_evt *mbm_get_mon_event_by_name(struct rdt_resource *r,
struct rdt_resource parameter seems to be unused ... but should be used to match
with mon_evt::rid?
> + char *name)
> +{
> + struct mon_evt *mevt;
> +
> + for (mevt = &mon_event_all[0]; mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++) {
(use macro)
> + if (mevt->enabled && !strcmp(mevt->name, name))
> + return mevt;
> + }
> +
> + return NULL;
> +}
This looks to be a utility that should be close to the data structure in
fs/resctrl/monitor.c. Please check if you can move monitoring code
to fs/resctrl/monitor.c.
> +
> +static unsigned int resctrl_get_assign_state(char *assign)
> +{
> + if (!assign || strlen(assign) != 1)
> + return ASSIGN_INVALID;
> +
> + switch (*assign) {
> + case 'e':
> + return ASSIGN_EXCLUSIVE;
I think this can be simplified by calling resctrl_assign_cntr_event()
(rdtgroup_assign_cntr_event()) directly.
> + case '_':
> + return ASSIGN_NONE;
Here resctrl_unassign_cntr_event() (rdtgroup_unassign_cntr_event())
can be called directly.
> + default:
> + return ASSIGN_INVALID;
> + }
With assign/unassign done the function can return proper error
> +}
> +
> +static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> + char *event, char *tok)
> +{
> + struct rdt_mon_domain *d;
> + unsigned long dom_id = 0;
> + char *dom_str, *id_str;
> + struct mon_evt *mevt;
> + int assign_state;
> + char domain[10];
> + bool found;
> + int ret;
> +
> + mevt = mbm_get_mon_event_by_name(r, event);
> + if (!mevt) {
> + rdt_last_cmd_printf("Invalid event %s\n", event);
> + return -ENOENT;
> + }
> +
> +next:
> + if (!tok || tok[0] == '\0')
> + return 0;
> +
> + /* Start processing the strings for each domain */
> + dom_str = strim(strsep(&tok, ";"));
> +
> + id_str = strsep(&dom_str, "=");
> +
> + /* Check for domain id '*' which means all domains */
> + if (id_str && *id_str == '*') {
> + d = NULL;
> + goto check_state;
Instead of "goto check_state" resctrl_get_assign_state() (with
more appropriate name after changes) can be called directly, its
return handled, possibly printing to last_cmd_status without needing
any sprintf() tricks, and exit from resctrl_process_assign().
Apart from simplifying the code an additional benefit is to avoid
(ab)use case where user/bot may write:
# echo "mbm_total_bytes:*=_;*=e;*=_" > /sys/fs/resctrl/mbm_L3_assignments
> + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
> + rdt_last_cmd_puts("Missing domain id\n");
> + return -EINVAL;
> + }
> +
> + /* Verify if the dom_id is valid */
> + found = false;
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + if (d->hdr.id == dom_id) {
Similarly, resctrl_get_assign_state() (new name TBD) can be
called directly and "found" can be dropped.
> + found = true;
> + break;
> + }
> + }
> +
> + if (!found) {
> + rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
> + return -EINVAL;
> + }
> +
> +check_state:
> + assign_state = resctrl_get_assign_state(dom_str);
> +
> + switch (assign_state) {
> + case ASSIGN_NONE:
> + ret = resctrl_unassign_cntr_event(r, d, rdtgrp, mevt);
> + break;
> + case ASSIGN_EXCLUSIVE:
> + ret = resctrl_assign_cntr_event(r, d, rdtgrp, mevt);
> + break;
> + case ASSIGN_INVALID:
> + ret = -EINVAL;
> + }
> +
> + if (ret)
> + goto out_fail;
> +
> + goto next;
> +
> +out_fail:
> + sprintf(domain, d ? "%ld" : "*", dom_id);
> +
> + rdt_last_cmd_printf("Assign operation '%s:%s=%s' failed\n", event, domain, dom_str);
> +
> + return ret;
> +}
> +
> +static ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf,
> + size_t nbytes, loff_t off)
> +{
> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> + struct rdtgroup *rdtgrp;
> + char *token, *event;
> + int ret = 0;
> +
> + /* Valid input requires a trailing newline */
> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
> + return -EINVAL;
> +
> + buf[nbytes - 1] = '\0';
> +
> + rdtgrp = rdtgroup_kn_lock_live(of->kn);
> + if (!rdtgrp) {
> + rdtgroup_kn_unlock(of->kn);
> + return -ENOENT;
> + }
> + rdt_last_cmd_clear();
> +
> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + rdt_last_cmd_puts("mbm_event mode is not enabled\n");
> + rdtgroup_kn_unlock(of->kn);
> + return -EINVAL;
> + }
> +
> + while ((token = strsep(&buf, "\n")) != NULL) {
> + /*
> + * The write command follows the following format:
> + * “<Event>:<Domain ID>=<Assignment state>”
> + * Extract the event name first.
> + */
> + event = strsep(&token, ":");
> +
> + ret = resctrl_process_assign(r, rdtgrp, event, token);
> + if (ret)
> + break;
> + }
> +
> + rdtgroup_kn_unlock(of->kn);
> +
> + return ret ?: nbytes;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
> @@ -2269,9 +2423,10 @@ static struct rftype res_common_files[] = {
> },
> {
> .name = "mbm_L3_assignments",
> - .mode = 0444,
> + .mode = 0644,
> .kf_ops = &rdtgroup_kf_single_ops,
> .seq_show = mbm_L3_assignments_show,
> + .write = mbm_L3_assignments_write,
> },
> {
> .name = "mbm_assign_mode",
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 29/32] fs/resctrl: Introduce the interface to modify assignments in a group
2025-06-25 23:38 ` Reinette Chatre
@ 2025-07-02 2:18 ` Moger, Babu
2025-07-02 2:56 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-07-02 2:18 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/25/2025 6:38 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> Introduce the interface to modify assignments within a group when
>
> nit: This cannot be an introduction since the "interface" (resctrl file)
> already exists at this point so this patch enables it to support modifications.
> Perhaps:
> "Enable the mbm_l3_assignments resctrl file to be used to modify counter
> assignments of CTRL_MON and MON groups when the "mbm_event" counter
> assignment mode is enabled." (Please feel free to improve)
Looks good.
>
>
>> "mbm_event" mode is enabled.
>
>
>
>>
>> The assignment modifications are done in the following format:
>> <Event>:<Domain id>=<Assignment state>
>>
>> Event: A valid MBM event in the
>> /sys/fs/resctrl/info/L3_MON/event_configs directory.
>>
>> Domain ID: A valid domain ID. When writing, '*' applies the changes
>> to all domains.
>>
>> Assignment states:
>>
>> _ : Unassign the counter.
>>
>> e : Assign the counter exclusively.
>>
>> Examples:
>>
>> $ cd /sys/fs/resctrl
>> $ cat /sys/fs/resctrl/mbm_L3_assignments
>> mbm_total_bytes:0=e;1=e
>> mbm_local_bytes:0=e;1=e
>>
>> To unassign the counter associated with the mbm_total_bytes event on
>> domain 0:
>>
>> $ echo "mbm_total_bytes:0=_" > mbm_L3_assignments
>> $ cat /sys/fs/resctrl/mbm_L3_assignments
>> mbm_total_bytes:0=_;1=e
>> mbm_local_bytes:0=e;1=e
>>
>> To unassign the counter associated with the mbm_total_bytes event on
>> all the domains:
>>
>> $ echo "mbm_total_bytes:*=_" > mbm_L3_assignments
>> $ cat /sys/fs/resctrl/mbm_L3_assignments
>> mbm_total_bytes:0=_;1=_
>> mbm_local_bytes:0=e;1=e
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> ---
>> Documentation/filesystems/resctrl.rst | 146 +++++++++++++++++++++++-
>> fs/resctrl/internal.h | 9 ++
>> fs/resctrl/rdtgroup.c | 157 +++++++++++++++++++++++++-
>> 3 files changed, 310 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index a232a0b1356c..cd82c2966ed7 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -527,7 +527,8 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
>> Event: A valid MBM event in the
>> /sys/fs/resctrl/info/L3_MON/event_configs directory.
>>
>> - Domain ID: A valid domain ID.
>> + Domain ID: A valid domain ID. When writing, '*' applies the changes
>> + to all domains.
>>
>> Assignment states:
>>
>> @@ -544,6 +545,34 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
>> mbm_total_bytes:0=e;1=e
>> mbm_local_bytes:0=e;1=e
>>
>> + Assignments can be modified by writing to the interface.
>> +
>> + Example:
>> + To unassign the counter associated with the mbm_total_bytes event on domain 0:
>> + ::
>> +
>> + # echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
>> + # cat /sys/fs/resctrl/mbm_L3_assignments
>> + mbm_total_bytes:0=_;1=e
>> + mbm_local_bytes:0=e;1=e
>> +
>> + To unassign the counter associated with the mbm_total_bytes event on all the domains:
>> + ::
>> +
>> + # echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
>> + # cat /sys/fs/resctrl/mbm_L3_assignments
>> + mbm_total_bytes:0=_;1=_
>> + mbm_local_bytes:0=e;1=e
>> +
>> + To assign the counter associated with the mbm_total_bytes event on all domains in
>> + exclusive mode:
>> + ::
>> +
>> + # echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
>> + # cat /sys/fs/resctrl/mbm_L3_assignments
>> + mbm_total_bytes:0=e;1=e
>> + mbm_local_bytes:0=e;1=e
>> +
>> Resource allocation rules
>> -------------------------
>>
>> @@ -1579,6 +1608,121 @@ View the llc occupancy snapshot::
>> # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
>> 11234000
>>
>> +
>> +Examples on working with mbm_assign_mode
>> +========================================
>> +
>> +a. Check if MBM assign support is available
>
> "MBM assign support"? I do not think this term has been used so far.
>
Changed it to
Check if MBM counter assignment mode is supported.
>> +::
>> +
>> + #mount -t resctrl resctrl /sys/fs/resctrl/
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> + [mbm_event]
>> + default
>> +
>> +mbm_event feature is detected and it is enabled.
>> +
>> +b. Check how many assignable counters are supported.
>> +::
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>> + 0=32;1=32
>> +
>> +c. Check how many assignable counters are available for assignment in each domain.
>> +::
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>> + 0=30;1=30
>> +
>> +d. To list the default group's assign states:
>> +::
>> +
>> + # cat /sys/fs/resctrl/mbm_L3_assignments
>> + mbm_total_bytes:0=e;1=e
>> + mbm_local_bytes:0=e;1=e
>> +
>> +e. To unassign the counter associated with the mbm_total_bytes event on domain 0:
>> +::
>> +
>> + # echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
>> + # cat /sys/fs/resctrl/mbm_L3_assignments
>> + mbm_total_bytes:0=_;1=e
>> + mbm_local_bytes:0=e;1=e
>> +
>> +f. To unassign the counter associated with the mbm_total_bytes event on all domains:
>> +::
>> +
>> + # echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
>> + # cat /sys/fs/resctrl/mbm_L3_assignment
>> + mbm_total_bytes:0=_;1=_
>> + mbm_local_bytes:0=e;1=e
>> +
>> +g. To assign a counter associated with the mbm_total_bytes event on all domains i
>> +nexclusive mode:
>
> "in exclusive"
>
sure.
>> +::
>> +
>> + # echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
>> + # cat /sys/fs/resctrl/mbm_L3_assignments
>> + mbm_total_bytes:0=e;1=e
>> + mbm_local_bytes:0=e;1=e
>> +
>> +h. Read the events mbm_total_bytes and mbm_local_bytes of the default group. There is
>> +no change in reading the events with the assignment. If the event is unassigned when
>> +reading, then the read will come back as "Unassigned".
>> +::
>> +
>> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> + 779247936
>> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> + 765207488
>> +
>> +i. Check the default event configurations.
>> +::
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
>> + local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
>> + local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
>> + local_reads,local_non_temporal_writes,local_reads_slow_memory
>> +
>> +j. Change the event configuration for mbm_local_bytes.
>> +::
>> +
>> + # echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
>> + /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> + local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads
>
> Please let output match code wrt spacing.
>
Sure.
>> +
>> +This will update all (across all domains of all monitor groups) counter assignments
>> +associated with the mbm_local_bytes event.
>> +
>> +k. Now read the local event again. The first read may come back with "Unavailable"
>> +status. The subsequent read of mbm_local_bytes will display only the read events.
>
> (note comment about "read events" on duplicate text)
Changed to
k. Now read the local event again. The first read may come back with
"Unavailable" status. The subsequent read of mbm_local_bytes will
display the current value.
>
>> +::
>> +
>> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> + Unavailable
>> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> + 314101
>> +
>> +l. Users have the option to go back to 'default' mbm_assign_mode if required. This can be
>> +done using the following command. Note that switching the mbm_assign_mode will reset all
>> +the MBM counters (and thus all MBM events) of all the resctrl groups.
>
> hmmm ... earlier documentation about mbm_assign_mode changes was careful to use
> "may reset", and here is it switched to "will reset". I am still cautious to make any
> strong commitments about resctrl behavior in user documentation.
Changed to "may reset"
>
>> +::
>> +
>> + # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> + mbm_event
>> + [default]
>> +
>> +m. Unmount the resctrl
>> +::
>> +
>> + #umount /sys/fs/resctrl/
>> +
>> Intel RDT Errata
>> ================
>>
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index ed0e3b695ad5..14d99c723ea5 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -51,6 +51,15 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
>> return container_of(kfc, struct rdt_fs_context, kfc);
>> }
>>
>> +/*
>> + * Assignment types for the monitor modes
>> + */
>> +enum {
>> + ASSIGN_NONE = 0,
>> + ASSIGN_EXCLUSIVE,
>> + ASSIGN_INVALID,
>> +};
>
> I do not think this is necessary (more below)
>
Removed it.
>> +
>> /**
>> * struct mon_evt - Description of a monitor event
>> * @evtid: event id
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 18ec65801dbb..92bb8f3adfae 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -2129,6 +2129,160 @@ static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file
>> return ret;
>> }
>>
>> +/**
>> + * mbm_get_mon_event_by_name() - Return the mon_evt entry for the matching
>> + * event name.
>> + */
>> +static struct mon_evt *mbm_get_mon_event_by_name(struct rdt_resource *r,
>
> struct rdt_resource parameter seems to be unused ... but should be used to match
> with mon_evt::rid?
>
Sure.
>> + char *name)
>> +{
>> + struct mon_evt *mevt;
>> +
>> + for (mevt = &mon_event_all[0]; mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++) {
>
> (use macro)
>
Sure.
>> + if (mevt->enabled && !strcmp(mevt->name, name))
>> + return mevt;
>> + }
>> +
>> + return NULL;
>> +}
>
> This looks to be a utility that should be close to the data structure in
> fs/resctrl/monitor.c. Please check if you can move monitoring code
> to fs/resctrl/monitor.c.
>
Yes.
>> +
>> +static unsigned int resctrl_get_assign_state(char *assign)
>> +{
>> + if (!assign || strlen(assign) != 1)
>> + return ASSIGN_INVALID;
>> +
>> + switch (*assign) {
>> + case 'e':
>> + return ASSIGN_EXCLUSIVE;
>
> I think this can be simplified by calling resctrl_assign_cntr_event()
> (rdtgroup_assign_cntr_event()) directly.
Yes.
>
>> + case '_':
>> + return ASSIGN_NONE;
>
> Here resctrl_unassign_cntr_event() (rdtgroup_unassign_cntr_event())
> can be called directly.
>
Yes.
>> + default:
>> + return ASSIGN_INVALID;
>> + }
>
> With assign/unassign done the function can return proper error
>
>> +}
>> +
>> +static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> + char *event, char *tok)
>> +{
>> + struct rdt_mon_domain *d;
>> + unsigned long dom_id = 0;
>> + char *dom_str, *id_str;
>> + struct mon_evt *mevt;
>> + int assign_state;
>> + char domain[10];
>> + bool found;
>> + int ret;
>> +
>> + mevt = mbm_get_mon_event_by_name(r, event);
>> + if (!mevt) {
>> + rdt_last_cmd_printf("Invalid event %s\n", event);
>> + return -ENOENT;
>> + }
>> +
>> +next:
>> + if (!tok || tok[0] == '\0')
>> + return 0;
>> +
>> + /* Start processing the strings for each domain */
>> + dom_str = strim(strsep(&tok, ";"));
>> +
>> + id_str = strsep(&dom_str, "=");
>> +
>> + /* Check for domain id '*' which means all domains */
>> + if (id_str && *id_str == '*') {
>> + d = NULL;
>> + goto check_state;
>
> Instead of "goto check_state" resctrl_get_assign_state() (with
> more appropriate name after changes) can be called directly, its
> return handled, possibly printing to last_cmd_status without needing
> any sprintf() tricks, and exit from resctrl_process_assign().
Changed
resctrl_get_assign_state() ->
rdtgroup_modify_assign_state().
It takes care of calling rdtgroup_assign_cntr_event() or
rdtgroup_unassign_cntr_event().
>
> Apart from simplifying the code an additional benefit is to avoid
> (ab)use case where user/bot may write:
> # echo "mbm_total_bytes:*=_;*=e;*=_" > /sys/fs/resctrl/mbm_L3_assignments
>
Why should we restrict this?
>> + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>> + rdt_last_cmd_puts("Missing domain id\n");
>> + return -EINVAL;
>> + }
>> +
>> + /* Verify if the dom_id is valid */
>> + found = false;
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + if (d->hdr.id == dom_id) {
>
> Similarly, resctrl_get_assign_state() (new name TBD) can be
> called directly and "found" can be dropped.
I think we still need to know if the domain id matched or not.
I think it is better to call resctrl_get_assign_state()(now
rdtgroup_modify_assign_state()) at once place. Code is easy to follow.
I have taken care of most of the stuff. You can review again in next
version.
Thanks
Babu
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 29/32] fs/resctrl: Introduce the interface to modify assignments in a group
2025-07-02 2:18 ` Moger, Babu
@ 2025-07-02 2:56 ` Reinette Chatre
0 siblings, 0 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-07-02 2:56 UTC (permalink / raw)
To: Moger, Babu, Babu Moger, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 7/1/25 7:18 PM, Moger, Babu wrote:
> On 6/25/2025 6:38 PM, Reinette Chatre wrote:
>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>
>> Apart from simplifying the code an additional benefit is to avoid
>> (ab)use case where user/bot may write:
>> # echo "mbm_total_bytes:*=_;*=e;*=_" > /sys/fs/resctrl/mbm_L3_assignments
>>
>
> Why should we restrict this?
I see it as unnecessary churn that can easily be avoided.
>
>>> + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>>> + rdt_last_cmd_puts("Missing domain id\n");
>>> + return -EINVAL;
>>> + }
>>> +
>>> + /* Verify if the dom_id is valid */
>>> + found = false;
>>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>> + if (d->hdr.id == dom_id) {
>>
>> Similarly, resctrl_get_assign_state() (new name TBD) can be
>> called directly and "found" can be dropped.
>
> I think we still need to know if the domain id matched or not.
Of course, and when the domain ID matches, just call
resctrl_get_assign_state()/rdtgroup_modify_assign_state()
>
> I think it is better to call resctrl_get_assign_state()(now rdtgroup_modify_assign_state()) at once place. Code is easy to follow.
This is not clear to me. Will surely take a look at how this turned out.
>
> I have taken care of most of the stuff. You can review again in next version.
ok, thank you.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (28 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 29/32] fs/resctrl: Introduce the interface to modify " Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 23:39 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 31/32] fs/resctrl: Introduce the interface to switch between monitor modes Babu Moger
` (3 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
BMEC (Bandwidth Monitoring Event Configuration) and mbm_event mode do not
work simultaneously.
When mbm_event mode is enabled, hide BMEC-related files to avoid confusion
and update the mon_features display accordingly.
The files /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config and
/sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config will not be visible
when mbm_event mode is enabled.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Updated the changelog for change in mbm_assign_modes.
Added check in rdt_mon_features_show to hide bmec related feature.
v13: New patch to hide BMEC related files.
---
fs/resctrl/rdtgroup.c | 42 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 41 insertions(+), 1 deletion(-)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 92bb8f3adfae..8c67e0897f25 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1164,7 +1164,8 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
if (mevt->rid != r->rid || !mevt->enabled)
continue;
seq_printf(seq, "%s\n", mevt->name);
- if (mevt->configurable)
+ if (mevt->configurable &&
+ !resctrl_arch_mbm_cntr_assign_enabled(r))
seq_printf(seq, "%s_config\n", mevt->name);
}
@@ -1813,6 +1814,38 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+/**
+ * resctrl_bmec_files_show() — Controls the visibility of BMEC-related resctrl
+ * files. When @show is true, the files are displayed; when false, the files
+ * are hidden.
+ */
+static void resctrl_bmec_files_show(struct rdt_resource *r, bool show)
+{
+ struct kernfs_node *kn_config, *l3_mon_kn;
+ char name[32];
+
+ sprintf(name, "%s_MON", r->name);
+ l3_mon_kn = kernfs_find_and_get(kn_info, name);
+ if (!l3_mon_kn)
+ return;
+
+ kn_config = kernfs_find_and_get(l3_mon_kn, "mbm_total_bytes_config");
+ if (kn_config) {
+ kernfs_get(kn_config);
+ kernfs_show(kn_config, show);
+ kernfs_put(kn_config);
+ }
+
+ kn_config = kernfs_find_and_get(l3_mon_kn, "mbm_local_bytes_config");
+ if (kn_config) {
+ kernfs_get(kn_config);
+ kernfs_show(kn_config, show);
+ kernfs_put(kn_config);
+ }
+
+ kernfs_put(l3_mon_kn);
+}
+
static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
struct seq_file *s, void *v)
{
@@ -2808,6 +2841,13 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
ret = resctrl_mkdir_counter_configs(r, name);
if (ret)
goto out_destroy;
+
+ /*
+ * Hide BMEC related files if mbm_event mode
+ * is enabled.
+ */
+ if (resctrl_arch_mbm_cntr_assign_enabled(r))
+ resctrl_bmec_files_show(r, false);
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
2025-06-13 21:05 ` [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled Babu Moger
@ 2025-06-25 23:39 ` Reinette Chatre
2025-07-02 16:42 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 23:39 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> BMEC (Bandwidth Monitoring Event Configuration) and mbm_event mode do not
> work simultaneously.
Could you please elaborate why they do not work simultaneously?
>
> When mbm_event mode is enabled, hide BMEC-related files to avoid confusion
> and update the mon_features display accordingly.
>
> The files /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config and
> /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config will not be visible
> when mbm_event mode is enabled.
>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v14: Updated the changelog for change in mbm_assign_modes.
> Added check in rdt_mon_features_show to hide bmec related feature.
>
> v13: New patch to hide BMEC related files.
> ---
> fs/resctrl/rdtgroup.c | 42 +++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 92bb8f3adfae..8c67e0897f25 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1164,7 +1164,8 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
> if (mevt->rid != r->rid || !mevt->enabled)
> continue;
> seq_printf(seq, "%s\n", mevt->name);
> - if (mevt->configurable)
> + if (mevt->configurable &&
> + !resctrl_arch_mbm_cntr_assign_enabled(r))
> seq_printf(seq, "%s_config\n", mevt->name);
> }
>
> @@ -1813,6 +1814,38 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
> return ret ?: nbytes;
> }
>
> +/**
> + * resctrl_bmec_files_show() — Controls the visibility of BMEC-related resctrl
> + * files. When @show is true, the files are displayed; when false, the files
> + * are hidden.
> + */
> +static void resctrl_bmec_files_show(struct rdt_resource *r, bool show)
> +{
> + struct kernfs_node *kn_config, *l3_mon_kn;
> + char name[32];
> +
> + sprintf(name, "%s_MON", r->name);
> + l3_mon_kn = kernfs_find_and_get(kn_info, name);
Similar to comment about resctrl_mkdir_counter_configs() (resctrl_mkdir_event_configs())
I think this can be avoided by calling resctrl_bmec_files_show() from
rdtgroup_mkdir_info_resdir().
> + if (!l3_mon_kn)
> + return;
> +
> + kn_config = kernfs_find_and_get(l3_mon_kn, "mbm_total_bytes_config");
> + if (kn_config) {
> + kernfs_get(kn_config);
Be careful ... kernfs_find_and_get() already took a reference. Additional
kernfs_get() is not needed.
> + kernfs_show(kn_config, show);
> + kernfs_put(kn_config);
> + }
> +
> + kn_config = kernfs_find_and_get(l3_mon_kn, "mbm_local_bytes_config");
> + if (kn_config) {
> + kernfs_get(kn_config);
> + kernfs_show(kn_config, show);
> + kernfs_put(kn_config);
> + }
> +
> + kernfs_put(l3_mon_kn);
> +}
> +
> static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
> struct seq_file *s, void *v)
> {
> @@ -2808,6 +2841,13 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
> ret = resctrl_mkdir_counter_configs(r, name);
> if (ret)
> goto out_destroy;
> +
> + /*
> + * Hide BMEC related files if mbm_event mode
> + * is enabled.
> + */
> + if (resctrl_arch_mbm_cntr_assign_enabled(r))
> + resctrl_bmec_files_show(r, false);
> }
> }
>
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
2025-06-25 23:39 ` Reinette Chatre
@ 2025-07-02 16:42 ` Moger, Babu
2025-07-02 17:21 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-07-02 16:42 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/25/25 18:39, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> BMEC (Bandwidth Monitoring Event Configuration) and mbm_event mode do not
>> work simultaneously.
>
> Could you please elaborate why they do not work simultaneously?
Changed the changelog.
When mbm_event counter assignment mode is enabled, events are configured
through the "event_filter" files under
/sys/fs/resctrl/info/L3_MON/event_configs/.
The default monitoring mode and with BMEC (Bandwidth Monitoring Event
Configuration) support, events are configured using the files
mbm_total_bytes_config or mbm_local_bytes_config in
/sys/fs/resctrl/info/L3_MON/.
To avoid the confusion, hide BMEC-related files when mbm_event counter
assignment mode is enabled and update the mon_features display accordingly.
>
>>
>> When mbm_event mode is enabled, hide BMEC-related files to avoid confusion
>> and update the mon_features display accordingly.
>>
>> The files /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config and
>> /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config will not be visible
>> when mbm_event mode is enabled.
>>
>> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v14: Updated the changelog for change in mbm_assign_modes.
>> Added check in rdt_mon_features_show to hide bmec related feature.
>>
>> v13: New patch to hide BMEC related files.
>> ---
>> fs/resctrl/rdtgroup.c | 42 +++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 41 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 92bb8f3adfae..8c67e0897f25 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -1164,7 +1164,8 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
>> if (mevt->rid != r->rid || !mevt->enabled)
>> continue;
>> seq_printf(seq, "%s\n", mevt->name);
>> - if (mevt->configurable)
>> + if (mevt->configurable &&
>> + !resctrl_arch_mbm_cntr_assign_enabled(r))
>> seq_printf(seq, "%s_config\n", mevt->name);
>> }
>>
>> @@ -1813,6 +1814,38 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
>> return ret ?: nbytes;
>> }
>>
>> +/**
>> + * resctrl_bmec_files_show() — Controls the visibility of BMEC-related resctrl
>> + * files. When @show is true, the files are displayed; when false, the files
>> + * are hidden.
>> + */
>> +static void resctrl_bmec_files_show(struct rdt_resource *r, bool show)
>> +{
>> + struct kernfs_node *kn_config, *l3_mon_kn;
>> + char name[32];
>> +
>> + sprintf(name, "%s_MON", r->name);
>> + l3_mon_kn = kernfs_find_and_get(kn_info, name);
>
> Similar to comment about resctrl_mkdir_counter_configs() (resctrl_mkdir_event_configs())
> I think this can be avoided by calling resctrl_bmec_files_show() from
> rdtgroup_mkdir_info_resdir().
Sure.
>
>> + if (!l3_mon_kn)
>> + return;
>> +
>> + kn_config = kernfs_find_and_get(l3_mon_kn, "mbm_total_bytes_config");
>> + if (kn_config) {
>> + kernfs_get(kn_config);
>
> Be careful ... kernfs_find_and_get() already took a reference. Additional
> kernfs_get() is not needed.
Sure.
>
>> + kernfs_show(kn_config, show);
>> + kernfs_put(kn_config);
>> + }
>> +
>> + kn_config = kernfs_find_and_get(l3_mon_kn, "mbm_local_bytes_config");
>> + if (kn_config) {
>> + kernfs_get(kn_config);
>> + kernfs_show(kn_config, show);
>> + kernfs_put(kn_config);
>> + }
>> +
>> + kernfs_put(l3_mon_kn);
>> +}
>> +
>> static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
>> struct seq_file *s, void *v)
>> {
>> @@ -2808,6 +2841,13 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
>> ret = resctrl_mkdir_counter_configs(r, name);
>> if (ret)
>> goto out_destroy;
>> +
>> + /*
>> + * Hide BMEC related files if mbm_event mode
>> + * is enabled.
>> + */
>> + if (resctrl_arch_mbm_cntr_assign_enabled(r))
>> + resctrl_bmec_files_show(r, false);
>> }
>> }
>>
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
2025-07-02 16:42 ` Moger, Babu
@ 2025-07-02 17:21 ` Reinette Chatre
2025-07-02 19:04 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-07-02 17:21 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 7/2/25 9:42 AM, Moger, Babu wrote:
> On 6/25/25 18:39, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>> BMEC (Bandwidth Monitoring Event Configuration) and mbm_event mode do not
>>> work simultaneously.
>>
>> Could you please elaborate why they do not work simultaneously?
>
> Changed the changelog.
>
> When mbm_event counter assignment mode is enabled, events are configured
> through the "event_filter" files under
> /sys/fs/resctrl/info/L3_MON/event_configs/.
>
> The default monitoring mode and with BMEC (Bandwidth Monitoring Event
> Configuration) support, events are configured using the files
> mbm_total_bytes_config or mbm_local_bytes_config in
> /sys/fs/resctrl/info/L3_MON/.
A reasonable question here may be why not just keep using the existing
(BMEC supporting) event configuration files for event configuration? Why
are new event configuration files needed?
>
> To avoid the confusion, hide BMEC-related files when mbm_event counter
> assignment mode is enabled and update the mon_features display accordingly.
>
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
2025-07-02 17:21 ` Reinette Chatre
@ 2025-07-02 19:04 ` Moger, Babu
2025-07-03 16:21 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-07-02 19:04 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 7/2/25 12:21, Reinette Chatre wrote:
> Hi Babu,
>
> On 7/2/25 9:42 AM, Moger, Babu wrote:
>> On 6/25/25 18:39, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>>> BMEC (Bandwidth Monitoring Event Configuration) and mbm_event mode do not
>>>> work simultaneously.
>>>
>>> Could you please elaborate why they do not work simultaneously?
>>
>> Changed the changelog.
>>
>> When mbm_event counter assignment mode is enabled, events are configured
>> through the "event_filter" files under
>> /sys/fs/resctrl/info/L3_MON/event_configs/.
>>
>> The default monitoring mode and with BMEC (Bandwidth Monitoring Event
>> Configuration) support, events are configured using the files
>> mbm_total_bytes_config or mbm_local_bytes_config in
>> /sys/fs/resctrl/info/L3_MON/.
>
> A reasonable question here may be why not just keep using the existing
> (BMEC supporting) event configuration files for event configuration? Why
> are new event configuration files needed?
New interface that enables users to read and write memory transaction
events using human-readable strings, simplifying configuration and
improving usability.
In future it can be extended to create free form event names.
>
>>
>> To avoid the confusion, hide BMEC-related files when mbm_event counter
>> assignment mode is enabled and update the mon_features display accordingly.
>>
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
2025-07-02 19:04 ` Moger, Babu
@ 2025-07-03 16:21 ` Reinette Chatre
2025-07-07 22:35 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-07-03 16:21 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 7/2/25 12:04 PM, Moger, Babu wrote:
> Hi Reinette,
>
> On 7/2/25 12:21, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 7/2/25 9:42 AM, Moger, Babu wrote:
>>> On 6/25/25 18:39, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>>>> BMEC (Bandwidth Monitoring Event Configuration) and mbm_event mode do not
>>>>> work simultaneously.
>>>>
>>>> Could you please elaborate why they do not work simultaneously?
>>>
>>> Changed the changelog.
>>>
>>> When mbm_event counter assignment mode is enabled, events are configured
>>> through the "event_filter" files under
>>> /sys/fs/resctrl/info/L3_MON/event_configs/.
>>>
>>> The default monitoring mode and with BMEC (Bandwidth Monitoring Event
>>> Configuration) support, events are configured using the files
>>> mbm_total_bytes_config or mbm_local_bytes_config in
>>> /sys/fs/resctrl/info/L3_MON/.
>>
>> A reasonable question here may be why not just keep using the existing
>> (BMEC supporting) event configuration files for event configuration? Why
>> are new event configuration files needed?
>
> New interface that enables users to read and write memory transaction
> events using human-readable strings, simplifying configuration and
> improving usability.
I find the "simplifying configuration and improving usability" a bit vague
for a changelog. The cover letter already claims that ABMC and BMEC are
incompatible and links to some email discussions. I think it will be helpful
to summarize here why ABMC and BMEC are considered incompatible and then use
that as motivation to hide BMEC. The motivation in this changelog is to
"avoid confusion" but the motivation is stronger than that.
>
> In future it can be extended to create free form event names.
>
>>
>>>
>>> To avoid the confusion, hide BMEC-related files when mbm_event counter
>>> assignment mode is enabled and update the mon_features display accordingly.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
2025-07-03 16:21 ` Reinette Chatre
@ 2025-07-07 22:35 ` Moger, Babu
2025-07-08 13:27 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-07-07 22:35 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 7/3/25 11:21, Reinette Chatre wrote:
> Hi Babu,
>
> On 7/2/25 12:04 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 7/2/25 12:21, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 7/2/25 9:42 AM, Moger, Babu wrote:
>>>> On 6/25/25 18:39, Reinette Chatre wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>>>>> BMEC (Bandwidth Monitoring Event Configuration) and mbm_event mode do not
>>>>>> work simultaneously.
>>>>>
>>>>> Could you please elaborate why they do not work simultaneously?
>>>>
>>>> Changed the changelog.
>>>>
>>>> When mbm_event counter assignment mode is enabled, events are configured
>>>> through the "event_filter" files under
>>>> /sys/fs/resctrl/info/L3_MON/event_configs/.
>>>>
>>>> The default monitoring mode and with BMEC (Bandwidth Monitoring Event
>>>> Configuration) support, events are configured using the files
>>>> mbm_total_bytes_config or mbm_local_bytes_config in
>>>> /sys/fs/resctrl/info/L3_MON/.
>>>
>>> A reasonable question here may be why not just keep using the existing
>>> (BMEC supporting) event configuration files for event configuration? Why
>>> are new event configuration files needed?
>>
>> New interface that enables users to read and write memory transaction
>> events using human-readable strings, simplifying configuration and
>> improving usability.
>
> I find the "simplifying configuration and improving usability" a bit vague
> for a changelog. The cover letter already claims that ABMC and BMEC are
> incompatible and links to some email discussions. I think it will be helpful
> to summarize here why ABMC and BMEC are considered incompatible and then use
> that as motivation to hide BMEC. The motivation in this changelog is to
> "avoid confusion" but the motivation is stronger than that.
>
Changed the changelog. How does this look?
"The default monitoring mode and with BMEC (Bandwidth Monitoring Event
Configuration) support, events are configured using the files
mbm_total_bytes_config or mbm_local_bytes_config in
/sys/fs/resctrl/info/L3_MON/.
When the mbm_event counter assignment mode is enabled, event configuration
is handled via the event_filter files under
/sys/fs/resctrl/info/L3_MON/event_configs/. This mode allows users to read
and write memory transaction events using human-readable strings, making
the interface easier to use and more intuitive. Going forward, this
mechanism can support assigning multiple counters to RMID, event pairs and
may be extended to allow flexible, user-defined event names.
Given these changes, hide the BMEC-related files when the mbm_event
counter assignment mode is enabled. Also, update the mon_features display
accordingly."
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
2025-07-07 22:35 ` Moger, Babu
@ 2025-07-08 13:27 ` Moger, Babu
2025-07-08 15:21 ` Reinette Chatre
0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-07-08 13:27 UTC (permalink / raw)
To: babu.moger, Reinette Chatre, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 7/7/2025 5:35 PM, Moger, Babu wrote:
> Hi Reinette,
>
>
> On 7/3/25 11:21, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 7/2/25 12:04 PM, Moger, Babu wrote:
>>> Hi Reinette,
>>>
>>> On 7/2/25 12:21, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 7/2/25 9:42 AM, Moger, Babu wrote:
>>>>> On 6/25/25 18:39, Reinette Chatre wrote:
>>>>>> Hi Babu,
>>>>>>
>>>>>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>>>>>> BMEC (Bandwidth Monitoring Event Configuration) and mbm_event mode do not
>>>>>>> work simultaneously.
>>>>>>
>>>>>> Could you please elaborate why they do not work simultaneously?
>>>>>
>>>>> Changed the changelog.
>>>>>
>>>>> When mbm_event counter assignment mode is enabled, events are configured
>>>>> through the "event_filter" files under
>>>>> /sys/fs/resctrl/info/L3_MON/event_configs/.
>>>>>
>>>>> The default monitoring mode and with BMEC (Bandwidth Monitoring Event
>>>>> Configuration) support, events are configured using the files
>>>>> mbm_total_bytes_config or mbm_local_bytes_config in
>>>>> /sys/fs/resctrl/info/L3_MON/.
>>>>
>>>> A reasonable question here may be why not just keep using the existing
>>>> (BMEC supporting) event configuration files for event configuration? Why
>>>> are new event configuration files needed?
>>>
>>> New interface that enables users to read and write memory transaction
>>> events using human-readable strings, simplifying configuration and
>>> improving usability.
>>
>> I find the "simplifying configuration and improving usability" a bit vague
>> for a changelog. The cover letter already claims that ABMC and BMEC are
>> incompatible and links to some email discussions. I think it will be helpful
>> to summarize here why ABMC and BMEC are considered incompatible and then use
>> that as motivation to hide BMEC. The motivation in this changelog is to
>> "avoid confusion" but the motivation is stronger than that.
>>
>
> Changed the changelog. How does this look?
>
> "The default monitoring mode and with BMEC (Bandwidth Monitoring Event
> Configuration) support, events are configured using the files
> mbm_total_bytes_config or mbm_local_bytes_config in
> /sys/fs/resctrl/info/L3_MON/.
>
> When the mbm_event counter assignment mode is enabled, event configuration
> is handled via the event_filter files under
> /sys/fs/resctrl/info/L3_MON/event_configs/. This mode allows users to read
> and write memory transaction events using human-readable strings, making
> the interface easier to use and more intuitive. Going forward, this
> mechanism can support assigning multiple counters to RMID, event pairs and
> may be extended to allow flexible, user-defined event names.
>
> Given these changes, hide the BMEC-related files when the mbm_event
> counter assignment mode is enabled. Also, update the mon_features display
> accordingly."
>
Here is another update.
fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
The default monitoring mode and with BMEC (Bandwidth Monitoring Event
Configuration) support, events are configured using the files
mbm_total_bytes_config or mbm_local_bytes_config in
/sys/fs/resctrl/info/L3_MON/.
When the mbm_event counter assignment mode is enabled, event
configuration is handled via the event_filter files under
/sys/fs/resctrl/info/L3_MON/event_configs/. This mode enables users to
configure memory transaction events using human-readable strings,
providing a more intuitive and user-friendly interface. In the future,
this mechanism could be extended to support assigning multiple counters
to RMID-event pairs, as well as customizable, user-defined event names.
Also, the presence of BMEC-related configuration files may cause
confusion when the mbm_event counter assignment mode is enabled.
To address this, these files are now hidden when the mode is active.
Additionally, the mon_features display has been updated to reflect this
change.
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
2025-07-08 13:27 ` Moger, Babu
@ 2025-07-08 15:21 ` Reinette Chatre
2025-07-08 15:43 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-07-08 15:21 UTC (permalink / raw)
To: Moger, Babu, babu.moger, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 7/8/25 6:27 AM, Moger, Babu wrote:
> On 7/7/2025 5:35 PM, Moger, Babu wrote:
>> On 7/3/25 11:21, Reinette Chatre wrote:
>>> On 7/2/25 12:04 PM, Moger, Babu wrote:
>>>> On 7/2/25 12:21, Reinette Chatre wrote:
>>>>> On 7/2/25 9:42 AM, Moger, Babu wrote:
>>>>>> On 6/25/25 18:39, Reinette Chatre wrote:
>>>>>>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>>>>>>> BMEC (Bandwidth Monitoring Event Configuration) and mbm_event mode do not
>>>>>>>> work simultaneously.
>>>>>>>
>>>>>>> Could you please elaborate why they do not work simultaneously?
>>>>>>
>>>>>> Changed the changelog.
>>>>>>
>>>>>> When mbm_event counter assignment mode is enabled, events are configured
>>>>>> through the "event_filter" files under
>>>>>> /sys/fs/resctrl/info/L3_MON/event_configs/.
>>>>>>
>>>>>> The default monitoring mode and with BMEC (Bandwidth Monitoring Event
>>>>>> Configuration) support, events are configured using the files
>>>>>> mbm_total_bytes_config or mbm_local_bytes_config in
>>>>>> /sys/fs/resctrl/info/L3_MON/.
>>>>>
>>>>> A reasonable question here may be why not just keep using the existing
>>>>> (BMEC supporting) event configuration files for event configuration? Why
>>>>> are new event configuration files needed?
>>>>
>>>> New interface that enables users to read and write memory transaction
>>>> events using human-readable strings, simplifying configuration and
>>>> improving usability.
>>>
>>> I find the "simplifying configuration and improving usability" a bit vague
>>> for a changelog. The cover letter already claims that ABMC and BMEC are
>>> incompatible and links to some email discussions. I think it will be helpful
>>> to summarize here why ABMC and BMEC are considered incompatible and then use
>>> that as motivation to hide BMEC. The motivation in this changelog is to
>>> "avoid confusion" but the motivation is stronger than that.
>>>
>>
>> Changed the changelog. How does this look?
>>
>> "The default monitoring mode and with BMEC (Bandwidth Monitoring Event
>> Configuration) support, events are configured using the files
>> mbm_total_bytes_config or mbm_local_bytes_config in
>> /sys/fs/resctrl/info/L3_MON/.
>>
>> When the mbm_event counter assignment mode is enabled, event configuration
>> is handled via the event_filter files under
>> /sys/fs/resctrl/info/L3_MON/event_configs/. This mode allows users to read
>> and write memory transaction events using human-readable strings, making
>> the interface easier to use and more intuitive. Going forward, this
>> mechanism can support assigning multiple counters to RMID, event pairs and
>> may be extended to allow flexible, user-defined event names.
>>
>> Given these changes, hide the BMEC-related files when the mbm_event
>> counter assignment mode is enabled. Also, update the mon_features display
>> accordingly."
>>
>
> Here is another update.
>
> fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
>
> The default monitoring mode and with BMEC (Bandwidth Monitoring Event
> Configuration) support, events are configured using the files
> mbm_total_bytes_config or mbm_local_bytes_config in
> /sys/fs/resctrl/info/L3_MON/.
>
> When the mbm_event counter assignment mode is enabled, event configuration is handled via the event_filter files under
> /sys/fs/resctrl/info/L3_MON/event_configs/. This mode enables users to
> configure memory transaction events using human-readable strings, providing a more intuitive and user-friendly interface. In the future, this mechanism could be extended to support assigning multiple counters to RMID-event pairs, as well as customizable, user-defined event names. Also, the presence of BMEC-related configuration files may cause confusion when the mbm_event counter assignment mode is enabled.
>
> To address this, these files are now hidden when the mode is active.
> Additionally, the mon_features display has been updated to reflect this
> change.
I do not find a concrete motivation in this changelog. The terms "may cause
confusion" and "providing a more intuitive and user-friendly interface." are
vague and not something that I think provides a good motivation for disabling
an entire interface.
I aim to write a draft below that I hope will help make this changelog more
convincing. Please do improve it:
fs/resctrl: Disable BMEC event configuration when mbm_event mode is enabled
The BMEC (Bandwidth Monitoring Event Configuration) feature enables per-domain
event configuration. With BMEC the MBM events are configured using
the mbm_total_bytes_config or mbm_local_bytes_config files in
/sys/fs/resctrl/info/L3_MON/ and the per-domain event configuration
affects all monitor resource groups.
The mbm_event counter assignment mode enables counters to be assigned to
RMID (i.e a monitor resource group), event pairs, with potentially unique
event configurations associated with every counter.
There may be systems that support both BMEC and mbm_event counter assignment
mode, but resctrl supporting both concurrently will present a conflicting
interface to the user with both per-domain and per RMID, event configurations
active at the same time.
mbm_event counter assignment provides most flexibility to user space and
aligns with Arm's counter support. On systems that support both, disable BMEC
event configuration when mbm_event mode is enabled by hiding the
the mbm_total_bytes_config or mbm_local_bytes_config files when mbm_event mode
is enabled. Ensure mon_features always displays accurate information about
monitor features.
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
2025-07-08 15:21 ` Reinette Chatre
@ 2025-07-08 15:43 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-07-08 15:43 UTC (permalink / raw)
To: Reinette Chatre, Moger, Babu, corbet, tony.luck, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 7/8/25 10:21, Reinette Chatre wrote:
> Hi Babu,
>
> On 7/8/25 6:27 AM, Moger, Babu wrote:
>> On 7/7/2025 5:35 PM, Moger, Babu wrote:
>>> On 7/3/25 11:21, Reinette Chatre wrote:
>>>> On 7/2/25 12:04 PM, Moger, Babu wrote:
>>>>> On 7/2/25 12:21, Reinette Chatre wrote:
>>>>>> On 7/2/25 9:42 AM, Moger, Babu wrote:
>>>>>>> On 6/25/25 18:39, Reinette Chatre wrote:
>>>>>>>> On 6/13/25 2:05 PM, Babu Moger wrote:
>>> Changed the changelog. How does this look?
>>>
>>> "The default monitoring mode and with BMEC (Bandwidth Monitoring Event
>>> Configuration) support, events are configured using the files
>>> mbm_total_bytes_config or mbm_local_bytes_config in
>>> /sys/fs/resctrl/info/L3_MON/.
>>>
>>> When the mbm_event counter assignment mode is enabled, event configuration
>>> is handled via the event_filter files under
>>> /sys/fs/resctrl/info/L3_MON/event_configs/. This mode allows users to read
>>> and write memory transaction events using human-readable strings, making
>>> the interface easier to use and more intuitive. Going forward, this
>>> mechanism can support assigning multiple counters to RMID, event pairs and
>>> may be extended to allow flexible, user-defined event names.
>>>
>>> Given these changes, hide the BMEC-related files when the mbm_event
>>> counter assignment mode is enabled. Also, update the mon_features display
>>> accordingly."
>>>
>>
>> Here is another update.
>>
>> fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled
>>
>> The default monitoring mode and with BMEC (Bandwidth Monitoring Event
>> Configuration) support, events are configured using the files
>> mbm_total_bytes_config or mbm_local_bytes_config in
>> /sys/fs/resctrl/info/L3_MON/.
>>
>> When the mbm_event counter assignment mode is enabled, event configuration is handled via the event_filter files under
>> /sys/fs/resctrl/info/L3_MON/event_configs/. This mode enables users to
>> configure memory transaction events using human-readable strings, providing a more intuitive and user-friendly interface. In the future, this mechanism could be extended to support assigning multiple counters to RMID-event pairs, as well as customizable, user-defined event names. Also, the presence of BMEC-related configuration files may cause confusion when the mbm_event counter assignment mode is enabled.
>>
>> To address this, these files are now hidden when the mode is active.
>> Additionally, the mon_features display has been updated to reflect this
>> change.
>
> I do not find a concrete motivation in this changelog. The terms "may cause
> confusion" and "providing a more intuitive and user-friendly interface." are
> vague and not something that I think provides a good motivation for disabling
> an entire interface.
>
> I aim to write a draft below that I hope will help make this changelog more
> convincing. Please do improve it:
>
>
> fs/resctrl: Disable BMEC event configuration when mbm_event mode is enabled
>
> The BMEC (Bandwidth Monitoring Event Configuration) feature enables per-domain
> event configuration. With BMEC the MBM events are configured using
> the mbm_total_bytes_config or mbm_local_bytes_config files in
> /sys/fs/resctrl/info/L3_MON/ and the per-domain event configuration
> affects all monitor resource groups.
>
> The mbm_event counter assignment mode enables counters to be assigned to
> RMID (i.e a monitor resource group), event pairs, with potentially unique
> event configurations associated with every counter.
>
> There may be systems that support both BMEC and mbm_event counter assignment
> mode, but resctrl supporting both concurrently will present a conflicting
> interface to the user with both per-domain and per RMID, event configurations
> active at the same time.
>
> mbm_event counter assignment provides most flexibility to user space and
> aligns with Arm's counter support. On systems that support both, disable BMEC
> event configuration when mbm_event mode is enabled by hiding the
> the mbm_total_bytes_config or mbm_local_bytes_config files when mbm_event mode
> is enabled. Ensure mon_features always displays accurate information about
> monitor features.
>
Looks good.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 31/32] fs/resctrl: Introduce the interface to switch between monitor modes
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (29 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 30/32] fs/resctrl: Hide the BMEC related files when mbm_event mode is enabled Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 23:40 ` Reinette Chatre
2025-06-13 21:05 ` [PATCH v14 32/32] x86/resctrl: Configure mbm_event mode if supported Babu Moger
` (2 subsequent siblings)
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Resctrl subsystem can support two monitoring modes, "mbm_event" or
"default". In mbm_event mode, monitoring event can only accumulate data
while it is backed by a hardware counter. In "default" mode, resctrl
assumes there is a hardware counter for each event within every CTRL_MON
and MON group.
Introduce interface to switch between mbm_event and default modes.
Example:
To list the MBM monitor modes supported:
$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_event]
default
To enable the "mbm_event" monitoring mode:
$ echo "mbm_event" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
To enable the "default" monitoring mode:
$ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
MBM event counters are automatically reset as part of changing the mode.
Clear both architectural and non-architectural event states to prevent
overflow conditions during the next event read. Also clear assignable
counter configuration on all the domains.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Updated the changelog to reflect the change in monitor mode naming.
Added the call resctrl_bmec_files_show() to enable/disable files
related to BMEC.
Added resctrl_set_mon_evt_cfg() to reset event configuration values
when mode is changes.
v13: Resolved the conflicts due to FS/ARCH restructure.
Introduced the new resctrl_init_evt_configuration() to initialize
the event modes and configuration values.
Added the call to resctrl_bmec_files_show() hide/show BMEC related
files.
v12: Fixed the documentation for a consistency.
Introduced mbm_cntr_free_all() and resctrl_reset_rmid_all() to clear
counters and non-architectural states when monitor mode is changed.
https://lore.kernel.org/lkml/b60b4f72-6245-46db-a126-428fb13b6310@intel.com/
v11: Changed the name of the function rdtgroup_mbm_assign_mode_write() to
resctrl_mbm_assign_mode_write().
Rewrote the commit message with context.
Added few more details in resctrl.rst about mbm_cntr_assign mode.
Re-arranged the text in resctrl.rst file.
v10: The call mbm_cntr_reset() has been moved to earlier patch.
Minor documentation update.
v9: Fixed extra spaces in user documentation.
Fixed problem changing the mode to mbm_cntr_assign mode when it is
not supported. Added extra checks to detect if systems supports it.
Used the rdtgroup_cntr_id_init to initialize cntr_id.
v8: Reset the internal counters after mbm_cntr_assign mode is changed.
Renamed rdtgroup_mbm_cntr_reset() to mbm_cntr_reset()
Updated the documentation to make text generic.
v7: Changed the interface name to mbm_assign_mode.
Removed the references of ABMC.
Added the changes to reset global and domain bitmaps.
Added the changes to reset rmid.
v6: Changed the mode name to mbm_cntr_assign.
Moved all the FS related code here.
Added changes to reset mbm_cntr_map and resctrl group counters.
v5: Change log and mode description text correction.
v4: Minor commit text changes. Keep the default to ABMC when supported.
Fixed comments to reflect changed interface "mbm_mode".
v3: New patch to address the review comments from upstream.
---
Documentation/filesystems/resctrl.rst | 23 +++++++-
fs/resctrl/internal.h | 2 +
fs/resctrl/monitor.c | 27 ++++++++++
fs/resctrl/rdtgroup.c | 78 +++++++++++++++++++++++++--
4 files changed, 126 insertions(+), 4 deletions(-)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index cd82c2966ed7..7e62c7fdcefa 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -259,7 +259,9 @@ with the following files:
"mbm_assign_mode":
The supported monitoring modes. The enclosed brackets indicate which mode
- is enabled.
+ is enabled. The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated
+ with counters may reset when "mbm_assign_mode" is changed.
+
::
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
@@ -279,6 +281,15 @@ with the following files:
of counters available is described in the "num_mbm_cntrs" file. Changing the
mode may cause all counters on the resource to reset.
+ Moving to mbm_event mode require users to assign the counters to the events.
+ Otherwise, the MBM event counters will return 'Unassigned' when read.
+
+ The mode is beneficial for AMD platforms that support more CTRL_MON
+ and MON groups than available hardware counters. By default, this
+ feature is enabled on AMD platforms with the ABMC (Assignable Bandwidth
+ Monitoring Counters) capability, ensuring counters remain assigned even
+ when the corresponding RMID is not actively used by any processor.
+
"default":
In default mode, resctrl assumes there is a hardware counter for each
@@ -288,6 +299,16 @@ with the following files:
result in misleading values or display "Unavailable" if no counter is assigned
to the event.
+ * To enable "mbm_event" monitoring mode:
+ ::
+
+ # echo "mbm_event" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+
+ * To enable "default" monitoring mode:
+ ::
+
+ # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+
"num_mbm_cntrs":
The maximum number of counter IDs (total of available and assigned counters)
in each domain when the system supports mbm_event mode.
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 14d99c723ea5..adc9ff3efdfd 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -414,6 +414,8 @@ void resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *
struct rdtgroup *rdtgrp, struct mon_evt *mevt);
int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
+void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
+void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_mon_domain *d);
#ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 618c94cd1ad8..504b869570e6 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1045,6 +1045,33 @@ static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
}
+/**
+ * mbm_cntr_free_all() - Clear all the counter ID configuration details in the
+ * domain @d. Called when mbm_assign_mode is changed.
+ */
+void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_mon_domain *d)
+{
+ memset(d->cntr_cfg, 0, sizeof(*d->cntr_cfg) * r->mon.num_mbm_cntrs);
+}
+
+/**
+ * resctrl_reset_rmid_all() - Reset all non-architecture states for all the
+ * supported RMIDs.
+ */
+void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
+{
+ u32 idx_limit = resctrl_arch_system_num_rmid_idx();
+ enum resctrl_event_id evt;
+ int idx;
+
+ for_each_mbm_event_id(evt) {
+ if (!resctrl_is_mon_event_enabled(evt))
+ continue;
+ idx = MBM_STATE_IDX(evt);
+ memset(d->mbm_states[idx], 0, sizeof(struct mbm_state) * idx_limit);
+ }
+}
+
/**
* resctrl_alloc_config_cntr() - Allocate a counter ID and configure it for the
* event pointed to by @mevt and the resctrl group @rdtgrp within the domain @d.
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 8c67e0897f25..6bb61fcf8673 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1876,6 +1876,77 @@ static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
return 0;
}
+static ssize_t resctrl_mbm_assign_mode_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+ struct rdt_mon_domain *d;
+ int ret = 0;
+ bool enable;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+
+ buf[nbytes - 1] = '\0';
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ rdt_last_cmd_clear();
+
+ if (!strcmp(buf, "default")) {
+ enable = 0;
+ } else if (!strcmp(buf, "mbm_event")) {
+ if (r->mon.mbm_cntr_assignable) {
+ enable = 1;
+ } else {
+ ret = -EINVAL;
+ rdt_last_cmd_puts("mbm_event mode is not supported\n");
+ goto write_exit;
+ }
+ } else {
+ ret = -EINVAL;
+ rdt_last_cmd_puts("Unsupported assign mode\n");
+ goto write_exit;
+ }
+
+ if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
+ if (ret)
+ goto write_exit;
+
+ /* Update the visibility of BMEC related files */
+ resctrl_bmec_files_show(r, !enable);
+
+ /*
+ * Initialize the default memory transaction values for
+ * total and local events.
+ */
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
+ resctrl_set_mon_evt_cfg(QOS_L3_MBM_TOTAL_EVENT_ID,
+ MAX_EVT_CONFIG_BITS);
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
+ resctrl_set_mon_evt_cfg(QOS_L3_MBM_LOCAL_EVENT_ID,
+ READS_TO_LOCAL_MEM |
+ READS_TO_LOCAL_S_MEM |
+ NON_TEMP_WRITE_TO_LOCAL_MEM);
+ /*
+ * Reset all the non-achitectural RMID state and assignable counters.
+ */
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ mbm_cntr_free_all(r, d);
+ resctrl_reset_rmid_all(r, d);
+ }
+ }
+
+write_exit:
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
+ return ret ?: nbytes;
+}
+
static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
struct seq_file *s, void *v)
{
@@ -2203,8 +2274,8 @@ static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgr
struct mon_evt *mevt;
int assign_state;
char domain[10];
+ int ret = 0;
bool found;
- int ret;
mevt = mbm_get_mon_event_by_name(r, event);
if (!mevt) {
@@ -2249,7 +2320,7 @@ static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgr
switch (assign_state) {
case ASSIGN_NONE:
- ret = resctrl_unassign_cntr_event(r, d, rdtgrp, mevt);
+ resctrl_unassign_cntr_event(r, d, rdtgrp, mevt);
break;
case ASSIGN_EXCLUSIVE:
ret = resctrl_assign_cntr_event(r, d, rdtgrp, mevt);
@@ -2463,9 +2534,10 @@ static struct rftype res_common_files[] = {
},
{
.name = "mbm_assign_mode",
- .mode = 0444,
+ .mode = 0644,
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = resctrl_mbm_assign_mode_show,
+ .write = resctrl_mbm_assign_mode_write,
.fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE,
},
{
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 31/32] fs/resctrl: Introduce the interface to switch between monitor modes
2025-06-13 21:05 ` [PATCH v14 31/32] fs/resctrl: Introduce the interface to switch between monitor modes Babu Moger
@ 2025-06-25 23:40 ` Reinette Chatre
2025-07-02 17:39 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 23:40 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> Resctrl subsystem can support two monitoring modes, "mbm_event" or
> "default". In mbm_event mode, monitoring event can only accumulate data
> while it is backed by a hardware counter. In "default" mode, resctrl
> assumes there is a hardware counter for each event within every CTRL_MON
> and MON group.
>
> Introduce interface to switch between mbm_event and default modes.
"Introduce interface" -> "Introduce mbm_assign_mode resctrl file"
>
> Example:
> To list the MBM monitor modes supported:
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_event]
> default
>
> To enable the "mbm_event" monitoring mode:
> $ echo "mbm_event" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>
> To enable the "default" monitoring mode:
> $ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>
> MBM event counters are automatically reset as part of changing the mode.
> Clear both architectural and non-architectural event states to prevent
> overflow conditions during the next event read. Also clear assignable
> counter configuration on all the domains.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> ---
> Documentation/filesystems/resctrl.rst | 23 +++++++-
> fs/resctrl/internal.h | 2 +
> fs/resctrl/monitor.c | 27 ++++++++++
> fs/resctrl/rdtgroup.c | 78 +++++++++++++++++++++++++--
> 4 files changed, 126 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index cd82c2966ed7..7e62c7fdcefa 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -259,7 +259,9 @@ with the following files:
>
> "mbm_assign_mode":
> The supported monitoring modes. The enclosed brackets indicate which mode
> - is enabled.
> + is enabled. The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated
Since there may be more events in future I think the "(mbm_total_bytes and/or
mbm_local_bytes)" can be dropped.
> + with counters may reset when "mbm_assign_mode" is changed.
> +
> ::
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> @@ -279,6 +281,15 @@ with the following files:
> of counters available is described in the "num_mbm_cntrs" file. Changing the
> mode may cause all counters on the resource to reset.
>
> + Moving to mbm_event mode require users to assign the counters to the events.
"Moving to mbm_event mode require" -> "mbm_event counter assignment mode requires"
> + Otherwise, the MBM event counters will return 'Unassigned' when read.
> +
> + The mode is beneficial for AMD platforms that support more CTRL_MON
> + and MON groups than available hardware counters. By default, this
> + feature is enabled on AMD platforms with the ABMC (Assignable Bandwidth
> + Monitoring Counters) capability, ensuring counters remain assigned even
> + when the corresponding RMID is not actively used by any processor.
> +
> "default":
>
> In default mode, resctrl assumes there is a hardware counter for each
> @@ -288,6 +299,16 @@ with the following files:
> result in misleading values or display "Unavailable" if no counter is assigned
> to the event.
>
> + * To enable "mbm_event" monitoring mode:
> + ::
> +
> + # echo "mbm_event" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> +
> + * To enable "default" monitoring mode:
> + ::
> +
> + # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> +
> "num_mbm_cntrs":
> The maximum number of counter IDs (total of available and assigned counters)
> in each domain when the system supports mbm_event mode.
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 14d99c723ea5..adc9ff3efdfd 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -414,6 +414,8 @@ void resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *
> struct rdtgroup *rdtgrp, struct mon_evt *mevt);
> int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
> struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
> +void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
> +void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_mon_domain *d);
>
> #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
> int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 618c94cd1ad8..504b869570e6 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -1045,6 +1045,33 @@ static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
> memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
> }
>
> +/**
> + * mbm_cntr_free_all() - Clear all the counter ID configuration details in the
> + * domain @d. Called when mbm_assign_mode is changed.
> + */
> +void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_mon_domain *d)
> +{
> + memset(d->cntr_cfg, 0, sizeof(*d->cntr_cfg) * r->mon.num_mbm_cntrs);
> +}
> +
> +/**
> + * resctrl_reset_rmid_all() - Reset all non-architecture states for all the
> + * supported RMIDs.
> + */
> +void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
struct rdt_resource *r is unused? At this time it seems unnecessary to check if
an MBM event belongs to particular resource since at this point I expect only L3
is possible. Even so, to be consistent and robust I think it would make flows
easier to understand by always ensureing that mon_evt::rid matches
expected resource.
> +{
> + u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> + enum resctrl_event_id evt;
> + int idx;
> +
> + for_each_mbm_event_id(evt) {
> + if (!resctrl_is_mon_event_enabled(evt))
> + continue;
> + idx = MBM_STATE_IDX(evt);
> + memset(d->mbm_states[idx], 0, sizeof(struct mbm_state) * idx_limit);
sizeof(*d->mbm_states[0])?
> + }
> +}
> +
> /**
> * resctrl_alloc_config_cntr() - Allocate a counter ID and configure it for the
> * event pointed to by @mevt and the resctrl group @rdtgrp within the domain @d.
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 8c67e0897f25..6bb61fcf8673 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1876,6 +1876,77 @@ static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
> return 0;
> }
>
> +static ssize_t resctrl_mbm_assign_mode_write(struct kernfs_open_file *of,
> + char *buf, size_t nbytes, loff_t off)
> +{
> + struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
> + struct rdt_mon_domain *d;
> + int ret = 0;
> + bool enable;
> +
> + /* Valid input requires a trailing newline */
> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
> + return -EINVAL;
> +
> + buf[nbytes - 1] = '\0';
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdt_last_cmd_clear();
> +
> + if (!strcmp(buf, "default")) {
> + enable = 0;
> + } else if (!strcmp(buf, "mbm_event")) {
> + if (r->mon.mbm_cntr_assignable) {
> + enable = 1;
> + } else {
> + ret = -EINVAL;
> + rdt_last_cmd_puts("mbm_event mode is not supported\n");
> + goto write_exit;
write_exit -> out_unlock
> + }
> + } else {
> + ret = -EINVAL;
> + rdt_last_cmd_puts("Unsupported assign mode\n");
> + goto write_exit;
> + }
> +
> + if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
> + if (ret)
> + goto write_exit;
> +
> + /* Update the visibility of BMEC related files */
> + resctrl_bmec_files_show(r, !enable);
> +
> + /*
> + * Initialize the default memory transaction values for
> + * total and local events.
> + */
> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> + resctrl_set_mon_evt_cfg(QOS_L3_MBM_TOTAL_EVENT_ID,
> + MAX_EVT_CONFIG_BITS);
> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
> + resctrl_set_mon_evt_cfg(QOS_L3_MBM_LOCAL_EVENT_ID,
> + READS_TO_LOCAL_MEM |
> + READS_TO_LOCAL_S_MEM |
> + NON_TEMP_WRITE_TO_LOCAL_MEM);
Nice, yes, this belongs in resctrl fs.
> + /*
> + * Reset all the non-achitectural RMID state and assignable counters.
> + */
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + mbm_cntr_free_all(r, d);
> + resctrl_reset_rmid_all(r, d);
> + }
> + }
> +
> +write_exit:
> + mutex_unlock(&rdtgroup_mutex);
> + cpus_read_unlock();
> +
> + return ret ?: nbytes;
> +}
> +
> static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
> struct seq_file *s, void *v)
> {
> @@ -2203,8 +2274,8 @@ static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgr
> struct mon_evt *mevt;
> int assign_state;
> char domain[10];
> + int ret = 0;
> bool found;
> - int ret;
>
> mevt = mbm_get_mon_event_by_name(r, event);
> if (!mevt) {
> @@ -2249,7 +2320,7 @@ static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgr
>
> switch (assign_state) {
> case ASSIGN_NONE:
> - ret = resctrl_unassign_cntr_event(r, d, rdtgrp, mevt);
> + resctrl_unassign_cntr_event(r, d, rdtgrp, mevt);
> break;
> case ASSIGN_EXCLUSIVE:
> ret = resctrl_assign_cntr_event(r, d, rdtgrp, mevt);
Two stray hunks?
> @@ -2463,9 +2534,10 @@ static struct rftype res_common_files[] = {
> },
> {
> .name = "mbm_assign_mode",
> - .mode = 0444,
> + .mode = 0644,
> .kf_ops = &rdtgroup_kf_single_ops,
> .seq_show = resctrl_mbm_assign_mode_show,
> + .write = resctrl_mbm_assign_mode_write,
> .fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE,
> },
> {
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 31/32] fs/resctrl: Introduce the interface to switch between monitor modes
2025-06-25 23:40 ` Reinette Chatre
@ 2025-07-02 17:39 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-07-02 17:39 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/25/25 18:40, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>> Resctrl subsystem can support two monitoring modes, "mbm_event" or
>> "default". In mbm_event mode, monitoring event can only accumulate data
>> while it is backed by a hardware counter. In "default" mode, resctrl
>> assumes there is a hardware counter for each event within every CTRL_MON
>> and MON group.
>>
>> Introduce interface to switch between mbm_event and default modes.
>
> "Introduce interface" -> "Introduce mbm_assign_mode resctrl file"
>
Sure.
>>
>> Example:
>> To list the MBM monitor modes supported:
>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> [mbm_event]
>> default
>>
>> To enable the "mbm_event" monitoring mode:
>> $ echo "mbm_event" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>
>> To enable the "default" monitoring mode:
>> $ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>
>> MBM event counters are automatically reset as part of changing the mode.
>> Clear both architectural and non-architectural event states to prevent
>> overflow conditions during the next event read. Also clear assignable
>> counter configuration on all the domains.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> ---
>> Documentation/filesystems/resctrl.rst | 23 +++++++-
>> fs/resctrl/internal.h | 2 +
>> fs/resctrl/monitor.c | 27 ++++++++++
>> fs/resctrl/rdtgroup.c | 78 +++++++++++++++++++++++++--
>> 4 files changed, 126 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index cd82c2966ed7..7e62c7fdcefa 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -259,7 +259,9 @@ with the following files:
>>
>> "mbm_assign_mode":
>> The supported monitoring modes. The enclosed brackets indicate which mode
>> - is enabled.
>> + is enabled. The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated
>
> Since there may be more events in future I think the "(mbm_total_bytes and/or
> mbm_local_bytes)" can be dropped.
Sure.
>
>> + with counters may reset when "mbm_assign_mode" is changed.
>> +
>> ::
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> @@ -279,6 +281,15 @@ with the following files:
>> of counters available is described in the "num_mbm_cntrs" file. Changing the
>> mode may cause all counters on the resource to reset.
>>
>> + Moving to mbm_event mode require users to assign the counters to the events.
>
> "Moving to mbm_event mode require" -> "mbm_event counter assignment mode requires"
Sure.
>
>> + Otherwise, the MBM event counters will return 'Unassigned' when read.
>> +
>> + The mode is beneficial for AMD platforms that support more CTRL_MON
>> + and MON groups than available hardware counters. By default, this
>> + feature is enabled on AMD platforms with the ABMC (Assignable Bandwidth
>> + Monitoring Counters) capability, ensuring counters remain assigned even
>> + when the corresponding RMID is not actively used by any processor.
>> +
>> "default":
>>
>> In default mode, resctrl assumes there is a hardware counter for each
>> @@ -288,6 +299,16 @@ with the following files:
>> result in misleading values or display "Unavailable" if no counter is assigned
>> to the event.
>>
>> + * To enable "mbm_event" monitoring mode:
>> + ::
>> +
>> + # echo "mbm_event" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> +
>> + * To enable "default" monitoring mode:
>> + ::
>> +
>> + # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> +
>> "num_mbm_cntrs":
>> The maximum number of counter IDs (total of available and assigned counters)
>> in each domain when the system supports mbm_event mode.
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 14d99c723ea5..adc9ff3efdfd 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -414,6 +414,8 @@ void resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *
>> struct rdtgroup *rdtgrp, struct mon_evt *mevt);
>> int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
>> struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
>> +void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
>> +void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_mon_domain *d);
>>
>> #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
>> int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 618c94cd1ad8..504b869570e6 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -1045,6 +1045,33 @@ static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
>> memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
>> }
>>
>> +/**
>> + * mbm_cntr_free_all() - Clear all the counter ID configuration details in the
>> + * domain @d. Called when mbm_assign_mode is changed.
>> + */
>> +void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_mon_domain *d)
>> +{
>> + memset(d->cntr_cfg, 0, sizeof(*d->cntr_cfg) * r->mon.num_mbm_cntrs);
>> +}
>> +
>> +/**
>> + * resctrl_reset_rmid_all() - Reset all non-architecture states for all the
>> + * supported RMIDs.
>> + */
>> +void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
>
> struct rdt_resource *r is unused? At this time it seems unnecessary to check if
> an MBM event belongs to particular resource since at this point I expect only L3
> is possible. Even so, to be consistent and robust I think it would make flows
> easier to understand by always ensureing that mon_evt::rid matches
> expected resource.
Agree. But in this we dont access mon_evt structure here.
>
>> +{
>> + u32 idx_limit = resctrl_arch_system_num_rmid_idx();
>> + enum resctrl_event_id evt;
>> + int idx;
>> +
>> + for_each_mbm_event_id(evt) {
>> + if (!resctrl_is_mon_event_enabled(evt))
>> + continue;
>> + idx = MBM_STATE_IDX(evt);
>> + memset(d->mbm_states[idx], 0, sizeof(struct mbm_state) * idx_limit);
>
> sizeof(*d->mbm_states[0])?
>
Sure.
>> + }
>> +}
>> +
>> /**
>> * resctrl_alloc_config_cntr() - Allocate a counter ID and configure it for the
>> * event pointed to by @mevt and the resctrl group @rdtgrp within the domain @d.
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 8c67e0897f25..6bb61fcf8673 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -1876,6 +1876,77 @@ static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
>> return 0;
>> }
>>
>> +static ssize_t resctrl_mbm_assign_mode_write(struct kernfs_open_file *of,
>> + char *buf, size_t nbytes, loff_t off)
>> +{
>> + struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
>> + struct rdt_mon_domain *d;
>> + int ret = 0;
>> + bool enable;
>> +
>> + /* Valid input requires a trailing newline */
>> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> + return -EINVAL;
>> +
>> + buf[nbytes - 1] = '\0';
>> +
>> + cpus_read_lock();
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + rdt_last_cmd_clear();
>> +
>> + if (!strcmp(buf, "default")) {
>> + enable = 0;
>> + } else if (!strcmp(buf, "mbm_event")) {
>> + if (r->mon.mbm_cntr_assignable) {
>> + enable = 1;
>> + } else {
>> + ret = -EINVAL;
>> + rdt_last_cmd_puts("mbm_event mode is not supported\n");
>> + goto write_exit;
>
> write_exit -> out_unlock
>
Sure.
>> + }
>> + } else {
>> + ret = -EINVAL;
>> + rdt_last_cmd_puts("Unsupported assign mode\n");
>> + goto write_exit;
>> + }
>> +
>> + if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> + ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
>> + if (ret)
>> + goto write_exit;
>> +
>> + /* Update the visibility of BMEC related files */
>> + resctrl_bmec_files_show(r, !enable);
>> +
>> + /*
>> + * Initialize the default memory transaction values for
>> + * total and local events.
>> + */
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>> + resctrl_set_mon_evt_cfg(QOS_L3_MBM_TOTAL_EVENT_ID,
>> + MAX_EVT_CONFIG_BITS);
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>> + resctrl_set_mon_evt_cfg(QOS_L3_MBM_LOCAL_EVENT_ID,
>> + READS_TO_LOCAL_MEM |
>> + READS_TO_LOCAL_S_MEM |
>> + NON_TEMP_WRITE_TO_LOCAL_MEM);
>
> Nice, yes, this belongs in resctrl fs.
>
>> + /*
>> + * Reset all the non-achitectural RMID state and assignable counters.
>> + */
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + mbm_cntr_free_all(r, d);
>> + resctrl_reset_rmid_all(r, d);
>> + }
>> + }
>> +
>> +write_exit:
>> + mutex_unlock(&rdtgroup_mutex);
>> + cpus_read_unlock();
>> +
>> + return ret ?: nbytes;
>> +}
>> +
>> static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
>> struct seq_file *s, void *v)
>> {
>> @@ -2203,8 +2274,8 @@ static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgr
>> struct mon_evt *mevt;
>> int assign_state;
>> char domain[10];
>> + int ret = 0;
>> bool found;
>> - int ret;
>>
>> mevt = mbm_get_mon_event_by_name(r, event);
>> if (!mevt) {
>> @@ -2249,7 +2320,7 @@ static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgr
>>
>> switch (assign_state) {
>> case ASSIGN_NONE:
>> - ret = resctrl_unassign_cntr_event(r, d, rdtgrp, mevt);
>> + resctrl_unassign_cntr_event(r, d, rdtgrp, mevt);
>> break;
>> case ASSIGN_EXCLUSIVE:
>> ret = resctrl_assign_cntr_event(r, d, rdtgrp, mevt);
>
> Two stray hunks?
>
Yes. Fixed it.
>> @@ -2463,9 +2534,10 @@ static struct rftype res_common_files[] = {
>> },
>> {
>> .name = "mbm_assign_mode",
>> - .mode = 0444,
>> + .mode = 0644,
>> .kf_ops = &rdtgroup_kf_single_ops,
>> .seq_show = resctrl_mbm_assign_mode_show,
>> + .write = resctrl_mbm_assign_mode_write,
>> .fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE,
>> },
>> {
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v14 32/32] x86/resctrl: Configure mbm_event mode if supported
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (30 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 31/32] fs/resctrl: Introduce the interface to switch between monitor modes Babu Moger
@ 2025-06-13 21:05 ` Babu Moger
2025-06-25 23:40 ` Reinette Chatre
2025-06-13 21:41 ` [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Luck, Tony
2025-06-24 21:25 ` Reinette Chatre
33 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-06-13 21:05 UTC (permalink / raw)
To: babu.moger, corbet, tony.luck, reinette.chatre, Dave.Martin,
james.morse, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Configure mbm_event mode on AMD platforms. On AMD platforms, it is
recommended to use the mbm_event mode, if supported, to prevent the
hardware from resetting counters between reads. This can result in
misleading values or display "Unavailable" if no counter is assigned
to the event.
The mbm_event mode, referred to as ABMC (Assignable Bandwidth Monitoring
Counters) on AMD, is enabled by default when supported by the system.
Update ABMC across all logical processors within the resctrl domain to
ensure proper functionality.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v14: Updated the changelog to reflect the change in name of the monitor mode
to mbm_event.
v13 : Added the call resctrl_init_evt_configuration() to setup the event
configuration during init.
Resolved conflicts caused by the recent FS/ARCH code restructure.
v12: Moved the resctrl_arch_mbm_cntr_assign_set_one to domain_add_cpu_mon().
Updated the commit log.
v11: Commit text in imperative tone. Added few more details.
Moved resctrl_arch_mbm_cntr_assign_set_one() to monitor.c.
v10: Commit text in imperative tone.
v9: Minor code change due to merge. Actual code did not change.
v8: Renamed resctrl_arch_mbm_cntr_assign_configure to
resctrl_arch_mbm_cntr_assign_set_one.
Adde r->mon_capable check.
Commit message update.
v7: Introduced resctrl_arch_mbm_cntr_assign_configure() to configure.
Moved the default settings to rdt_get_mon_l3_config(). It should be
done before the hotplug handler is called. It cannot be done at
rdtgroup_init().
v6: Keeping the default enablement in arch init code for now.
This may need some discussion.
Renamed resctrl_arch_configure_abmc to resctrl_arch_mbm_cntr_assign_configure.
v5: New patch to enable ABMC by default.
---
arch/x86/kernel/cpu/resctrl/core.c | 7 +++++++
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/monitor.c | 8 ++++++++
3 files changed, 16 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1df171d04bea..a8fda1b408c7 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -518,6 +518,9 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
d = container_of(hdr, struct rdt_mon_domain, hdr);
cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
+ /* Update the mbm_assign_mode state for the CPU if supported */
+ if (r->mon.mbm_cntr_assignable)
+ resctrl_arch_mbm_cntr_assign_set_one(r);
return;
}
@@ -537,6 +540,10 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
d->ci_id = ci->id;
cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
+ /* Update the mbm_mbm_assign state for the CPU if supported */
+ if (r->mon.mbm_cntr_assignable)
+ resctrl_arch_mbm_cntr_assign_set_one(r);
+
arch_mon_domain_online(r, d);
if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 77a9ce4a8403..f03abae3a2c4 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -216,5 +216,6 @@ bool rdt_cpu_has(int flag);
void __init intel_rdt_mbm_apply_quirk(void);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
+void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 053f516a8e67..74282dab7c2b 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -430,6 +430,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
r->mon.mbm_assign_on_mkdir = true;
+ hw_res->mbm_cntr_assign_enabled = true;
}
r->mon_capable = true;
@@ -533,3 +534,10 @@ void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
memset(am, 0, sizeof(*am));
}
}
+
+void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r)
+{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+ resctrl_abmc_set_one_amd(&hw_res->mbm_cntr_assign_enabled);
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v14 32/32] x86/resctrl: Configure mbm_event mode if supported
2025-06-13 21:05 ` [PATCH v14 32/32] x86/resctrl: Configure mbm_event mode if supported Babu Moger
@ 2025-06-25 23:40 ` Reinette Chatre
2025-07-02 17:45 ` Moger, Babu
0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-25 23:40 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> @@ -537,6 +540,10 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> d->ci_id = ci->id;
> cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
>
> + /* Update the mbm_mbm_assign state for the CPU if supported */
"mbm_mbm_assign" -> "mbm_assign_mode"?
> + if (r->mon.mbm_cntr_assignable)
> + resctrl_arch_mbm_cntr_assign_set_one(r);
> +
> arch_mon_domain_online(r, d);
>
> if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 32/32] x86/resctrl: Configure mbm_event mode if supported
2025-06-25 23:40 ` Reinette Chatre
@ 2025-07-02 17:45 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-07-02 17:45 UTC (permalink / raw)
To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
tglx, mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Reinette,
On 6/25/25 18:40, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:05 PM, Babu Moger wrote:
>
>> @@ -537,6 +540,10 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
>> d->ci_id = ci->id;
>> cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
>>
>> + /* Update the mbm_mbm_assign state for the CPU if supported */
>
> "mbm_mbm_assign" -> "mbm_assign_mode"?
>
Sure.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* RE: [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (31 preceding siblings ...)
2025-06-13 21:05 ` [PATCH v14 32/32] x86/resctrl: Configure mbm_event mode if supported Babu Moger
@ 2025-06-13 21:41 ` Luck, Tony
2025-06-16 14:47 ` Moger, Babu
2025-06-24 21:25 ` Reinette Chatre
33 siblings, 1 reply; 114+ messages in thread
From: Luck, Tony @ 2025-06-13 21:41 UTC (permalink / raw)
To: Babu Moger, corbet@lwn.net, Chatre, Reinette, Dave.Martin@arm.com,
james.morse@arm.com, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com
Cc: x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org,
rostedt@goodmis.org, paulmck@kernel.org, thuth@redhat.com,
ardb@kernel.org, gregkh@linuxfoundation.org, seanjc@google.com,
thomas.lendacky@amd.com, pawan.kumar.gupta@linux.intel.com,
manali.shukla@amd.com, perry.yuan@amd.com, Huang, Kai,
peterz@infradead.org, Li, Xiaoyao, kan.liang@linux.intel.com,
mario.limonciello@amd.com, Li, Xin3, gautham.shenoy@amd.com,
xin@zytor.com, Bae, Chang Seok, fenghuay@nvidia.com,
peternewman@google.com, Wieczor-Retman, Maciej, Eranian, Stephane,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Babu,
Compiling with "make W=1" you have several kerneldoc comments on new functions
that do not describe their parameters.
E.g.
/**
* resctrl_config_cntr() - Configure the counter ID for the event, RMID pair in
* the domain.
*
* Assign the counter if @assign is true else unassign the counter. Reset the
* associated non-architectural state.
*/
static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
enum resctrl_event_id evtid, u32 rmid, u32 closid,
u32 cntr_id, bool assign)
Warning: fs/resctrl/monitor.c:984 function parameter 'r' not described in 'resctrl_config_cntr'
Warning: fs/resctrl/monitor.c:984 function parameter 'd' not described in 'resctrl_config_cntr'
Warning: fs/resctrl/monitor.c:984 function parameter 'evtid' not described in 'resctrl_config_cntr'
Warning: fs/resctrl/monitor.c:984 function parameter 'rmid' not described in 'resctrl_config_cntr'
Warning: fs/resctrl/monitor.c:984 function parameter 'closid' not described in 'resctrl_config_cntr'
Warning: fs/resctrl/monitor.c:984 function parameter 'cntr_id' not described in 'resctrl_config_cntr'
Warning: fs/resctrl/monitor.c:984 function parameter 'assign' not described in 'resctrl_config_cntr'
-Tony
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: RE: [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
2025-06-13 21:41 ` [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Luck, Tony
@ 2025-06-16 14:47 ` Moger, Babu
0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-16 14:47 UTC (permalink / raw)
To: Luck, Tony, corbet@lwn.net, Chatre, Reinette, Dave.Martin@arm.com,
james.morse@arm.com, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com
Cc: x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org,
rostedt@goodmis.org, paulmck@kernel.org, thuth@redhat.com,
ardb@kernel.org, gregkh@linuxfoundation.org, seanjc@google.com,
thomas.lendacky@amd.com, pawan.kumar.gupta@linux.intel.com,
manali.shukla@amd.com, perry.yuan@amd.com, Huang, Kai,
peterz@infradead.org, Li, Xiaoyao, kan.liang@linux.intel.com,
mario.limonciello@amd.com, Li, Xin3, gautham.shenoy@amd.com,
xin@zytor.com, Bae, Chang Seok, fenghuay@nvidia.com,
peternewman@google.com, Wieczor-Retman, Maciej, Eranian, Stephane,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Hi Tony,
On 6/13/25 16:41, Luck, Tony wrote:
> Babu,
>
> Compiling with "make W=1" you have several kerneldoc comments on new functions
> that do not describe their parameters.
>
> E.g.
>
> /**
> * resctrl_config_cntr() - Configure the counter ID for the event, RMID pair in
> * the domain.
> *
> * Assign the counter if @assign is true else unassign the counter. Reset the
> * associated non-architectural state.
> */
> static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> enum resctrl_event_id evtid, u32 rmid, u32 closid,
> u32 cntr_id, bool assign)
>
>
> Warning: fs/resctrl/monitor.c:984 function parameter 'r' not described in 'resctrl_config_cntr'
> Warning: fs/resctrl/monitor.c:984 function parameter 'd' not described in 'resctrl_config_cntr'
> Warning: fs/resctrl/monitor.c:984 function parameter 'evtid' not described in 'resctrl_config_cntr'
> Warning: fs/resctrl/monitor.c:984 function parameter 'rmid' not described in 'resctrl_config_cntr'
> Warning: fs/resctrl/monitor.c:984 function parameter 'closid' not described in 'resctrl_config_cntr'
> Warning: fs/resctrl/monitor.c:984 function parameter 'cntr_id' not described in 'resctrl_config_cntr'
> Warning: fs/resctrl/monitor.c:984 function parameter 'assign' not described in 'resctrl_config_cntr'
>
Yes. I noticed several of them.
Warning goes away after replacing "/**" with "/*".
Now I am not sure if we fix it with replacing with "/*" or adding
definitions for each of these parameters. There will be too many
repetitions. All these functions take r, d, eventid, and rmid as
parameters. Also these are static functions.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
2025-06-13 21:04 [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (32 preceding siblings ...)
2025-06-13 21:41 ` [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Luck, Tony
@ 2025-06-24 21:25 ` Reinette Chatre
33 siblings, 0 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-06-24 21:25 UTC (permalink / raw)
To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx,
mingo, bp, dave.hansen
Cc: x86, hpa, akpm, rostedt, paulmck, thuth, ardb, gregkh, seanjc,
thomas.lendacky, pawan.kumar.gupta, manali.shukla, perry.yuan,
kai.huang, peterz, xiaoyao.li, kan.liang, mario.limonciello,
xin3.li, gautham.shenoy, xin, chang.seok.bae, fenghuay,
peternewman, maciej.wieczor-retman, eranian, linux-doc,
linux-kernel
Hi Babu,
On 6/13/25 2:04 PM, Babu Moger wrote:
>
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
>
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
>
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
> The patches are based on top of commit
> b75dc5e1399df (tip/master) Merge branch into tip/master: 'sched/core'
>
> # Introduction
>
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, the bandwidth monitoring feature on AMD systems only guarantees
> that RMIDs currently assigned to a processor will be tracked by hardware.
> The counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked during a particular time. Users do not have the option
> to monitor a group or set of groups for a certain period of time without
> worrying about counter being reset in between.
"about counter" -> "about counters" ?
>
> The ABMC feature allows users to assign a hardware counter ID to an RMID,
> event pair and monitor bandwidth usage as long as it is assigned. The
> hardware continues to track the assigned counter until it is explicitly
> unassigned by the user. Additionally, the user can specify the type of
> memory transactions (e.g., reads, writes) to be tracked by the counter
> for the assigned RMID.
>
> Without ABMC enabled, monitoring will work in current 'default' mode without
> assignment option.
>
> # History
>
> Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
> Event Configuration). Peter had concerns with that implementation because
> it may be not be compatible with ARM's MPAM.
>
> Here are the threads discussing the concerns and new interface to address the concerns.
> https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>
> Here are the finalized requirements based on the discussion:
>
> * BMEC and ABMC are incompatible with each other. They need to be mutually exclusive.
>
> * Eliminate global assignment listing. The interface
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.
>
> * Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
> The configuration file names should be free-form, allowing users to create them as needed.
>
> * Perform assignment listing at the group level by introducing mbm_L3_assignments
> in each monitoring group level. The listing should provide the following details:
>
> Event Configuration: Specifies the event configuration applied. This will be crucial
> when "mkdir" on event configuration is added in the future, leading to the creation
> of mon_data/mon_l3_*/<event configuration>.
>
> Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.
>
> Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).
>
> Exclusive assignment: Assign the counter ID the RMID, event pair exclusively.
>
> Shared assignment: A shared assignment applies to both soft-ABMC and ABMC. A user can designate a
> "counter" (could be hardware counter or "active" RMID) as shared and that means
> the counter within that domain is shared between different monitor groups and actual
> assignment is scheduled by resctrl.
>
> Unassigned: No longer assigned.
>
> * Provide option to enable or disable auto assignment when new group is created.
>
> * Keep the flexibilty to support future assign options like Soft-ABMC etc.
> https://lore.kernel.org/lkml/7f10fa69-d1fe-4748-b10c-fa0c9b60bd66@intel.com/
>
>
> This series tries to address all the requirements listed above.
Please drop the "tries to". Also please do not say "address all requirements" when this
is not the case. This series does not address all the requirements listed
(no dynamic event configurations via mkdir and no shared assignment). Please be specific
about what this series addresses and what it leaves for "future", but highlight that
while this series does not implement all requirements it does create framework
to support their future implementation.
>
> # Implementation details
>
> Create a generic interface aimed to support user space assignment of scarce
drop "aimed"
> counters used for monitoring. First usage of interface is by ABMC with option
> to expand usage to "soft-ABMC" and MPAM counters in future.
>
> Feature adds following interface files:
>
> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
> monitoring features supported. The enclosed brackets indicate which
> feature is enabled.
>
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: The maximum number of monitoring counters
> (total of available and assigned counters) in each domain when the system supports
> mbm_assign_mode.
>
> /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: The number of monitoring counters
> available for assignment in each domain when mbm_event mode is enabled on the system.
Why is "num_mbm_cntrs" connected to mbm_assign_mode while "available_mbm_cntrs" is
connected to mbm_event mode? Perhaps both can be "mbm_event" mode to reduce confusion?
>
> /sys/fs/resctrl/info/L3_MON/event_configs: Contains sub-directory for each MBM event
> that can be assigned to a counter.
>
> /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter: The type of
> memory transactions tracked by the event mbm_total_bytes.
>
> /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter: The type of
> memory transactions tracked by the event mbm_local_bytes.
>
> /sys/fs/resctrl/mbm_L3_assignments: Per monitor group interface to list or modify
> counters assigned to the group.
>
> # Examples
>
> a. Check if MBM assign support is available
> #mount -t resctrl resctrl /sys/fs/resctrl/
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_event]
> default
>
> mbm_event feature is detected and it is enabled.
>
> b. Check how many assignable counters are supported.
>
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=32;1=32
>
> c. Check how many assignable counters are available for assignment in each domain.
>
> # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
> 0=30;1=30
>
> d. Check default event configuration.
>
> # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
> local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
> local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
>
> # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
> local_reads,local_non_temporal_writes,local_reads_slow_memory
>
> e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
> to list and modify that group's monitoring states.
>
> The list is displayed in the following format:
>
> <Event>:<Domain ID>=<Assignment type>
Suggest adding multiple domains to example. Above creates impression that each domain
is listed on its own line (until example below clears that up).
>
> Event: A valid MBM event listed in the
> /sys/fs/resctrl/info/L3_MON/event_configs directory.
>
> Domain ID: A valid domain ID.
>
> Assignment types:
>
> _ : No counter assigned.
>
> e : Counter assigned exclusively.
>
> To list the default group states:
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
>
> To unassign the counter associated with the mbm_total_bytes event on domain 0:
> # echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=_;1=e
> mbm_local_bytes:0=e;1=e
>
> To unassign the counter associated with the mbm_total_bytes event on all domains:
> # echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
> # cat /sys/fs/resctrl/mbm_L3_assignment
> mbm_total_bytes:0=_;1=_
> mbm_local_bytes:0=e;1=e
>
> To assign a counter associated with the mbm_total_bytes event on all domains in exclusive mode:
> # echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
>
> g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
> There is no change in reading the events with the assignment. If the event is unassigned
> when reading, then the read will come back as "Unassigned".
>
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 779247936
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> 765207488
>
> h. Check the default event configurations.
>
> # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
> local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
> local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
>
> # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
> local_reads,local_non_temporal_writes,local_reads_slow_memory
>
> i. Change the event configuration for mbm_local_bytes.
>
> # echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>
> # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads
Note that examples are inconsistent wrt spacing in output of this file. This is expected
to match how the implementation in series does the spacing.
>
> This will update all (across all domains of all monitor groups) counter assignments
> associated with the mbm_local_bytes event.
>
> j. Now read the local event again. The first read may come back with "Unavailable"
> status. The subsequent read of mbm_local_bytes will display only the read events.
Above specifies "will display only the read events" while previous step added
"local_non_temporal_writes" to the memory transactions. What is meant with "only the read events"?
>
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> Unavailable
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> 314101
>
> k. Users have the option to go back to 'default' mbm_assign_mode if required.
> This can be done using the following command. Note that switching the
> mbm_assign_mode will reset all the MBM counters (and thus all MBM events) of all
> the resctrl groups.
>
> # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> mbm_event
> [default]
>
> l. Unmount the resctrl
>
> #umount /sys/fs/resctrl/
> ---
Reinette
^ permalink raw reply [flat|nested] 114+ messages in thread