* [PATCH v9 0/9] x86/resctrl: mba_MBps enhancement
@ 2024-11-14 0:17 Tony Luck
2024-11-14 0:17 ` [PATCH v9 1/9] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Tony Luck
` (8 more replies)
0 siblings, 9 replies; 32+ messages in thread
From: Tony Luck @ 2024-11-14 0:17 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches,
Tony Luck
Add support to choose the memory monitor bandwidth event independently
for each ctrl_mon group when resctrl is mounted with the mba_MBps
option. Users may want this for applications that are not localized to
NUMA boundaries. Default behavior still uses local memory bandwidth
when that event is supported by the platform.
Side benefit[0]: Systems that do not support the local bandwidth monitor
event but do support the total bandwidth event can now use the mba_MBps
mount option.
Changes since v8:
Link: https://lore.kernel.org/all/20241029172832.93963-1-tony.luck@intel.com
Patch(es): Change
1: New to this series. Almost direct copy from Babu's patch6
in the ABMC series. Only change is to drop the __init
from resctrl_file_fflags_init() because I need to use it
at runtime for mount/unmount.
2: Was patch 1
Fenghua: Use is_mbm_local_enabled() instead of open
coded bit check.
Reinette: Fix comment for @mba_mbps_event
Move check for local event after check for any event
Move initialization of rdtgroup_default.mba_mbps_event
into rdtgroup_setup_default() and make it conditional
on is_mbm_local_enabled().
Move initialization of rdtgrp->mba_mbps_event inside
"if (resctrl_arch_mon_capable())" and make it
conditional on is_mba_sc(NULL).
Tony: Moved the fallback to total bandwidth to later in
series until all code is changed to cope with total.
3-4: Was 2-3
Reinette: Shuffled the pieces of these two patches to
flow better.
Fenghue: Don't drop the comment when refactoring, but do
update a little.
5: Was 4. No change.
6: Was 5.
Reinette: Expand commit change log
Fix rdtgroup_mba_mbps_event_show() to return -ENOENT
if rdtgrp isn't found.
Add pr_warn_once() to cover default case in switch.
File mode of mba_MBps_event 0444 in this patch.
Initialize .fflags to "RFTYPE_CTRL_BASE | RFTYPE_MON_BASE"
using Babu's resctrl_file_fflags_init() helper
instead of adding mba_mbps_event_init() function.
7: Was 6.
Reinette: Expand commit change log
Just use "Unsupported event" event for all invalid
user input instead of separate messages for
different problems.
Use this patch to switch mode from 0444 to 0644
8: New. Moved the fallback to total bandwidth to this patch.
9: No change. I punted on trying to explain the perils of
users rapidly switching between mba_sc events.
[0] My original objective!
Babu Moger (1):
x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
Tony Luck (8):
x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control
x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event
x86/resctrl: Compute memory bandwidth for all supported events
x86/resctrl: Relax checks for mba_MBps mount option
x86/resctrl: Add "mba_MBps_event" file to ctrl_mon directories
x86/resctrl: Add write option to "mba_MBps_event" file
x86/resctrl: Make mba_sc use total bandwidth if local is not supported
x86/resctrl: Document the new "mba_MBps_event" file
Documentation/arch/x86/resctrl.rst | 10 +++
include/linux/resctrl.h | 2 +
arch/x86/kernel/cpu/resctrl/internal.h | 9 +-
arch/x86/kernel/cpu/resctrl/core.c | 9 +-
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 70 +++++++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 101 ++++++++++++----------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 36 ++++----
7 files changed, 173 insertions(+), 64 deletions(-)
base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
--
2.47.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v9 1/9] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
2024-11-14 0:17 [PATCH v9 0/9] x86/resctrl: mba_MBps enhancement Tony Luck
@ 2024-11-14 0:17 ` Tony Luck
2024-11-15 16:19 ` Moger, Babu
2024-11-20 0:38 ` Reinette Chatre
2024-11-14 0:17 ` [PATCH v9 2/9] x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control Tony Luck
` (7 subsequent siblings)
8 siblings, 2 replies; 32+ messages in thread
From: Tony Luck @ 2024-11-14 0:17 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches,
Tony Luck
From: Babu Moger <babu.moger@amd.com>
thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.
Adding new files will involve adding another function to initialize
the fflags. This can be simplified by adding a new function
resctrl_file_fflags_init() and passing the file name and flags
to be initialized.
Consolidate fflags initialization into resctrl_file_fflags_init() and
remove thread_throttle_mode_init() and mbm_config_rftype_init().
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/internal.h | 3 +--
arch/x86/kernel/cpu/resctrl/core.c | 4 +++-
arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++++--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 15 ++-------------
4 files changed, 10 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 955999aecfca..faaff9d64102 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -647,8 +647,7 @@ void cqm_handle_limbo(struct work_struct *work);
bool has_busy_rmid(struct rdt_mon_domain *d);
void __check_limbo(struct rdt_mon_domain *d, bool force_free);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
-void __init thread_throttle_mode_init(void);
-void __init mbm_config_rftype_init(const char *config);
+void resctrl_file_fflags_init(const char *config, unsigned long fflags);
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b681c2e07dbf..f3ee5859b69d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -234,7 +234,9 @@ static __init bool __get_mem_config_intel(struct rdt_resource *r)
r->membw.throttle_mode = THREAD_THROTTLE_PER_THREAD;
else
r->membw.throttle_mode = THREAD_THROTTLE_MAX;
- thread_throttle_mode_init();
+
+ resctrl_file_fflags_init("thread_throttle_mode",
+ RFTYPE_CTRL_INFO | RFTYPE_RES_MB);
r->alloc_capable = true;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 851b561850e0..7ef1a293cc13 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1223,11 +1223,13 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
mbm_total_event.configurable = true;
- mbm_config_rftype_init("mbm_total_bytes_config");
+ resctrl_file_fflags_init("mbm_total_bytes_config",
+ RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
mbm_local_event.configurable = true;
- mbm_config_rftype_init("mbm_local_bytes_config");
+ resctrl_file_fflags_init("mbm_local_bytes_config",
+ RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d7163b764c62..2b198ef95e1e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2020,24 +2020,13 @@ static struct rftype *rdtgroup_get_rftype_by_name(const char *name)
return NULL;
}
-void __init thread_throttle_mode_init(void)
-{
- struct rftype *rft;
-
- rft = rdtgroup_get_rftype_by_name("thread_throttle_mode");
- if (!rft)
- return;
-
- rft->fflags = RFTYPE_CTRL_INFO | RFTYPE_RES_MB;
-}
-
-void __init mbm_config_rftype_init(const char *config)
+void resctrl_file_fflags_init(const char *config, unsigned long fflags)
{
struct rftype *rft;
rft = rdtgroup_get_rftype_by_name(config);
if (rft)
- rft->fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE;
+ rft->fflags = fflags;
}
/**
--
2.47.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 2/9] x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control
2024-11-14 0:17 [PATCH v9 0/9] x86/resctrl: mba_MBps enhancement Tony Luck
2024-11-14 0:17 ` [PATCH v9 1/9] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Tony Luck
@ 2024-11-14 0:17 ` Tony Luck
2024-11-15 16:20 ` Moger, Babu
2024-11-20 1:08 ` Reinette Chatre
2024-11-14 0:17 ` [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event Tony Luck
` (6 subsequent siblings)
8 siblings, 2 replies; 32+ messages in thread
From: Tony Luck @ 2024-11-14 0:17 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches,
Tony Luck
Resctrl uses local memory bandwidth event as input to the feedback
loop when the mba_MBps mount option is used. This means that this
mount option cannot be used on systems that only support monitoring
of total bandwidth.
Prepare to allow users to choose the input event independently for
each ctrl_mon group.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 2 ++
arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/core.c | 3 +++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++++++
4 files changed, 13 insertions(+)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d94abba1c716..fd05b937e2f4 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -49,6 +49,8 @@ enum resctrl_event_id {
QOS_L3_MBM_LOCAL_EVENT_ID = 0x03,
};
+extern enum resctrl_event_id mba_mbps_default_event;
+
/**
* struct resctrl_staged_config - parsed configuration to be applied
* @new_ctrl: new ctrl value to be loaded
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index faaff9d64102..485800055a7d 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -283,6 +283,7 @@ struct pseudo_lock_region {
* monitor only or ctrl_mon group
* @mon: mongroup related data
* @mode: mode of resource group
+ * @mba_mbps_event: input monitoring event id when mba_sc is enabled
* @plr: pseudo-locked region
*/
struct rdtgroup {
@@ -295,6 +296,7 @@ struct rdtgroup {
enum rdt_group_type type;
struct mongroup mon;
enum rdtgrp_mode mode;
+ enum resctrl_event_id mba_mbps_event;
struct pseudo_lock_region *plr;
};
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index f3ee5859b69d..94bf559966d6 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -963,6 +963,9 @@ static __init bool get_rdt_mon_resources(void)
if (!rdt_mon_features)
return false;
+ if (is_mbm_local_enabled())
+ mba_mbps_default_event = QOS_L3_MBM_LOCAL_EVENT_ID;
+
return !rdt_get_mon_l3_config(r);
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 2b198ef95e1e..a8022bddf9f7 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -65,6 +65,8 @@ static void rdtgroup_destroy_root(void);
struct dentry *debugfs_resctrl;
+enum resctrl_event_id mba_mbps_default_event;
+
static bool resctrl_debug;
void rdt_last_cmd_clear(void)
@@ -3611,6 +3613,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
rdt_last_cmd_puts("kernfs subdir error\n");
goto out_del_list;
}
+ if (is_mba_sc(NULL))
+ rdtgrp->mba_mbps_event = mba_mbps_default_event;
}
goto out_unlock;
@@ -3970,6 +3974,8 @@ static void __init rdtgroup_setup_default(void)
rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
rdtgroup_default.type = RDTCTRL_GROUP;
+ if (supports_mba_mbps())
+ rdtgroup_default.mba_mbps_event = mba_mbps_default_event;
INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
--
2.47.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event
2024-11-14 0:17 [PATCH v9 0/9] x86/resctrl: mba_MBps enhancement Tony Luck
2024-11-14 0:17 ` [PATCH v9 1/9] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Tony Luck
2024-11-14 0:17 ` [PATCH v9 2/9] x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control Tony Luck
@ 2024-11-14 0:17 ` Tony Luck
2024-11-15 16:21 ` Moger, Babu
2024-11-20 3:39 ` Reinette Chatre
2024-11-14 0:17 ` [PATCH v9 4/9] x86/resctrl: Compute memory bandwidth for all supported events Tony Luck
` (5 subsequent siblings)
8 siblings, 2 replies; 32+ messages in thread
From: Tony Luck @ 2024-11-14 0:17 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches,
Tony Luck
Instead of hard-coding the memory bandwidth local event as the
input to the mba_sc feedback look, use the event that the user
configured for each ctrl_mon group.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/monitor.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 7ef1a293cc13..2176e355e864 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -752,20 +752,31 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
u32 closid, rmid, cur_msr_val, new_msr_val;
struct mbm_state *pmbm_data, *cmbm_data;
struct rdt_ctrl_domain *dom_mba;
+ enum resctrl_event_id evt_id;
struct rdt_resource *r_mba;
- u32 cur_bw, user_bw, idx;
struct list_head *head;
struct rdtgroup *entry;
+ u32 cur_bw, user_bw;
- if (!is_mbm_local_enabled())
+ if (!is_mbm_enabled())
return;
r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+ evt_id = rgrp->mba_mbps_event;
+
+ if (WARN_ON_ONCE(!is_mbm_event(evt_id)))
+ return;
+ if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_LOCAL_EVENT_ID && !is_mbm_local_enabled()))
+ return;
+ if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_TOTAL_EVENT_ID && !is_mbm_total_enabled()))
+ return;
+
closid = rgrp->closid;
rmid = rgrp->mon.rmid;
- idx = resctrl_arch_rmid_idx_encode(closid, rmid);
- pmbm_data = &dom_mbm->mbm_local[idx];
+ pmbm_data = get_mbm_state(dom_mbm, closid, rmid, evt_id);
+ if (WARN_ON_ONCE(!pmbm_data))
+ return;
dom_mba = get_ctrl_domain_from_cpu(smp_processor_id(), r_mba);
if (!dom_mba) {
@@ -784,7 +795,9 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
*/
head = &rgrp->mon.crdtgrp_list;
list_for_each_entry(entry, head, mon.crdtgrp_list) {
- cmbm_data = &dom_mbm->mbm_local[entry->mon.rmid];
+ cmbm_data = get_mbm_state(dom_mbm, entry->closid, entry->mon.rmid, evt_id);
+ if (WARN_ON_ONCE(!cmbm_data))
+ return;
cur_bw += cmbm_data->prev_bw;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 4/9] x86/resctrl: Compute memory bandwidth for all supported events
2024-11-14 0:17 [PATCH v9 0/9] x86/resctrl: mba_MBps enhancement Tony Luck
` (2 preceding siblings ...)
2024-11-14 0:17 ` [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event Tony Luck
@ 2024-11-14 0:17 ` Tony Luck
2024-11-15 13:53 ` Peter Newman
2024-11-20 3:45 ` Reinette Chatre
2024-11-14 0:17 ` [PATCH v9 5/9] x86/resctrl: Relax checks for mba_MBps mount option Tony Luck
` (4 subsequent siblings)
8 siblings, 2 replies; 32+ messages in thread
From: Tony Luck @ 2024-11-14 0:17 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches,
Tony Luck
Computing the bandwidth for an event is cheap, and only done once
per second. Doing so simplifies switching between events and allows
choosing different events per ctrl_mon group.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/monitor.c | 72 ++++++++++++---------------
1 file changed, 33 insertions(+), 39 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 2176e355e864..da4ae21350c8 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -663,9 +663,12 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
*/
static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
{
- u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
- struct mbm_state *m = &rr->d->mbm_local[idx];
u64 cur_bw, bytes, cur_bytes;
+ struct mbm_state *m;
+
+ m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
+ if (WARN_ON_ONCE(!m))
+ return;
cur_bytes = rr->val;
bytes = cur_bytes - m->prev_bw_bytes;
@@ -826,54 +829,45 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
}
-static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
- u32 closid, u32 rmid)
+static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+ u32 closid, u32 rmid, enum resctrl_event_id evtid)
{
struct rmid_read rr = {0};
rr.r = r;
rr.d = d;
+ rr.evtid = evtid;
+ rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
+ if (IS_ERR(rr.arch_mon_ctx)) {
+ pr_warn_ratelimited("Failed to allocate monitor context: %ld",
+ PTR_ERR(rr.arch_mon_ctx));
+ return;
+ }
+
+ __mon_event_count(closid, rmid, &rr);
/*
- * This is protected from concurrent reads from user
- * as both the user and we hold the global mutex.
+ * If the software controller is enabled, compute the
+ * bandwidth for this event id.
*/
- if (is_mbm_total_enabled()) {
- rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
- rr.val = 0;
- rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
- if (IS_ERR(rr.arch_mon_ctx)) {
- pr_warn_ratelimited("Failed to allocate monitor context: %ld",
- PTR_ERR(rr.arch_mon_ctx));
- return;
- }
-
- __mon_event_count(closid, rmid, &rr);
+ if (is_mba_sc(NULL))
+ mbm_bw_count(closid, rmid, &rr);
- resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
- }
- if (is_mbm_local_enabled()) {
- rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
- rr.val = 0;
- rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
- if (IS_ERR(rr.arch_mon_ctx)) {
- pr_warn_ratelimited("Failed to allocate monitor context: %ld",
- PTR_ERR(rr.arch_mon_ctx));
- return;
- }
-
- __mon_event_count(closid, rmid, &rr);
+ resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
+}
- /*
- * Call the MBA software controller only for the
- * control groups and when user has enabled
- * the software controller explicitly.
- */
- if (is_mba_sc(NULL))
- mbm_bw_count(closid, rmid, &rr);
+static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
+ u32 closid, u32 rmid)
+{
+ /*
+ * This is protected from concurrent reads from user
+ * as both the user and we hold the global mutex.
+ */
+ if (is_mbm_total_enabled())
+ mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_TOTAL_EVENT_ID);
- resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
- }
+ if (is_mbm_local_enabled())
+ mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_LOCAL_EVENT_ID);
}
/*
--
2.47.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 5/9] x86/resctrl: Relax checks for mba_MBps mount option
2024-11-14 0:17 [PATCH v9 0/9] x86/resctrl: mba_MBps enhancement Tony Luck
` (3 preceding siblings ...)
2024-11-14 0:17 ` [PATCH v9 4/9] x86/resctrl: Compute memory bandwidth for all supported events Tony Luck
@ 2024-11-14 0:17 ` Tony Luck
2024-11-20 3:54 ` Reinette Chatre
2024-11-14 0:17 ` [PATCH v9 6/9] x86/resctrl: Add "mba_MBps_event" file to ctrl_mon directories Tony Luck
` (3 subsequent siblings)
8 siblings, 1 reply; 32+ messages in thread
From: Tony Luck @ 2024-11-14 0:17 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches,
Tony Luck
This option may be used with any memory bandwidth monitoring event.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index a8022bddf9f7..3a89516e6f56 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2334,7 +2334,7 @@ static bool supports_mba_mbps(void)
struct rdt_resource *rmbm = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
- return (is_mbm_local_enabled() &&
+ return (is_mbm_enabled() &&
r->alloc_capable && is_mba_linear() &&
r->ctrl_scope == rmbm->mon_scope);
}
@@ -2759,7 +2759,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
ctx->enable_cdpl2 = true;
return 0;
case Opt_mba_mbps:
- msg = "mba_MBps requires local MBM and linear scale MBA at L3 scope";
+ msg = "mba_MBps requires MBM and linear scale MBA at L3 scope";
if (!supports_mba_mbps())
return invalfc(fc, msg);
ctx->enable_mba_mbps = true;
--
2.47.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 6/9] x86/resctrl: Add "mba_MBps_event" file to ctrl_mon directories
2024-11-14 0:17 [PATCH v9 0/9] x86/resctrl: mba_MBps enhancement Tony Luck
` (4 preceding siblings ...)
2024-11-14 0:17 ` [PATCH v9 5/9] x86/resctrl: Relax checks for mba_MBps mount option Tony Luck
@ 2024-11-14 0:17 ` Tony Luck
2024-11-20 4:03 ` Reinette Chatre
2024-11-14 0:17 ` [PATCH v9 7/9] x86/resctrl: Add write option to "mba_MBps_event" file Tony Luck
` (2 subsequent siblings)
8 siblings, 1 reply; 32+ messages in thread
From: Tony Luck @ 2024-11-14 0:17 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches,
Tony Luck
The "mba_MBps" mount option provides an alternate method to
control memory bandwidth. Instead of specifying allowable
bandwidth as a percentage of maximum possible, the user
provides a MiB/s limit value.
Historically the limit was enforced by a feedback loop from
the measure local bandwidth to adjust the memory bandwidth
allocation controls.
In preparation to allow the user to pick the memory bandwidth
monitoring event used as input to the feedback loop, provide
a file in each ctrl_mon group directory that shows the event
currently in use.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 30 +++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 ++++++++
3 files changed, 42 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 485800055a7d..ce10a883ecf8 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -609,6 +609,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
struct seq_file *s, void *v);
+int rdtgroup_mba_mbps_event_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v);
bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid, bool exclusive);
unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_ctrl_domain *d,
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 200d89a64027..5fa37b4ecc7a 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -518,6 +518,36 @@ static int smp_mon_event_count(void *arg)
return 0;
}
+int rdtgroup_mba_mbps_event_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdtgroup *rdtgrp;
+ int ret = 0;
+
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+
+ if (rdtgrp) {
+ switch (rdtgrp->mba_mbps_event) {
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ seq_puts(s, "mbm_local_bytes\n");
+ break;
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ seq_puts(s, "mbm_total_bytes\n");
+ break;
+ default:
+ pr_warn_once("Bad event %d\n", rdtgrp->mba_mbps_event);
+ ret = -EINVAL;
+ break;
+ }
+ } else {
+ ret = -ENOENT;
+ }
+
+ rdtgroup_kn_unlock(of->kn);
+
+ return ret;
+}
+
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
cpumask_t *cpumask, int evtid, int first)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 3a89516e6f56..416e1acfad9f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1943,6 +1943,12 @@ static struct rftype res_common_files[] = {
.seq_show = rdtgroup_schemata_show,
.fflags = RFTYPE_CTRL_BASE,
},
+ {
+ .name = "mba_MBps_event",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_mba_mbps_event_show,
+ },
{
.name = "mode",
.mode = 0644,
@@ -2348,6 +2354,7 @@ static int set_mba_sc(bool mba_sc)
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
u32 num_closid = resctrl_arch_get_num_closid(r);
struct rdt_ctrl_domain *d;
+ unsigned long fflags;
int i;
if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
@@ -2360,6 +2367,9 @@ static int set_mba_sc(bool mba_sc)
d->mbps_val[i] = MBA_MAX_MBPS;
}
+ fflags = mba_sc ? RFTYPE_CTRL_BASE | RFTYPE_MON_BASE : 0;
+ resctrl_file_fflags_init("mba_MBps_event", fflags);
+
return 0;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 7/9] x86/resctrl: Add write option to "mba_MBps_event" file
2024-11-14 0:17 [PATCH v9 0/9] x86/resctrl: mba_MBps enhancement Tony Luck
` (5 preceding siblings ...)
2024-11-14 0:17 ` [PATCH v9 6/9] x86/resctrl: Add "mba_MBps_event" file to ctrl_mon directories Tony Luck
@ 2024-11-14 0:17 ` Tony Luck
2024-11-14 0:17 ` [PATCH v9 8/9] x86/resctrl: Make mba_sc use total bandwidth if local is not supported Tony Luck
2024-11-14 0:17 ` [PATCH v9 9/9] x86/resctrl: Document the new "mba_MBps_event" file Tony Luck
8 siblings, 0 replies; 32+ messages in thread
From: Tony Luck @ 2024-11-14 0:17 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches,
Tony Luck
The "mba_MBps" mount option provides an alternate method to control memory
bandwidth. Instead of specifying allowable bandwidth as a percentage of
maximum possible, the user provides a MiB/s limit value.
Historically the limit was enforced by a feedback loop from the measure
local bandwidth to adjust the memory bandwidth allocation controls.
There is a file in each ctrl_mon group directory that shows the event
currently in use (always says "mbm_local_bytes").
Allow writing that file to choose a different event.
A user can choose any of the memory bandwidth monitoring events listed in
/sys/fs/resctrl/info/L3_mon/mon_features independently for each ctrl_mon
group by writing to the "mba_MBps_event" file.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 40 +++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 3 +-
3 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index ce10a883ecf8..6345ab3e0890 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -609,6 +609,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
struct seq_file *s, void *v);
+ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off);
int rdtgroup_mba_mbps_event_show(struct kernfs_open_file *of,
struct seq_file *s, void *v);
bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 5fa37b4ecc7a..ee983b68c046 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -518,6 +518,46 @@ static int smp_mon_event_count(void *arg)
return 0;
}
+ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct rdtgroup *rdtgrp;
+ int ret = 0;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+ buf[nbytes - 1] = '\0';
+
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+ if (!rdtgrp) {
+ rdtgroup_kn_unlock(of->kn);
+ return -ENOENT;
+ }
+ rdt_last_cmd_clear();
+
+ if (!strcmp(buf, "mbm_local_bytes")) {
+ if (is_mbm_local_enabled())
+ rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
+ else
+ ret = -EINVAL;
+ } else if (!strcmp(buf, "mbm_total_bytes")) {
+ if (is_mbm_total_enabled())
+ rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;
+ else
+ ret = -EINVAL;
+ } else {
+ ret = -EINVAL;
+ }
+
+ if (ret)
+ rdt_last_cmd_printf("Unsupported event id '%s'\n", buf);
+
+ rdtgroup_kn_unlock(of->kn);
+
+ return ret ?: nbytes;
+}
+
int rdtgroup_mba_mbps_event_show(struct kernfs_open_file *of,
struct seq_file *s, void *v)
{
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 416e1acfad9f..3f2e035208e6 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1945,8 +1945,9 @@ static struct rftype res_common_files[] = {
},
{
.name = "mba_MBps_event",
- .mode = 0444,
+ .mode = 0644,
.kf_ops = &rdtgroup_kf_single_ops,
+ .write = rdtgroup_mba_mbps_event_write,
.seq_show = rdtgroup_mba_mbps_event_show,
},
{
--
2.47.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 8/9] x86/resctrl: Make mba_sc use total bandwidth if local is not supported
2024-11-14 0:17 [PATCH v9 0/9] x86/resctrl: mba_MBps enhancement Tony Luck
` (6 preceding siblings ...)
2024-11-14 0:17 ` [PATCH v9 7/9] x86/resctrl: Add write option to "mba_MBps_event" file Tony Luck
@ 2024-11-14 0:17 ` Tony Luck
2024-11-14 0:17 ` [PATCH v9 9/9] x86/resctrl: Document the new "mba_MBps_event" file Tony Luck
8 siblings, 0 replies; 32+ messages in thread
From: Tony Luck @ 2024-11-14 0:17 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches,
Tony Luck
The default input measurement to the mba_sc feedback loop for
memory bandwidth control when the user mounts with the "mba_MBps"
option has historically been the local bandwidth event.
But some systems may not support a local bandwidth event.
When local bandwidth event is not supported, check for support
of total bandwidth and use that instead.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 94bf559966d6..3d1735ed8d1f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -965,6 +965,8 @@ static __init bool get_rdt_mon_resources(void)
if (is_mbm_local_enabled())
mba_mbps_default_event = QOS_L3_MBM_LOCAL_EVENT_ID;
+ else if (is_mbm_total_enabled())
+ mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
return !rdt_get_mon_l3_config(r);
}
--
2.47.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 9/9] x86/resctrl: Document the new "mba_MBps_event" file
2024-11-14 0:17 [PATCH v9 0/9] x86/resctrl: mba_MBps enhancement Tony Luck
` (7 preceding siblings ...)
2024-11-14 0:17 ` [PATCH v9 8/9] x86/resctrl: Make mba_sc use total bandwidth if local is not supported Tony Luck
@ 2024-11-14 0:17 ` Tony Luck
8 siblings, 0 replies; 32+ messages in thread
From: Tony Luck @ 2024-11-14 0:17 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches,
Tony Luck
Add a section to document a new read/write file that shows/sets the memory
bandwidth event used to control bandwidth used by each ctrl_mon group.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
Documentation/arch/x86/resctrl.rst | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a824affd741d..6768fc1fad16 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -384,6 +384,16 @@ When monitoring is enabled all MON groups will also contain:
Available only with debug option. The identifier used by hardware
for the monitor group. On x86 this is the RMID.
+When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
+
+"mba_MBps_event":
+ Reading this file shows which memory bandwidth event is used
+ as input to the software feedback loop that keeps memory bandwidth
+ below the value specified in the schemata file. Writing the
+ name of one of the supported memory bandwidth events found in
+ /sys/fs/resctrl/info/L3_MON/mon_features changes the input
+ event.
+
Resource allocation rules
-------------------------
--
2.47.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v9 4/9] x86/resctrl: Compute memory bandwidth for all supported events
2024-11-14 0:17 ` [PATCH v9 4/9] x86/resctrl: Compute memory bandwidth for all supported events Tony Luck
@ 2024-11-15 13:53 ` Peter Newman
2024-11-15 16:59 ` Luck, Tony
2024-11-20 3:45 ` Reinette Chatre
1 sibling, 1 reply; 32+ messages in thread
From: Peter Newman @ 2024-11-15 13:53 UTC (permalink / raw)
To: Tony Luck
Cc: Fenghua Yu, Reinette Chatre, Jonathan Corbet, x86, James Morse,
Jamie Iles, Babu Moger, Randy Dunlap, Shaopeng Tan (Fujitsu),
linux-kernel, linux-doc, patches
Hi Tony,
On Thu, Nov 14, 2024 at 1:17 AM Tony Luck <tony.luck@intel.com> wrote:
>
> Computing the bandwidth for an event is cheap, and only done once
> per second. Doing so simplifies switching between events and allows
> choosing different events per ctrl_mon group.
>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 72 ++++++++++++---------------
> 1 file changed, 33 insertions(+), 39 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 2176e355e864..da4ae21350c8 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -663,9 +663,12 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> */
> static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
> {
> - u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
> - struct mbm_state *m = &rr->d->mbm_local[idx];
> u64 cur_bw, bytes, cur_bytes;
> + struct mbm_state *m;
> +
> + m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
> + if (WARN_ON_ONCE(!m))
> + return;
>
> cur_bytes = rr->val;
> bytes = cur_bytes - m->prev_bw_bytes;
> @@ -826,54 +829,45 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
> resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
> }
>
> -static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
> - u32 closid, u32 rmid)
> +static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> + u32 closid, u32 rmid, enum resctrl_event_id evtid)
> {
> struct rmid_read rr = {0};
>
> rr.r = r;
> rr.d = d;
> + rr.evtid = evtid;
> + rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> + if (IS_ERR(rr.arch_mon_ctx)) {
> + pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> + PTR_ERR(rr.arch_mon_ctx));
> + return;
> + }
> +
> + __mon_event_count(closid, rmid, &rr);
>
> /*
> - * This is protected from concurrent reads from user
> - * as both the user and we hold the global mutex.
> + * If the software controller is enabled, compute the
> + * bandwidth for this event id.
> */
> - if (is_mbm_total_enabled()) {
> - rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
> - rr.val = 0;
> - rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> - if (IS_ERR(rr.arch_mon_ctx)) {
> - pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> - PTR_ERR(rr.arch_mon_ctx));
> - return;
> - }
> -
> - __mon_event_count(closid, rmid, &rr);
> + if (is_mba_sc(NULL))
> + mbm_bw_count(closid, rmid, &rr);
>
> - resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
> - }
> - if (is_mbm_local_enabled()) {
> - rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
> - rr.val = 0;
> - rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> - if (IS_ERR(rr.arch_mon_ctx)) {
> - pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> - PTR_ERR(rr.arch_mon_ctx));
> - return;
> - }
> -
> - __mon_event_count(closid, rmid, &rr);
> + resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
> +}
>
> - /*
> - * Call the MBA software controller only for the
> - * control groups and when user has enabled
> - * the software controller explicitly.
> - */
> - if (is_mba_sc(NULL))
> - mbm_bw_count(closid, rmid, &rr);
> +static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
> + u32 closid, u32 rmid)
> +{
> + /*
> + * This is protected from concurrent reads from user
> + * as both the user and we hold the global mutex.
> + */
> + if (is_mbm_total_enabled())
> + mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_TOTAL_EVENT_ID);
>
> - resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
> - }
> + if (is_mbm_local_enabled())
> + mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_LOCAL_EVENT_ID);
> }
>
> /*
> --
> 2.47.0
>
I experimented with all-groups, per-domain counter aggregation files
prototype using this change as a starting point.
I'm happy to report that the values reported looked fairly reasonable.
Tested-by: Peter Newman <peternewman@google.com>
I also did a quick experiment to see how close to 1 second apart the
readings were:
@@ -664,6 +664,7 @@ static int __mon_event_count(u32 closid, u32 rmid,
struct rmid_read *rr)
static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
{
u64 cur_bw, bytes, cur_bytes;
struct mbm_state *m;
+ u64 ts;
m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
@@ -680,6 +681,10 @@ static void mbm_bw_count(u32 closid, u32 rmid,
struct rmid_read *rr)
cur_bw = bytes / SZ_1M;
m->prev_bw = cur_bw;
+
+ ts = jiffies;
+ m->latency = ts - m->last_update_jiffies;
+ m->last_update_jiffies = ts;
}
On an AMD EPYC 7B12 64-Core Processor, I saw a consistent 1.021-1.026
second period. Is this enough error that you would want to divide by
the actual period instead of assuming a denominator of 1 exactly?
We're mainly concerned with the relative bandwidth of jobs, so this
error isn't much concern as long as it doesn't favor any group.
The only thing I'd worry about is if the user is using setitimer() to
keep a consistent 1 second period for reading the bandwidth rate, the
window of the resctrl updates would drift away from the userspace
consumer over time.
Thanks!
-Peter
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/9] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
2024-11-14 0:17 ` [PATCH v9 1/9] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Tony Luck
@ 2024-11-15 16:19 ` Moger, Babu
2024-11-20 0:38 ` Reinette Chatre
1 sibling, 0 replies; 32+ messages in thread
From: Moger, Babu @ 2024-11-15 16:19 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Reinette Chatre, Peter Newman,
Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches
Tony, Looks good. Thanks
On 11/13/2024 6:17 PM, Tony Luck wrote:
> From: Babu Moger <babu.moger@amd.com>
>
> thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
> fflags for resctrl files.
>
> Adding new files will involve adding another function to initialize
> the fflags. This can be simplified by adding a new function
> resctrl_file_fflags_init() and passing the file name and flags
> to be initialized.
>
> Consolidate fflags initialization into resctrl_file_fflags_init() and
> remove thread_throttle_mode_init() and mbm_config_rftype_init().
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 3 +--
> arch/x86/kernel/cpu/resctrl/core.c | 4 +++-
> arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++++--
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 15 ++-------------
> 4 files changed, 10 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 955999aecfca..faaff9d64102 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -647,8 +647,7 @@ void cqm_handle_limbo(struct work_struct *work);
> bool has_busy_rmid(struct rdt_mon_domain *d);
> void __check_limbo(struct rdt_mon_domain *d, bool force_free);
> void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
> -void __init thread_throttle_mode_init(void);
> -void __init mbm_config_rftype_init(const char *config);
> +void resctrl_file_fflags_init(const char *config, unsigned long fflags);
> void rdt_staged_configs_clear(void);
> bool closid_allocated(unsigned int closid);
> int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index b681c2e07dbf..f3ee5859b69d 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -234,7 +234,9 @@ static __init bool __get_mem_config_intel(struct rdt_resource *r)
> r->membw.throttle_mode = THREAD_THROTTLE_PER_THREAD;
> else
> r->membw.throttle_mode = THREAD_THROTTLE_MAX;
> - thread_throttle_mode_init();
> +
> + resctrl_file_fflags_init("thread_throttle_mode",
> + RFTYPE_CTRL_INFO | RFTYPE_RES_MB);
>
> r->alloc_capable = true;
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 851b561850e0..7ef1a293cc13 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1223,11 +1223,13 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>
> if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
> mbm_total_event.configurable = true;
> - mbm_config_rftype_init("mbm_total_bytes_config");
> + resctrl_file_fflags_init("mbm_total_bytes_config",
> + RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
> }
> if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
> mbm_local_event.configurable = true;
> - mbm_config_rftype_init("mbm_local_bytes_config");
> + resctrl_file_fflags_init("mbm_local_bytes_config",
> + RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
> }
> }
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index d7163b764c62..2b198ef95e1e 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2020,24 +2020,13 @@ static struct rftype *rdtgroup_get_rftype_by_name(const char *name)
> return NULL;
> }
>
> -void __init thread_throttle_mode_init(void)
> -{
> - struct rftype *rft;
> -
> - rft = rdtgroup_get_rftype_by_name("thread_throttle_mode");
> - if (!rft)
> - return;
> -
> - rft->fflags = RFTYPE_CTRL_INFO | RFTYPE_RES_MB;
> -}
> -
> -void __init mbm_config_rftype_init(const char *config)
> +void resctrl_file_fflags_init(const char *config, unsigned long fflags)
> {
> struct rftype *rft;
>
> rft = rdtgroup_get_rftype_by_name(config);
> if (rft)
> - rft->fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE;
> + rft->fflags = fflags;
> }
>
> /**
--
- Babu Moger
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/9] x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control
2024-11-14 0:17 ` [PATCH v9 2/9] x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control Tony Luck
@ 2024-11-15 16:20 ` Moger, Babu
2024-11-18 23:47 ` Luck, Tony
2024-11-20 1:08 ` Reinette Chatre
1 sibling, 1 reply; 32+ messages in thread
From: Moger, Babu @ 2024-11-15 16:20 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Reinette Chatre, Peter Newman,
Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches
Hi Tony,
On 11/13/2024 6:17 PM, Tony Luck wrote:
> Resctrl uses local memory bandwidth event as input to the feedback
> loop when the mba_MBps mount option is used. This means that this
> mount option cannot be used on systems that only support monitoring
> of total bandwidth.
>
> Prepare to allow users to choose the input event independently for
> each ctrl_mon group.
How about this?
Provide users with the ability to select the input event independently
for each ctrl_mon group.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> include/linux/resctrl.h | 2 ++
> arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
> arch/x86/kernel/cpu/resctrl/core.c | 3 +++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++++++
> 4 files changed, 13 insertions(+)
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index d94abba1c716..fd05b937e2f4 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -49,6 +49,8 @@ enum resctrl_event_id {
> QOS_L3_MBM_LOCAL_EVENT_ID = 0x03,
> };
>
> +extern enum resctrl_event_id mba_mbps_default_event;
> +
> /**
> * struct resctrl_staged_config - parsed configuration to be applied
> * @new_ctrl: new ctrl value to be loaded
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index faaff9d64102..485800055a7d 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -283,6 +283,7 @@ struct pseudo_lock_region {
> * monitor only or ctrl_mon group
> * @mon: mongroup related data
> * @mode: mode of resource group
> + * @mba_mbps_event: input monitoring event id when mba_sc is enabled
> * @plr: pseudo-locked region
> */
> struct rdtgroup {
> @@ -295,6 +296,7 @@ struct rdtgroup {
> enum rdt_group_type type;
> struct mongroup mon;
> enum rdtgrp_mode mode;
> + enum resctrl_event_id mba_mbps_event;
> struct pseudo_lock_region *plr;
> };
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index f3ee5859b69d..94bf559966d6 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -963,6 +963,9 @@ static __init bool get_rdt_mon_resources(void)
> if (!rdt_mon_features)
> return false;
>
> + if (is_mbm_local_enabled())
> + mba_mbps_default_event = QOS_L3_MBM_LOCAL_EVENT_ID;
Any reason to separate this patch and patch 8? I feel it can be combined.
> +
> return !rdt_get_mon_l3_config(r);
> }
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 2b198ef95e1e..a8022bddf9f7 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -65,6 +65,8 @@ static void rdtgroup_destroy_root(void);
>
> struct dentry *debugfs_resctrl;
>
> +enum resctrl_event_id mba_mbps_default_event;
> +
> static bool resctrl_debug;
>
> void rdt_last_cmd_clear(void)
> @@ -3611,6 +3613,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
> rdt_last_cmd_puts("kernfs subdir error\n");
> goto out_del_list;
> }
> + if (is_mba_sc(NULL))
> + rdtgrp->mba_mbps_event = mba_mbps_default_event;
> }
>
> goto out_unlock;
> @@ -3970,6 +3974,8 @@ static void __init rdtgroup_setup_default(void)
> rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
> rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
> rdtgroup_default.type = RDTCTRL_GROUP;
> + if (supports_mba_mbps())
> + rdtgroup_default.mba_mbps_event = mba_mbps_default_event;
> INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
>
> list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
--
- Babu Moger
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event
2024-11-14 0:17 ` [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event Tony Luck
@ 2024-11-15 16:21 ` Moger, Babu
2024-11-19 0:01 ` Luck, Tony
2024-11-20 3:39 ` Reinette Chatre
1 sibling, 1 reply; 32+ messages in thread
From: Moger, Babu @ 2024-11-15 16:21 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Reinette Chatre, Peter Newman,
Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches
Hi Tony,
On 11/13/2024 6:17 PM, Tony Luck wrote:
> Instead of hard-coding the memory bandwidth local event as the
> input to the mba_sc feedback look, use the event that the user
> configured for each ctrl_mon group.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 7ef1a293cc13..2176e355e864 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -752,20 +752,31 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
> u32 closid, rmid, cur_msr_val, new_msr_val;
> struct mbm_state *pmbm_data, *cmbm_data;
> struct rdt_ctrl_domain *dom_mba;
> + enum resctrl_event_id evt_id;
> struct rdt_resource *r_mba;
> - u32 cur_bw, user_bw, idx;
> struct list_head *head;
> struct rdtgroup *entry;
> + u32 cur_bw, user_bw;
>
> - if (!is_mbm_local_enabled())
> + if (!is_mbm_enabled())
> return;
>
> r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> + evt_id = rgrp->mba_mbps_event;
> +
> + if (WARN_ON_ONCE(!is_mbm_event(evt_id)))
> + return;
I feel this check is enough.
> + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_LOCAL_EVENT_ID && !is_mbm_local_enabled()))
> + return;
> + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_TOTAL_EVENT_ID && !is_mbm_total_enabled()))
> + return;
These two checks are not necessary. You are already validating it while
initializing(in patch 7).
> +
>
> closid = rgrp->closid;
> rmid = rgrp->mon.rmid;
> - idx = resctrl_arch_rmid_idx_encode(closid, rmid);
> - pmbm_data = &dom_mbm->mbm_local[idx];
> + pmbm_data = get_mbm_state(dom_mbm, closid, rmid, evt_id);
> + if (WARN_ON_ONCE(!pmbm_data))
> + return;
>
> dom_mba = get_ctrl_domain_from_cpu(smp_processor_id(), r_mba);
> if (!dom_mba) {
> @@ -784,7 +795,9 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
> */
> head = &rgrp->mon.crdtgrp_list;
> list_for_each_entry(entry, head, mon.crdtgrp_list) {
> - cmbm_data = &dom_mbm->mbm_local[entry->mon.rmid];
> + cmbm_data = get_mbm_state(dom_mbm, entry->closid, entry->mon.rmid, evt_id);
> + if (WARN_ON_ONCE(!cmbm_data))
> + return;
> cur_bw += cmbm_data->prev_bw;
> }
>
--
- Babu Moger
^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH v9 4/9] x86/resctrl: Compute memory bandwidth for all supported events
2024-11-15 13:53 ` Peter Newman
@ 2024-11-15 16:59 ` Luck, Tony
0 siblings, 0 replies; 32+ messages in thread
From: Luck, Tony @ 2024-11-15 16:59 UTC (permalink / raw)
To: Peter Newman
Cc: Yu, Fenghua, Chatre, Reinette, Jonathan Corbet, x86@kernel.org,
James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, patches@lists.linux.dev
> I experimented with all-groups, per-domain counter aggregation files
> prototype using this change as a starting point.
>
> I'm happy to report that the values reported looked fairly reasonable.
>
> Tested-by: Peter Newman <peternewman@google.com>
Thanks for the test report.
> On an AMD EPYC 7B12 64-Core Processor, I saw a consistent 1.021-1.026
> second period. Is this enough error that you would want to divide by
> the actual period instead of assuming a denominator of 1 exactly?
> We're mainly concerned with the relative bandwidth of jobs, so this
> error isn't much concern as long as it doesn't favor any group.
I see pretty much the same delta_t on Intel Icelake. We could
use jiffies to get a bit more precision (depending on HZ value).
> The only thing I'd worry about is if the user is using setitimer() to
> keep a consistent 1 second period for reading the bandwidth rate, the
> window of the resctrl updates would drift away from the userspace
> consumer over time.
One other thing I did in my resctrl2 summary code was to patch
the modification time of the summary file to when the kernel ran
mbm_handle_overflow(). That would allow users to check the
update time to stay in sync with kernel updates.
-Tony
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/9] x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control
2024-11-15 16:20 ` Moger, Babu
@ 2024-11-18 23:47 ` Luck, Tony
0 siblings, 0 replies; 32+ messages in thread
From: Luck, Tony @ 2024-11-18 23:47 UTC (permalink / raw)
To: babu.moger
Cc: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86,
James Morse, Jamie Iles, Randy Dunlap, Shaopeng Tan (Fujitsu),
linux-kernel, linux-doc, patches
On Fri, Nov 15, 2024 at 10:20:34AM -0600, Moger, Babu wrote:
Thanks for looking. Comments below.
> Hi Tony,
>
> On 11/13/2024 6:17 PM, Tony Luck wrote:
> > Resctrl uses local memory bandwidth event as input to the feedback
> > loop when the mba_MBps mount option is used. This means that this
> > mount option cannot be used on systems that only support monitoring
> > of total bandwidth.
> >
> > Prepare to allow users to choose the input event independently for
> > each ctrl_mon group.
>
> How about this?
>
> Provide users with the ability to select the input event independently for
> each ctrl_mon group.
That's a description for the series as a whole. This patch doesn't
do all the things in that sentence.
>
> >
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> > include/linux/resctrl.h | 2 ++
> > arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
> > arch/x86/kernel/cpu/resctrl/core.c | 3 +++
> > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++++++
> > 4 files changed, 13 insertions(+)
> >
> > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> > index d94abba1c716..fd05b937e2f4 100644
> > --- a/include/linux/resctrl.h
> > +++ b/include/linux/resctrl.h
> > @@ -49,6 +49,8 @@ enum resctrl_event_id {
> > QOS_L3_MBM_LOCAL_EVENT_ID = 0x03,
> > };
> > +extern enum resctrl_event_id mba_mbps_default_event;
> > +
> > /**
> > * struct resctrl_staged_config - parsed configuration to be applied
> > * @new_ctrl: new ctrl value to be loaded
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> > index faaff9d64102..485800055a7d 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -283,6 +283,7 @@ struct pseudo_lock_region {
> > * monitor only or ctrl_mon group
> > * @mon: mongroup related data
> > * @mode: mode of resource group
> > + * @mba_mbps_event: input monitoring event id when mba_sc is enabled
> > * @plr: pseudo-locked region
> > */
> > struct rdtgroup {
> > @@ -295,6 +296,7 @@ struct rdtgroup {
> > enum rdt_group_type type;
> > struct mongroup mon;
> > enum rdtgrp_mode mode;
> > + enum resctrl_event_id mba_mbps_event;
> > struct pseudo_lock_region *plr;
> > };
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> > index f3ee5859b69d..94bf559966d6 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -963,6 +963,9 @@ static __init bool get_rdt_mon_resources(void)
> > if (!rdt_mon_features)
> > return false;
> > + if (is_mbm_local_enabled())
> > + mba_mbps_default_event = QOS_L3_MBM_LOCAL_EVENT_ID;
>
>
> Any reason to separate this patch and patch 8? I feel it can be combined.
patch 8 will set mba_mbps_default_event to QOS_L3_MBM_TOTAL_EVENT_ID
on systems witout support for local memory bandwidth monitoring.
The rest of the code isn't ready for that until midway through this
series when other code has been updated to handle total bandwidth
correctly.
I may have gone to extremes moving that part out all the way to patch
8. It could potentially happen earlier in the series.
>
> > +
> > return !rdt_get_mon_l3_config(r);
> > }
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index 2b198ef95e1e..a8022bddf9f7 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -65,6 +65,8 @@ static void rdtgroup_destroy_root(void);
> > struct dentry *debugfs_resctrl;
> > +enum resctrl_event_id mba_mbps_default_event;
> > +
> > static bool resctrl_debug;
> > void rdt_last_cmd_clear(void)
> > @@ -3611,6 +3613,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
> > rdt_last_cmd_puts("kernfs subdir error\n");
> > goto out_del_list;
> > }
> > + if (is_mba_sc(NULL))
> > + rdtgrp->mba_mbps_event = mba_mbps_default_event;
> > }
> > goto out_unlock;
> > @@ -3970,6 +3974,8 @@ static void __init rdtgroup_setup_default(void)
> > rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
> > rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
> > rdtgroup_default.type = RDTCTRL_GROUP;
> > + if (supports_mba_mbps())
> > + rdtgroup_default.mba_mbps_event = mba_mbps_default_event;
> > INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
> > list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
>
> --
> - Babu Moger
-Tony
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event
2024-11-15 16:21 ` Moger, Babu
@ 2024-11-19 0:01 ` Luck, Tony
2024-11-19 0:51 ` Reinette Chatre
0 siblings, 1 reply; 32+ messages in thread
From: Luck, Tony @ 2024-11-19 0:01 UTC (permalink / raw)
To: babu.moger
Cc: Fenghua Yu, Reinette Chatre, Peter Newman, Jonathan Corbet, x86,
James Morse, Jamie Iles, Randy Dunlap, Shaopeng Tan (Fujitsu),
linux-kernel, linux-doc, patches
On Fri, Nov 15, 2024 at 10:21:01AM -0600, Moger, Babu wrote:
> Hi Tony,
Thanks for looking at this patch.
>
> On 11/13/2024 6:17 PM, Tony Luck wrote:
> > Instead of hard-coding the memory bandwidth local event as the
> > input to the mba_sc feedback look, use the event that the user
> > configured for each ctrl_mon group.
> >
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> > arch/x86/kernel/cpu/resctrl/monitor.c | 23 ++++++++++++++++++-----
> > 1 file changed, 18 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> > index 7ef1a293cc13..2176e355e864 100644
> > --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> > @@ -752,20 +752,31 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
> > u32 closid, rmid, cur_msr_val, new_msr_val;
> > struct mbm_state *pmbm_data, *cmbm_data;
> > struct rdt_ctrl_domain *dom_mba;
> > + enum resctrl_event_id evt_id;
> > struct rdt_resource *r_mba;
> > - u32 cur_bw, user_bw, idx;
> > struct list_head *head;
> > struct rdtgroup *entry;
> > + u32 cur_bw, user_bw;
> > - if (!is_mbm_local_enabled())
> > + if (!is_mbm_enabled())
> > return;
> > r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> > + evt_id = rgrp->mba_mbps_event;
> > +
> > + if (WARN_ON_ONCE(!is_mbm_event(evt_id)))
> > + return;
>
> I feel this check is enough.
>
> > + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_LOCAL_EVENT_ID && !is_mbm_local_enabled()))
> > + return;
> > + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_TOTAL_EVENT_ID && !is_mbm_total_enabled()))
> > + return;
>
> These two checks are not necessary. You are already validating it while
> initializing(in patch 7).
I added this in response to a comment on v7 from Reinette that evt_id
wasn't properly validated here - in conjuction with the change a few
lines earlier that relaxed the check for is_mbm_local_enabled() to
just is_mbm_enabled().
See: https://lore.kernel.org/r/bb30835f-5be9-44b4-8544-2f528e7fc573@intel.com
In theory all of these tests could be dropped. As you point out the
sanity checks are done higher in the call sequence. But some folks
like the "belt and braces" approach to such sanity checks.
> > +
> > closid = rgrp->closid;
> > rmid = rgrp->mon.rmid;
> > - idx = resctrl_arch_rmid_idx_encode(closid, rmid);
> > - pmbm_data = &dom_mbm->mbm_local[idx];
> > + pmbm_data = get_mbm_state(dom_mbm, closid, rmid, evt_id);
> > + if (WARN_ON_ONCE(!pmbm_data))
> > + return;
> > dom_mba = get_ctrl_domain_from_cpu(smp_processor_id(), r_mba);
> > if (!dom_mba) {
> > @@ -784,7 +795,9 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
> > */
> > head = &rgrp->mon.crdtgrp_list;
> > list_for_each_entry(entry, head, mon.crdtgrp_list) {
> > - cmbm_data = &dom_mbm->mbm_local[entry->mon.rmid];
> > + cmbm_data = get_mbm_state(dom_mbm, entry->closid, entry->mon.rmid, evt_id);
> > + if (WARN_ON_ONCE(!cmbm_data))
> > + return;
> > cur_bw += cmbm_data->prev_bw;
> > }
>
> --
> - Babu Moger
-Tony
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event
2024-11-19 0:01 ` Luck, Tony
@ 2024-11-19 0:51 ` Reinette Chatre
2024-11-19 1:44 ` Luck, Tony
0 siblings, 1 reply; 32+ messages in thread
From: Reinette Chatre @ 2024-11-19 0:51 UTC (permalink / raw)
To: Luck, Tony, babu.moger
Cc: Fenghua Yu, Peter Newman, Jonathan Corbet, x86, James Morse,
Jamie Iles, Randy Dunlap, Shaopeng Tan (Fujitsu), linux-kernel,
linux-doc, patches
Hi Tony,
On 11/18/24 4:01 PM, Luck, Tony wrote:
> On Fri, Nov 15, 2024 at 10:21:01AM -0600, Moger, Babu wrote:
>> Hi Tony,
>
> Thanks for looking at this patch.
>
>>
>> On 11/13/2024 6:17 PM, Tony Luck wrote:
>>> Instead of hard-coding the memory bandwidth local event as the
>>> input to the mba_sc feedback look, use the event that the user
>>> configured for each ctrl_mon group.
>>>
>>> Signed-off-by: Tony Luck <tony.luck@intel.com>
>>> ---
>>> arch/x86/kernel/cpu/resctrl/monitor.c | 23 ++++++++++++++++++-----
>>> 1 file changed, 18 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> index 7ef1a293cc13..2176e355e864 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> @@ -752,20 +752,31 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
>>> u32 closid, rmid, cur_msr_val, new_msr_val;
>>> struct mbm_state *pmbm_data, *cmbm_data;
>>> struct rdt_ctrl_domain *dom_mba;
>>> + enum resctrl_event_id evt_id;
>>> struct rdt_resource *r_mba;
>>> - u32 cur_bw, user_bw, idx;
>>> struct list_head *head;
>>> struct rdtgroup *entry;
>>> + u32 cur_bw, user_bw;
>>> - if (!is_mbm_local_enabled())
>>> + if (!is_mbm_enabled())
>>> return;
>>> r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
>>> + evt_id = rgrp->mba_mbps_event;
>>> +
>>> + if (WARN_ON_ONCE(!is_mbm_event(evt_id)))
>>> + return;
>>
>> I feel this check is enough.
>>
>>> + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_LOCAL_EVENT_ID && !is_mbm_local_enabled()))
>>> + return;
>>> + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_TOTAL_EVENT_ID && !is_mbm_total_enabled()))
>>> + return;
>>
>> These two checks are not necessary. You are already validating it while
>> initializing(in patch 7).
>
> I added this in response to a comment on v7 from Reinette that evt_id
> wasn't properly validated here - in conjuction with the change a few
> lines earlier that relaxed the check for is_mbm_local_enabled() to
> just is_mbm_enabled().
right that patch had an issue ... the "initialize" code hardcoded support to be
r->membw.mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
without any checking and then the handler used a relaxed check of
is_mbm_enabled()
On a system that only supports total MBM the is_mbm_enabled() check will
pass while the event used will be local MBM.
>
> See: https://lore.kernel.org/r/bb30835f-5be9-44b4-8544-2f528e7fc573@intel.com
>
> In theory all of these tests could be dropped. As you point out the
> sanity checks are done higher in the call sequence. But some folks
> like the "belt and braces" approach to such sanity checks.
If that "higher in the call sequence" can be trusted, yes. That was not the
case when I made those statements. Sprinkling WARN() that continues execution
in a known bad state does not seem safe to me either.
Reinette
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event
2024-11-19 0:51 ` Reinette Chatre
@ 2024-11-19 1:44 ` Luck, Tony
2024-11-19 17:36 ` Reinette Chatre
0 siblings, 1 reply; 32+ messages in thread
From: Luck, Tony @ 2024-11-19 1:44 UTC (permalink / raw)
To: Reinette Chatre
Cc: babu.moger, Fenghua Yu, Peter Newman, Jonathan Corbet, x86,
James Morse, Jamie Iles, Randy Dunlap, Shaopeng Tan (Fujitsu),
linux-kernel, linux-doc, patches
On Mon, Nov 18, 2024 at 04:51:38PM -0800, Reinette Chatre wrote:
> Hi Tony,
>
> On 11/18/24 4:01 PM, Luck, Tony wrote:
> > On Fri, Nov 15, 2024 at 10:21:01AM -0600, Moger, Babu wrote:
> >> Hi Tony,
> >
> > Thanks for looking at this patch.
> >
> >>
> >> On 11/13/2024 6:17 PM, Tony Luck wrote:
> >>> Instead of hard-coding the memory bandwidth local event as the
> >>> input to the mba_sc feedback look, use the event that the user
> >>> configured for each ctrl_mon group.
> >>>
> >>> Signed-off-by: Tony Luck <tony.luck@intel.com>
> >>> ---
> >>> arch/x86/kernel/cpu/resctrl/monitor.c | 23 ++++++++++++++++++-----
> >>> 1 file changed, 18 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> >>> index 7ef1a293cc13..2176e355e864 100644
> >>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> >>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> >>> @@ -752,20 +752,31 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
> >>> u32 closid, rmid, cur_msr_val, new_msr_val;
> >>> struct mbm_state *pmbm_data, *cmbm_data;
> >>> struct rdt_ctrl_domain *dom_mba;
> >>> + enum resctrl_event_id evt_id;
> >>> struct rdt_resource *r_mba;
> >>> - u32 cur_bw, user_bw, idx;
> >>> struct list_head *head;
> >>> struct rdtgroup *entry;
> >>> + u32 cur_bw, user_bw;
> >>> - if (!is_mbm_local_enabled())
> >>> + if (!is_mbm_enabled())
> >>> return;
> >>> r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> >>> + evt_id = rgrp->mba_mbps_event;
> >>> +
> >>> + if (WARN_ON_ONCE(!is_mbm_event(evt_id)))
> >>> + return;
> >>
> >> I feel this check is enough.
> >>
> >>> + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_LOCAL_EVENT_ID && !is_mbm_local_enabled()))
> >>> + return;
> >>> + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_TOTAL_EVENT_ID && !is_mbm_total_enabled()))
> >>> + return;
> >>
> >> These two checks are not necessary. You are already validating it while
> >> initializing(in patch 7).
> >
> > I added this in response to a comment on v7 from Reinette that evt_id
> > wasn't properly validated here - in conjuction with the change a few
> > lines earlier that relaxed the check for is_mbm_local_enabled() to
> > just is_mbm_enabled().
>
> right that patch had an issue ... the "initialize" code hardcoded support to be
> r->membw.mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
> without any checking and then the handler used a relaxed check of
> is_mbm_enabled()
>
> On a system that only supports total MBM the is_mbm_enabled() check will
> pass while the event used will be local MBM.
In the v9 series I believe all the necessary checks are made outside
of the update_mba_bw() function itself.
update_mba_bw() is only called when is_mba_sc() returns true. Which
is the value of:
rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc
which can only be set if mbm is enabled.
So instead of changing the check from is_mbm_local_enabled() to
is_mbm_enabled() it could be deleted.
rgrp->mba_mbps_event can only be set to QOS_L3_MBM_LOCAL_EVENT_ID
until patch 7 when the user can select QOS_L3_MBM_TOTAL_EVENT_ID
or patch 8 when the initiialization code can pick TOTAL on systems
that don't support LOCAL.
So all three of the WARN_ON_ONCE() calls are unnecessary.
Should I drop all these checks in v10?
>
> >
> > See: https://lore.kernel.org/r/bb30835f-5be9-44b4-8544-2f528e7fc573@intel.com
> >
> > In theory all of these tests could be dropped. As you point out the
> > sanity checks are done higher in the call sequence. But some folks
> > like the "belt and braces" approach to such sanity checks.
>
> If that "higher in the call sequence" can be trusted, yes. That was not the
> case when I made those statements. Sprinkling WARN() that continues execution
> in a known bad state does not seem safe to me either.
>
> Reinette
-Tony
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event
2024-11-19 1:44 ` Luck, Tony
@ 2024-11-19 17:36 ` Reinette Chatre
0 siblings, 0 replies; 32+ messages in thread
From: Reinette Chatre @ 2024-11-19 17:36 UTC (permalink / raw)
To: Luck, Tony
Cc: babu.moger, Fenghua Yu, Peter Newman, Jonathan Corbet, x86,
James Morse, Jamie Iles, Randy Dunlap, Shaopeng Tan (Fujitsu),
linux-kernel, linux-doc, patches
Hi Tony,
On 11/18/24 5:44 PM, Luck, Tony wrote:
> On Mon, Nov 18, 2024 at 04:51:38PM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 11/18/24 4:01 PM, Luck, Tony wrote:
>>> On Fri, Nov 15, 2024 at 10:21:01AM -0600, Moger, Babu wrote:
>>>> Hi Tony,
>>>
>>> Thanks for looking at this patch.
>>>
>>>>
>>>> On 11/13/2024 6:17 PM, Tony Luck wrote:
>>>>> Instead of hard-coding the memory bandwidth local event as the
>>>>> input to the mba_sc feedback look, use the event that the user
>>>>> configured for each ctrl_mon group.
>>>>>
>>>>> Signed-off-by: Tony Luck <tony.luck@intel.com>
>>>>> ---
>>>>> arch/x86/kernel/cpu/resctrl/monitor.c | 23 ++++++++++++++++++-----
>>>>> 1 file changed, 18 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> index 7ef1a293cc13..2176e355e864 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> @@ -752,20 +752,31 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
>>>>> u32 closid, rmid, cur_msr_val, new_msr_val;
>>>>> struct mbm_state *pmbm_data, *cmbm_data;
>>>>> struct rdt_ctrl_domain *dom_mba;
>>>>> + enum resctrl_event_id evt_id;
>>>>> struct rdt_resource *r_mba;
>>>>> - u32 cur_bw, user_bw, idx;
>>>>> struct list_head *head;
>>>>> struct rdtgroup *entry;
>>>>> + u32 cur_bw, user_bw;
>>>>> - if (!is_mbm_local_enabled())
>>>>> + if (!is_mbm_enabled())
>>>>> return;
>>>>> r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
>>>>> + evt_id = rgrp->mba_mbps_event;
>>>>> +
>>>>> + if (WARN_ON_ONCE(!is_mbm_event(evt_id)))
>>>>> + return;
>>>>
>>>> I feel this check is enough.
>>>>
>>>>> + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_LOCAL_EVENT_ID && !is_mbm_local_enabled()))
>>>>> + return;
>>>>> + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_TOTAL_EVENT_ID && !is_mbm_total_enabled()))
>>>>> + return;
>>>>
>>>> These two checks are not necessary. You are already validating it while
>>>> initializing(in patch 7).
>>>
>>> I added this in response to a comment on v7 from Reinette that evt_id
>>> wasn't properly validated here - in conjuction with the change a few
>>> lines earlier that relaxed the check for is_mbm_local_enabled() to
>>> just is_mbm_enabled().
>>
>> right that patch had an issue ... the "initialize" code hardcoded support to be
>> r->membw.mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
>> without any checking and then the handler used a relaxed check of
>> is_mbm_enabled()
>>
>> On a system that only supports total MBM the is_mbm_enabled() check will
>> pass while the event used will be local MBM.
>
> In the v9 series I believe all the necessary checks are made outside
> of the update_mba_bw() function itself.
>
> update_mba_bw() is only called when is_mba_sc() returns true. Which
> is the value of:
> rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc
> which can only be set if mbm is enabled.
At this point in series mba_sc can still only be enabled if local MBM
is enabled.
>
> So instead of changing the check from is_mbm_local_enabled() to
> is_mbm_enabled() it could be deleted.
Perhaps ... at this point in series without guidance from changelogs I am forced
to dig through layers to deciper what patches aim to do, how they go about it, and
how complete solution is built using these individual cryptic pieces.
> rgrp->mba_mbps_event can only be set to QOS_L3_MBM_LOCAL_EVENT_ID
> until patch 7 when the user can select QOS_L3_MBM_TOTAL_EVENT_ID
> or patch 8 when the initiialization code can pick TOTAL on systems
> that don't support LOCAL.
>
> So all three of the WARN_ON_ONCE() calls are unnecessary.
>
> Should I drop all these checks in v10?
I am still trying to decipher this code and will aim to address this.
Reinette
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/9] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
2024-11-14 0:17 ` [PATCH v9 1/9] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Tony Luck
2024-11-15 16:19 ` Moger, Babu
@ 2024-11-20 0:38 ` Reinette Chatre
2024-11-21 17:21 ` Luck, Tony
1 sibling, 1 reply; 32+ messages in thread
From: Reinette Chatre @ 2024-11-20 0:38 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches
Hi Tony,
On 11/13/24 4:17 PM, Tony Luck wrote:
> From: Babu Moger <babu.moger@amd.com>
>
> thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
> fflags for resctrl files.
>
> Adding new files will involve adding another function to initialize
> the fflags. This can be simplified by adding a new function
> resctrl_file_fflags_init() and passing the file name and flags
> to be initialized.
>
> Consolidate fflags initialization into resctrl_file_fflags_init() and
> remove thread_throttle_mode_init() and mbm_config_rftype_init().
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
Please place "Signed-off-by" tag before "Reviewed-by" tag. For reference,
"Ordering of commit tags" in Documentation/process/maintainer-tip.rst
Reinette
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/9] x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control
2024-11-14 0:17 ` [PATCH v9 2/9] x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control Tony Luck
2024-11-15 16:20 ` Moger, Babu
@ 2024-11-20 1:08 ` Reinette Chatre
2024-11-21 17:33 ` Luck, Tony
1 sibling, 1 reply; 32+ messages in thread
From: Reinette Chatre @ 2024-11-20 1:08 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches
Hi Tony,
On 11/13/24 4:17 PM, Tony Luck wrote:
> Resctrl uses local memory bandwidth event as input to the feedback
> loop when the mba_MBps mount option is used. This means that this
> mount option cannot be used on systems that only support monitoring
> of total bandwidth.
>
> Prepare to allow users to choose the input event independently for
> each ctrl_mon group.
The lack of detail on design and implementation leaves a lot for the
reader to decipher. For example,
* the change appears to create a contract that rdtgroup.mba_mbps_event
is only valid if mba_sc is enabled, this is "documented" in the
structure member comment but not connected to the rest of implementation, not
here nor later in series.
* the patch uses *three* different checks to manage new variables:
is_mbm_local_enabled(), is_mba_sc(), and supports_mba_mbps(). Reader is
left to decipher that all checks are built on is_mbm_local_enabled()
and thus it is ok to use these checks before using the value that is only
assigned when is_mbm_local_enabled().
* clearly mba_mbps_default_event cannot always have a value making reader wonder
if enum resctrl_event_id needs a "0", takes some deciphering to get confidence
that its assignment when is_mbm_local_enabled() fits under the contract
that values are only value when is_mba_sc() and thus any code following contract by
first checking for mba_sc should never encounter a 0.
* based on premise of this work reader may consider what happens if
system does not support local MBM. more deciphering needed to get confidence
that while mba_mbps_default_event will not be set, since is_mba_sc() still
depends on local MBM this still fits under contract that mba_mbps_default_event
cannot be used in this case.
Of course, it may just me that needs more help to understand what a patch is doing
while having little insight into what it intends to do. I thought by sharing some of
the questions I felt needed to investigated may give some insight into the difficulty
a cryptic changelog creates. Review could be helped significantly if the changelog
provides insight into the design decisions.
...
> @@ -3611,6 +3613,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
> rdt_last_cmd_puts("kernfs subdir error\n");
> goto out_del_list;
> }
> + if (is_mba_sc(NULL))
> + rdtgrp->mba_mbps_event = mba_mbps_default_event;
> }
>
> goto out_unlock;
> @@ -3970,6 +3974,8 @@ static void __init rdtgroup_setup_default(void)
> rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
> rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
> rdtgroup_default.type = RDTCTRL_GROUP;
> + if (supports_mba_mbps())
> + rdtgroup_default.mba_mbps_event = mba_mbps_default_event;
> INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
>
> list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
I do not see the default resource group's mba_mbps_event ever being reset. This means
that if the user mounts resctrl, changes mba_mbps_event, umount resctrl, remount
resctrl, then the default resource group will not have the default mba_mbps_event
but whatever was set on previous mount. Is this intended? No mention of this behavior in
changelog.
Reinette
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event
2024-11-14 0:17 ` [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event Tony Luck
2024-11-15 16:21 ` Moger, Babu
@ 2024-11-20 3:39 ` Reinette Chatre
1 sibling, 0 replies; 32+ messages in thread
From: Reinette Chatre @ 2024-11-20 3:39 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches
Hi Tony,
On 11/13/24 4:17 PM, Tony Luck wrote:
> Instead of hard-coding the memory bandwidth local event as the
> input to the mba_sc feedback look, use the event that the user
"feedback look" -> "feedback loop"
> configured for each ctrl_mon group.
From "Changelog" in Documentation/process/maintainer-tip.rst:
"It's also useful to structure the changelog into several paragraphs and not
lump everything together into a single one. A good structure is to explain
the context, the problem and the solution in separate paragraphs and this
order."
I do not find there to be a context nor problem description in the
changelog. Do you believe this changelog is appropriate for tip? Am I missing
something?
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 7ef1a293cc13..2176e355e864 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -752,20 +752,31 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
> u32 closid, rmid, cur_msr_val, new_msr_val;
> struct mbm_state *pmbm_data, *cmbm_data;
> struct rdt_ctrl_domain *dom_mba;
> + enum resctrl_event_id evt_id;
> struct rdt_resource *r_mba;
> - u32 cur_bw, user_bw, idx;
> struct list_head *head;
> struct rdtgroup *entry;
> + u32 cur_bw, user_bw;
>
> - if (!is_mbm_local_enabled())
> + if (!is_mbm_enabled())
This change in the check is unexpected because at this point the event is still enforced to be
local MBM. This change is also undocumented so difficult to reason about.
> return;
>
> r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> + evt_id = rgrp->mba_mbps_event;
(To also answer the question in https://lore.kernel.org/all/Zzvtj8n1_ukhnRWT@agluck-desk3/ )
One key point from previous patch is that there is a new "contract" that rgrp->mba_mbps_event
is valid if mba_sc is enabled. If that contract is respected with appropriate initialization
and change of rgrp->mba_mbps_event then I do not believe that the three checks below
nor the "is_mbm_enabled()" above in reader of rgrp->mba_mbps_event are necessary since the
access of rgrp->mba_mbps_event is within "contract".
Note that caller does the checking if mba_sc is enabled:
if (is_mba_sc(NULL))
update_mba_bw(prgrp, d);
Thus doing same check within update_mba_bw() should not be necessary.
It does take a lot of digging to understand so it would really be helpful to document these types
of design decisions and reinforce them through the series.
> +
> + if (WARN_ON_ONCE(!is_mbm_event(evt_id)))
> + return;
> + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_LOCAL_EVENT_ID && !is_mbm_local_enabled()))
> + return;
> + if (WARN_ON_ONCE(evt_id == QOS_L3_MBM_TOTAL_EVENT_ID && !is_mbm_total_enabled()))
> + return;
> +
>
> closid = rgrp->closid;
> rmid = rgrp->mon.rmid;
> - idx = resctrl_arch_rmid_idx_encode(closid, rmid);
> - pmbm_data = &dom_mbm->mbm_local[idx];
> + pmbm_data = get_mbm_state(dom_mbm, closid, rmid, evt_id);
> + if (WARN_ON_ONCE(!pmbm_data))
> + return;
>
> dom_mba = get_ctrl_domain_from_cpu(smp_processor_id(), r_mba);
> if (!dom_mba) {
> @@ -784,7 +795,9 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
> */
> head = &rgrp->mon.crdtgrp_list;
> list_for_each_entry(entry, head, mon.crdtgrp_list) {
> - cmbm_data = &dom_mbm->mbm_local[entry->mon.rmid];
> + cmbm_data = get_mbm_state(dom_mbm, entry->closid, entry->mon.rmid, evt_id);
> + if (WARN_ON_ONCE(!cmbm_data))
> + return;
> cur_bw += cmbm_data->prev_bw;
> }
>
Reinette
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 4/9] x86/resctrl: Compute memory bandwidth for all supported events
2024-11-14 0:17 ` [PATCH v9 4/9] x86/resctrl: Compute memory bandwidth for all supported events Tony Luck
2024-11-15 13:53 ` Peter Newman
@ 2024-11-20 3:45 ` Reinette Chatre
2024-11-21 17:36 ` Luck, Tony
1 sibling, 1 reply; 32+ messages in thread
From: Reinette Chatre @ 2024-11-20 3:45 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches
Hi Tony,
On 11/13/24 4:17 PM, Tony Luck wrote:
> Computing the bandwidth for an event is cheap, and only done once
> per second. Doing so simplifies switching between events and allows
> choosing different events per ctrl_mon group.
This just reads like some a general statement. There surely can be
some context, problem and *some* description about how this patch goes
about addressing the problem?
>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 72 ++++++++++++---------------
> 1 file changed, 33 insertions(+), 39 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 2176e355e864..da4ae21350c8 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -663,9 +663,12 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> */
> static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
> {
> - u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
> - struct mbm_state *m = &rr->d->mbm_local[idx];
> u64 cur_bw, bytes, cur_bytes;
> + struct mbm_state *m;
> +
> + m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
> + if (WARN_ON_ONCE(!m))
> + return;
>
> cur_bytes = rr->val;
> bytes = cur_bytes - m->prev_bw_bytes;
> @@ -826,54 +829,45 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
> resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
> }
>
> -static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
> - u32 closid, u32 rmid)
> +static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> + u32 closid, u32 rmid, enum resctrl_event_id evtid)
> {
> struct rmid_read rr = {0};
>
> rr.r = r;
> rr.d = d;
> + rr.evtid = evtid;
> + rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> + if (IS_ERR(rr.arch_mon_ctx)) {
> + pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> + PTR_ERR(rr.arch_mon_ctx));
> + return;
> + }
> +
> + __mon_event_count(closid, rmid, &rr);
>
> /*
> - * This is protected from concurrent reads from user
> - * as both the user and we hold the global mutex.
> + * If the software controller is enabled, compute the
> + * bandwidth for this event id.
> */
> - if (is_mbm_total_enabled()) {
> - rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
> - rr.val = 0;
> - rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> - if (IS_ERR(rr.arch_mon_ctx)) {
> - pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> - PTR_ERR(rr.arch_mon_ctx));
> - return;
> - }
> -
> - __mon_event_count(closid, rmid, &rr);
> + if (is_mba_sc(NULL))
> + mbm_bw_count(closid, rmid, &rr);
>
> - resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
> - }
> - if (is_mbm_local_enabled()) {
> - rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
> - rr.val = 0;
> - rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> - if (IS_ERR(rr.arch_mon_ctx)) {
> - pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> - PTR_ERR(rr.arch_mon_ctx));
> - return;
> - }
> -
> - __mon_event_count(closid, rmid, &rr);
> + resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
> +}
>
> - /*
> - * Call the MBA software controller only for the
> - * control groups and when user has enabled
> - * the software controller explicitly.
> - */
> - if (is_mba_sc(NULL))
> - mbm_bw_count(closid, rmid, &rr);
> +static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
> + u32 closid, u32 rmid)
> +{
> + /*
> + * This is protected from concurrent reads from user
> + * as both the user and we hold the global mutex.
I understand that you are just copy&pasting a comment here but could you please
help to avoid any obstacles by removing the code impersonation? Perhaps something like:
* This is protected from concurrent reads from user
* as both the user and overflow handler hold the global mutex.
(please feel free to improve)
> + */
> + if (is_mbm_total_enabled())
> + mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_TOTAL_EVENT_ID);
>
> - resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
> - }
> + if (is_mbm_local_enabled())
> + mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_LOCAL_EVENT_ID);
> }
>
> /*
Reinette
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 5/9] x86/resctrl: Relax checks for mba_MBps mount option
2024-11-14 0:17 ` [PATCH v9 5/9] x86/resctrl: Relax checks for mba_MBps mount option Tony Luck
@ 2024-11-20 3:54 ` Reinette Chatre
2024-11-21 17:39 ` Luck, Tony
0 siblings, 1 reply; 32+ messages in thread
From: Reinette Chatre @ 2024-11-20 3:54 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches
Hi Tony,
On 11/13/24 4:17 PM, Tony Luck wrote:
> This option may be used with any memory bandwidth monitoring event.
Needs a changelog.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index a8022bddf9f7..3a89516e6f56 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2334,7 +2334,7 @@ static bool supports_mba_mbps(void)
> struct rdt_resource *rmbm = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
>
> - return (is_mbm_local_enabled() &&
> + return (is_mbm_enabled() &&
> r->alloc_capable && is_mba_linear() &&
> r->ctrl_scope == rmbm->mon_scope);
> }
I *thought* I had a handle on things with the understanding that rdtgroup.mba_mbps_event
is only valid when mba_sc is enabled. This understanding falls apart with this change since
at this point in series if a system only supports total MBM then mba_sc may be true
but rdtgroup.mba_mbps_event will be zero.
The expectation is that patches build on each other to create a solution but this series
does not respect this making it difficult to reason about this work. I think this series
will be easier to understand if "x86/resctrl: Make mba_sc use total bandwidth if local
is not supported" is moved before this change.
> @@ -2759,7 +2759,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
> ctx->enable_cdpl2 = true;
> return 0;
> case Opt_mba_mbps:
> - msg = "mba_MBps requires local MBM and linear scale MBA at L3 scope";
> + msg = "mba_MBps requires MBM and linear scale MBA at L3 scope";
> if (!supports_mba_mbps())
> return invalfc(fc, msg);
> ctx->enable_mba_mbps = true;
Reinette
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 6/9] x86/resctrl: Add "mba_MBps_event" file to ctrl_mon directories
2024-11-14 0:17 ` [PATCH v9 6/9] x86/resctrl: Add "mba_MBps_event" file to ctrl_mon directories Tony Luck
@ 2024-11-20 4:03 ` Reinette Chatre
2024-11-21 17:42 ` Luck, Tony
0 siblings, 1 reply; 32+ messages in thread
From: Reinette Chatre @ 2024-11-20 4:03 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Peter Newman, Jonathan Corbet, x86
Cc: James Morse, Jamie Iles, Babu Moger, Randy Dunlap,
Shaopeng Tan (Fujitsu), linux-kernel, linux-doc, patches
Hi Tony,
On 11/13/24 4:17 PM, Tony Luck wrote:
> The "mba_MBps" mount option provides an alternate method to
> control memory bandwidth. Instead of specifying allowable
> bandwidth as a percentage of maximum possible, the user
> provides a MiB/s limit value.
>
> Historically the limit was enforced by a feedback loop from
"Historically the limit was enforced" no history needed since
this is still the case at the time of this patch.
> the measure local bandwidth to adjust the memory bandwidth
> allocation controls.
I am not sure what is meant by "a feedback loop from the measure
local bandwidth" (that was copy&pasted to next patch).
>
> In preparation to allow the user to pick the memory bandwidth
> monitoring event used as input to the feedback loop, provide
> a file in each ctrl_mon group directory that shows the event
In the documentation the custom is to use CTRL_MON.
> currently in use.
Much better changelog. I think it will help to add a snippet that
mentions the file is only visible to user space if resctrl
was mounted with mba_MBps, and thus only visible when mba_sc is
enabled, and thus reinforcing that this maintains the contract
that rdtgrp->mba_mbps_event is only accessed when mba_sc is enabled.
>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
Reinette
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/9] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
2024-11-20 0:38 ` Reinette Chatre
@ 2024-11-21 17:21 ` Luck, Tony
0 siblings, 0 replies; 32+ messages in thread
From: Luck, Tony @ 2024-11-21 17:21 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Peter Newman, Jonathan Corbet, x86, James Morse,
Jamie Iles, Babu Moger, Randy Dunlap, Shaopeng Tan (Fujitsu),
linux-kernel, linux-doc, patches
On Tue, Nov 19, 2024 at 04:38:15PM -0800, Reinette Chatre wrote:
> Hi Tony,
>
> On 11/13/24 4:17 PM, Tony Luck wrote:
> > From: Babu Moger <babu.moger@amd.com>
> >
> > thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
> > fflags for resctrl files.
> >
> > Adding new files will involve adding another function to initialize
> > the fflags. This can be simplified by adding a new function
> > resctrl_file_fflags_init() and passing the file name and flags
> > to be initialized.
> >
> > Consolidate fflags initialization into resctrl_file_fflags_init() and
> > remove thread_throttle_mode_init() and mbm_config_rftype_init().
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
>
> Please place "Signed-off-by" tag before "Reviewed-by" tag. For reference,
> "Ordering of commit tags" in Documentation/process/maintainer-tip.rst
Hi Reinette,
I had misremembered what that Documentation file said. There is a
section that describes keeping the Signed-off-by: tags in the order
they were applied so that the commit comment describes the path that
the patch took before final acceptance. I thought that ordering applied
to other tags.
But while I was there refreshing my memory I also noted the section
about documenting changes made when adopting patches from other
people. So I've added:
[Tony: Drop __init attribute so resctrl_file_fflags_init() can be used
at run time]
>
> Reinette
-Tony
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/9] x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control
2024-11-20 1:08 ` Reinette Chatre
@ 2024-11-21 17:33 ` Luck, Tony
2024-11-22 21:33 ` Reinette Chatre
0 siblings, 1 reply; 32+ messages in thread
From: Luck, Tony @ 2024-11-21 17:33 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Peter Newman, Jonathan Corbet, x86, James Morse,
Jamie Iles, Babu Moger, Randy Dunlap, Shaopeng Tan (Fujitsu),
linux-kernel, linux-doc, patches
On Tue, Nov 19, 2024 at 05:08:42PM -0800, Reinette Chatre wrote:
> Hi Tony,
>
> On 11/13/24 4:17 PM, Tony Luck wrote:
> > Resctrl uses local memory bandwidth event as input to the feedback
> > loop when the mba_MBps mount option is used. This means that this
> > mount option cannot be used on systems that only support monitoring
> > of total bandwidth.
> >
> > Prepare to allow users to choose the input event independently for
> > each ctrl_mon group.
>
Hi Reinette,
> The lack of detail on design and implementation leaves a lot for the
> reader to decipher. For example,
> * the change appears to create a contract that rdtgroup.mba_mbps_event
> is only valid if mba_sc is enabled, this is "documented" in the
> structure member comment but not connected to the rest of implementation, not
> here nor later in series.
I'll add text documenting this to the commit comment here, and also a
comment in the code that defines mba_mbps_default_event.
> * the patch uses *three* different checks to manage new variables:
> is_mbm_local_enabled(), is_mba_sc(), and supports_mba_mbps(). Reader is
> left to decipher that all checks are built on is_mbm_local_enabled()
> and thus it is ok to use these checks before using the value that is only
> assigned when is_mbm_local_enabled().
With some refactoring I've got that down to just one new additon of
"is_mba_sc()" (protecting the assignment of rdtgrp->mba_mbps_event
in rdtgroup_mkdir_ctrl_mon().
> * clearly mba_mbps_default_event cannot always have a value making reader wonder
> if enum resctrl_event_id needs a "0", takes some deciphering to get confidence
> that its assignment when is_mbm_local_enabled() fits under the contract
> that values are only value when is_mba_sc() and thus any code following contract by
> first checking for mba_sc should never encounter a 0.
Documented that mba_mbps_default_event remaining "0" is not an issue in
the comment at the definition of mba_mbps_default_event.
> * based on premise of this work reader may consider what happens if
> system does not support local MBM. more deciphering needed to get confidence
> that while mba_mbps_default_event will not be set, since is_mba_sc() still
> depends on local MBM this still fits under contract that mba_mbps_default_event
> cannot be used in this case.
I think I've cleared up this mystery with commit and code comments.
> Of course, it may just me that needs more help to understand what a patch is doing
> while having little insight into what it intends to do. I thought by sharing some of
> the questions I felt needed to investigated may give some insight into the difficulty
> a cryptic changelog creates. Review could be helped significantly if the changelog
> provides insight into the design decisions.
Good point. I've also re-worked the cover letter to provide more
information on the problem and solution implemented by this series.
> ...
>
> > @@ -3611,6 +3613,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
> > rdt_last_cmd_puts("kernfs subdir error\n");
> > goto out_del_list;
> > }
> > + if (is_mba_sc(NULL))
> > + rdtgrp->mba_mbps_event = mba_mbps_default_event;
> > }
> >
> > goto out_unlock;
> > @@ -3970,6 +3974,8 @@ static void __init rdtgroup_setup_default(void)
> > rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
> > rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
> > rdtgroup_default.type = RDTCTRL_GROUP;
> > + if (supports_mba_mbps())
> > + rdtgroup_default.mba_mbps_event = mba_mbps_default_event;
> > INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
> >
> > list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
>
> I do not see the default resource group's mba_mbps_event ever being reset. This means
> that if the user mounts resctrl, changes mba_mbps_event, umount resctrl, remount
> resctrl, then the default resource group will not have the default mba_mbps_event
> but whatever was set on previous mount. Is this intended? No mention of this behavior in
> changelog.
Good catch. You are correct that a changed event value in the default
CTRL_MON group is preserved across unmount/remount. This was not
intentional. Moving the initialization of rdtgroup_default.mba_mbps_event
into set_mba_sc() fixes this (with the added benefit of removing the
supports_mba_mbps() check).
> Reinette
-Tony
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 4/9] x86/resctrl: Compute memory bandwidth for all supported events
2024-11-20 3:45 ` Reinette Chatre
@ 2024-11-21 17:36 ` Luck, Tony
0 siblings, 0 replies; 32+ messages in thread
From: Luck, Tony @ 2024-11-21 17:36 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Peter Newman, Jonathan Corbet, x86, James Morse,
Jamie Iles, Babu Moger, Randy Dunlap, Shaopeng Tan (Fujitsu),
linux-kernel, linux-doc, patches
On Tue, Nov 19, 2024 at 07:45:01PM -0800, Reinette Chatre wrote:
> Hi Tony,
>
> On 11/13/24 4:17 PM, Tony Luck wrote:
> > Computing the bandwidth for an event is cheap, and only done once
> > per second. Doing so simplifies switching between events and allows
> > choosing different events per ctrl_mon group.
>
> This just reads like some a general statement. There surely can be
> some context, problem and *some* description about how this patch goes
> about addressing the problem?
I've rewritten this in the problem ... solution format.
> >
> > Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> > arch/x86/kernel/cpu/resctrl/monitor.c | 72 ++++++++++++---------------
> > 1 file changed, 33 insertions(+), 39 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> > index 2176e355e864..da4ae21350c8 100644
> > --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> > @@ -663,9 +663,12 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> > */
> > static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
> > {
> > - u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
> > - struct mbm_state *m = &rr->d->mbm_local[idx];
> > u64 cur_bw, bytes, cur_bytes;
> > + struct mbm_state *m;
> > +
> > + m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
> > + if (WARN_ON_ONCE(!m))
> > + return;
> >
> > cur_bytes = rr->val;
> > bytes = cur_bytes - m->prev_bw_bytes;
> > @@ -826,54 +829,45 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
> > resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
> > }
> >
> > -static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
> > - u32 closid, u32 rmid)
> > +static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> > + u32 closid, u32 rmid, enum resctrl_event_id evtid)
> > {
> > struct rmid_read rr = {0};
> >
> > rr.r = r;
> > rr.d = d;
> > + rr.evtid = evtid;
> > + rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> > + if (IS_ERR(rr.arch_mon_ctx)) {
> > + pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> > + PTR_ERR(rr.arch_mon_ctx));
> > + return;
> > + }
> > +
> > + __mon_event_count(closid, rmid, &rr);
> >
> > /*
> > - * This is protected from concurrent reads from user
> > - * as both the user and we hold the global mutex.
> > + * If the software controller is enabled, compute the
> > + * bandwidth for this event id.
> > */
> > - if (is_mbm_total_enabled()) {
> > - rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
> > - rr.val = 0;
> > - rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> > - if (IS_ERR(rr.arch_mon_ctx)) {
> > - pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> > - PTR_ERR(rr.arch_mon_ctx));
> > - return;
> > - }
> > -
> > - __mon_event_count(closid, rmid, &rr);
> > + if (is_mba_sc(NULL))
> > + mbm_bw_count(closid, rmid, &rr);
> >
> > - resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
> > - }
> > - if (is_mbm_local_enabled()) {
> > - rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
> > - rr.val = 0;
> > - rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> > - if (IS_ERR(rr.arch_mon_ctx)) {
> > - pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> > - PTR_ERR(rr.arch_mon_ctx));
> > - return;
> > - }
> > -
> > - __mon_event_count(closid, rmid, &rr);
> > + resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
> > +}
> >
> > - /*
> > - * Call the MBA software controller only for the
> > - * control groups and when user has enabled
> > - * the software controller explicitly.
> > - */
> > - if (is_mba_sc(NULL))
> > - mbm_bw_count(closid, rmid, &rr);
> > +static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
> > + u32 closid, u32 rmid)
> > +{
> > + /*
> > + * This is protected from concurrent reads from user
> > + * as both the user and we hold the global mutex.
>
> I understand that you are just copy&pasting a comment here but could you please
> help to avoid any obstacles by removing the code impersonation? Perhaps something like:
>
> * This is protected from concurrent reads from user
> * as both the user and overflow handler hold the global mutex.
>
> (please feel free to improve)
No improvement to the text needed. But I did move some words from 2nd
line to the first to look better (IMHO).
> > + */
> > + if (is_mbm_total_enabled())
> > + mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_TOTAL_EVENT_ID);
> >
> > - resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
> > - }
> > + if (is_mbm_local_enabled())
> > + mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_LOCAL_EVENT_ID);
> > }
> >
> > /*
>
> Reinette
-Tony
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 5/9] x86/resctrl: Relax checks for mba_MBps mount option
2024-11-20 3:54 ` Reinette Chatre
@ 2024-11-21 17:39 ` Luck, Tony
0 siblings, 0 replies; 32+ messages in thread
From: Luck, Tony @ 2024-11-21 17:39 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Peter Newman, Jonathan Corbet, x86, James Morse,
Jamie Iles, Babu Moger, Randy Dunlap, Shaopeng Tan (Fujitsu),
linux-kernel, linux-doc, patches
On Tue, Nov 19, 2024 at 07:54:10PM -0800, Reinette Chatre wrote:
> Hi Tony,
>
> On 11/13/24 4:17 PM, Tony Luck wrote:
> > This option may be used with any memory bandwidth monitoring event.
>
> Needs a changelog.
Added one.
> >
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index a8022bddf9f7..3a89516e6f56 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -2334,7 +2334,7 @@ static bool supports_mba_mbps(void)
> > struct rdt_resource *rmbm = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> > struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> >
> > - return (is_mbm_local_enabled() &&
> > + return (is_mbm_enabled() &&
> > r->alloc_capable && is_mba_linear() &&
> > r->ctrl_scope == rmbm->mon_scope);
> > }
>
> I *thought* I had a handle on things with the understanding that rdtgroup.mba_mbps_event
> is only valid when mba_sc is enabled. This understanding falls apart with this change since
> at this point in series if a system only supports total MBM then mba_sc may be true
> but rdtgroup.mba_mbps_event will be zero.
>
> The expectation is that patches build on each other to create a solution but this series
> does not respect this making it difficult to reason about this work. I think this series
> will be easier to understand if "x86/resctrl: Make mba_sc use total bandwidth if local
> is not supported" is moved before this change.
Rather than moving that patch before this one, I've merged it into this
patch since both are small and intimately connected.
> > @@ -2759,7 +2759,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > ctx->enable_cdpl2 = true;
> > return 0;
> > case Opt_mba_mbps:
> > - msg = "mba_MBps requires local MBM and linear scale MBA at L3 scope";
> > + msg = "mba_MBps requires MBM and linear scale MBA at L3 scope";
> > if (!supports_mba_mbps())
> > return invalfc(fc, msg);
> > ctx->enable_mba_mbps = true;
>
> Reinette
-Tony
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 6/9] x86/resctrl: Add "mba_MBps_event" file to ctrl_mon directories
2024-11-20 4:03 ` Reinette Chatre
@ 2024-11-21 17:42 ` Luck, Tony
0 siblings, 0 replies; 32+ messages in thread
From: Luck, Tony @ 2024-11-21 17:42 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Peter Newman, Jonathan Corbet, x86, James Morse,
Jamie Iles, Babu Moger, Randy Dunlap, Shaopeng Tan (Fujitsu),
linux-kernel, linux-doc, patches
On Tue, Nov 19, 2024 at 08:03:06PM -0800, Reinette Chatre wrote:
> Hi Tony,
>
> On 11/13/24 4:17 PM, Tony Luck wrote:
> > The "mba_MBps" mount option provides an alternate method to
> > control memory bandwidth. Instead of specifying allowable
> > bandwidth as a percentage of maximum possible, the user
> > provides a MiB/s limit value.
> >
> > Historically the limit was enforced by a feedback loop from
>
> "Historically the limit was enforced" no history needed since
> this is still the case at the time of this patch.
>
> > the measure local bandwidth to adjust the memory bandwidth
> > allocation controls.
>
> I am not sure what is meant by "a feedback loop from the measure
> local bandwidth" (that was copy&pasted to next patch).
Dropped the whole of the "Historically" paragraph.
>
> >
> > In preparation to allow the user to pick the memory bandwidth
> > monitoring event used as input to the feedback loop, provide
> > a file in each ctrl_mon group directory that shows the event
>
> In the documentation the custom is to use CTRL_MON.
Fixed here (and in several other places in the series).
> > currently in use.
>
> Much better changelog. I think it will help to add a snippet that
> mentions the file is only visible to user space if resctrl
> was mounted with mba_MBps, and thus only visible when mba_sc is
> enabled, and thus reinforcing that this maintains the contract
> that rdtgrp->mba_mbps_event is only accessed when mba_sc is enabled.
Added text to commit message about this file only being visible
when the "mba_MBps" mount option is in use.
> >
> > Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
>
> Reinette
-Tony
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/9] x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control
2024-11-21 17:33 ` Luck, Tony
@ 2024-11-22 21:33 ` Reinette Chatre
0 siblings, 0 replies; 32+ messages in thread
From: Reinette Chatre @ 2024-11-22 21:33 UTC (permalink / raw)
To: Luck, Tony
Cc: Fenghua Yu, Peter Newman, Jonathan Corbet, x86, James Morse,
Jamie Iles, Babu Moger, Randy Dunlap, Shaopeng Tan (Fujitsu),
linux-kernel, linux-doc, patches
Hi Tony,
On 11/21/24 9:33 AM, Luck, Tony wrote:
> On Tue, Nov 19, 2024 at 05:08:42PM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 11/13/24 4:17 PM, Tony Luck wrote:
>>> Resctrl uses local memory bandwidth event as input to the feedback
>>> loop when the mba_MBps mount option is used. This means that this
>>> mount option cannot be used on systems that only support monitoring
>>> of total bandwidth.
>>>
>>> Prepare to allow users to choose the input event independently for
>>> each ctrl_mon group.
>>
> Hi Reinette,
>
>> The lack of detail on design and implementation leaves a lot for the
>> reader to decipher. For example,
>> * the change appears to create a contract that rdtgroup.mba_mbps_event
>> is only valid if mba_sc is enabled, this is "documented" in the
>> structure member comment but not connected to the rest of implementation, not
>> here nor later in series.
>
> I'll add text documenting this to the commit comment here, and also a
> comment in the code that defines mba_mbps_default_event.
>
>> * the patch uses *three* different checks to manage new variables:
>> is_mbm_local_enabled(), is_mba_sc(), and supports_mba_mbps(). Reader is
>> left to decipher that all checks are built on is_mbm_local_enabled()
>> and thus it is ok to use these checks before using the value that is only
>> assigned when is_mbm_local_enabled().
>
> With some refactoring I've got that down to just one new additon of
> "is_mba_sc()" (protecting the assignment of rdtgrp->mba_mbps_event
> in rdtgroup_mkdir_ctrl_mon().
>
Just to clarify, I am not stating that there should not be three different checks. Clearly
if some initialization is done before resctrl mount then the mba_sc checks may not apply.
My goal was to highlight the investigations a reader is forced to do without
guidance from the changelog.
...
>>> @@ -3611,6 +3613,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>>> rdt_last_cmd_puts("kernfs subdir error\n");
>>> goto out_del_list;
>>> }
>>> + if (is_mba_sc(NULL))
>>> + rdtgrp->mba_mbps_event = mba_mbps_default_event;
>>> }
>>>
>>> goto out_unlock;
>>> @@ -3970,6 +3974,8 @@ static void __init rdtgroup_setup_default(void)
>>> rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
>>> rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
>>> rdtgroup_default.type = RDTCTRL_GROUP;
>>> + if (supports_mba_mbps())
>>> + rdtgroup_default.mba_mbps_event = mba_mbps_default_event;
>>> INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
>>>
>>> list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
>>
>> I do not see the default resource group's mba_mbps_event ever being reset. This means
>> that if the user mounts resctrl, changes mba_mbps_event, umount resctrl, remount
>> resctrl, then the default resource group will not have the default mba_mbps_event
>> but whatever was set on previous mount. Is this intended? No mention of this behavior in
>> changelog.
>
> Good catch. You are correct that a changed event value in the default
> CTRL_MON group is preserved across unmount/remount. This was not
> intentional. Moving the initialization of rdtgroup_default.mba_mbps_event
> into set_mba_sc() fixes this (with the added benefit of removing the
> supports_mba_mbps() check).
I am curious how this will turn out considering the fragmentation of the rdtgroup_default
initialization.
Reinette
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2024-11-22 21:33 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-14 0:17 [PATCH v9 0/9] x86/resctrl: mba_MBps enhancement Tony Luck
2024-11-14 0:17 ` [PATCH v9 1/9] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Tony Luck
2024-11-15 16:19 ` Moger, Babu
2024-11-20 0:38 ` Reinette Chatre
2024-11-21 17:21 ` Luck, Tony
2024-11-14 0:17 ` [PATCH v9 2/9] x86/resctrl: Prepare for per-ctrl_mon group mba_MBps control Tony Luck
2024-11-15 16:20 ` Moger, Babu
2024-11-18 23:47 ` Luck, Tony
2024-11-20 1:08 ` Reinette Chatre
2024-11-21 17:33 ` Luck, Tony
2024-11-22 21:33 ` Reinette Chatre
2024-11-14 0:17 ` [PATCH v9 3/9] x86/resctrl: Modify update_mba_bw() to use per ctrl_mon group event Tony Luck
2024-11-15 16:21 ` Moger, Babu
2024-11-19 0:01 ` Luck, Tony
2024-11-19 0:51 ` Reinette Chatre
2024-11-19 1:44 ` Luck, Tony
2024-11-19 17:36 ` Reinette Chatre
2024-11-20 3:39 ` Reinette Chatre
2024-11-14 0:17 ` [PATCH v9 4/9] x86/resctrl: Compute memory bandwidth for all supported events Tony Luck
2024-11-15 13:53 ` Peter Newman
2024-11-15 16:59 ` Luck, Tony
2024-11-20 3:45 ` Reinette Chatre
2024-11-21 17:36 ` Luck, Tony
2024-11-14 0:17 ` [PATCH v9 5/9] x86/resctrl: Relax checks for mba_MBps mount option Tony Luck
2024-11-20 3:54 ` Reinette Chatre
2024-11-21 17:39 ` Luck, Tony
2024-11-14 0:17 ` [PATCH v9 6/9] x86/resctrl: Add "mba_MBps_event" file to ctrl_mon directories Tony Luck
2024-11-20 4:03 ` Reinette Chatre
2024-11-21 17:42 ` Luck, Tony
2024-11-14 0:17 ` [PATCH v9 7/9] x86/resctrl: Add write option to "mba_MBps_event" file Tony Luck
2024-11-14 0:17 ` [PATCH v9 8/9] x86/resctrl: Make mba_sc use total bandwidth if local is not supported Tony Luck
2024-11-14 0:17 ` [PATCH v9 9/9] x86/resctrl: Document the new "mba_MBps_event" file Tony Luck
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).