From: Xiaochen Shen <xiaochen.shen@intel.com>
To: Tony Luck <tony.luck@intel.com>,
Fenghua Yu <fenghua.yu@intel.com>,
Reinette Chatre <reinette.chatre@intel.com>,
Peter Newman <peternewman@google.com>,
x86@kernel.org
Cc: James Morse <james.morse@arm.com>,
Jamie Iles <quic_jiles@quicinc.com>,
Babu Moger <babu.moger@amd.com>,
linux-kernel@vger.kernel.org, patches@lists.linux.dev,
xiaochen.shen@intel.com
Subject: Re: [PATCH] x86/resctrl: Implement new MBA_mbps throttling heuristic
Date: Wed, 17 Jan 2024 11:40:13 +0800 [thread overview]
Message-ID: <063f4e49-94c2-4e2e-9ddf-70f0afc3ab05@intel.com> (raw)
In-Reply-To: <20240109220003.29225-1-tony.luck@intel.com>
Tested-by: Xiaochen Shen <xiaochen.shen@intel.com>
On 1/10/2024 6:00, Tony Luck wrote:
> The MBA_mbps feedback loop increases throttling when a group is using
> more bandwidth than the target set by the user in the schemata file, and
> decreases throttling when below target.
>
> To avoid possibly stepping throttling up and down on every poll a
> flag "delta_comp" is set whenever throttling is changed to indicate
> that the actual change in bandwidth should be recorded on the next
> poll in "delta_bw". Throttling is only reduced if the current bandwidth
> plus delta_bw is below the user target.
>
> This algorithm works well if the workload has steady bandwidth needs.
> But it can go badly wrong if the workload moves to a different phase
> just as the throttling level changed. E.g. if the workload becomes
> essentially idle right as throttling level is increased, the value
> calculated for delta_bw will be more or less the old bandwidth level.
> If the workload then resumes, Linux may never reduce throttling because
> current bandwidth plu delta_bw is above the target set by the user.
>
> Implement a simpler heuristic by assuming that in the worst case the
> currently measured bandwidth is being controlled by the current level of
> throttling. Compute how much it may increase if throttling is relaxed to
> the next higher level. If that is still below the user target, then it
> is ok to reduce the amount of throttling.
>
> Fixes: ba0f26d8529c ("x86/intel_rdt/mba_sc: Prepare for feedback loop")
> Reported-by: Xiaochen Shen <xiaochen.shen@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>
> This patch was previously posted in:
>
> https://lore.kernel.org/lkml/ZVU+L92LRBbJXgXn@agluck-desk3/#t
>
> as part of a proposal to allow the mba_MBps mount option to base its
> feedback loop input on total bandwidth instead of local bandwidth.
> Sending it now as a standalone patch because Xiaochen reported that
> real systems have experienced problems when delta_bw is incorrectly
> calculated.
>
> arch/x86/kernel/cpu/resctrl/internal.h | 4 ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 42 ++++++--------------------
> 2 files changed, 10 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index a4f1aa15f0a2..71bbd2245cc7 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -296,14 +296,10 @@ struct rftype {
> * struct mbm_state - status for each MBM counter in each domain
> * @prev_bw_bytes: Previous bytes value read for bandwidth calculation
> * @prev_bw: The most recent bandwidth in MBps
> - * @delta_bw: Difference between the current and previous bandwidth
> - * @delta_comp: Indicates whether to compute the delta_bw
> */
> struct mbm_state {
> u64 prev_bw_bytes;
> u32 prev_bw;
> - u32 delta_bw;
> - bool delta_comp;
> };
>
> /**
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index f136ac046851..1961823b555b 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -440,9 +440,6 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
>
> cur_bw = bytes / SZ_1M;
>
> - if (m->delta_comp)
> - m->delta_bw = abs(cur_bw - m->prev_bw);
> - m->delta_comp = false;
> m->prev_bw = cur_bw;
> }
>
> @@ -520,7 +517,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
> {
> u32 closid, rmid, cur_msr_val, new_msr_val;
> struct mbm_state *pmbm_data, *cmbm_data;
> - u32 cur_bw, delta_bw, user_bw;
> + u32 cur_bw, user_bw;
> struct rdt_resource *r_mba;
> struct rdt_domain *dom_mba;
> struct list_head *head;
> @@ -543,7 +540,6 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
>
> cur_bw = pmbm_data->prev_bw;
> user_bw = dom_mba->mbps_val[closid];
> - delta_bw = pmbm_data->delta_bw;
>
> /* MBA resource doesn't support CDP */
> cur_msr_val = resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE);
> @@ -555,49 +551,31 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
> list_for_each_entry(entry, head, mon.crdtgrp_list) {
> cmbm_data = &dom_mbm->mbm_local[entry->mon.rmid];
> cur_bw += cmbm_data->prev_bw;
> - delta_bw += cmbm_data->delta_bw;
> }
>
> /*
> * Scale up/down the bandwidth linearly for the ctrl group. The
> * bandwidth step is the bandwidth granularity specified by the
> * hardware.
> - *
> - * The delta_bw is used when increasing the bandwidth so that we
> - * dont alternately increase and decrease the control values
> - * continuously.
> - *
> - * For ex: consider cur_bw = 90MBps, user_bw = 100MBps and if
> - * bandwidth step is 20MBps(> user_bw - cur_bw), we would keep
> - * switching between 90 and 110 continuously if we only check
> - * cur_bw < user_bw.
> + * Always increase throttling if current bandwidth is above the
> + * target set by user.
> + * But avoid thrashing up and down on every poll by checking
> + * whether a decrease in throttling is likely to push the group
> + * back over target. E.g. if currently throttling to 30% of bandwidth
> + * on a system with 10% granularity steps, check whether moving to
> + * 40% would go past the limit by multiplying current bandwidth by
> + * "(30 + 10) / 30".
> */
> if (cur_msr_val > r_mba->membw.min_bw && user_bw < cur_bw) {
> new_msr_val = cur_msr_val - r_mba->membw.bw_gran;
> } else if (cur_msr_val < MAX_MBA_BW &&
> - (user_bw > (cur_bw + delta_bw))) {
> + (user_bw > (cur_bw * (cur_msr_val + r_mba->membw.min_bw) / cur_msr_val))) {
> new_msr_val = cur_msr_val + r_mba->membw.bw_gran;
> } else {
> return;
> }
>
> resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
> -
> - /*
> - * Delta values are updated dynamically package wise for each
> - * rdtgrp every time the throttle MSR changes value.
> - *
> - * This is because (1)the increase in bandwidth is not perfectly
> - * linear and only "approximately" linear even when the hardware
> - * says it is linear.(2)Also since MBA is a core specific
> - * mechanism, the delta values vary based on number of cores used
> - * by the rdtgrp.
> - */
> - pmbm_data->delta_comp = true;
> - list_for_each_entry(entry, head, mon.crdtgrp_list) {
> - cmbm_data = &dom_mbm->mbm_local[entry->mon.rmid];
> - cmbm_data->delta_comp = true;
> - }
> }
>
> static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
>
> base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
Best regards,
Xiaochen
next prev parent reply other threads:[~2024-01-17 3:40 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-24 18:16 [PATCH] x86/resctrl: mba_MBps: Fall back to total b/w if local b/w unavailable Tony Luck
2023-10-24 18:24 ` Luck, Tony
2023-10-24 23:20 ` Moger, Babu
2023-10-24 23:43 ` Luck, Tony
2023-10-25 16:01 ` Moger, Babu
2023-10-25 12:46 ` Peter Newman
2023-10-25 19:38 ` Tony Luck
2023-10-25 20:39 ` Moger, Babu
2023-10-25 20:42 ` Moger, Babu
2023-10-25 20:52 ` Tony Luck
2023-10-25 23:41 ` Moger, Babu
2023-10-26 0:07 ` Luck, Tony
2023-10-25 21:06 ` Peter Newman
2023-10-26 13:55 ` Moger, Babu
2023-10-26 16:09 ` Luck, Tony
2023-10-26 17:19 ` Moger, Babu
2023-10-26 19:54 ` Tony Luck
2023-10-25 23:50 ` [PATCH v2] " Tony Luck
2023-10-26 20:02 ` [PATCH v3] " Tony Luck
2023-10-26 22:40 ` Moger, Babu
2023-10-26 22:59 ` Luck, Tony
2023-11-03 21:43 ` Reinette Chatre
2023-11-03 21:50 ` Reinette Chatre
2023-11-07 21:15 ` Tony Luck
2023-11-08 21:49 ` Reinette Chatre
2023-11-09 21:27 ` Luck, Tony
2023-11-15 16:09 ` Reinette Chatre
2023-11-15 21:54 ` Tony Luck
2023-11-16 19:48 ` Reinette Chatre
2023-11-28 23:14 ` [PATCH v4] x86/resctrl: Add mount option to pick total MBM event Tony Luck
2023-11-29 23:48 ` Reinette Chatre
2023-12-01 20:45 ` Tony Luck
2023-12-01 21:47 ` [PATCH v5] x86/resctrl: Add event choices for mba_MBps Tony Luck
2023-12-04 16:24 ` Moger, Babu
2023-12-04 18:16 ` Tony Luck
2023-12-04 19:04 ` Moger, Babu
2023-12-04 19:45 ` Luck, Tony
2023-12-04 20:03 ` Reinette Chatre
2023-12-04 21:08 ` Tony Luck
2023-12-04 22:15 ` Reinette Chatre
2023-12-04 22:51 ` Reinette Chatre
2023-12-07 19:56 ` [PATCH v6 0/3] x86/resctrl: mba_MBps enhancements Tony Luck
2023-12-07 19:56 ` [PATCH v6 1/3] x86/resctrl: Add mount option "mba_MBps_event" Tony Luck
2023-12-08 18:17 ` Peter Newman
2023-12-08 21:57 ` Tony Luck
2023-12-08 22:09 ` Peter Newman
2023-12-08 22:37 ` Luck, Tony
2023-12-12 17:54 ` Reinette Chatre
2023-12-12 20:02 ` Tony Luck
2023-12-12 21:42 ` Reinette Chatre
2023-12-13 1:07 ` Luck, Tony
2023-12-08 18:29 ` Moger, Babu
2023-12-08 21:50 ` Tony Luck
2023-12-12 18:59 ` Reinette Chatre
2023-12-07 19:56 ` [PATCH v6 2/3] x86/resctrl: Use total bandwidth for mba_MBps option when local isn't present Tony Luck
2023-12-08 18:26 ` Peter Newman
2023-12-07 19:56 ` [PATCH v6 3/3] x86/resctrl: Add new "mba_MBps_event" mount option to documentation Tony Luck
2023-12-08 19:22 ` Peter Newman
2023-12-12 18:59 ` Reinette Chatre
2024-01-09 22:00 ` [PATCH] x86/resctrl: Implement new MBA_mbps throttling heuristic Tony Luck
2024-01-16 19:55 ` Reinette Chatre
2024-01-17 3:36 ` Xiaochen Shen
2024-01-17 3:40 ` Xiaochen Shen [this message]
2024-01-18 0:26 ` Reinette Chatre
2024-01-18 21:42 ` [PATCH v2] x86/resctrl: Implement new mba_MBps " Tony Luck
2024-01-22 17:34 ` Reinette Chatre
2024-01-22 18:07 ` Luck, Tony
2024-01-22 18:18 ` Reinette Chatre
2024-01-22 18:21 ` Borislav Petkov
2024-01-22 18:41 ` Reinette Chatre
2024-01-22 18:47 ` Borislav Petkov
2024-01-22 20:58 ` Luck, Tony
2024-01-23 12:12 ` James Morse
2024-01-23 17:07 ` Luck, Tony
2024-01-24 0:29 ` Tony Luck
2024-01-25 17:29 ` Tony Luck
2024-01-22 18:08 ` [PATCH v3] " Tony Luck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=063f4e49-94c2-4e2e-9ddf-70f0afc3ab05@intel.com \
--to=xiaochen.shen@intel.com \
--cc=babu.moger@amd.com \
--cc=fenghua.yu@intel.com \
--cc=james.morse@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=peternewman@google.com \
--cc=quic_jiles@quicinc.com \
--cc=reinette.chatre@intel.com \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).