patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Xiaochen Shen <xiaochen.shen@intel.com>
To: Tony Luck <tony.luck@intel.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Peter Newman <peternewman@google.com>,
	x86@kernel.org
Cc: James Morse <james.morse@arm.com>,
	Jamie Iles <quic_jiles@quicinc.com>,
	Babu Moger <babu.moger@amd.com>,
	linux-kernel@vger.kernel.org, patches@lists.linux.dev,
	xiaochen.shen@intel.com
Subject: Re: [PATCH] x86/resctrl: Implement new MBA_mbps throttling heuristic
Date: Wed, 17 Jan 2024 11:40:13 +0800	[thread overview]
Message-ID: <063f4e49-94c2-4e2e-9ddf-70f0afc3ab05@intel.com> (raw)
In-Reply-To: <20240109220003.29225-1-tony.luck@intel.com>

Tested-by: Xiaochen Shen <xiaochen.shen@intel.com>

On 1/10/2024 6:00, Tony Luck wrote:
> The MBA_mbps feedback loop increases throttling when a group is using
> more bandwidth than the target set by the user in the schemata file, and
> decreases throttling when below target.
>
> To avoid possibly stepping throttling up and down on every poll a
> flag "delta_comp" is set whenever throttling is changed to indicate
> that the actual change in bandwidth should be recorded on the next
> poll in "delta_bw". Throttling is only reduced if the current bandwidth
> plus delta_bw is below the user target.
>
> This algorithm works well if the workload has steady bandwidth needs.
> But it can go badly wrong if the workload moves to a different phase
> just as the throttling level changed. E.g. if the workload becomes
> essentially idle right as throttling level is increased, the value
> calculated for delta_bw will be more or less the old bandwidth level.
> If the workload then resumes, Linux may never reduce throttling because
> current bandwidth plu delta_bw is above the target set by the user.
>
> Implement a simpler heuristic by assuming that in the worst case the
> currently measured bandwidth is being controlled by the current level of
> throttling. Compute how much it may increase if throttling is relaxed to
> the next higher level. If that is still below the user target, then it
> is ok to reduce the amount of throttling.
>
> Fixes: ba0f26d8529c ("x86/intel_rdt/mba_sc: Prepare for feedback loop")
> Reported-by: Xiaochen Shen <xiaochen.shen@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>
> This patch was previously posted in:
>
>    https://lore.kernel.org/lkml/ZVU+L92LRBbJXgXn@agluck-desk3/#t
>
> as part of a proposal to allow the mba_MBps mount option to base its
> feedback loop input on total bandwidth instead of local bandwidth.
> Sending it now as a standalone patch because Xiaochen reported that
> real systems have experienced problems when delta_bw is incorrectly
> calculated.
>
>   arch/x86/kernel/cpu/resctrl/internal.h |  4 ---
>   arch/x86/kernel/cpu/resctrl/monitor.c  | 42 ++++++--------------------
>   2 files changed, 10 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index a4f1aa15f0a2..71bbd2245cc7 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -296,14 +296,10 @@ struct rftype {
>    * struct mbm_state - status for each MBM counter in each domain
>    * @prev_bw_bytes: Previous bytes value read for bandwidth calculation
>    * @prev_bw:	The most recent bandwidth in MBps
> - * @delta_bw:	Difference between the current and previous bandwidth
> - * @delta_comp:	Indicates whether to compute the delta_bw
>    */
>   struct mbm_state {
>   	u64	prev_bw_bytes;
>   	u32	prev_bw;
> -	u32	delta_bw;
> -	bool	delta_comp;
>   };
>   
>   /**
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index f136ac046851..1961823b555b 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -440,9 +440,6 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
>   
>   	cur_bw = bytes / SZ_1M;
>   
> -	if (m->delta_comp)
> -		m->delta_bw = abs(cur_bw - m->prev_bw);
> -	m->delta_comp = false;
>   	m->prev_bw = cur_bw;
>   }
>   
> @@ -520,7 +517,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
>   {
>   	u32 closid, rmid, cur_msr_val, new_msr_val;
>   	struct mbm_state *pmbm_data, *cmbm_data;
> -	u32 cur_bw, delta_bw, user_bw;
> +	u32 cur_bw, user_bw;
>   	struct rdt_resource *r_mba;
>   	struct rdt_domain *dom_mba;
>   	struct list_head *head;
> @@ -543,7 +540,6 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
>   
>   	cur_bw = pmbm_data->prev_bw;
>   	user_bw = dom_mba->mbps_val[closid];
> -	delta_bw = pmbm_data->delta_bw;
>   
>   	/* MBA resource doesn't support CDP */
>   	cur_msr_val = resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE);
> @@ -555,49 +551,31 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
>   	list_for_each_entry(entry, head, mon.crdtgrp_list) {
>   		cmbm_data = &dom_mbm->mbm_local[entry->mon.rmid];
>   		cur_bw += cmbm_data->prev_bw;
> -		delta_bw += cmbm_data->delta_bw;
>   	}
>   
>   	/*
>   	 * Scale up/down the bandwidth linearly for the ctrl group.  The
>   	 * bandwidth step is the bandwidth granularity specified by the
>   	 * hardware.
> -	 *
> -	 * The delta_bw is used when increasing the bandwidth so that we
> -	 * dont alternately increase and decrease the control values
> -	 * continuously.
> -	 *
> -	 * For ex: consider cur_bw = 90MBps, user_bw = 100MBps and if
> -	 * bandwidth step is 20MBps(> user_bw - cur_bw), we would keep
> -	 * switching between 90 and 110 continuously if we only check
> -	 * cur_bw < user_bw.
> +	 * Always increase throttling if current bandwidth is above the
> +	 * target set by user.
> +	 * But avoid thrashing up and down on every poll by checking
> +	 * whether a decrease in throttling is likely to push the group
> +	 * back over target. E.g. if currently throttling to 30% of bandwidth
> +	 * on a system with 10% granularity steps, check whether moving to
> +	 * 40% would go past the limit by multiplying current bandwidth by
> +	 * "(30 + 10) / 30".
>   	 */
>   	if (cur_msr_val > r_mba->membw.min_bw && user_bw < cur_bw) {
>   		new_msr_val = cur_msr_val - r_mba->membw.bw_gran;
>   	} else if (cur_msr_val < MAX_MBA_BW &&
> -		   (user_bw > (cur_bw + delta_bw))) {
> +		   (user_bw > (cur_bw * (cur_msr_val + r_mba->membw.min_bw) / cur_msr_val))) {
>   		new_msr_val = cur_msr_val + r_mba->membw.bw_gran;
>   	} else {
>   		return;
>   	}
>   
>   	resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
> -
> -	/*
> -	 * Delta values are updated dynamically package wise for each
> -	 * rdtgrp every time the throttle MSR changes value.
> -	 *
> -	 * This is because (1)the increase in bandwidth is not perfectly
> -	 * linear and only "approximately" linear even when the hardware
> -	 * says it is linear.(2)Also since MBA is a core specific
> -	 * mechanism, the delta values vary based on number of cores used
> -	 * by the rdtgrp.
> -	 */
> -	pmbm_data->delta_comp = true;
> -	list_for_each_entry(entry, head, mon.crdtgrp_list) {
> -		cmbm_data = &dom_mbm->mbm_local[entry->mon.rmid];
> -		cmbm_data->delta_comp = true;
> -	}
>   }
>   
>   static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
>
> base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a

Best regards,
Xiaochen


  parent reply	other threads:[~2024-01-17  3:40 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-24 18:16 [PATCH] x86/resctrl: mba_MBps: Fall back to total b/w if local b/w unavailable Tony Luck
2023-10-24 18:24 ` Luck, Tony
2023-10-24 23:20 ` Moger, Babu
2023-10-24 23:43   ` Luck, Tony
2023-10-25 16:01     ` Moger, Babu
2023-10-25 12:46 ` Peter Newman
2023-10-25 19:38   ` Tony Luck
2023-10-25 20:39     ` Moger, Babu
2023-10-25 20:42       ` Moger, Babu
2023-10-25 20:52         ` Tony Luck
2023-10-25 23:41           ` Moger, Babu
2023-10-26  0:07             ` Luck, Tony
2023-10-25 21:06     ` Peter Newman
2023-10-26 13:55       ` Moger, Babu
2023-10-26 16:09         ` Luck, Tony
2023-10-26 17:19           ` Moger, Babu
2023-10-26 19:54             ` Tony Luck
2023-10-25 23:50 ` [PATCH v2] " Tony Luck
2023-10-26 20:02   ` [PATCH v3] " Tony Luck
2023-10-26 22:40     ` Moger, Babu
2023-10-26 22:59       ` Luck, Tony
2023-11-03 21:43     ` Reinette Chatre
2023-11-03 21:50       ` Reinette Chatre
2023-11-07 21:15       ` Tony Luck
2023-11-08 21:49         ` Reinette Chatre
2023-11-09 21:27           ` Luck, Tony
2023-11-15 16:09             ` Reinette Chatre
2023-11-15 21:54               ` Tony Luck
2023-11-16 19:48                 ` Reinette Chatre
2023-11-28 23:14     ` [PATCH v4] x86/resctrl: Add mount option to pick total MBM event Tony Luck
2023-11-29 23:48       ` Reinette Chatre
2023-12-01 20:45         ` Tony Luck
2023-12-01 21:47       ` [PATCH v5] x86/resctrl: Add event choices for mba_MBps Tony Luck
2023-12-04 16:24         ` Moger, Babu
2023-12-04 18:16           ` Tony Luck
2023-12-04 19:04             ` Moger, Babu
2023-12-04 19:45               ` Luck, Tony
2023-12-04 20:03                 ` Reinette Chatre
2023-12-04 21:08                   ` Tony Luck
2023-12-04 22:15                     ` Reinette Chatre
2023-12-04 22:51                       ` Reinette Chatre
2023-12-07 19:56         ` [PATCH v6 0/3] x86/resctrl: mba_MBps enhancements Tony Luck
2023-12-07 19:56           ` [PATCH v6 1/3] x86/resctrl: Add mount option "mba_MBps_event" Tony Luck
2023-12-08 18:17             ` Peter Newman
2023-12-08 21:57               ` Tony Luck
2023-12-08 22:09                 ` Peter Newman
2023-12-08 22:37                   ` Luck, Tony
2023-12-12 17:54                   ` Reinette Chatre
2023-12-12 20:02                     ` Tony Luck
2023-12-12 21:42                       ` Reinette Chatre
2023-12-13  1:07                         ` Luck, Tony
2023-12-08 18:29             ` Moger, Babu
2023-12-08 21:50               ` Tony Luck
2023-12-12 18:59             ` Reinette Chatre
2023-12-07 19:56           ` [PATCH v6 2/3] x86/resctrl: Use total bandwidth for mba_MBps option when local isn't present Tony Luck
2023-12-08 18:26             ` Peter Newman
2023-12-07 19:56           ` [PATCH v6 3/3] x86/resctrl: Add new "mba_MBps_event" mount option to documentation Tony Luck
2023-12-08 19:22             ` Peter Newman
2023-12-12 18:59             ` Reinette Chatre
2024-01-09 22:00         ` [PATCH] x86/resctrl: Implement new MBA_mbps throttling heuristic Tony Luck
2024-01-16 19:55           ` Reinette Chatre
2024-01-17  3:36             ` Xiaochen Shen
2024-01-17  3:40           ` Xiaochen Shen [this message]
2024-01-18  0:26           ` Reinette Chatre
2024-01-18 21:42         ` [PATCH v2] x86/resctrl: Implement new mba_MBps " Tony Luck
2024-01-22 17:34           ` Reinette Chatre
2024-01-22 18:07             ` Luck, Tony
2024-01-22 18:18               ` Reinette Chatre
2024-01-22 18:21                 ` Borislav Petkov
2024-01-22 18:41                   ` Reinette Chatre
2024-01-22 18:47                     ` Borislav Petkov
2024-01-22 20:58                       ` Luck, Tony
2024-01-23 12:12                         ` James Morse
2024-01-23 17:07                           ` Luck, Tony
2024-01-24  0:29                             ` Tony Luck
2024-01-25 17:29                               ` Tony Luck
2024-01-22 18:08           ` [PATCH v3] " Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=063f4e49-94c2-4e2e-9ddf-70f0afc3ab05@intel.com \
    --to=xiaochen.shen@intel.com \
    --cc=babu.moger@amd.com \
    --cc=fenghua.yu@intel.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=quic_jiles@quicinc.com \
    --cc=reinette.chatre@intel.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).