From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Xiaochen Shen <xiaochen.shen@intel.com>,
Borislav Petkov <bp@suse.de>, Tony Luck <tony.luck@intel.com>,
Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Subject: [PATCH 5.4 34/34] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
Date: Sat, 19 Dec 2020 14:03:31 +0100 [thread overview]
Message-ID: <20201219125343.067191922@linuxfoundation.org> (raw)
In-Reply-To: <20201219125341.384025953@linuxfoundation.org>
From: Xiaochen Shen <xiaochen.shen@intel.com>
commit 06c5fe9b12dde1b62821f302f177c972bb1c81f9 upstream
The MBA software controller (mba_sc) is a feedback loop which
periodically reads MBM counters and tries to restrict the bandwidth
below a user-specified value. It tags along the MBM counter overflow
handler to do the updates with 1s interval in mbm_update() and
update_mba_bw().
The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:
rdtgroup_mondata_show()
mon_event_read()
mon_event_count()
__mon_event_count()
In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.
When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is shown to
the user.
Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.
Test steps:
# Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
&& make
./tools/membw/membw -c 0 -b 10000 --read
# Enable MBA software controller
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
# Create control group c1
mkdir /sys/fs/resctrl/c1
# Set MB throttle to 6 GB/s
echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata
# Write PID of the workload to tasks file
echo `pidof membw` > /sys/fs/resctrl/c1/tasks
# Read local bytes counters twice with 1s interval, the calculated
# local bandwidth is not as expected (approaching to 6 GB/s):
local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
sleep 1
local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
echo "local b/w (bytes/s):" `expr $local_2 - $local_1`
Before fix:
local b/w (bytes/s): 11076796416
After fix:
local b/w (bytes/s): 5465014272
Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -280,7 +280,6 @@ static void mbm_bw_count(u32 rmid, struc
return;
chunks = mbm_overflow_count(m->prev_bw_msr, tval);
- m->chunks += chunks;
cur_bw = (chunks * r->mon_scale) >> 20;
if (m->delta_comp)
@@ -450,15 +449,14 @@ static void mbm_update(struct rdt_domain
}
if (is_mbm_local_enabled()) {
rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+ __mon_event_count(rmid, &rr);
/*
* Call the MBA software controller only for the
* control groups and when user has enabled
* the software controller explicitly.
*/
- if (!is_mba_sc(NULL))
- __mon_event_count(rmid, &rr);
- else
+ if (is_mba_sc(NULL))
mbm_bw_count(rmid, &rr);
}
}
next prev parent reply other threads:[~2020-12-19 13:05 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-19 13:02 [PATCH 5.4 00/34] 5.4.85-rc1 review Greg Kroah-Hartman
2020-12-19 13:02 ` [PATCH 5.4 01/34] ptrace: Prevent kernel-infoleak in ptrace_get_syscall_info() Greg Kroah-Hartman
2020-12-19 13:02 ` [PATCH 5.4 02/34] ipv4: fix error return code in rtm_to_fib_config() Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 03/34] mac80211: mesh: fix mesh_pathtbl_init() error path Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 04/34] net: bridge: vlan: fix error return code in __vlan_add() Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 05/34] vrf: packets with lladdr src needs dst at input with orig_iif when needs strict Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 06/34] net: hns3: remove a misused pragma packed Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 07/34] udp: fix the proto value passed to ip_protocol_deliver_rcu for the segments Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 08/34] enetc: Fix reporting of h/w packet counters Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 09/34] bridge: Fix a deadlock when enabling multicast snooping Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 10/34] net: stmmac: free tx skb buffer in stmmac_resume() Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 11/34] tcp: select sane initial rcvq_space.space for big MSS Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 12/34] tcp: fix cwnd-limited bug for TSO deferral where we send nothing Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 13/34] net/mlx4_en: Avoid scheduling restart task if it is already running Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 14/34] lan743x: fix for potential NULL pointer dereference with bare card Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 15/34] net/mlx4_en: Handle TX error CQE Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 16/34] net: ll_temac: Fix potential NULL dereference in temac_probe() Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 17/34] net: stmmac: dwmac-meson8b: fix mask definition of the m250_sel mux Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 18/34] net: stmmac: delete the eee_ctrl_timer after napi disabled Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 19/34] ktest.pl: If size of log is too big to email, email error message Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 20/34] USB: dummy-hcd: Fix uninitialized array use in init() Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 21/34] USB: add RESET_RESUME quirk for Snapscan 1212 Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 22/34] ALSA: usb-audio: Fix potential out-of-bounds shift Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 23/34] ALSA: usb-audio: Fix control access overflow errors from chmap Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 24/34] xhci: Give USB2 ports time to enter U3 in bus suspend Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 25/34] xhci-pci: Allow host runtime PM as default for Intel Alpine Ridge LP Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 26/34] USB: UAS: introduce a quirk to set no_write_same Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 27/34] USB: sisusbvga: Make console support depend on BROKEN Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 28/34] ALSA: pcm: oss: Fix potential out-of-bounds shift Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 29/34] serial: 8250_omap: Avoid FIFO corruption caused by MDR1 access Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 30/34] KVM: mmu: Fix SPTE encoding of MMIO generation upper half Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 31/34] Revert "selftests/ftrace: check for do_sys_openat2 in user-memory test" Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 32/34] membarrier: Explicitly sync remote cores when SYNC_CORE is requested Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 33/34] x86/resctrl: Remove unused struct mbm_state::chunks_bw Greg Kroah-Hartman
2020-12-19 13:03 ` Greg Kroah-Hartman [this message]
2020-12-19 21:49 ` [PATCH 5.4 00/34] 5.4.85-rc1 review Guenter Roeck
2020-12-20 3:58 ` Naresh Kamboju
2020-12-20 13:18 ` Jon Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201219125343.067191922@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=bp@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=sudipm.mukherjee@gmail.com \
--cc=tony.luck@intel.com \
--cc=xiaochen.shen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.