From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Babu Moger <babu.moger@amd.com>,
"Borislav Petkov (AMD)" <bp@alien8.de>,
Reinette Chatre <reinette.chatre@intel.com>
Subject: [PATCH 6.6 80/84] x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID
Date: Mon, 27 Oct 2025 19:37:09 +0100 [thread overview]
Message-ID: <20251027183440.942556856@linuxfoundation.org> (raw)
In-Reply-To: <20251027183438.817309828@linuxfoundation.org>
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Babu Moger <babu.moger@amd.com>
commit 15292f1b4c55a3a7c940dbcb6cb8793871ed3d92 upstream.
Users can create as many monitoring groups as the number of RMIDs supported
by the hardware. However, on AMD systems, only a limited number of RMIDs
are guaranteed to be actively tracked by the hardware. RMIDs that exceed
this limit are placed in an "Unavailable" state.
When a bandwidth counter is read for such an RMID, the hardware sets
MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked
again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable
remains set on first read after tracking re-starts and is clear on all
subsequent reads as long as the RMID is tracked.
resctrl miscounts the bandwidth events after an RMID transitions from the
"Unavailable" state back to being tracked. This happens because when the
hardware starts counting again after resetting the counter to zero, resctrl
in turn compares the new count against the counter value stored from the
previous time the RMID was tracked.
This results in resctrl computing an event value that is either undercounting
(when new counter is more than stored counter) or a mistaken overflow (when
new counter is less than stored counter).
Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to
zero whenever the RMID is in the "Unavailable" state to ensure accurate
counting after the RMID resets to zero when it starts to be tracked again.
Example scenario that results in mistaken overflow
==================================================
1. The resctrl filesystem is mounted, and a task is assigned to a
monitoring group.
$mount -t resctrl resctrl /sys/fs/resctrl
$mkdir /sys/fs/resctrl/mon_groups/test1/
$echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
21323 <- Total bytes on domain 0
"Unavailable" <- Total bytes on domain 1
Task is running on domain 0. Counter on domain 1 is "Unavailable".
2. The task runs on domain 0 for a while and then moves to domain 1. The
counter starts incrementing on domain 1.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
7345357 <- Total bytes on domain 0
4545 <- Total bytes on domain 1
3. At some point, the RMID in domain 0 transitions to the "Unavailable"
state because the task is no longer executing in that domain.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
"Unavailable" <- Total bytes on domain 0
434341 <- Total bytes on domain 1
4. Since the task continues to migrate between domains, it may eventually
return to domain 0.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
17592178699059 <- Overflow on domain 0
3232332 <- Total bytes on domain 1
In this case, the RMID on domain 0 transitions from "Unavailable" state to
active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when
the counter is read and begins tracking the RMID counting from 0.
Subsequent reads succeed but return a value smaller than the previously
saved MSR value (7345357). Consequently, the resctrl's overflow logic is
triggered, it compares the previous value (7345357) with the new, smaller
value and incorrectly interprets this as a counter overflow, adding a large
delta.
In reality, this is a false positive: the counter did not overflow but was
simply reset when the RMID transitioned from "Unavailable" back to active
state.
Here is the text from APM [1] available from [2].
"In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the
first QM_CTR read when it begins tracking an RMID that it was not
previously tracking. The U bit will be zero for all subsequent reads from
that RMID while it is still tracked by the hardware. Therefore, a QM_CTR
read with the U bit set when that RMID is in use by a processor can be
considered 0 when calculating the difference with a subsequent read."
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory
Bandwidth (MBM).
[ bp: Split commit message into smaller paragraph chunks for better
consumption. ]
Fixes: 4d05bf71f157d ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org # needs adjustments for <= v6.17
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
[babu.moger@amd.com: Fix conflict for v6.6 stable]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/kernel/cpu/resctrl/monitor.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -241,11 +241,15 @@ int resctrl_arch_rmid_read(struct rdt_re
if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
return -EINVAL;
+ am = get_arch_mbm_state(hw_dom, rmid, eventid);
+
ret = __rmid_read(rmid, eventid, &msr_val);
- if (ret)
+ if (ret) {
+ if (am && ret == -EINVAL)
+ am->prev_msr = 0;
return ret;
+ }
- am = get_arch_mbm_state(hw_dom, rmid, eventid);
if (am) {
am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
hw_res->mbm_width);
next prev parent reply other threads:[~2025-10-27 19:19 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-27 18:35 [PATCH 6.6 00/84] 6.6.115-rc1 review Greg Kroah-Hartman
2025-10-27 18:35 ` [PATCH 6.6 01/84] exec: Fix incorrect type for ret Greg Kroah-Hartman
2025-10-27 18:35 ` [PATCH 6.6 02/84] nios2: ensure that memblock.current_limit is set when setting pfn limits Greg Kroah-Hartman
2025-10-27 18:35 ` [PATCH 6.6 03/84] hfs: clear offset and space out of valid records in b-tree node Greg Kroah-Hartman
2025-10-27 18:35 ` [PATCH 6.6 04/84] hfs: make proper initalization of struct hfs_find_data Greg Kroah-Hartman
2025-10-27 18:35 ` [PATCH 6.6 05/84] hfsplus: fix KMSAN uninit-value issue in __hfsplus_ext_cache_extent() Greg Kroah-Hartman
2025-10-27 18:35 ` [PATCH 6.6 06/84] hfs: validate record offset in hfsplus_bmap_alloc Greg Kroah-Hartman
2025-10-27 18:35 ` [PATCH 6.6 07/84] hfsplus: fix KMSAN uninit-value issue in hfsplus_delete_cat() Greg Kroah-Hartman
2025-10-27 18:35 ` [PATCH 6.6 08/84] dlm: check for defined force value in dlm_lockspace_release Greg Kroah-Hartman
2025-10-27 18:35 ` [PATCH 6.6 09/84] hfs: fix KMSAN uninit-value issue in hfs_find_set_zero_bits() Greg Kroah-Hartman
2025-10-27 18:35 ` [PATCH 6.6 10/84] hfsplus: return EIO when type of hidden directory mismatch in hfsplus_fill_super() Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 11/84] lkdtm: fortify: Fix potential NULL dereference on kmalloc failure Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 12/84] m68k: bitops: Fix find_*_bit() signatures Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 13/84] powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 14/84] drivers/perf: hisi: Relax the event ID check in the framework Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 15/84] smb: server: let smb_direct_flush_send_list() invalidate a remote key first Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 16/84] Unbreak make tools/* for user-space targets Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 17/84] net/mlx5e: Return 1 instead of 0 in invalid case in mlx5e_mpwrq_umr_entry_size() Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 18/84] rtnetlink: Allow deleting FDB entries in user namespace Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 19/84] net: Tree wide: Replace xdp_do_flush_map() with xdp_do_flush() Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 20/84] net: enetc: fix the deadlock of enetc_mdio_lock Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 21/84] net: enetc: correct the value of ENETC_RXB_TRUESIZE Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 22/84] dpaa2-eth: fix the pointer passed to PTR_ALIGN on Tx path Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 23/84] can: bxcan: bxcan_start_xmit(): use can_dev_dropped_skb() instead of can_dropped_invalid_skb() Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 24/84] selftests/net: convert sctp_vrf.sh to run it in unique namespace Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 25/84] selftests: net: fix server bind failure in sctp_vrf.sh Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 26/84] net/mlx5e: Reuse per-RQ XDP buffer to avoid stack zeroing overhead Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 27/84] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 28/84] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 29/84] arm64, mm: avoid always making PTE dirty in pte_mkwrite() Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 30/84] sctp: avoid NULL dereference when chunk data buffer is missing Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 31/84] net: bonding: fix possible peer notify event loss or dup issue Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 32/84] dma-debug: dont report false positives with DMA_BOUNCE_UNALIGNED_KMALLOC Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 33/84] arch_topology: Fix incorrect error check in topology_parse_cpu_capacity() Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 34/84] gpio: pci-idio-16: Define maximum valid register address offset Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 35/84] gpio: 104-idio-16: " Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 36/84] Revert "cpuidle: menu: Avoid discarding useful information" Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 37/84] ACPICA: Work around bogus -Wstringop-overread warning since GCC 11 Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 38/84] can: netlink: can_changelink(): allow disabling of automatic restart Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 39/84] cifs: Fix TCP_Server_Info::credits to be signed Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 40/84] MIPS: Malta: Fix keyboard resource preventing i8042 driver from registering Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 41/84] ocfs2: clear extent cache after moving/defragmenting extents Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 42/84] vsock: fix lock inversion in vsock_assign_transport() Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 43/84] net: stmmac: dwmac-rk: Fix disabling set_clock_selection Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 44/84] net: usb: rtl8150: Fix frame padding Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 45/84] net: ravb: Enforce descriptor type ordering Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 46/84] net: ravb: Ensure memory write completes before ringing TX doorbell Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 47/84] selftests: mptcp: join: mark flush re-add as skipped if not supported Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 48/84] selftests: mptcp: join: mark implicit tests " Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 49/84] spi: spi-nxp-fspi: add extra delay after dll locked Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 50/84] firmware: arm_scmi: Account for failed debug initialization Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 51/84] firmware: arm_scmi: Fix premature SCMI_XFER_FLAG_IS_RAW clearing in raw mode Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 52/84] RISC-V: Define pgprot_dmacoherent() for non-coherent devices Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 53/84] RISC-V: Dont print details of CPUs disabled in DT Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 54/84] hwmon: (sht3x) Fix error handling Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 55/84] gpio: update Intel LJCA USB GPIO driver Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 56/84] gpio: ljca: Fix duplicated IRQ mapping Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 57/84] io_uring: correct __must_hold annotation in io_install_fixed_file Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 58/84] sched: Remove never used code in mm_cid_get() Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 59/84] USB: serial: option: add UNISOC UIS7720 Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 60/84] USB: serial: option: add Quectel RG255C Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 61/84] USB: serial: option: add Telit FN920C04 ECM compositions Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 62/84] usb/core/quirks: Add Huawei ME906S to wakeup quirk Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 63/84] usb: raw-gadget: do not limit transfer length Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 64/84] xhci: dbc: enable back DbC in resume if it was enabled before suspend Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 65/84] x86/microcode: Fix Entrysign revision check for Zen1/Naples Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 66/84] binder: remove "invalid inc weak" check Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 67/84] comedi: fix divide-by-zero in comedi_buf_munge() Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 68/84] mei: me: add wildcat lake P DID Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 69/84] misc: fastrpc: Fix dma_buf object leak in fastrpc_map_lookup Greg Kroah-Hartman
2025-10-27 18:36 ` [PATCH 6.6 70/84] most: usb: Fix use-after-free in hdm_disconnect Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 71/84] most: usb: hdm_probe: Fix calling put_device() before device initialization Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 72/84] tcpm: switch check for role_sw device with fw_node Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 73/84] dt-bindings: usb: dwc3-imx8mp: dma-range is required only for imx8mp Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 74/84] serial: 8250_dw: handle reset control deassert error Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 75/84] serial: 8250_exar: add support for Advantech 2 port card with Device ID 0x0018 Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 76/84] serial: 8250_mtk: Enable baud clock and manage in runtime PM Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 77/84] devcoredump: Fix circular locking dependency with devcd->mutex Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 78/84] xfs: always warn about deprecated mount options Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 79/84] fs/notify: call exportfs_encode_fid with s_umount Greg Kroah-Hartman
2025-10-27 18:37 ` Greg Kroah-Hartman [this message]
2025-10-28 16:00 ` [PATCH 6.6 80/84] x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID Babu Moger
2025-10-27 18:37 ` [PATCH 6.6 81/84] s390/cio: Update purge function to unregister the unused subchannels Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 82/84] fuse: allocate ff->release_args only if release is needed Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 83/84] fuse: fix livelock in synchronous file put from fuseblk workers Greg Kroah-Hartman
2025-10-27 18:37 ` [PATCH 6.6 84/84] gpio: ljca: Initialize num before accessing item in ljca_gpio_config Greg Kroah-Hartman
2025-10-27 21:50 ` [PATCH 6.6 00/84] 6.6.115-rc1 review Florian Fainelli
2025-10-28 3:45 ` Peter Schneider
2025-10-28 11:28 ` Jon Hunter
2025-10-28 11:51 ` Ron Economos
2025-10-28 13:42 ` Naresh Kamboju
2025-10-28 13:53 ` Brett A C Sheffield
2025-10-28 19:25 ` Shuah Khan
2025-10-28 22:11 ` Slade Watkins
2025-10-29 11:21 ` Miguel Ojeda
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251027183440.942556856@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=babu.moger@amd.com \
--cc=bp@alien8.de \
--cc=patches@lists.linux.dev \
--cc=reinette.chatre@intel.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).