From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev,
Dave Hansen <dave.hansen@linux.intel.com>,
Rik van Riel <riel@surriel.com>,
Stephen Dolan <sdolan@janestreet.com>,
Ingo Molnar <mingo@kernel.org>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.15 23/54] x86/mm: Eliminate window where TLB flushes may be inadvertently skipped
Date: Mon, 12 May 2025 19:29:35 +0200 [thread overview]
Message-ID: <20250512172016.578668469@linuxfoundation.org> (raw)
In-Reply-To: <20250512172015.643809034@linuxfoundation.org>
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Hansen <dave.hansen@linux.intel.com>
commit fea4e317f9e7e1f449ce90dedc27a2d2a95bee5a upstream.
tl;dr: There is a window in the mm switching code where the new CR3 is
set and the CPU should be getting TLB flushes for the new mm. But
should_flush_tlb() has a bug and suppresses the flush. Fix it by
widening the window where should_flush_tlb() sends an IPI.
Long Version:
=== History ===
There were a few things leading up to this.
First, updating mm_cpumask() was observed to be too expensive, so it was
made lazier. But being lazy caused too many unnecessary IPIs to CPUs
due to the now-lazy mm_cpumask(). So code was added to cull
mm_cpumask() periodically[2]. But that culling was a bit too aggressive
and skipped sending TLB flushes to CPUs that need them. So here we are
again.
=== Problem ===
The too-aggressive code in should_flush_tlb() strikes in this window:
// Turn on IPIs for this CPU/mm combination, but only
// if should_flush_tlb() agrees:
cpumask_set_cpu(cpu, mm_cpumask(next));
next_tlb_gen = atomic64_read(&next->context.tlb_gen);
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
load_new_mm_cr3(need_flush);
// ^ After 'need_flush' is set to false, IPIs *MUST*
// be sent to this CPU and not be ignored.
this_cpu_write(cpu_tlbstate.loaded_mm, next);
// ^ Not until this point does should_flush_tlb()
// become true!
should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()
and writing to 'loaded_mm', which is a window where they should not be
suppressed. Whoops.
=== Solution ===
Thankfully, the fuzzy "just about to write CR3" window is already marked
with loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state in
should_flush_tlb() is sufficient to ensure that the CPU is targeted with
an IPI.
This will cause more TLB flush IPIs. But the window is relatively small
and I do not expect this to cause any kind of measurable performance
impact.
Update the comment where LOADED_MM_SWITCHING is written since it grew
yet another user.
Peter Z also raised a concern that should_flush_tlb() might not observe
'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off()
writes them. Add a barrier to ensure that they are observed in the
order they are written.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/ [1]
Fixes: 6db2526c1d69 ("x86/mm/tlb: Only trim the mm_cpumask once a second") [2]
Reported-by: Stephen Dolan <sdolan@janestreet.com>
Cc: stable@vger.kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/mm/tlb.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -616,7 +616,11 @@ void switch_mm_irqs_off(struct mm_struct
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
- /* Let nmi_uaccess_okay() know that we're changing CR3. */
+ /*
+ * Indicate that CR3 is about to change. nmi_uaccess_okay()
+ * and others are sensitive to the window where mm_cpumask(),
+ * CR3 and cpu_tlbstate.loaded_mm are not all in sync.
+ */
this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING);
barrier();
}
@@ -856,8 +860,16 @@ done:
static bool should_flush_tlb(int cpu, void *data)
{
+ struct mm_struct *loaded_mm = per_cpu(cpu_tlbstate.loaded_mm, cpu);
struct flush_tlb_info *info = data;
+ /*
+ * Order the 'loaded_mm' and 'is_lazy' against their
+ * write ordering in switch_mm_irqs_off(). Ensure
+ * 'is_lazy' is at least as new as 'loaded_mm'.
+ */
+ smp_rmb();
+
/* Lazy TLB will get flushed at the next context switch. */
if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
return false;
@@ -866,8 +878,15 @@ static bool should_flush_tlb(int cpu, vo
if (!info->mm)
return true;
+ /*
+ * While switching, the remote CPU could have state from
+ * either the prev or next mm. Assume the worst and flush.
+ */
+ if (loaded_mm == LOADED_MM_SWITCHING)
+ return true;
+
/* The target mm is loaded, and the CPU is not lazy. */
- if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
+ if (loaded_mm == info->mm)
return true;
/* In cpumask, but not the loaded mm? Periodically remove by flushing. */
next prev parent reply other threads:[~2025-05-12 17:32 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-12 17:29 [PATCH 5.15 00/54] 5.15.183-rc1 review Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 01/54] can: mcan: m_can_class_unregister(): fix order of unregistration calls Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 02/54] can: mcp251xfd: mcp251xfd_remove(): " Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 03/54] openvswitch: Fix unsafe attribute parsing in output_userspace() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 04/54] gre: Fix again IPv6 link-local address generation Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 05/54] can: gw: use call_rcu() instead of costly synchronize_rcu() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 06/54] rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 07/54] can: gw: fix RCU/BH usage in cgw_create_job() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 08/54] netfilter: ipset: fix region locking in hash types Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 09/54] net: dsa: b53: allow leaky reserved multicast Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 10/54] net: dsa: b53: fix clearing PVID of a port Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 11/54] net: dsa: b53: fix flushing old pvid VLAN on pvid change Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 12/54] net: dsa: b53: fix VLAN ID for untagged vlan on bridge leave Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 13/54] net: dsa: b53: always rejoin default untagged VLAN " Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 14/54] net: dsa: b53: fix learning on VLAN unaware bridges Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 15/54] Input: synaptics - enable InterTouch on Dynabook Portege X30-D Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 16/54] Input: synaptics - enable InterTouch on Dynabook Portege X30L-G Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 17/54] Input: synaptics - enable InterTouch on Dell Precision M3800 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 18/54] Input: synaptics - enable SMBus for HP Elitebook 850 G1 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 19/54] Input: synaptics - enable InterTouch on TUXEDO InfinityBook Pro 14 v5 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 20/54] staging: iio: adc: ad7816: Correct conditional logic for store mode Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 21/54] staging: axis-fifo: Remove hardware resets for user errors Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 22/54] staging: axis-fifo: Correct handling of tx_fifo_depth for size validation Greg Kroah-Hartman
2025-05-12 17:29 ` Greg Kroah-Hartman [this message]
2025-05-12 17:29 ` [PATCH 5.15 24/54] iio: adc: ad7606: fix serial register access Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 25/54] iio: adis16201: Correct inclinometer channel resolution Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 26/54] iio: imu: st_lsm6dsx: fix possible lockup in st_lsm6dsx_read_fifo Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 27/54] iio: imu: st_lsm6dsx: fix possible lockup in st_lsm6dsx_read_tagged_fifo Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 28/54] drm/amd/display: Fix wrong handling for AUX_DEFER case Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 29/54] usb: uhci-platform: Make the clock really optional Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 30/54] xenbus: Use kref to track req lifetime Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 31/54] module: ensure that kobject_put() is safe for module type kobjects Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 32/54] ocfs2: switch osb->disable_recovery to enum Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 33/54] ocfs2: implement handshaking with ocfs2 recovery thread Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 34/54] ocfs2: stop quota recovery before disabling quotas Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 35/54] usb: cdnsp: Fix issue with resuming from L1 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 36/54] usb: cdnsp: fix L1 resume issue for RTL_REVISION_NEW_LPM version Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 37/54] usb: gadget: tegra-xudc: ACK ST_RC after clearing CTRL_RUN Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 38/54] usb: host: tegra: Prevent host controller crash when OTG port is used Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 39/54] usb: typec: tcpm: delay SNK_TRY_WAIT_DEBOUNCE to SRC_TRYWAIT transition Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 40/54] usb: typec: ucsi: displayport: Fix NULL pointer access Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 41/54] USB: usbtmc: use interruptible sleep in usbtmc_read Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 42/54] usb: usbtmc: Fix erroneous get_stb ioctl error returns Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 43/54] usb: usbtmc: Fix erroneous wait_srq ioctl return Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 44/54] usb: usbtmc: Fix erroneous generic_read " Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 45/54] types: Complement the aligned types with signed 64-bit one Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 46/54] iio: adc: dln2: Use aligned_s64 for timestamp Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 47/54] MIPS: Fix MAX_REG_OFFSET Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 48/54] drm/panel: simple: Update timings for AUO G101EVN010 Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 49/54] nvme: unblock ctrl state transition for firmware update Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 50/54] do_umount(): add missing barrier before refcount checks in sync case Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 51/54] Revert "net: phy: microchip: force IRQ polling mode for lan88xx" Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 52/54] x86/bpf: Call branch history clearing sequence on exit Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 53/54] x86/bpf: Add IBHF call at end of classic BPF Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 54/54] x86/bhi: Do not set BHI_DIS_S in 32-bit mode Greg Kroah-Hartman
2025-05-12 20:56 ` [PATCH 5.15 00/54] 5.15.183-rc1 review Jon Hunter
2025-05-13 9:29 ` Florian Fainelli
2025-05-13 9:48 ` Mark Brown
2025-05-13 10:14 ` Ron Economos
2025-05-13 16:17 ` Naresh Kamboju
2025-05-13 17:33 ` Shuah Khan
2025-05-14 5:23 ` Vijayendra Suman
2025-05-14 17:14 ` Hardik Garg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250512172016.578668469@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=mingo@kernel.org \
--cc=patches@lists.linux.dev \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=sdolan@janestreet.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox