All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Rik van Riel <riel@surriel.com>,
	Stephen Dolan <sdolan@janestreet.com>,
	Ingo Molnar <mingo@kernel.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.15 23/54] x86/mm: Eliminate window where TLB flushes may be inadvertently skipped
Date: Mon, 12 May 2025 19:29:35 +0200	[thread overview]
Message-ID: <20250512172016.578668469@linuxfoundation.org> (raw)
In-Reply-To: <20250512172015.643809034@linuxfoundation.org>

5.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Hansen <dave.hansen@linux.intel.com>

commit fea4e317f9e7e1f449ce90dedc27a2d2a95bee5a upstream.

tl;dr: There is a window in the mm switching code where the new CR3 is
set and the CPU should be getting TLB flushes for the new mm.  But
should_flush_tlb() has a bug and suppresses the flush.  Fix it by
widening the window where should_flush_tlb() sends an IPI.

Long Version:

=== History ===

There were a few things leading up to this.

First, updating mm_cpumask() was observed to be too expensive, so it was
made lazier.  But being lazy caused too many unnecessary IPIs to CPUs
due to the now-lazy mm_cpumask().  So code was added to cull
mm_cpumask() periodically[2].  But that culling was a bit too aggressive
and skipped sending TLB flushes to CPUs that need them.  So here we are
again.

=== Problem ===

The too-aggressive code in should_flush_tlb() strikes in this window:

	// Turn on IPIs for this CPU/mm combination, but only
	// if should_flush_tlb() agrees:
	cpumask_set_cpu(cpu, mm_cpumask(next));

	next_tlb_gen = atomic64_read(&next->context.tlb_gen);
	choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
	load_new_mm_cr3(need_flush);
	// ^ After 'need_flush' is set to false, IPIs *MUST*
	// be sent to this CPU and not be ignored.

        this_cpu_write(cpu_tlbstate.loaded_mm, next);
	// ^ Not until this point does should_flush_tlb()
	// become true!

should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()
and writing to 'loaded_mm', which is a window where they should not be
suppressed.  Whoops.

=== Solution ===

Thankfully, the fuzzy "just about to write CR3" window is already marked
with loaded_mm==LOADED_MM_SWITCHING.  Simply checking for that state in
should_flush_tlb() is sufficient to ensure that the CPU is targeted with
an IPI.

This will cause more TLB flush IPIs.  But the window is relatively small
and I do not expect this to cause any kind of measurable performance
impact.

Update the comment where LOADED_MM_SWITCHING is written since it grew
yet another user.

Peter Z also raised a concern that should_flush_tlb() might not observe
'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off()
writes them.  Add a barrier to ensure that they are observed in the
order they are written.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/ [1]
Fixes: 6db2526c1d69 ("x86/mm/tlb: Only trim the mm_cpumask once a second") [2]
Reported-by: Stephen Dolan <sdolan@janestreet.com>
Cc: stable@vger.kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/tlb.c |   23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -616,7 +616,11 @@ void switch_mm_irqs_off(struct mm_struct
 
 		choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
 
-		/* Let nmi_uaccess_okay() know that we're changing CR3. */
+		/*
+		 * Indicate that CR3 is about to change. nmi_uaccess_okay()
+		 * and others are sensitive to the window where mm_cpumask(),
+		 * CR3 and cpu_tlbstate.loaded_mm are not all in sync.
+ 		 */
 		this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING);
 		barrier();
 	}
@@ -856,8 +860,16 @@ done:
 
 static bool should_flush_tlb(int cpu, void *data)
 {
+	struct mm_struct *loaded_mm = per_cpu(cpu_tlbstate.loaded_mm, cpu);
 	struct flush_tlb_info *info = data;
 
+	/*
+	 * Order the 'loaded_mm' and 'is_lazy' against their
+	 * write ordering in switch_mm_irqs_off(). Ensure
+	 * 'is_lazy' is at least as new as 'loaded_mm'.
+	 */
+	smp_rmb();
+
 	/* Lazy TLB will get flushed at the next context switch. */
 	if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
 		return false;
@@ -866,8 +878,15 @@ static bool should_flush_tlb(int cpu, vo
 	if (!info->mm)
 		return true;
 
+	/*
+	 * While switching, the remote CPU could have state from
+	 * either the prev or next mm. Assume the worst and flush.
+	 */
+	if (loaded_mm == LOADED_MM_SWITCHING)
+		return true;
+
 	/* The target mm is loaded, and the CPU is not lazy. */
-	if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
+	if (loaded_mm == info->mm)
 		return true;
 
 	/* In cpumask, but not the loaded mm? Periodically remove by flushing. */



  parent reply	other threads:[~2025-05-12 17:32 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12 17:29 [PATCH 5.15 00/54] 5.15.183-rc1 review Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 01/54] can: mcan: m_can_class_unregister(): fix order of unregistration calls Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 02/54] can: mcp251xfd: mcp251xfd_remove(): " Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 03/54] openvswitch: Fix unsafe attribute parsing in output_userspace() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 04/54] gre: Fix again IPv6 link-local address generation Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 05/54] can: gw: use call_rcu() instead of costly synchronize_rcu() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 06/54] rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 07/54] can: gw: fix RCU/BH usage in cgw_create_job() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 08/54] netfilter: ipset: fix region locking in hash types Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 09/54] net: dsa: b53: allow leaky reserved multicast Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 10/54] net: dsa: b53: fix clearing PVID of a port Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 11/54] net: dsa: b53: fix flushing old pvid VLAN on pvid change Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 12/54] net: dsa: b53: fix VLAN ID for untagged vlan on bridge leave Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 13/54] net: dsa: b53: always rejoin default untagged VLAN " Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 14/54] net: dsa: b53: fix learning on VLAN unaware bridges Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 15/54] Input: synaptics - enable InterTouch on Dynabook Portege X30-D Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 16/54] Input: synaptics - enable InterTouch on Dynabook Portege X30L-G Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 17/54] Input: synaptics - enable InterTouch on Dell Precision M3800 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 18/54] Input: synaptics - enable SMBus for HP Elitebook 850 G1 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 19/54] Input: synaptics - enable InterTouch on TUXEDO InfinityBook Pro 14 v5 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 20/54] staging: iio: adc: ad7816: Correct conditional logic for store mode Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 21/54] staging: axis-fifo: Remove hardware resets for user errors Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 22/54] staging: axis-fifo: Correct handling of tx_fifo_depth for size validation Greg Kroah-Hartman
2025-05-12 17:29 ` Greg Kroah-Hartman [this message]
2025-05-12 17:29 ` [PATCH 5.15 24/54] iio: adc: ad7606: fix serial register access Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 25/54] iio: adis16201: Correct inclinometer channel resolution Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 26/54] iio: imu: st_lsm6dsx: fix possible lockup in st_lsm6dsx_read_fifo Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 27/54] iio: imu: st_lsm6dsx: fix possible lockup in st_lsm6dsx_read_tagged_fifo Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 28/54] drm/amd/display: Fix wrong handling for AUX_DEFER case Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 29/54] usb: uhci-platform: Make the clock really optional Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 30/54] xenbus: Use kref to track req lifetime Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 31/54] module: ensure that kobject_put() is safe for module type kobjects Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 32/54] ocfs2: switch osb->disable_recovery to enum Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 33/54] ocfs2: implement handshaking with ocfs2 recovery thread Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 34/54] ocfs2: stop quota recovery before disabling quotas Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 35/54] usb: cdnsp: Fix issue with resuming from L1 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 36/54] usb: cdnsp: fix L1 resume issue for RTL_REVISION_NEW_LPM version Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 37/54] usb: gadget: tegra-xudc: ACK ST_RC after clearing CTRL_RUN Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 38/54] usb: host: tegra: Prevent host controller crash when OTG port is used Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 39/54] usb: typec: tcpm: delay SNK_TRY_WAIT_DEBOUNCE to SRC_TRYWAIT transition Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 40/54] usb: typec: ucsi: displayport: Fix NULL pointer access Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 41/54] USB: usbtmc: use interruptible sleep in usbtmc_read Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 42/54] usb: usbtmc: Fix erroneous get_stb ioctl error returns Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 43/54] usb: usbtmc: Fix erroneous wait_srq ioctl return Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 44/54] usb: usbtmc: Fix erroneous generic_read " Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 45/54] types: Complement the aligned types with signed 64-bit one Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 46/54] iio: adc: dln2: Use aligned_s64 for timestamp Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 47/54] MIPS: Fix MAX_REG_OFFSET Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 48/54] drm/panel: simple: Update timings for AUO G101EVN010 Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 49/54] nvme: unblock ctrl state transition for firmware update Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 50/54] do_umount(): add missing barrier before refcount checks in sync case Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 51/54] Revert "net: phy: microchip: force IRQ polling mode for lan88xx" Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 52/54] x86/bpf: Call branch history clearing sequence on exit Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 53/54] x86/bpf: Add IBHF call at end of classic BPF Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 54/54] x86/bhi: Do not set BHI_DIS_S in 32-bit mode Greg Kroah-Hartman
2025-05-12 20:56 ` [PATCH 5.15 00/54] 5.15.183-rc1 review Jon Hunter
2025-05-13  9:29 ` Florian Fainelli
2025-05-13  9:48 ` Mark Brown
2025-05-13 10:14 ` Ron Economos
2025-05-13 16:17 ` Naresh Kamboju
2025-05-13 17:33 ` Shuah Khan
2025-05-14  5:23 ` Vijayendra Suman
2025-05-14 17:14 ` Hardik Garg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250512172016.578668469@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=mingo@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=sdolan@janestreet.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.