public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Rik van Riel <riel@surriel.com>,
	Stephen Dolan <sdolan@janestreet.com>,
	Ingo Molnar <mingo@kernel.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.15 23/54] x86/mm: Eliminate window where TLB flushes may be inadvertently skipped
Date: Mon, 12 May 2025 19:29:35 +0200	[thread overview]
Message-ID: <20250512172016.578668469@linuxfoundation.org> (raw)
In-Reply-To: <20250512172015.643809034@linuxfoundation.org>

5.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Hansen <dave.hansen@linux.intel.com>

commit fea4e317f9e7e1f449ce90dedc27a2d2a95bee5a upstream.

tl;dr: There is a window in the mm switching code where the new CR3 is
set and the CPU should be getting TLB flushes for the new mm.  But
should_flush_tlb() has a bug and suppresses the flush.  Fix it by
widening the window where should_flush_tlb() sends an IPI.

Long Version:

=== History ===

There were a few things leading up to this.

First, updating mm_cpumask() was observed to be too expensive, so it was
made lazier.  But being lazy caused too many unnecessary IPIs to CPUs
due to the now-lazy mm_cpumask().  So code was added to cull
mm_cpumask() periodically[2].  But that culling was a bit too aggressive
and skipped sending TLB flushes to CPUs that need them.  So here we are
again.

=== Problem ===

The too-aggressive code in should_flush_tlb() strikes in this window:

	// Turn on IPIs for this CPU/mm combination, but only
	// if should_flush_tlb() agrees:
	cpumask_set_cpu(cpu, mm_cpumask(next));

	next_tlb_gen = atomic64_read(&next->context.tlb_gen);
	choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
	load_new_mm_cr3(need_flush);
	// ^ After 'need_flush' is set to false, IPIs *MUST*
	// be sent to this CPU and not be ignored.

        this_cpu_write(cpu_tlbstate.loaded_mm, next);
	// ^ Not until this point does should_flush_tlb()
	// become true!

should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()
and writing to 'loaded_mm', which is a window where they should not be
suppressed.  Whoops.

=== Solution ===

Thankfully, the fuzzy "just about to write CR3" window is already marked
with loaded_mm==LOADED_MM_SWITCHING.  Simply checking for that state in
should_flush_tlb() is sufficient to ensure that the CPU is targeted with
an IPI.

This will cause more TLB flush IPIs.  But the window is relatively small
and I do not expect this to cause any kind of measurable performance
impact.

Update the comment where LOADED_MM_SWITCHING is written since it grew
yet another user.

Peter Z also raised a concern that should_flush_tlb() might not observe
'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off()
writes them.  Add a barrier to ensure that they are observed in the
order they are written.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/ [1]
Fixes: 6db2526c1d69 ("x86/mm/tlb: Only trim the mm_cpumask once a second") [2]
Reported-by: Stephen Dolan <sdolan@janestreet.com>
Cc: stable@vger.kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/tlb.c |   23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -616,7 +616,11 @@ void switch_mm_irqs_off(struct mm_struct
 
 		choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
 
-		/* Let nmi_uaccess_okay() know that we're changing CR3. */
+		/*
+		 * Indicate that CR3 is about to change. nmi_uaccess_okay()
+		 * and others are sensitive to the window where mm_cpumask(),
+		 * CR3 and cpu_tlbstate.loaded_mm are not all in sync.
+ 		 */
 		this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING);
 		barrier();
 	}
@@ -856,8 +860,16 @@ done:
 
 static bool should_flush_tlb(int cpu, void *data)
 {
+	struct mm_struct *loaded_mm = per_cpu(cpu_tlbstate.loaded_mm, cpu);
 	struct flush_tlb_info *info = data;
 
+	/*
+	 * Order the 'loaded_mm' and 'is_lazy' against their
+	 * write ordering in switch_mm_irqs_off(). Ensure
+	 * 'is_lazy' is at least as new as 'loaded_mm'.
+	 */
+	smp_rmb();
+
 	/* Lazy TLB will get flushed at the next context switch. */
 	if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
 		return false;
@@ -866,8 +878,15 @@ static bool should_flush_tlb(int cpu, vo
 	if (!info->mm)
 		return true;
 
+	/*
+	 * While switching, the remote CPU could have state from
+	 * either the prev or next mm. Assume the worst and flush.
+	 */
+	if (loaded_mm == LOADED_MM_SWITCHING)
+		return true;
+
 	/* The target mm is loaded, and the CPU is not lazy. */
-	if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
+	if (loaded_mm == info->mm)
 		return true;
 
 	/* In cpumask, but not the loaded mm? Periodically remove by flushing. */



  parent reply	other threads:[~2025-05-12 17:32 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12 17:29 [PATCH 5.15 00/54] 5.15.183-rc1 review Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 01/54] can: mcan: m_can_class_unregister(): fix order of unregistration calls Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 02/54] can: mcp251xfd: mcp251xfd_remove(): " Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 03/54] openvswitch: Fix unsafe attribute parsing in output_userspace() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 04/54] gre: Fix again IPv6 link-local address generation Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 05/54] can: gw: use call_rcu() instead of costly synchronize_rcu() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 06/54] rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 07/54] can: gw: fix RCU/BH usage in cgw_create_job() Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 08/54] netfilter: ipset: fix region locking in hash types Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 09/54] net: dsa: b53: allow leaky reserved multicast Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 10/54] net: dsa: b53: fix clearing PVID of a port Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 11/54] net: dsa: b53: fix flushing old pvid VLAN on pvid change Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 12/54] net: dsa: b53: fix VLAN ID for untagged vlan on bridge leave Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 13/54] net: dsa: b53: always rejoin default untagged VLAN " Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 14/54] net: dsa: b53: fix learning on VLAN unaware bridges Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 15/54] Input: synaptics - enable InterTouch on Dynabook Portege X30-D Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 16/54] Input: synaptics - enable InterTouch on Dynabook Portege X30L-G Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 17/54] Input: synaptics - enable InterTouch on Dell Precision M3800 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 18/54] Input: synaptics - enable SMBus for HP Elitebook 850 G1 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 19/54] Input: synaptics - enable InterTouch on TUXEDO InfinityBook Pro 14 v5 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 20/54] staging: iio: adc: ad7816: Correct conditional logic for store mode Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 21/54] staging: axis-fifo: Remove hardware resets for user errors Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 22/54] staging: axis-fifo: Correct handling of tx_fifo_depth for size validation Greg Kroah-Hartman
2025-05-12 17:29 ` Greg Kroah-Hartman [this message]
2025-05-12 17:29 ` [PATCH 5.15 24/54] iio: adc: ad7606: fix serial register access Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 25/54] iio: adis16201: Correct inclinometer channel resolution Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 26/54] iio: imu: st_lsm6dsx: fix possible lockup in st_lsm6dsx_read_fifo Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 27/54] iio: imu: st_lsm6dsx: fix possible lockup in st_lsm6dsx_read_tagged_fifo Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 28/54] drm/amd/display: Fix wrong handling for AUX_DEFER case Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 29/54] usb: uhci-platform: Make the clock really optional Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 30/54] xenbus: Use kref to track req lifetime Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 31/54] module: ensure that kobject_put() is safe for module type kobjects Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 32/54] ocfs2: switch osb->disable_recovery to enum Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 33/54] ocfs2: implement handshaking with ocfs2 recovery thread Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 34/54] ocfs2: stop quota recovery before disabling quotas Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 35/54] usb: cdnsp: Fix issue with resuming from L1 Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 36/54] usb: cdnsp: fix L1 resume issue for RTL_REVISION_NEW_LPM version Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 37/54] usb: gadget: tegra-xudc: ACK ST_RC after clearing CTRL_RUN Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 38/54] usb: host: tegra: Prevent host controller crash when OTG port is used Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 39/54] usb: typec: tcpm: delay SNK_TRY_WAIT_DEBOUNCE to SRC_TRYWAIT transition Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 40/54] usb: typec: ucsi: displayport: Fix NULL pointer access Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 41/54] USB: usbtmc: use interruptible sleep in usbtmc_read Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 42/54] usb: usbtmc: Fix erroneous get_stb ioctl error returns Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 43/54] usb: usbtmc: Fix erroneous wait_srq ioctl return Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 44/54] usb: usbtmc: Fix erroneous generic_read " Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 45/54] types: Complement the aligned types with signed 64-bit one Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 46/54] iio: adc: dln2: Use aligned_s64 for timestamp Greg Kroah-Hartman
2025-05-12 17:29 ` [PATCH 5.15 47/54] MIPS: Fix MAX_REG_OFFSET Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 48/54] drm/panel: simple: Update timings for AUO G101EVN010 Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 49/54] nvme: unblock ctrl state transition for firmware update Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 50/54] do_umount(): add missing barrier before refcount checks in sync case Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 51/54] Revert "net: phy: microchip: force IRQ polling mode for lan88xx" Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 52/54] x86/bpf: Call branch history clearing sequence on exit Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 53/54] x86/bpf: Add IBHF call at end of classic BPF Greg Kroah-Hartman
2025-05-12 17:30 ` [PATCH 5.15 54/54] x86/bhi: Do not set BHI_DIS_S in 32-bit mode Greg Kroah-Hartman
2025-05-12 20:56 ` [PATCH 5.15 00/54] 5.15.183-rc1 review Jon Hunter
2025-05-13  9:29 ` Florian Fainelli
2025-05-13  9:48 ` Mark Brown
2025-05-13 10:14 ` Ron Economos
2025-05-13 16:17 ` Naresh Kamboju
2025-05-13 17:33 ` Shuah Khan
2025-05-14  5:23 ` Vijayendra Suman
2025-05-14 17:14 ` Hardik Garg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250512172016.578668469@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=mingo@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=sdolan@janestreet.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox