public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Rik van Riel <riel@surriel.com>,
	Stephen Dolan <sdolan@janestreet.com>,
	Ingo Molnar <mingo@kernel.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 6.1 32/92] x86/mm: Eliminate window where TLB flushes may be inadvertently skipped
Date: Mon, 12 May 2025 19:45:07 +0200	[thread overview]
Message-ID: <20250512172024.444793053@linuxfoundation.org> (raw)
In-Reply-To: <20250512172023.126467649@linuxfoundation.org>

6.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Hansen <dave.hansen@linux.intel.com>

commit fea4e317f9e7e1f449ce90dedc27a2d2a95bee5a upstream.

tl;dr: There is a window in the mm switching code where the new CR3 is
set and the CPU should be getting TLB flushes for the new mm.  But
should_flush_tlb() has a bug and suppresses the flush.  Fix it by
widening the window where should_flush_tlb() sends an IPI.

Long Version:

=== History ===

There were a few things leading up to this.

First, updating mm_cpumask() was observed to be too expensive, so it was
made lazier.  But being lazy caused too many unnecessary IPIs to CPUs
due to the now-lazy mm_cpumask().  So code was added to cull
mm_cpumask() periodically[2].  But that culling was a bit too aggressive
and skipped sending TLB flushes to CPUs that need them.  So here we are
again.

=== Problem ===

The too-aggressive code in should_flush_tlb() strikes in this window:

	// Turn on IPIs for this CPU/mm combination, but only
	// if should_flush_tlb() agrees:
	cpumask_set_cpu(cpu, mm_cpumask(next));

	next_tlb_gen = atomic64_read(&next->context.tlb_gen);
	choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
	load_new_mm_cr3(need_flush);
	// ^ After 'need_flush' is set to false, IPIs *MUST*
	// be sent to this CPU and not be ignored.

        this_cpu_write(cpu_tlbstate.loaded_mm, next);
	// ^ Not until this point does should_flush_tlb()
	// become true!

should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()
and writing to 'loaded_mm', which is a window where they should not be
suppressed.  Whoops.

=== Solution ===

Thankfully, the fuzzy "just about to write CR3" window is already marked
with loaded_mm==LOADED_MM_SWITCHING.  Simply checking for that state in
should_flush_tlb() is sufficient to ensure that the CPU is targeted with
an IPI.

This will cause more TLB flush IPIs.  But the window is relatively small
and I do not expect this to cause any kind of measurable performance
impact.

Update the comment where LOADED_MM_SWITCHING is written since it grew
yet another user.

Peter Z also raised a concern that should_flush_tlb() might not observe
'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off()
writes them.  Add a barrier to ensure that they are observed in the
order they are written.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/ [1]
Fixes: 6db2526c1d69 ("x86/mm/tlb: Only trim the mm_cpumask once a second") [2]
Reported-by: Stephen Dolan <sdolan@janestreet.com>
Cc: stable@vger.kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/tlb.c |   23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -617,7 +617,11 @@ void switch_mm_irqs_off(struct mm_struct
 
 		choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
 
-		/* Let nmi_uaccess_okay() know that we're changing CR3. */
+		/*
+		 * Indicate that CR3 is about to change. nmi_uaccess_okay()
+		 * and others are sensitive to the window where mm_cpumask(),
+		 * CR3 and cpu_tlbstate.loaded_mm are not all in sync.
+ 		 */
 		this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING);
 		barrier();
 	}
@@ -880,8 +884,16 @@ done:
 
 static bool should_flush_tlb(int cpu, void *data)
 {
+	struct mm_struct *loaded_mm = per_cpu(cpu_tlbstate.loaded_mm, cpu);
 	struct flush_tlb_info *info = data;
 
+	/*
+	 * Order the 'loaded_mm' and 'is_lazy' against their
+	 * write ordering in switch_mm_irqs_off(). Ensure
+	 * 'is_lazy' is at least as new as 'loaded_mm'.
+	 */
+	smp_rmb();
+
 	/* Lazy TLB will get flushed at the next context switch. */
 	if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
 		return false;
@@ -890,8 +902,15 @@ static bool should_flush_tlb(int cpu, vo
 	if (!info->mm)
 		return true;
 
+	/*
+	 * While switching, the remote CPU could have state from
+	 * either the prev or next mm. Assume the worst and flush.
+	 */
+	if (loaded_mm == LOADED_MM_SWITCHING)
+		return true;
+
 	/* The target mm is loaded, and the CPU is not lazy. */
-	if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
+	if (loaded_mm == info->mm)
 		return true;
 
 	/* In cpumask, but not the loaded mm? Periodically remove by flushing. */



  parent reply	other threads:[~2025-05-12 17:53 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12 17:44 [PATCH 6.1 00/92] 6.1.139-rc1 review Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 01/92] dm: add missing unlock on in dm_keyslot_evict() Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 02/92] arm64: dts: imx8mm-verdin: Link reg_usdhc2_vqmmc to usdhc2 Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 03/92] can: mcan: m_can_class_unregister(): fix order of unregistration calls Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 04/92] can: mcp251xfd: mcp251xfd_remove(): " Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 05/92] ksmbd: prevent out-of-bounds stream writes by validating *pos Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 06/92] openvswitch: Fix unsafe attribute parsing in output_userspace() Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 07/92] ksmbd: fix memory leak in parse_lease_state() Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 08/92] sch_htb: make htb_deactivate() idempotent Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 09/92] gre: Fix again IPv6 link-local address generation Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 10/92] can: mcp251xfd: fix TDC setting for low data bit rates Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 11/92] rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 12/92] can: gw: fix RCU/BH usage in cgw_create_job() Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 13/92] ipv4: Drop tos parameter from flowi4_update_output() Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 14/92] ipvs: fix uninit-value for saddr in do_output_route4 Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 15/92] netfilter: ipset: fix region locking in hash types Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 16/92] bpf: Scrub packet on bpf_redirect_peer Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 17/92] net: dsa: b53: allow leaky reserved multicast Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 18/92] net: dsa: b53: fix clearing PVID of a port Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 19/92] net: dsa: b53: fix flushing old pvid VLAN on pvid change Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 20/92] net: dsa: b53: fix VLAN ID for untagged vlan on bridge leave Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 21/92] net: dsa: b53: always rejoin default untagged VLAN " Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 22/92] net: dsa: b53: fix learning on VLAN unaware bridges Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 23/92] Input: mtk-pmic-keys - fix possible null pointer dereference Greg Kroah-Hartman
2025-05-12 17:44 ` [PATCH 6.1 24/92] Input: synaptics - enable InterTouch on Dynabook Portege X30-D Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 25/92] Input: synaptics - enable InterTouch on Dynabook Portege X30L-G Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 26/92] Input: synaptics - enable InterTouch on Dell Precision M3800 Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 27/92] Input: synaptics - enable SMBus for HP Elitebook 850 G1 Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 28/92] Input: synaptics - enable InterTouch on TUXEDO InfinityBook Pro 14 v5 Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 29/92] staging: iio: adc: ad7816: Correct conditional logic for store mode Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 30/92] staging: axis-fifo: Remove hardware resets for user errors Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 31/92] staging: axis-fifo: Correct handling of tx_fifo_depth for size validation Greg Kroah-Hartman
2025-05-12 17:45 ` Greg Kroah-Hartman [this message]
2025-05-12 17:45 ` [PATCH 6.1 33/92] drm/amd/display: Shift DMUB AUX reply command if necessary Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 34/92] iio: adc: ad7606: fix serial register access Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 35/92] iio: adis16201: Correct inclinometer channel resolution Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 36/92] iio: imu: st_lsm6dsx: fix possible lockup in st_lsm6dsx_read_fifo Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 37/92] iio: imu: st_lsm6dsx: fix possible lockup in st_lsm6dsx_read_tagged_fifo Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 38/92] drm/v3d: Add job to pending list if the reset was skipped Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 39/92] drm/amd/display: Fix the checking condition in dmub aux handling Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 40/92] drm/amd/display: Remove incorrect checking in dmub aux handler Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 41/92] drm/amd/display: Fix wrong handling for AUX_DEFER case Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 42/92] drm/amd/display: Copy AUX read reply data whenever length > 0 Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 43/92] drm/amdgpu/hdp5.2: use memcfg register to post the write for HDP flush Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 44/92] usb: uhci-platform: Make the clock really optional Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 45/92] xenbus: Use kref to track req lifetime Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 46/92] module: ensure that kobject_put() is safe for module type kobjects Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 47/92] ocfs2: switch osb->disable_recovery to enum Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 48/92] ocfs2: implement handshaking with ocfs2 recovery thread Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 49/92] ocfs2: stop quota recovery before disabling quotas Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 50/92] usb: cdnsp: Fix issue with resuming from L1 Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 51/92] usb: cdnsp: fix L1 resume issue for RTL_REVISION_NEW_LPM version Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 52/92] usb: gadget: tegra-xudc: ACK ST_RC after clearing CTRL_RUN Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 53/92] usb: host: tegra: Prevent host controller crash when OTG port is used Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 54/92] usb: typec: tcpm: delay SNK_TRY_WAIT_DEBOUNCE to SRC_TRYWAIT transition Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 55/92] usb: typec: ucsi: displayport: Fix NULL pointer access Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 56/92] USB: usbtmc: use interruptible sleep in usbtmc_read Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 57/92] usb: usbtmc: Fix erroneous get_stb ioctl error returns Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 58/92] usb: usbtmc: Fix erroneous wait_srq ioctl return Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 59/92] usb: usbtmc: Fix erroneous generic_read " Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 60/92] iio: accel: adxl367: fix setting odr for activity time update Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 61/92] iio: temp: maxim-thermocouple: Fix potential lack of DMA safe buffer Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 62/92] types: Complement the aligned types with signed 64-bit one Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 63/92] iio: accel: adxl355: Make timestamp 64-bit aligned using aligned_s64 Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 64/92] iio: adc: dln2: Use aligned_s64 for timestamp Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 65/92] MIPS: Fix MAX_REG_OFFSET Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 66/92] drm/panel: simple: Update timings for AUO G101EVN010 Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 67/92] nvme: unblock ctrl state transition for firmware update Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 68/92] do_umount(): add missing barrier before refcount checks in sync case Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 69/92] io_uring: always arm linked timeouts prior to issue Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 70/92] io_uring: ensure deferred completions are posted for multishot Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 71/92] Revert "net: phy: microchip: force IRQ polling mode for lan88xx" Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 72/92] arm64: insn: Add support for encoding DSB Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 73/92] arm64: proton-pack: Expose whether the platform is mitigated by firmware Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 74/92] arm64: proton-pack: Expose whether the branchy loop k value Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 75/92] arm64: bpf: Add BHB mitigation to the epilogue for cBPF programs Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 76/92] arm64: bpf: Only mitigate cBPF programs loaded by unprivileged users Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 77/92] arm64: proton-pack: Add new CPUs k values for branch mitigation Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 78/92] x86/bpf: Call branch history clearing sequence on exit Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 79/92] x86/bpf: Add IBHF call at end of classic BPF Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 80/92] x86/bhi: Do not set BHI_DIS_S in 32-bit mode Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 81/92] x86/speculation: Simplify and make CALL_NOSPEC consistent Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 82/92] x86/speculation: Add a conditional CS prefix to CALL_NOSPEC Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 83/92] x86/speculation: Remove the extra #ifdef around CALL_NOSPEC Greg Kroah-Hartman
2025-05-12 17:45 ` [PATCH 6.1 84/92] Documentation: x86/bugs/its: Add ITS documentation Greg Kroah-Hartman
2025-05-12 17:46 ` [PATCH 6.1 85/92] x86/its: Enumerate Indirect Target Selection (ITS) bug Greg Kroah-Hartman
2025-05-12 17:46 ` [PATCH 6.1 86/92] x86/its: Add support for ITS-safe indirect thunk Greg Kroah-Hartman
2025-05-12 17:46 ` [PATCH 6.1 87/92] x86/its: Add support for ITS-safe return thunk Greg Kroah-Hartman
2025-05-12 17:46 ` [PATCH 6.1 88/92] x86/its: Enable Indirect Target Selection mitigation Greg Kroah-Hartman
2025-05-12 17:46 ` [PATCH 6.1 89/92] x86/its: Add "vmexit" option to skip mitigation on some CPUs Greg Kroah-Hartman
2025-05-12 17:46 ` [PATCH 6.1 90/92] x86/its: Align RETs in BHB clear sequence to avoid thunking Greg Kroah-Hartman
2025-05-12 17:46 ` [PATCH 6.1 91/92] x86/ibt: Keep IBT disabled during alternative patching Greg Kroah-Hartman
2025-05-12 17:46 ` [PATCH 6.1 92/92] x86/its: Use dynamic thunks for indirect branches Greg Kroah-Hartman
2025-05-12 20:56 ` [PATCH 6.1 00/92] 6.1.139-rc1 review Jon Hunter
2025-05-13  6:45 ` Pavel Machek
2025-05-13  9:43 ` Florian Fainelli
2025-05-13  9:48 ` Mark Brown
2025-05-13 10:04 ` Ron Economos
2025-05-13 11:39 ` Peter Schneider
2025-05-13 17:19 ` Naresh Kamboju
2025-05-13 17:32 ` Shuah Khan
2025-05-14 17:11 ` Hardik Garg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250512172024.444793053@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=mingo@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=sdolan@janestreet.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox