public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Boqun Feng <boqun.feng@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Steve Capper <steve.capper@arm.com>,
	Will Deacon <will@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.10 59/64] arm64: cmpxchg_double*: hazard against entire exchange variable
Date: Mon, 16 Jan 2023 16:52:06 +0100	[thread overview]
Message-ID: <20230116154745.631943317@linuxfoundation.org> (raw)
In-Reply-To: <20230116154743.577276578@linuxfoundation.org>

From: Mark Rutland <mark.rutland@arm.com>

[ Upstream commit 031af50045ea97ed4386eb3751ca2c134d0fc911 ]

The inline assembly for arm64's cmpxchg_double*() implementations use a
+Q constraint to hazard against other accesses to the memory location
being exchanged. However, the pointer passed to the constraint is a
pointer to unsigned long, and thus the hazard only applies to the first
8 bytes of the location.

GCC can take advantage of this, assuming that other portions of the
location are unchanged, leading to a number of potential problems.

This is similar to what we fixed back in commit:

  fee960bed5e857eb ("arm64: xchg: hazard against entire exchange variable")

... but we forgot to adjust cmpxchg_double*() similarly at the same
time.

The same problem applies, as demonstrated with the following test:

| struct big {
|         u64 lo, hi;
| } __aligned(128);
|
| unsigned long foo(struct big *b)
| {
|         u64 hi_old, hi_new;
|
|         hi_old = b->hi;
|         cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78);
|         hi_new = b->hi;
|
|         return hi_old ^ hi_new;
| }

... which GCC 12.1.0 compiles as:

| 0000000000000000 <foo>:
|    0:   d503233f        paciasp
|    4:   aa0003e4        mov     x4, x0
|    8:   1400000e        b       40 <foo+0x40>
|    c:   d2800240        mov     x0, #0x12                       // #18
|   10:   d2800681        mov     x1, #0x34                       // #52
|   14:   aa0003e5        mov     x5, x0
|   18:   aa0103e6        mov     x6, x1
|   1c:   d2800ac2        mov     x2, #0x56                       // #86
|   20:   d2800f03        mov     x3, #0x78                       // #120
|   24:   48207c82        casp    x0, x1, x2, x3, [x4]
|   28:   ca050000        eor     x0, x0, x5
|   2c:   ca060021        eor     x1, x1, x6
|   30:   aa010000        orr     x0, x0, x1
|   34:   d2800000        mov     x0, #0x0                        // #0    <--- BANG
|   38:   d50323bf        autiasp
|   3c:   d65f03c0        ret
|   40:   d2800240        mov     x0, #0x12                       // #18
|   44:   d2800681        mov     x1, #0x34                       // #52
|   48:   d2800ac2        mov     x2, #0x56                       // #86
|   4c:   d2800f03        mov     x3, #0x78                       // #120
|   50:   f9800091        prfm    pstl1strm, [x4]
|   54:   c87f1885        ldxp    x5, x6, [x4]
|   58:   ca0000a5        eor     x5, x5, x0
|   5c:   ca0100c6        eor     x6, x6, x1
|   60:   aa0600a6        orr     x6, x5, x6
|   64:   b5000066        cbnz    x6, 70 <foo+0x70>
|   68:   c8250c82        stxp    w5, x2, x3, [x4]
|   6c:   35ffff45        cbnz    w5, 54 <foo+0x54>
|   70:   d2800000        mov     x0, #0x0                        // #0     <--- BANG
|   74:   d50323bf        autiasp
|   78:   d65f03c0        ret

Notice that at the lines with "BANG" comments, GCC has assumed that the
higher 8 bytes are unchanged by the cmpxchg_double() call, and that
`hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and
LL/SC versions of cmpxchg_double().

This patch fixes the issue by passing a pointer to __uint128_t into the
+Q constraint, ensuring that the compiler hazards against the entire 16
bytes being modified.

With this change, GCC 12.1.0 compiles the above test as:

| 0000000000000000 <foo>:
|    0:   f9400407        ldr     x7, [x0, #8]
|    4:   d503233f        paciasp
|    8:   aa0003e4        mov     x4, x0
|    c:   1400000f        b       48 <foo+0x48>
|   10:   d2800240        mov     x0, #0x12                       // #18
|   14:   d2800681        mov     x1, #0x34                       // #52
|   18:   aa0003e5        mov     x5, x0
|   1c:   aa0103e6        mov     x6, x1
|   20:   d2800ac2        mov     x2, #0x56                       // #86
|   24:   d2800f03        mov     x3, #0x78                       // #120
|   28:   48207c82        casp    x0, x1, x2, x3, [x4]
|   2c:   ca050000        eor     x0, x0, x5
|   30:   ca060021        eor     x1, x1, x6
|   34:   aa010000        orr     x0, x0, x1
|   38:   f9400480        ldr     x0, [x4, #8]
|   3c:   d50323bf        autiasp
|   40:   ca0000e0        eor     x0, x7, x0
|   44:   d65f03c0        ret
|   48:   d2800240        mov     x0, #0x12                       // #18
|   4c:   d2800681        mov     x1, #0x34                       // #52
|   50:   d2800ac2        mov     x2, #0x56                       // #86
|   54:   d2800f03        mov     x3, #0x78                       // #120
|   58:   f9800091        prfm    pstl1strm, [x4]
|   5c:   c87f1885        ldxp    x5, x6, [x4]
|   60:   ca0000a5        eor     x5, x5, x0
|   64:   ca0100c6        eor     x6, x6, x1
|   68:   aa0600a6        orr     x6, x5, x6
|   6c:   b5000066        cbnz    x6, 78 <foo+0x78>
|   70:   c8250c82        stxp    w5, x2, x3, [x4]
|   74:   35ffff45        cbnz    w5, 5c <foo+0x5c>
|   78:   f9400480        ldr     x0, [x4, #8]
|   7c:   d50323bf        autiasp
|   80:   ca0000e0        eor     x0, x7, x0
|   84:   d65f03c0        ret

... sampling the high 8 bytes before and after the cmpxchg, and
performing an EOR, as we'd expect.

For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note
that linux-4.9.y is oldest currently supported stable release, and
mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run
on my machines due to library incompatibilities.

I've also used a standalone test to check that we can use a __uint128_t
pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM
3.9.1.

Fixes: 5284e1b4bc8a ("arm64: xchg: Implement cmpxchg_double")
Fixes: e9a4b795652f ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/
Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/arm64/include/asm/atomic_ll_sc.h | 2 +-
 arch/arm64/include/asm/atomic_lse.h   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index 906e2d8c254c..abd302e521c0 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -315,7 +315,7 @@ __ll_sc__cmpxchg_double##name(unsigned long old1,			\
 	"	cbnz	%w0, 1b\n"					\
 	"	" #mb "\n"						\
 	"2:"								\
-	: "=&r" (tmp), "=&r" (ret), "+Q" (*(unsigned long *)ptr)	\
+	: "=&r" (tmp), "=&r" (ret), "+Q" (*(__uint128_t *)ptr)		\
 	: "r" (old1), "r" (old2), "r" (new1), "r" (new2)		\
 	: cl);								\
 									\
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index ab661375835e..28e96118c1e5 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -403,7 +403,7 @@ __lse__cmpxchg_double##name(unsigned long old1,				\
 	"	eor	%[old2], %[old2], %[oldval2]\n"			\
 	"	orr	%[old1], %[old1], %[old2]"			\
 	: [old1] "+&r" (x0), [old2] "+&r" (x1),				\
-	  [v] "+Q" (*(unsigned long *)ptr)				\
+	  [v] "+Q" (*(__uint128_t *)ptr)				\
 	: [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4),		\
 	  [oldval1] "r" (oldval1), [oldval2] "r" (oldval2)		\
 	: cl);								\
-- 
2.35.1




  parent reply	other threads:[~2023-01-16 16:14 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-16 15:51 [PATCH 5.10 00/64] 5.10.164-rc1 review Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 01/64] netfilter: nft_payload: incorrect arithmetics when fetching VLAN header bits Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 02/64] ALSA: hda/realtek: Enable mute/micmute LEDs on HP Spectre x360 13-aw0xxx Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 03/64] KVM: arm64: Fix S1PTW handling on RO memslots Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 04/64] efi: tpm: Avoid READ_ONCE() for accessing the event log Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 05/64] docs: Fix the docs build with Sphinx 6.0 Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 06/64] perf auxtrace: Fix address filter duplicate symbol selection Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 07/64] s390/kexec: fix ipl report address for kdump Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 08/64] ASoC: qcom: lpass-cpu: Fix fallback SD line index handling Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 09/64] s390/cpum_sf: add READ_ONCE() semantics to compare and swap loops Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 10/64] s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple() Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 11/64] cifs: Fix uninitialized memory read for smb311 posix symlink create Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 12/64] drm/msm/adreno: Make adreno quirks not overwrite each other Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 13/64] drm/msm/dp: do not complete dp_aux_cmd_fifo_tx() if irq is not for aux transfer Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 14/64] platform/x86: sony-laptop: Dont turn off 0x153 keyboard backlight during probe Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 15/64] ixgbe: fix pci device refcount leak Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 16/64] ipv6: raw: Deduct extension header length in rawv6_push_pending_frames Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 17/64] bus: mhi: host: Fix race between channel preparation and M0 event Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 18/64] iommu/amd: Add PCI segment support for ivrs_[ioapic/hpet/acpihid] commands Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 19/64] iommu/amd: Fix ill-formed ivrs_ioapic, ivrs_hpet and ivrs_acpihid options Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 20/64] clk: imx8mp: Add DISP2 pixel clock Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 21/64] clk: imx8mp: add clkout1/2 support Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 22/64] dt-bindings: clocks: imx8mp: Add ID for usb suspend clock Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 23/64] clk: imx: imx8mp: add shared clk gate for usb suspend clk Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 24/64] xhci: Avoid parsing transfer events several times Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 25/64] xhci: get isochronous ring directly from endpoint structure Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 26/64] xhci: adjust parameters passed to cleanup_halted_endpoint() Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 27/64] xhci: Add xhci_reset_halted_ep() helper function Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 28/64] xhci: move xhci_td_cleanup so it can be called by more functions Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 29/64] xhci: store TD status in the td struct instead of passing it along Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 30/64] xhci: move and rename xhci_cleanup_halted_endpoint() Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 31/64] xhci: Prevent infinite loop in transaction errors recovery for streams Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 32/64] usb: ulpi: defer ulpi_register on ulpi_read_id timeout Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 33/64] ext4: fix uninititialized value in ext4_evict_inode Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 34/64] xfrm: fix rcu lock in xfrm_notify_userpolicy() Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 35/64] netfilter: ipset: Fix overflow before widen in the bitmap_ip_create() function Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 36/64] powerpc/imc-pmu: Fix use of mutex in IRQs disabled section Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 37/64] x86/boot: Avoid using Intel mnemonics in AT&T syntax asm Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 38/64] EDAC/device: Fix period calculation in edac_device_reset_delay_period() Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 39/64] regulator: da9211: Use irq handler when ready Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 40/64] ASoC: wm8904: fix wrong outputs volume after power reactivation Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 41/64] tipc: fix unexpected link reset due to discovery messages Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 42/64] octeontx2-af: Update get/set resource count functions Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 43/64] octeontx2-af: Map NIX block from CGX connection Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 44/64] octeontx2-af: Fix LMAC config in cgx_lmac_rx_tx_enable Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 45/64] hvc/xen: lock console list traversal Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 46/64] nfc: pn533: Wait for out_urbs completion in pn533_usb_send_frame() Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 47/64] net/sched: act_mpls: Fix warning during failed attribute validation Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 48/64] net/mlx5: Fix ptp max frequency adjustment range Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 49/64] net/mlx5e: Dont support encap rules with gbp option Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 50/64] mm: Always release pages to the buddy allocator in memblock_free_late() Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 51/64] iommu/mediatek-v1: Add error handle for mtk_iommu_probe Greg Kroah-Hartman
2023-01-16 15:51 ` [PATCH 5.10 52/64] iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe() Greg Kroah-Hartman
2023-01-16 15:52 ` [PATCH 5.10 53/64] Documentation: KVM: add API issues section Greg Kroah-Hartman
2023-01-16 15:52 ` [PATCH 5.10 54/64] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID Greg Kroah-Hartman
2023-01-16 15:52 ` [PATCH 5.10 55/64] x86/resctrl: Use task_curr() instead of task_struct->on_cpu to prevent unnecessary IPI Greg Kroah-Hartman
2023-01-16 15:52 ` [PATCH 5.10 56/64] x86/resctrl: Fix task CLOSID/RMID update race Greg Kroah-Hartman
2023-01-16 15:52 ` [PATCH 5.10 57/64] arm64: atomics: format whitespace consistently Greg Kroah-Hartman
2023-01-16 15:52 ` [PATCH 5.10 58/64] arm64: atomics: remove LL/SC trampolines Greg Kroah-Hartman
2023-01-16 15:52 ` Greg Kroah-Hartman [this message]
2023-01-16 15:52 ` [PATCH 5.10 60/64] efi: fix NULL-deref in init error path Greg Kroah-Hartman
2023-01-16 15:52 ` [PATCH 5.10 61/64] drm/virtio: Fix GEM handle creation UAF Greg Kroah-Hartman
2023-01-16 15:52 ` [PATCH 5.10 62/64] io_uring/io-wq: free worker if task_work creation is canceled Greg Kroah-Hartman
2023-01-16 15:52 ` [PATCH 5.10 63/64] io_uring/io-wq: only free worker if it was allocated for creation Greg Kroah-Hartman
2023-01-16 15:52 ` [PATCH 5.10 64/64] Revert "usb: ulpi: defer ulpi_register on ulpi_read_id timeout" Greg Kroah-Hartman
2023-01-16 18:58 ` [PATCH 5.10 00/64] 5.10.164-rc1 review Daniel Díaz
2023-01-16 21:30   ` Pavel Machek
2023-01-17  9:32   ` Greg Kroah-Hartman
2023-01-16 23:58 ` Shuah Khan
2023-01-17 12:35 ` Sudip Mukherjee
2023-01-17 14:20   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230116154745.631943317@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=arnd@arndb.de \
    --cc=boqun.feng@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=mark.rutland@arm.com \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=steve.capper@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox