From: Catalin Marinas <catalin.marinas@arm.com>
To: Sasha Levin <sashal@kernel.org>
Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
Mark Rutland <mark.rutland@arm.com>,
Will Deacon <will@kernel.org>,
peterz@infradead.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH AUTOSEL 6.0 17/46] arm64: atomics: remove LL/SC trampolines
Date: Wed, 12 Oct 2022 09:55:59 +0100 [thread overview]
Message-ID: <Y0aBH1cWdx9oh0BV@arm.com> (raw)
In-Reply-To: <20221011145015.1622882-17-sashal@kernel.org>
On Tue, Oct 11, 2022 at 10:49:45AM -0400, Sasha Levin wrote:
> From: Mark Rutland <mark.rutland@arm.com>
>
> [ Upstream commit b2c3ccbd0011bb3b51d0fec24cb3a5812b1ec8ea ]
>
> When CONFIG_ARM64_LSE_ATOMICS=y, each use of an LL/SC atomic results in
> a fragment of code being generated in a subsection without a clear
> association with its caller. A trampoline in the caller branches to the
> LL/SC atomic with with a direct branch, and the atomic directly branches
> back into its trampoline.
>
> This breaks backtracing, as any PC within the out-of-line fragment will
> be symbolized as an offset from the nearest prior symbol (which may not
> be the function using the atomic), and since the atomic returns with a
> direct branch, the caller's PC may be missing from the backtrace.
>
> For example, with secondary_start_kernel() hacked to contain
> atomic_inc(NULL), the resulting exception can be reported as being taken
> from cpus_are_stuck_in_kernel():
>
> | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> | Mem abort info:
> | ESR = 0x0000000096000004
> | EC = 0x25: DABT (current EL), IL = 32 bits
> | SET = 0, FnV = 0
> | EA = 0, S1PTW = 0
> | FSC = 0x04: level 0 translation fault
> | Data abort info:
> | ISV = 0, ISS = 0x00000004
> | CM = 0, WnR = 0
> | [0000000000000000] user address but active_mm is swapper
> | Internal error: Oops: 96000004 [#1] PREEMPT SMP
> | Modules linked in:
> | CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.19.0-11219-geb555cb5b794-dirty #3
> | Hardware name: linux,dummy-virt (DT)
> | pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> | pc : cpus_are_stuck_in_kernel+0xa4/0x120
> | lr : secondary_start_kernel+0x164/0x170
> | sp : ffff80000a4cbe90
> | x29: ffff80000a4cbe90 x28: 0000000000000000 x27: 0000000000000000
> | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
> | x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000000
> | x20: 0000000000000001 x19: 0000000000000001 x18: 0000000000000008
> | x17: 3030383832343030 x16: 3030303030307830 x15: ffff80000a4cbab0
> | x14: 0000000000000001 x13: 5d31666130663133 x12: 3478305b20313030
> | x11: 3030303030303078 x10: 3020726f73736563 x9 : 726f737365636f72
> | x8 : ffff800009ff2ef0 x7 : 0000000000000003 x6 : 0000000000000000
> | x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000100
> | x2 : 0000000000000000 x1 : ffff0000029bd880 x0 : 0000000000000000
> | Call trace:
> | cpus_are_stuck_in_kernel+0xa4/0x120
> | __secondary_switched+0xb0/0xb4
> | Code: 35ffffa3 17fffc6c d53cd040 f9800011 (885f7c01)
> | ---[ end trace 0000000000000000 ]---
>
> This is confusing and hinders debugging, and will be problematic for
> CONFIG_LIVEPATCH as these cases cannot be unwound reliably.
>
> This is very similar to recent issues with out-of-line exception fixups,
> which were removed in commits:
>
> 35d67794b8828333 ("arm64: lib: __arch_clear_user(): fold fixups into body")
> 4012e0e22739eef9 ("arm64: lib: __arch_copy_from_user(): fold fixups into body")
> 139f9ab73d60cf76 ("arm64: lib: __arch_copy_to_user(): fold fixups into body")
>
> When the trampolines were introduced in commit:
>
> addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics")
>
> The rationale was to improve icache performance by grouping the LL/SC
> atomics together. This has never been measured, and this theoretical
> benefit is outweighed by other factors:
>
> * As the subsections are collapsed into sections at object file
> granularity, these are spread out throughout the kernel and can share
> cachelines with unrelated code regardless.
>
> * GCC 12.1.0 has been observed to place the trampoline out-of-line in
> specialised __ll_sc_*() functions, introducing more branching than was
> intended.
>
> * Removing the trampolines has been observed to shrink a defconfig
> kernel Image by 64KiB when building with GCC 12.1.0.
>
> This patch removes the LL/SC trampolines, meaning that the LL/SC atomics
> will be inlined into their callers (or placed in out-of line functions
> using regular BL/RET pairs). When CONFIG_ARM64_LSE_ATOMICS=y, the LL/SC
> atomics are always called in an unlikely branch, and will be placed in a
> cold portion of the function, so this should have minimal impact to the
> hot paths.
>
> Other than the improved backtracing, there should be no functional
> change as a result of this patch.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Link: https://lore.kernel.org/r/20220817155914.3975112-2-mark.rutland@arm.com
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
Please also drop this here. Thanks.
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Sasha Levin <sashal@kernel.org>
Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
Mark Rutland <mark.rutland@arm.com>,
Will Deacon <will@kernel.org>,
peterz@infradead.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH AUTOSEL 6.0 17/46] arm64: atomics: remove LL/SC trampolines
Date: Wed, 12 Oct 2022 09:55:59 +0100 [thread overview]
Message-ID: <Y0aBH1cWdx9oh0BV@arm.com> (raw)
In-Reply-To: <20221011145015.1622882-17-sashal@kernel.org>
On Tue, Oct 11, 2022 at 10:49:45AM -0400, Sasha Levin wrote:
> From: Mark Rutland <mark.rutland@arm.com>
>
> [ Upstream commit b2c3ccbd0011bb3b51d0fec24cb3a5812b1ec8ea ]
>
> When CONFIG_ARM64_LSE_ATOMICS=y, each use of an LL/SC atomic results in
> a fragment of code being generated in a subsection without a clear
> association with its caller. A trampoline in the caller branches to the
> LL/SC atomic with with a direct branch, and the atomic directly branches
> back into its trampoline.
>
> This breaks backtracing, as any PC within the out-of-line fragment will
> be symbolized as an offset from the nearest prior symbol (which may not
> be the function using the atomic), and since the atomic returns with a
> direct branch, the caller's PC may be missing from the backtrace.
>
> For example, with secondary_start_kernel() hacked to contain
> atomic_inc(NULL), the resulting exception can be reported as being taken
> from cpus_are_stuck_in_kernel():
>
> | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> | Mem abort info:
> | ESR = 0x0000000096000004
> | EC = 0x25: DABT (current EL), IL = 32 bits
> | SET = 0, FnV = 0
> | EA = 0, S1PTW = 0
> | FSC = 0x04: level 0 translation fault
> | Data abort info:
> | ISV = 0, ISS = 0x00000004
> | CM = 0, WnR = 0
> | [0000000000000000] user address but active_mm is swapper
> | Internal error: Oops: 96000004 [#1] PREEMPT SMP
> | Modules linked in:
> | CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.19.0-11219-geb555cb5b794-dirty #3
> | Hardware name: linux,dummy-virt (DT)
> | pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> | pc : cpus_are_stuck_in_kernel+0xa4/0x120
> | lr : secondary_start_kernel+0x164/0x170
> | sp : ffff80000a4cbe90
> | x29: ffff80000a4cbe90 x28: 0000000000000000 x27: 0000000000000000
> | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
> | x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000000
> | x20: 0000000000000001 x19: 0000000000000001 x18: 0000000000000008
> | x17: 3030383832343030 x16: 3030303030307830 x15: ffff80000a4cbab0
> | x14: 0000000000000001 x13: 5d31666130663133 x12: 3478305b20313030
> | x11: 3030303030303078 x10: 3020726f73736563 x9 : 726f737365636f72
> | x8 : ffff800009ff2ef0 x7 : 0000000000000003 x6 : 0000000000000000
> | x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000100
> | x2 : 0000000000000000 x1 : ffff0000029bd880 x0 : 0000000000000000
> | Call trace:
> | cpus_are_stuck_in_kernel+0xa4/0x120
> | __secondary_switched+0xb0/0xb4
> | Code: 35ffffa3 17fffc6c d53cd040 f9800011 (885f7c01)
> | ---[ end trace 0000000000000000 ]---
>
> This is confusing and hinders debugging, and will be problematic for
> CONFIG_LIVEPATCH as these cases cannot be unwound reliably.
>
> This is very similar to recent issues with out-of-line exception fixups,
> which were removed in commits:
>
> 35d67794b8828333 ("arm64: lib: __arch_clear_user(): fold fixups into body")
> 4012e0e22739eef9 ("arm64: lib: __arch_copy_from_user(): fold fixups into body")
> 139f9ab73d60cf76 ("arm64: lib: __arch_copy_to_user(): fold fixups into body")
>
> When the trampolines were introduced in commit:
>
> addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics")
>
> The rationale was to improve icache performance by grouping the LL/SC
> atomics together. This has never been measured, and this theoretical
> benefit is outweighed by other factors:
>
> * As the subsections are collapsed into sections at object file
> granularity, these are spread out throughout the kernel and can share
> cachelines with unrelated code regardless.
>
> * GCC 12.1.0 has been observed to place the trampoline out-of-line in
> specialised __ll_sc_*() functions, introducing more branching than was
> intended.
>
> * Removing the trampolines has been observed to shrink a defconfig
> kernel Image by 64KiB when building with GCC 12.1.0.
>
> This patch removes the LL/SC trampolines, meaning that the LL/SC atomics
> will be inlined into their callers (or placed in out-of line functions
> using regular BL/RET pairs). When CONFIG_ARM64_LSE_ATOMICS=y, the LL/SC
> atomics are always called in an unlikely branch, and will be placed in a
> cold portion of the function, so this should have minimal impact to the
> hot paths.
>
> Other than the improved backtracing, there should be no functional
> change as a result of this patch.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Link: https://lore.kernel.org/r/20220817155914.3975112-2-mark.rutland@arm.com
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
Please also drop this here. Thanks.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-10-12 8:56 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-11 14:49 [PATCH AUTOSEL 6.0 01/46] arm64: dts: qcom: sdm845: narrow LLCC address space Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 02/46] ARM: dts: imx6: delete interrupts property if interrupts-extended is set Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 03/46] ARM: dts: imx7d-sdb: config the max pressure for tsc2046 Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 04/46] soc: mediatek: Let PMIC Wrapper and SCPSYS depend on OF Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 22:49 ` Jean Delvare
2022-10-11 22:49 ` Jean Delvare
2022-10-16 14:46 ` Sasha Levin
2022-10-16 14:46 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 05/46] arm64: dts: qcom: sc7280-idp: correct ADC channel node name and unit address Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 06/46] ARM: dts: imx6q: add missing properties for sram Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 07/46] ARM: dts: imx6dl: " Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 08/46] ARM: dts: imx6qp: " Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 09/46] ARM: dts: imx6sl: " Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 10/46] ARM: dts: imx6sll: " Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 11/46] ARM: dts: imx6sx: " Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 12/46] ARM: dts: imx6sl: use tabs for code indent Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 13/46] ARM: dts: imx6sx-udoo-neo: don't use multiple blank lines Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 14/46] kselftest/arm64: Fix validatation termination record after EXTRA_CONTEXT Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 15/46] kselftest/arm64: Allow larger buffers in get_signal_context() Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 15:04 ` Mark Brown
2022-10-11 15:04 ` Mark Brown
2022-10-13 17:58 ` Sasha Levin
2022-10-13 17:58 ` Sasha Levin
2022-10-13 18:02 ` Mark Brown
2022-10-13 18:02 ` Mark Brown
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 16/46] sparc: Fix the generic IO helpers Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 17/46] arm64: atomics: remove LL/SC trampolines Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-12 8:55 ` Catalin Marinas [this message]
2022-10-12 8:55 ` Catalin Marinas
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 18/46] arm64: run softirqs on the per-CPU IRQ stack Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 19/46] arm64: dts: imx8mm-kontron: Use the VSELECT signal to switch SD card IO voltage Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 20/46] arm64: dts: imx8ulp: no executable source file permission Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 21/46] arm64: dts: imx8mq-librem5: Add bq25895 as max17055's power supply Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 22/46] ARM: orion: fix include path Sasha Levin
2022-10-11 14:49 ` Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 23/46] btrfs: dump extra info if one free space cache has more bitmaps than it should Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 24/46] btrfs: add macros for annotating wait events with lockdep Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 25/46] btrfs: add lockdep annotations for num_writers wait event Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 26/46] btrfs: add lockdep annotations for num_extwriters " Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 27/46] btrfs: add lockdep annotations for transaction states wait events Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 28/46] btrfs: add lockdep annotations for pending_ordered wait event Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 29/46] btrfs: change the lockdep class of free space inode's invalidate_lock Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 30/46] btrfs: add lockdep annotations for the ordered extents wait event Sasha Levin
2022-10-11 14:49 ` [PATCH AUTOSEL 6.0 31/46] btrfs: scrub: properly report super block errors in system log Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 32/46] btrfs: scrub: try to fix super block errors Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 33/46] btrfs: get rid of block group caching progress logic Sasha Levin
2022-10-11 23:46 ` Omar Sandoval
2022-10-13 17:55 ` Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 34/46] btrfs: don't print information about space cache or tree every remount Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 35/46] btrfs: call __btrfs_remove_free_space_cache_locked on cache load failure Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 36/46] btrfs: check superblock to ensure the fs was not modified at thaw time Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 37/46] btrfs: remove the unnecessary result variables Sasha Levin
2022-10-12 11:54 ` David Sterba
2022-10-13 17:56 ` Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 38/46] btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN Sasha Levin
2022-10-12 12:56 ` David Sterba
2022-10-12 23:12 ` Qu Wenruo
2022-10-13 17:56 ` Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 39/46] btrfs: add KCSAN annotations for unlocked access to block_rsv->full Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 40/46] btrfs: separate out the eb and extent state leak helpers Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 41/46] btrfs: relax block-group-tree feature dependency checks Sasha Levin
2022-10-12 13:01 ` David Sterba
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 42/46] arm64: dts: uniphier: Add USB-device support for PXs3 reference board Sasha Levin
2022-10-11 14:50 ` Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 43/46] ARM: 9233/1: stacktrace: Skip frame pointer boundary check for call_with_stack() Sasha Levin
2022-10-11 14:50 ` Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 44/46] ARM: 9234/1: stacktrace: Avoid duplicate saving of exception PC value Sasha Levin
2022-10-11 14:50 ` Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 45/46] ARM: 9242/1: kasan: Only map modules if CONFIG_KASAN_VMALLOC=n Sasha Levin
2022-10-11 14:50 ` Sasha Levin
2022-10-11 14:50 ` [PATCH AUTOSEL 6.0 46/46] selftests/cpu-hotplug: Use return instead of exit Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y0aBH1cWdx9oh0BV@arm.com \
--to=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=peterz@infradead.org \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.