From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 0/3] arm64: use subsections instead of function calls for LL/SC fallbacks
Date: Tue, 27 Nov 2018 19:30:55 +0000 [thread overview]
Message-ID: <20181127193054.GF5641@arm.com> (raw)
In-Reply-To: <20181113233923.20098-1-ard.biesheuvel@linaro.org>
Hi Ard,
On Tue, Nov 13, 2018 at 03:39:20PM -0800, Ard Biesheuvel wrote:
> Refactor the LL/SC atomics code so we can emit the LL/SC fallbacks for the
> LSE atomics as subsections that get instantiated at each call site rather
> than as out of line functions that get called from inline asm (without the
> awareness of the compiler)
>
> This should allow slightly better LSE code, and removes stack spilling and
> potential PLT indirection for the LL/SC fallbacks.
Thanks, I much prefer using subsections to the current approach. However,
a downside of your patches is that the some of the asm operands passed
to the LSE implementation are redundant, for example, in the fetch-ops:
" " #lse_op #ac #rl " %w[i], %w[res], %[v]") \
: [res]"=&r" (result), [val]"=&r" (val), [tmp]"=&r" (tmp), \
[v]"+Q" (v->counter) \
I'd have thought we could avoid this by splitting up the asms and using
a static key to dispatch them. For example, the really crude hacking
below resulted in reasonable code generation:
000000000000040 <will_atomic_add>:
40: 14000004 b 50 <will_atomic_add+0x10> // Patched with NOP once features are determined
44: 14000007 b 60 <will_atomic_add+0x20> // Patched with NOP if LSE
48: b820003f stadd w0, [x1]
4c: d65f03c0 ret
50: 90000002 adrp x2, 0 <cpu_hwcaps>
54: f9400042 ldr x2, [x2]
58: 721b005f tst w2, #0x20
5c: 54ffff61 b.ne 48 <will_atomic_add+0x8> // b.any
60: 14000002 b 68 <will_atomic_add+0x28>
64: d65f03c0 ret
68: f9800031 prfm pstl1strm, [x1]
6c: 885f7c22 ldxr w2, [x1]
70: 0b000042 add w2, w2, w0
74: 88037c22 stxr w3, w2, [x1]
78: 35ffffa3 cbnz w3, 6c <will_atomic_add+0x2c>
7c: 17fffffa b 64 <will_atomic_add+0x24>
So if we tweaked the existing code so that we can generate the LL/SC
versions either in a subsection or not depending on LSE, then we could
probably play this sort of trick using a static key.
What do you think?
Will
--->8
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 7e2ec64aa414..ec7bfa40ee85 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -369,7 +369,7 @@ static inline bool __cpus_have_const_cap(int num)
{
if (num >= ARM64_NCAPS)
return false;
- return static_branch_unlikely(&cpu_hwcap_keys[num]);
+ return static_branch_likely(&cpu_hwcap_keys[num]);
}
static inline bool cpus_have_cap(unsigned int num)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index f4fc1e0544b7..f44080ef7188 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -405,3 +405,36 @@ static int __init register_kernel_offset_dumper(void)
return 0;
}
__initcall(register_kernel_offset_dumper);
+
+static inline void ll_sc_atomic_add(int i, atomic_t *v)
+{
+ unsigned long tmp;
+ int result;
+
+ asm volatile(
+" b 3f\n"
+" .subsection 1\n"
+"3: prfm pstl1strm, %2\n"
+"1: ldxr %w0, %2\n"
+" add %w0, %w0, %w3\n"
+" stxr %w1, %w0, %2\n"
+" cbnz %w1, 1b\n"
+" b 4f\n"
+" .previous\n"
+"4:"
+ : "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
+ : "Ir" (i));
+}
+
+void will_atomic_add(int i, atomic_t *v)
+{
+ if (!cpus_have_const_cap(ARM64_HAS_LSE_ATOMICS)) {
+ ll_sc_atomic_add(i, v);
+ } else {
+ asm volatile("stadd %w[i], %[v]"
+ : [v] "+Q" (v->counter)
+ : [i] "r" (i));
+ }
+
+ return;
+}
next prev parent reply other threads:[~2018-11-27 19:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-13 23:39 [PATCH 0/3] arm64: use subsections instead of function calls for LL/SC fallbacks Ard Biesheuvel
2018-11-13 23:39 ` [PATCH 1/3] arm64/atomics: refactor LL/SC base asm templates Ard Biesheuvel
2018-11-13 23:39 ` [PATCH 2/3] arm64/atomics: use subsections for out of line LL/SC alternatives Ard Biesheuvel
2018-11-13 23:39 ` [PATCH 3/3] arm64/atomics: remove " Ard Biesheuvel
2018-11-27 19:30 ` Will Deacon [this message]
2018-11-28 9:16 ` [PATCH 0/3] arm64: use subsections instead of function calls for LL/SC fallbacks Ard Biesheuvel
2018-11-28 9:33 ` Ard Biesheuvel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181127193054.GF5641@arm.com \
--to=will.deacon@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.