[PATCH 0/3] arm64: use subsections instead of function calls for LL/SC fallbacks

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 0/3] arm64: use subsections instead of function calls for LL/SC fallbacks
Date: Tue, 27 Nov 2018 19:30:55 +0000	[thread overview]
Message-ID: <20181127193054.GF5641@arm.com> (raw)
In-Reply-To: <20181113233923.20098-1-ard.biesheuvel@linaro.org>

Hi Ard,

On Tue, Nov 13, 2018 at 03:39:20PM -0800, Ard Biesheuvel wrote:
> Refactor the LL/SC atomics code so we can emit the LL/SC fallbacks for the
> LSE atomics as subsections that get instantiated at each call site rather
> than as out of line functions that get called from inline asm (without the
> awareness of the compiler)
> 
> This should allow slightly better LSE code, and removes stack spilling and
> potential PLT indirection for the LL/SC fallbacks.

Thanks, I much prefer using subsections to the current approach. However,
a downside of your patches is that the some of the asm operands passed
to the LSE implementation are redundant, for example, in the fetch-ops:

"	" #lse_op #ac #rl " %w[i], %w[res], %[v]")			\
	: [res]"=&r" (result), [val]"=&r" (val), [tmp]"=&r" (tmp),	\
	  [v]"+Q" (v->counter)						\

I'd have thought we could avoid this by splitting up the asms and using
a static key to dispatch them. For example, the really crude hacking
below resulted in reasonable code generation:

000000000000040 <will_atomic_add>:
  40:   14000004        b       50 <will_atomic_add+0x10>	// Patched with NOP once features are determined
  44:   14000007        b       60 <will_atomic_add+0x20>	// Patched with NOP if LSE
  48:   b820003f        stadd   w0, [x1]
  4c:   d65f03c0        ret
  50:   90000002        adrp    x2, 0 <cpu_hwcaps>
  54:   f9400042        ldr     x2, [x2]
  58:   721b005f        tst     w2, #0x20
  5c:   54ffff61        b.ne    48 <will_atomic_add+0x8>  // b.any
  60:   14000002        b       68 <will_atomic_add+0x28>
  64:   d65f03c0        ret
  68:   f9800031        prfm    pstl1strm, [x1]
  6c:   885f7c22        ldxr    w2, [x1]
  70:   0b000042        add     w2, w2, w0
  74:   88037c22        stxr    w3, w2, [x1]
  78:   35ffffa3        cbnz    w3, 6c <will_atomic_add+0x2c>
  7c:   17fffffa        b       64 <will_atomic_add+0x24>

So if we tweaked the existing code so that we can generate the LL/SC
versions either in a subsection or not depending on LSE, then we could
probably play this sort of trick using a static key.

What do you think?

Will

--->8

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 7e2ec64aa414..ec7bfa40ee85 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -369,7 +369,7 @@ static inline bool __cpus_have_const_cap(int num)
 {
 	if (num >= ARM64_NCAPS)
 		return false;
-	return static_branch_unlikely(&cpu_hwcap_keys[num]);
+	return static_branch_likely(&cpu_hwcap_keys[num]);
 }
 
 static inline bool cpus_have_cap(unsigned int num)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index f4fc1e0544b7..f44080ef7188 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -405,3 +405,36 @@ static int __init register_kernel_offset_dumper(void)
 	return 0;
 }
 __initcall(register_kernel_offset_dumper);
+
+static inline void ll_sc_atomic_add(int i, atomic_t *v)
+{
+	unsigned long tmp;
+	int result;
+
+	asm volatile(
+"	b	3f\n"
+"	.subsection 1\n"
+"3:	prfm	pstl1strm, %2\n"
+"1:	ldxr	%w0, %2\n"
+"	add	%w0, %w0, %w3\n"
+"	stxr	%w1, %w0, %2\n"
+"	cbnz	%w1, 1b\n"
+"	b	4f\n"
+"	.previous\n"
+"4:"
+	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
+	: "Ir" (i));
+}
+
+void will_atomic_add(int i, atomic_t *v)
+{
+	if (!cpus_have_const_cap(ARM64_HAS_LSE_ATOMICS)) {
+		ll_sc_atomic_add(i, v);
+	} else {
+		asm volatile("stadd	%w[i], %[v]"
+		: [v] "+Q" (v->counter)
+		: [i] "r" (i));
+	}
+
+	return;
+}

next prev parent reply	other threads:[~2018-11-27 19:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-13 23:39 [PATCH 0/3] arm64: use subsections instead of function calls for LL/SC fallbacks Ard Biesheuvel
2018-11-13 23:39 ` [PATCH 1/3] arm64/atomics: refactor LL/SC base asm templates Ard Biesheuvel
2018-11-13 23:39 ` [PATCH 2/3] arm64/atomics: use subsections for out of line LL/SC alternatives Ard Biesheuvel
2018-11-13 23:39 ` [PATCH 3/3] arm64/atomics: remove " Ard Biesheuvel
2018-11-27 19:30 ` Will Deacon [this message]
2018-11-28  9:16   ` [PATCH 0/3] arm64: use subsections instead of function calls for LL/SC fallbacks Ard Biesheuvel
2018-11-28  9:33     ` Ard Biesheuvel

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:7e2ec64aa41 dfblob:ec7bfa40ee8 dfblob:f4fc1e0544b
dfblob:f44080ef718 )
 OR (
bs:"[PATCH 0/3] arm64: use subsections instead of function calls for LL/SC fallbacks" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181127193054.GF5641@arm.com \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).