From mboxrd@z Thu Jan  1 00:00:00 1970
From: will.deacon@arm.com (Will Deacon)
Date: Tue, 7 Aug 2018 17:56:34 +0100
Subject: [RFC PATCH] arm64: lse: provide additional GPR to 'fetch' LL/SC
 fallback variants
In-Reply-To: <20180804095553.16358-1-ard.biesheuvel@linaro.org>
References: <20180804095553.16358-1-ard.biesheuvel@linaro.org>
Message-ID: <20180807165634.GA21809@arm.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

Hi Ard,

On Sat, Aug 04, 2018 at 11:55:53AM +0200, Ard Biesheuvel wrote:
> When support for ARMv8.2 LSE atomics is compiled in, the original
> LL/SC implementations are demoted to fallbacks that are invoked
> via function calls on systems that do not implement the new instructions.
> 
> Due to the fact that these function calls may occur from modules that
> are located further than 128 MB away from their targets in the core
> kernel, such calls may be indirected via PLT entries, which are permitted
> to clobber registers x16 and x17. Since we must assume that those
> registers do not retain their value across a function call to such a
> LL/SC fallback, and given that those function calls are hidden from the
> compiler entirely, we must assume that calling any of the LSE atomics
> routines clobbers x16 and x17 (and x30, for that matter).
> 
> Fortunately, there is an upside: having two scratch register available
> permits the compiler to emit many of the LL/SC fallbacks without having
> to preserve/restore registers on the stack, which would penalise the
> users of the LL/SC fallbacks even more, given that they are already
> putting up with the function call overhead.
> 
> However, the 'fetch' variants need an additional scratch register in
> order to execute without the need to preserve registers on the stack.
> 
> So let's give those routines an additional scratch register 'x15' when
> emitted as a LL/SC fallback, and ensure that the register is marked as
> clobbered at the associated LSE call sites (but not anywhere else)

Hmm, doesn't this mean that we'll needlessly spill/reload in the case that
we have LSE atomics in the CPU? I'd rather keep the LSE code as fast as
possible if ARM64_LSE_ATOMICS=y, and allow people to disable the config
option if they want to get the best performance for the LL/SC variants.

Will