[PATCH] arm64: build CONFIG_ARM64_LSE_ATOMICS without -ffixed- flags

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: ard.biesheuvel@linaro.org (Ard Biesheuvel)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] arm64: build CONFIG_ARM64_LSE_ATOMICS without -ffixed- flags
Date: Sat, 4 Aug 2018 09:21:05 +0200	[thread overview]
Message-ID: <CAKv+Gu-oWw9qa-U=XB8YPhUtSGcGwtEddafpsOY7XW9L6aPNdQ@mail.gmail.com> (raw)
In-Reply-To: <CANA+-vDMckt3WN+7E59vd_EKnUS4r_iuPNgyp2cs6-uWiAg3qA@mail.gmail.com>

On 4 August 2018 at 03:05, Tri Vo <trong@android.com> wrote:
> On Fri, Aug 3, 2018 at 12:34 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 3 August 2018 at 21:13, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>> On 3 August 2018 at 21:08, Tri Vo <trong@android.com> wrote:
>>>> On Fri, Aug 3, 2018 at 10:27 AM, Ard Biesheuvel
>>>> <ard.biesheuvel@linaro.org> wrote:
>>>>> On 3 August 2018 at 19:17, Nick Desaulniers <ndesaulniers@google.com> wrote:
>>>>>> On Fri, Aug 3, 2018 at 1:43 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>>>>>
>>>>>>> On 3 August 2018 at 01:17, Tri Vo <trong@android.com> wrote:
>>>>>>> > Remove -ffixed- compile flags as they are not required for correct
>>>>>>> > behavior of out-of-line ll/sc implementations of atomic functions.
>>>>>>> > Registers (except x0 which contains return value) only need to be
>>>>>>> > callee-saved, not necessarily fixed.
>>>>>>> >
>>>>>>> > For readability purposes, represent "callee-saves-all-registers" calling
>>>>>>> > convention with a combination of -fcall-used- and -fcall-saved- flags only.
>>>>>>> >
>>>>>>> > Signed-off-by: Tri Vo <trong@android.com>
>>>>>>>
>>>>>>> Wouldn't this permit the compiler to stack/unstack these registers in
>>>>>>> the implementations of the out of line LL/SC fallbacks.
>>>>
>>>> Yes. However, the number of registers that get stacked/unstacked
>>>> should remain the same since compiler should only preserve registers
>>>> that are used by callee (assuming callee saves all).
>>>>>>
>>>>>> I'm curious to understand more about how this LSE patching works, and
>>>>>> what the constraints exactly are.  Are there some docs I should read
>>>>>> that you can point me to?  I see
>>>>>> arch/arm64/include/asm/atomic_ll_sc.h, but are those are the in line
>>>>>> implementations?  Does that make arch/arm64/include/asm/atomic.h the
>>>>>> out of line definitions?  What are the restrictions around preventing
>>>>>> the compiler to push/pop those registers?
>>>>>>
>>>>>
>>>>> The idea is that LSE atomics can each be patched into a bl instruction
>>>>> that calls the appropriate LL/SC fallback, but without the overhead of
>>>>> an ordinary function call. (actually the bl is there by default, but
>>>>> patched into LSE ops on h/w that supports it)
>>>>>
>>>>> So that is why the 'bl' is in an asm() block, and why x30 is used as a
>>>>> temporary/clobber in the LSE sequences: the function call is
>>>>> completely hidden from the compiler. Obviously, for performance, you
>>>>> want to prevent these fallbacks from touching the stack.
>>>>
>>>> The function call is hidden from the compiler on the caller side
>>>> using 'bl' in the asm block. I suspect that is done to make
>>>> alternative instructions the same byte length (constraints of dynamic
>>>> patching). However, on the callee side, LL/SC fallbacks
>>>> (in atomic_ll_sc.h) are implemented as C functions. So the compiler
>>>> still handles the function call details and register allocation
>>>> within those functions.
>>>>
>>>> Since the caller doesn't save any registers, callee stacks/unstacks
>>>> all the registers it uses. So, for example, LL/SC fallback
>>>> implementation of atomic_add preserves 2 registers with or without
>>>> this change.
>>>>
>>>
>>> Did you look at the disassembly?
>
> Yes, I did.

0000000000000000 <__ll_sc_atomic_add>:
   0: f9800031 prfm pstl1strm, [x1]
   4: 885f7c31 ldxr w17, [x1]
   8: 0b000231 add w17, w17, w0
   c: 88107c31 stxr w16, w17, [x1]
  10: 35ffffb0 cbnz w16, 4 <__ll_sc_atomic_add+0x4>
  14: d65f03c0 ret

Now where does it preserve 2 registers?

>>
>> Actually, I only just spotted that you only modify x1 .. x7 and that
>> the other registers already use -fcall-saved-xNN
>>
>> Will should comment on why the distinction exists in the first place.
>> Note that a strict reading of the GCC man page suggests that using
>> -fcall-saved-x8 is a bad idea, given that it is the indirect return
>> register. However, given the limited scope of the option here, that
>> probably does not matter in practice.
>>
>> So I guess you need this patch for Clang? Would be better to mention
>> that in the commit log if this is the case.
>
> Yes, I am planning to fold whatever compiler support is needed for LSE
> atomics into "callee-saves-all-registers" calling convention (without
> ffixed flags). I'll then implement that calling convention in Clang.
> I'll mention that in the commit message.

Not *all* registers. x16 and x17 are excluded in this case, and I am
not sure whether you would to encode that in an option.

What is wrong with using -ffixed or -fcall-saved?

next prev parent reply	other threads:[~2018-08-04  7:21 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-02 23:17 [PATCH] arm64: build CONFIG_ARM64_LSE_ATOMICS without -ffixed- flags Tri Vo
2018-08-03  8:43 ` Ard Biesheuvel
2018-08-03 17:17   ` Nick Desaulniers
2018-08-03 17:27     ` Ard Biesheuvel
2018-08-03 19:08       ` Tri Vo
2018-08-03 19:13         ` Ard Biesheuvel
2018-08-03 19:34           ` Ard Biesheuvel
2018-08-04  1:05             ` Tri Vo
2018-08-04  7:21               ` Ard Biesheuvel [this message]
2018-08-06 21:54                 ` Tri Vo
2018-08-06 22:00                   ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKv+Gu-oWw9qa-U=XB8YPhUtSGcGwtEddafpsOY7XW9L6aPNdQ@mail.gmail.com' \
    --to=ard.biesheuvel@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).