All of lore.kernel.org
 help / color / mirror / Atom feed
From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2] arm64: lse: deal with clobbered x16 register after branch via PLT
Date: Fri, 26 Feb 2016 10:14:35 +0000	[thread overview]
Message-ID: <20160226101435.GC29125@arm.com> (raw)
In-Reply-To: <CAKv+Gu-AAu7LpbBMuEQSx7Kfg+gbD4eUf0yjP3oPwfx7V5ZiAQ@mail.gmail.com>

On Fri, Feb 26, 2016 at 11:04:38AM +0100, Ard Biesheuvel wrote:
> On 26 February 2016 at 11:03, Will Deacon <will.deacon@arm.com> wrote:
> > Hey Ard,
> >
> > On Thu, Feb 25, 2016 at 08:48:53PM +0100, Ard Biesheuvel wrote:
> >> The LSE atomics implementation uses runtime patching to patch in calls
> >> to out of line non-LSE atomics implementations on cores that lack hardware
> >> support for LSE. To avoid paying the overhead cost of a function call even
> >> if no call ends up being made, the bl instruction is kept invisible to the
> >> compiler, and the out of line implementations preserve all registers, not
> >> just the ones that they are required to preserve as per the AAPCS64.
> >>
> >> However, commit fd045f6cd98e ("arm64: add support for module PLTs") added
> >> support for routing branch instructions via veneers if the branch target
> >> offset exceeds the range of the ordinary relative branch instructions.
> >> Since this deals with jump and call instructions that are exposed to ELF
> >> relocations, the PLT code uses x16 to hold the address of the branch target
> >> when it performs an indirect branch-to-register, something which is
> >> explicitly allowed by the AAPCS64 (and ordinary compiler generated code
> >> does not expect register x16 or x17 to retain their values across a bl
> >> instruction).
> >>
> >> Since the lse runtime patched bl instructions don't adhere to the AAPCS64,
> >> they don't deal with this clobbering of registers x16 and x17. So add them
> >> to the clobber list of the asm() statements that perform the call
> >> instructions, and drop x16 and x17 from the list of registers that are
> >> caller saved in the out of line non-LSE implementations.
> >>
> >> In addition, since we have given these functions two scratch registers,
> >> they no longer need to stack/unstack temp registers, and the only remaining
> >> stack accesses are for the frame pointer. So pass -fomit-frame-pointer as
> >> well, this eliminates all stack accesses from these functions.
> >
> > [...]
> >
> >> diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
> >> index 197e06afbf71..7af60139f718 100644
> >> --- a/arch/arm64/include/asm/atomic_lse.h
> >> +++ b/arch/arm64/include/asm/atomic_lse.h
> >> @@ -36,7 +36,7 @@ static inline void atomic_andnot(int i, atomic_t *v)
> >>       "       stclr   %w[i], %[v]\n")
> >>       : [i] "+r" (w0), [v] "+Q" (v->counter)
> >>       : "r" (x1)
> >> -     : "x30");
> >> +     : "x16", "x17", "x30");
> >>  }
> >
> > The problem with this is that we potentially end up spilling/reloading
> > x16 and x17 even when we patch in the LSE atomic. That's why I opted for
> > the explicit stack accesses in my patch, so that they get overwritten
> > with NOPs when we switch to the LSE version.
> >
> 
> I see. But is that really an issue in practice?

You'd need to ask somebody with silicon ;)

> And the fact that the non-LSE code is a lot more efficient has to count for
> something, I suppose?
> (/me thinks enterprise, distro kernels etc etc)

Right, but with my patch, we also get the nice code in the out-of-line
case:

0000000000000000 <__ll_sc_atomic_add>:
   0:   f9800031        prfm    pstl1strm, [x1]
   4:   885f7c31        ldxr    w17, [x1]
   8:   0b000231        add     w17, w17, w0
   c:   88107c31        stxr    w16, w17, [x1]
  10:   35ffffb0        cbnz    w16, 4 <__ll_sc_atomic_add+0x4>
  14:   d65f03c0        ret

and in the inline LSE case we have NOPs instead of stack accesses.

Will

  reply	other threads:[~2016-02-26 10:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-25 19:48 [PATCH v2] arm64: lse: deal with clobbered x16 register after branch via PLT Ard Biesheuvel
2016-02-25 22:09 ` Ard Biesheuvel
2016-02-26 10:03 ` Will Deacon
2016-02-26 10:04   ` Ard Biesheuvel
2016-02-26 10:14     ` Will Deacon [this message]
2016-02-26 10:25       ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160226101435.GC29125@arm.com \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.