* compiler code model for the kernel
@ 2018-11-19 17:41 Ard Biesheuvel
2018-11-19 17:48 ` Nick Desaulniers
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2018-11-19 17:41 UTC (permalink / raw)
To: linux-arm-kernel
Hello all,
Some of us (on cc) have discussed this a bit on various occasions, so
perhaps it's time to sit down and do something about it :-)
The Clang work that Nick is involved in has made explicit some
assumptions that we are currently making in the kernel with respect to
the behavior of GCC, and it would be good to formalise this so we can
keep relying on it in the future, and to clarify to other compiler
developers what is needed.
GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
introduce the same for AArch64. Under this model, we should be able to
make the following assumptions:
- symbol references are emitted as ADRP/ADD pairs
- no absolute references in generated code like jump tables etc (like
-fpic/-fpie) [*]
- no shared library semantics (no GOT indirections to support symbol
preemption, or to reduce the CoW footprint and/or avoid text
relocations)
- resulting objects can be linked in -pie mode by ld.bfd
Another thing that came up is that we currently rely on the stack
pointer never to assume a value that is not 16-byte aligned, even
transiently.
Other things I've missed?
[*] This is only strictly required in parts of the code that may
execute at a different offset than the linked offset, even after
processing dynamic relocations at boot time (e.g., KVM hyp code
running in a different exception level) but avoiding those altogether
is reasonable. Note that GCC does the right thing for us here already,
but Clang current;y needs -fno-jump-tables to build the KVM hyp code.
^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel
2018-11-19 17:41 compiler code model for the kernel Ard Biesheuvel
@ 2018-11-19 17:48 ` Nick Desaulniers
2018-11-22 11:53 ` Dave Martin
2018-11-19 23:54 ` Nick Desaulniers
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Nick Desaulniers @ 2018-11-19 17:48 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> Hello all,
>
> Some of us (on cc) have discussed this a bit on various occasions, so
> perhaps it's time to sit down and do something about it :-)
>
> The Clang work that Nick is involved in has made explicit some
> assumptions that we are currently making in the kernel with respect to
> the behavior of GCC, and it would be good to formalise this so we can
> keep relying on it in the future, and to clarify to other compiler
> developers what is needed.
>
> GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
> introduce the same for AArch64. Under this model, we should be able to
> make the following assumptions:
> - symbol references are emitted as ADRP/ADD pairs
> - no absolute references in generated code like jump tables etc (like
> -fpic/-fpie) [*]
> - no shared library semantics (no GOT indirections to support symbol
> preemption, or to reduce the CoW footprint and/or avoid text
> relocations)
> - resulting objects can be linked in -pie mode by ld.bfd
>
> Another thing that came up is that we currently rely on the stack
> pointer never to assume a value that is not 16-byte aligned, even
> transiently.
>
> Other things I've missed?
>
> [*] This is only strictly required in parts of the code that may
> execute at a different offset than the linked offset, even after
> processing dynamic relocations at boot time (e.g., KVM hyp code
> running in a different exception level) but avoiding those altogether
> is reasonable. Note that GCC does the right thing for us here already,
> but Clang current;y needs -fno-jump-tables to build the KVM hyp code.
Just following up on the KVM hyp code. At plumbers Mark and I took a
look at that case and were no longer able to reproduce. Mark was able
to boot a Clang compiled kernel at EL2. IIRC, the code in question
had been changed. Note: Clang still probably does the wrong thing for
jump tables, so we should pursue this code model, I just don't think
this previous example is still the case. Since it could always come
back, we should fix the issue on the compiler side though.
--
Thanks,
~Nick Desaulniers
^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel
2018-11-19 17:41 compiler code model for the kernel Ard Biesheuvel
2018-11-19 17:48 ` Nick Desaulniers
@ 2018-11-19 23:54 ` Nick Desaulniers
2018-11-20 11:20 ` Peter Smith
2018-11-20 13:23 ` Will Deacon
2018-11-20 13:25 ` Ramana Radhakrishnan
3 siblings, 1 reply; 12+ messages in thread
From: Nick Desaulniers @ 2018-11-19 23:54 UTC (permalink / raw)
To: linux-arm-kernel
+ some more folks to take a look
On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> Hello all,
>
> Some of us (on cc) have discussed this a bit on various occasions, so
> perhaps it's time to sit down and do something about it :-)
>
> The Clang work that Nick is involved in has made explicit some
> assumptions that we are currently making in the kernel with respect to
> the behavior of GCC, and it would be good to formalise this so we can
> keep relying on it in the future, and to clarify to other compiler
> developers what is needed.
>
> GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
> introduce the same for AArch64. Under this model, we should be able to
> make the following assumptions:
> - symbol references are emitted as ADRP/ADD pairs
> - no absolute references in generated code like jump tables etc (like
> -fpic/-fpie) [*]
> - no shared library semantics (no GOT indirections to support symbol
> preemption, or to reduce the CoW footprint and/or avoid text
> relocations)
> - resulting objects can be linked in -pie mode by ld.bfd
>
> Another thing that came up is that we currently rely on the stack
> pointer never to assume a value that is not 16-byte aligned, even
> transiently.
>
> Other things I've missed?
>
> [*] This is only strictly required in parts of the code that may
> execute at a different offset than the linked offset, even after
> processing dynamic relocations at boot time (e.g., KVM hyp code
> running in a different exception level) but avoiding those altogether
> is reasonable. Note that GCC does the right thing for us here already,
> but Clang current;y needs -fno-jump-tables to build the KVM hyp code.
--
Thanks,
~Nick Desaulniers
^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel
2018-11-19 23:54 ` Nick Desaulniers
@ 2018-11-20 11:20 ` Peter Smith
2018-11-20 13:16 ` Ard Biesheuvel
0 siblings, 1 reply; 12+ messages in thread
From: Peter Smith @ 2018-11-20 11:20 UTC (permalink / raw)
To: linux-arm-kernel
________________________________________
From: Nick Desaulniers <ndesaulniers@google.com>
Sent: 19 November 2018 23:54
To: Ard Biesheuvel
Cc: Will Deacon; Catalin Marinas; Ramana Radhakrishnan; Linux ARM; Mark Rutland; Arnd Bergmann; Marc Zyngier; Peter Smith; Kristof Beyls; Stephen Hines
Subject: Re: compiler code model for the kernel
+ some more folks to take a look
On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> Hello all,
>
> Some of us (on cc) have discussed this a bit on various occasions, so
> perhaps it's time to sit down and do something about it :-)
>
> The Clang work that Nick is involved in has made explicit some
> assumptions that we are currently making in the kernel with respect to
> the behavior of GCC, and it would be good to formalise this so we can
> keep relying on it in the future, and to clarify to other compiler
> developers what is needed.
>
> GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
> introduce the same for AArch64. Under this model, we should be able to
> make the following assumptions:
> - symbol references are emitted as ADRP/ADD pairs
> - no absolute references in generated code like jump tables etc (like
> -fpic/-fpie) [*]
> - no shared library semantics (no GOT indirections to support symbol
> preemption, or to reduce the CoW footprint and/or avoid text
> relocations)
> - resulting objects can be linked in -pie mode by ld.bfd
>
> Another thing that came up is that we currently rely on the stack
> pointer never to assume a value that is not 16-byte aligned, even
> transiently.
>
> Other things I've missed?
>
> [*] This is only strictly required in parts of the code that may
> execute at a different offset than the linked offset, even after
> processing dynamic relocations at boot time (e.g., KVM hyp code
> running in a different exception level) but avoiding those altogether
> is reasonable. Note that GCC does the right thing for us here already,
> but Clang current;y needs -fno-jump-tables to build the KVM hyp code.
Not got a lot to add at the moment. Some observations from the clang/llvm side.
Clang already has -mcmodel=kernel for fuchsia targets so there is some prior art. My understanding is that it uses the small code-model with some small tweaks. There will probably need to be some renaming in the back-end to distinguish between fuchsia kernel code model and linux kernel code model.
I think that the assumptions above should be describable without too much difficulty. ADRP/ADD is covered by the small code model. I think the existing -fpie where every global has hidden visibility and hence cannot be found in a shared object will produce code with no GOT indirections.
My limited understanding of the frame lowering code suggests that clang maintains a minimum 16-byte alignment of the stack pointer at all times. It would need to be looked over by an expert to make sure though.
Peter
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel
2018-11-20 11:20 ` Peter Smith
@ 2018-11-20 13:16 ` Ard Biesheuvel
0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2018-11-20 13:16 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 20 Nov 2018 at 12:20, Peter Smith <Peter.Smith@arm.com> wrote:
>
>
> ________________________________________
> From: Nick Desaulniers <ndesaulniers@google.com>
> Sent: 19 November 2018 23:54
> To: Ard Biesheuvel
> Cc: Will Deacon; Catalin Marinas; Ramana Radhakrishnan; Linux ARM; Mark Rutland; Arnd Bergmann; Marc Zyngier; Peter Smith; Kristof Beyls; Stephen Hines
> Subject: Re: compiler code model for the kernel
>
> + some more folks to take a look
>
> On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> >
> > Hello all,
> >
> > Some of us (on cc) have discussed this a bit on various occasions, so
> > perhaps it's time to sit down and do something about it :-)
> >
> > The Clang work that Nick is involved in has made explicit some
> > assumptions that we are currently making in the kernel with respect to
> > the behavior of GCC, and it would be good to formalise this so we can
> > keep relying on it in the future, and to clarify to other compiler
> > developers what is needed.
> >
> > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
> > introduce the same for AArch64. Under this model, we should be able to
> > make the following assumptions:
> > - symbol references are emitted as ADRP/ADD pairs
> > - no absolute references in generated code like jump tables etc (like
> > -fpic/-fpie) [*]
> > - no shared library semantics (no GOT indirections to support symbol
> > preemption, or to reduce the CoW footprint and/or avoid text
> > relocations)
> > - resulting objects can be linked in -pie mode by ld.bfd
> >
> > Another thing that came up is that we currently rely on the stack
> > pointer never to assume a value that is not 16-byte aligned, even
> > transiently.
> >
> > Other things I've missed?
> >
> > [*] This is only strictly required in parts of the code that may
> > execute at a different offset than the linked offset, even after
> > processing dynamic relocations at boot time (e.g., KVM hyp code
> > running in a different exception level) but avoiding those altogether
> > is reasonable. Note that GCC does the right thing for us here already,
> > but Clang current;y needs -fno-jump-tables to build the KVM hyp code.
>
> Not got a lot to add at the moment. Some observations from the clang/llvm side.
>
> Clang already has -mcmodel=kernel for fuchsia targets so there is some prior art. My understanding is that it uses the small code-model with some small tweaks. There will probably need to be some renaming in the back-end to distinguish between fuchsia kernel code model and linux kernel code model.
>
> I think that the assumptions above should be describable without too much difficulty. ADRP/ADD is covered by the small code model. I think the existing -fpie where every global has hidden visibility and hence cannot be found in a shared object will produce code with no GOT indirections.
>
Indeed. On GCC, using the visibility 'hidden' pragma (which affects
extern declarations as well as definitions, unlike the -fvisibility
command line option) with -fpie results in a working kernel image
without any artificial absolute references like GOT entries.
> My limited understanding of the frame lowering code suggests that clang maintains a minimum 16-byte alignment of the stack pointer at all times. It would need to be looked over by an expert to make sure though.
>
Excellent. That is basically the same situation as GCC: the backend
only ever adjusts the stack pointer in 16 byte increments, but it
would be good to have agreement that it should remain that way.
^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel
2018-11-19 17:41 compiler code model for the kernel Ard Biesheuvel
2018-11-19 17:48 ` Nick Desaulniers
2018-11-19 23:54 ` Nick Desaulniers
@ 2018-11-20 13:23 ` Will Deacon
2018-11-20 13:49 ` Ard Biesheuvel
2018-11-20 13:25 ` Ramana Radhakrishnan
3 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2018-11-20 13:23 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Nov 19, 2018 at 09:41:54AM -0800, Ard Biesheuvel wrote:
> Some of us (on cc) have discussed this a bit on various occasions, so
> perhaps it's time to sit down and do something about it :-)
>
> The Clang work that Nick is involved in has made explicit some
> assumptions that we are currently making in the kernel with respect to
> the behavior of GCC, and it would be good to formalise this so we can
> keep relying on it in the future, and to clarify to other compiler
> developers what is needed.
>
> GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
> introduce the same for AArch64. Under this model, we should be able to
> make the following assumptions:
> - symbol references are emitted as ADRP/ADD pairs
> - no absolute references in generated code like jump tables etc (like
> -fpic/-fpie) [*]
> - no shared library semantics (no GOT indirections to support symbol
> preemption, or to reduce the CoW footprint and/or avoid text
> relocations)
> - resulting objects can be linked in -pie mode by ld.bfd
>
> Another thing that came up is that we currently rely on the stack
> pointer never to assume a value that is not 16-byte aligned, even
> transiently.
>
> Other things I've missed?
It would be great if we could disable idiom recognition by the compiler as
part of this option, since this ends up with us failing to inline code
because the compiler ends up wanting to use vector registers but can't, so
replaces the idiom with a call to a libgcc function which we're forced to
implement out-of-line.
https://bugzilla.kernel.org/show_bug.cgi?id=200671
Another desirable feature would be having a way to force the assembler
to accept arbitrary instructions, rather than have to use .arch_extension
all over the place.
Will
^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel
2018-11-19 17:41 compiler code model for the kernel Ard Biesheuvel
` (2 preceding siblings ...)
2018-11-20 13:23 ` Will Deacon
@ 2018-11-20 13:25 ` Ramana Radhakrishnan
2018-11-20 13:37 ` Ard Biesheuvel
3 siblings, 1 reply; 12+ messages in thread
From: Ramana Radhakrishnan @ 2018-11-20 13:25 UTC (permalink / raw)
To: linux-arm-kernel
Hi Ard,
CC'ing Richard E and James Greenhalgh as well on this thread.
On 19/11/2018 17:41, Ard Biesheuvel wrote:
> Hello all,
>
> Some of us (on cc) have discussed this a bit on various occasions, so
> perhaps it's time to sit down and do something about it :-)
>
> The Clang work that Nick is involved in has made explicit some
> assumptions that we are currently making in the kernel with respect to
> the behavior of GCC, and it would be good to formalise this so we can
> keep relying on it in the future, and to clarify to other compiler
> developers what is needed.
>
> GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
> introduce the same for AArch64. Under this model, we should be able to
> make the following assumptions:
I would prefer this is -mcmodel=kernel-small so that it's explicit that
this is like the small memory model.
On AArch64 this is what GCC calls the small absolute model in code i.e.
really all global data accesses from code are really PC relative and
thus position independent. The problem is data initializers but I
understand you are able to deal with it by producing relative
relocations by using hidden attributes.
Thus I wonder if we should go as far as implying -fvisibility=hidden in
the compiler for all global data. Is that what you do today ?
In-fact the patch stack I am carrying for the sp_el0 based canary toyed
with the idea of adding such a flag and only putting out the symbol name
for the offset as I really dont' see anyone but the kernel using that
kind of code generation.
> - symbol references are emitted as ADRP/ADD pairs
Ok.
> - no absolute references in generated code like jump tables etc (like
> -fpic/-fpie) [*]
Ok. -fPIC/pie with default hidden visibility just achieves the same
thing with -mcmodel=small ?
> - no shared library semantics (no GOT indirections to support symbol
> preemption, or to reduce the CoW footprint and/or avoid text
> relocations)
Ok.
> - resulting objects can be linked in -pie mode by ld.bfd
Ok, but never producing a GOT. Further you probably want to have checks
on the final binary once a build is completed to ensure this ?
>
> Another thing that came up is that we currently rely on the stack
> pointer never to assume a value that is not 16-byte aligned, even
> transiently.
I think we've discussed that to great detail as far as GCC is concerned.
I would prefer it is documented in the kernel documentation as a
requirement for this standard.
TL;DR
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/611445.html
regards
Ramana
>
> Other things I've missed?
>
> [*] This is only strictly required in parts of the code that may
> execute at a different offset than the linked offset, even after
> processing dynamic relocations at boot time (e.g., KVM hyp code
> running in a different exception level) but avoiding those altogether
> is reasonable. Note that GCC does the right thing for us here already,
> but Clang current;y needs -fno-jump-tables to build the KVM hyp code.
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel
2018-11-20 13:25 ` Ramana Radhakrishnan
@ 2018-11-20 13:37 ` Ard Biesheuvel
0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2018-11-20 13:37 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 20 Nov 2018 at 14:25, Ramana Radhakrishnan
<Ramana.Radhakrishnan@arm.com> wrote:
>
> Hi Ard,
>
> CC'ing Richard E and James Greenhalgh as well on this thread.
>
>
> On 19/11/2018 17:41, Ard Biesheuvel wrote:
> > Hello all,
> >
> > Some of us (on cc) have discussed this a bit on various occasions, so
> > perhaps it's time to sit down and do something about it :-)
> >
> > The Clang work that Nick is involved in has made explicit some
> > assumptions that we are currently making in the kernel with respect to
> > the behavior of GCC, and it would be good to formalise this so we can
> > keep relying on it in the future, and to clarify to other compiler
> > developers what is needed.
> >
> > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
> > introduce the same for AArch64. Under this model, we should be able to
> > make the following assumptions:
>
> I would prefer this is -mcmodel=kernel-small so that it's explicit that
> this is like the small memory model.
>
I don't have a strong preference regarding the name, but I will point
out that other arches have a 'kernel' code model already, also on
Clang (as Peter reports)
> On AArch64 this is what GCC calls the small absolute model in code i.e.
> really all global data accesses from code are really PC relative and
> thus position independent. The problem is data initializers but I
> understand you are able to deal with it by producing relative
> relocations by using hidden attributes.
>
> Thus I wonder if we should go as far as implying -fvisibility=hidden in
> the compiler for all global data. Is that what you do today ?
>
Fortunately, we don't have to do this currently since the non-PIC
small model already gives us what we need.
-fivisibility=hidden isn't quite sufficient though when building with
-fpic, only the visibility pragma affects extern declarations as well
as definition.
> In-fact the patch stack I am carrying for the sp_el0 based canary toyed
> with the idea of adding such a flag and only putting out the symbol name
> for the offset as I really dont' see anyone but the kernel using that
> kind of code generation.
>
> > - symbol references are emitted as ADRP/ADD pairs
>
> Ok.
>
> > - no absolute references in generated code like jump tables etc (like
> > -fpic/-fpie) [*]
>
> Ok. -fPIC/pie with default hidden visibility just achieves the same
> thing with -mcmodel=small ?
>
Yes, but only if you use the pragma
> > - no shared library semantics (no GOT indirections to support symbol
> > preemption, or to reduce the CoW footprint and/or avoid text
> > relocations)
>
> Ok.
>
> > - resulting objects can be linked in -pie mode by ld.bfd
>
> Ok, but never producing a GOT. Further you probably want to have checks
> on the final binary once a build is completed to ensure this ?
>
In general, having no GOT whatsoever is not a hard requirement. It is
a hard requirement for /some/ pieces of the code (EFI stub, KVM hyp
code) that we have no absolute references whatsoever, but in general,
we simply fix up the R_AARCH64_RELATIVE relocations wherever we
encounter them. Having all symbol references go through the GOT just
bloats the binary needlessly given that we don't require the shared
library semantics.
> >
> > Another thing that came up is that we currently rely on the stack
> > pointer never to assume a value that is not 16-byte aligned, even
> > transiently.
>
> I think we've discussed that to great detail as far as GCC is concerned.
> I would prefer it is documented in the kernel documentation as a
> requirement for this standard.
>
>
> TL;DR
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/611445.html
>
>
> regards
> Ramana
>
> >
> > Other things I've missed?
> >
> > [*] This is only strictly required in parts of the code that may
> > execute at a different offset than the linked offset, even after
> > processing dynamic relocations at boot time (e.g., KVM hyp code
> > running in a different exception level) but avoiding those altogether
> > is reasonable. Note that GCC does the right thing for us here already,
> > but Clang current;y needs -fno-jump-tables to build the KVM hyp code.
> >
>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel
2018-11-20 13:23 ` Will Deacon
@ 2018-11-20 13:49 ` Ard Biesheuvel
2018-11-20 17:02 ` Ramana Radhakrishnan
0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2018-11-20 13:49 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 20 Nov 2018 at 14:23, Will Deacon <will.deacon@arm.com> wrote:
>
> On Mon, Nov 19, 2018 at 09:41:54AM -0800, Ard Biesheuvel wrote:
> > Some of us (on cc) have discussed this a bit on various occasions, so
> > perhaps it's time to sit down and do something about it :-)
> >
> > The Clang work that Nick is involved in has made explicit some
> > assumptions that we are currently making in the kernel with respect to
> > the behavior of GCC, and it would be good to formalise this so we can
> > keep relying on it in the future, and to clarify to other compiler
> > developers what is needed.
> >
> > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
> > introduce the same for AArch64. Under this model, we should be able to
> > make the following assumptions:
> > - symbol references are emitted as ADRP/ADD pairs
> > - no absolute references in generated code like jump tables etc (like
> > -fpic/-fpie) [*]
> > - no shared library semantics (no GOT indirections to support symbol
> > preemption, or to reduce the CoW footprint and/or avoid text
> > relocations)
> > - resulting objects can be linked in -pie mode by ld.bfd
> >
> > Another thing that came up is that we currently rely on the stack
> > pointer never to assume a value that is not 16-byte aligned, even
> > transiently.
> >
> > Other things I've missed?
>
> It would be great if we could disable idiom recognition by the compiler as
> part of this option, since this ends up with us failing to inline code
> because the compiler ends up wanting to use vector registers but can't, so
> replaces the idiom with a call to a libgcc function which we're forced to
> implement out-of-line.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=200671
>
This is another thing that we may have been getting away with by
accident: currently, we don't include libgcc at all, and since we
build the kernel with -mgeneral-regs-only, there is no way we could
without rebuilding it (unless libgcc is somehow guaranteed not to use
FP registers)
So I think idiom recognition is fine in general: I guess this is also
the thing that replaces memset() calls with stp xzr,xzr,[]
instructions? It is the unanticipated libgcc dependency that is a
problem, and I guess this may trigger in other parts of the compiler
as well.
Note that we do provide the 'intrinsic' memcpy() and memset()
routines, simply because they have the same name in the kernel (and
they are libc not libgcc).
> Another desirable feature would be having a way to force the assembler
> to accept arbitrary instructions, rather than have to use .arch_extension
> all over the place.
>
IIRC this is a recent change in GCC where it repeats the .arch/.cpu
directive for each function?
^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel
2018-11-20 13:49 ` Ard Biesheuvel
@ 2018-11-20 17:02 ` Ramana Radhakrishnan
0 siblings, 0 replies; 12+ messages in thread
From: Ramana Radhakrishnan @ 2018-11-20 17:02 UTC (permalink / raw)
To: linux-arm-kernel
On 20/11/2018 13:49, Ard Biesheuvel wrote:
>>
>> It would be great if we could disable idiom recognition by the compiler as
>> part of this option, since this ends up with us failing to inline code
>> because the compiler ends up wanting to use vector registers but can't, so
>> replaces the idiom with a call to a libgcc function which we're forced to
>> implement out-of-line.
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=200671
>>
I think idiom recognition is fine. Is this still an issue, I thought the
commit on 13th November "fixed" it ?
We shouldn't look to turn off every single optimization feature, these
are put into the compiler and we should look to make this work better
between the kernel and the toolchain in general and not just punt
always. Case in point, see the work that's being attempted with the ipa
optimizations and live patching.
I don't like the idea of pushing in such things into a -mcmodel option
as it is really not sustainable. I think those sort of errors should be
fixed generically and not really as part of this backend option.
>
> This is another thing that we may have been getting away with by
> accident: currently, we don't include libgcc at all, and since we
> build the kernel with -mgeneral-regs-only, there is no way we could
> without rebuilding it (unless libgcc is somehow guaranteed not to use
> FP registers)
>
> So I think idiom recognition is fine in general: I guess this is also
> the thing that replaces memset() calls with stp xzr,xzr,[]
> instructions? It is the unanticipated libgcc dependency that is a
> problem, and I guess this may trigger in other parts of the compiler
> as well.
>
> Note that we do provide the 'intrinsic' memcpy() and memset()
> routines, simply because they have the same name in the kernel (and
> they are libc not libgcc).
>
>
>> Another desirable feature would be having a way to force the assembler
>> to accept arbitrary instructions, rather than have to use .arch_extension
>> all over the place.
>>
>
> IIRC this is a recent change in GCC where it repeats the .arch/.cpu
> directive for each function?
>
That is required IIRC for LTO and target function attributes to work
reliably. You could have different functions compiled at different
architecture levels with __attribute__((target (arch=)) .
And part of it is due to the static checking that gas gives you with
respect to the architecture.
The .arch_extension is really an arm (AArch32) feature, on AArch64
architectural features should really get added to the .arch string
separated by a `+'.
Ramana
^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel
2018-11-19 17:48 ` Nick Desaulniers
@ 2018-11-22 11:53 ` Dave Martin
2018-11-22 12:06 ` Ard Biesheuvel
0 siblings, 1 reply; 12+ messages in thread
From: Dave Martin @ 2018-11-22 11:53 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Nov 19, 2018 at 09:48:16AM -0800, Nick Desaulniers wrote:
> On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> >
> > Hello all,
> >
> > Some of us (on cc) have discussed this a bit on various occasions, so
> > perhaps it's time to sit down and do something about it :-)
> >
> > The Clang work that Nick is involved in has made explicit some
> > assumptions that we are currently making in the kernel with respect to
> > the behavior of GCC, and it would be good to formalise this so we can
> > keep relying on it in the future, and to clarify to other compiler
> > developers what is needed.
> >
> > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
> > introduce the same for AArch64. Under this model, we should be able to
> > make the following assumptions:
> > - symbol references are emitted as ADRP/ADD pairs
> > - no absolute references in generated code like jump tables etc (like
> > -fpic/-fpie) [*]
> > - no shared library semantics (no GOT indirections to support symbol
> > preemption, or to reduce the CoW footprint and/or avoid text
> > relocations)
> > - resulting objects can be linked in -pie mode by ld.bfd
> >
> > Another thing that came up is that we currently rely on the stack
> > pointer never to assume a value that is not 16-byte aligned, even
> > transiently.
> >
> > Other things I've missed?
> >
> > [*] This is only strictly required in parts of the code that may
> > execute at a different offset than the linked offset, even after
> > processing dynamic relocations at boot time (e.g., KVM hyp code
> > running in a different exception level) but avoiding those altogether
> > is reasonable. Note that GCC does the right thing for us here already,
> > but Clang current;y needs -fno-jump-tables to build the KVM hyp code.
>
> Just following up on the KVM hyp code. At plumbers Mark and I took a
> look at that case and were no longer able to reproduce. Mark was able
> to boot a Clang compiled kernel at EL2. IIRC, the code in question
> had been changed. Note: Clang still probably does the wrong thing for
> jump tables, so we should pursue this code model, I just don't think
> this previous example is still the case. Since it could always come
> back, we should fix the issue on the compiler side though.
We should absolutely document it as a requirement either way,
and the putative gcc -mcmodel=kernel for arm64 should guarantee it.
Is this a requirement for non-hyp code though?
KASLR doesn't strictly need it, since we could keep the relocation info
around and fix the kernel up at boot time (though obviously life is
easier if we don't need to do this).
Currently the lack of relocations seems to be an accidental side-effect
of -mcmodel=small. It's not clear to me whether we rely on this because
it's advantageous, or whether we rely on it by accident because it
happens to be the default.
My question is: should there be just one kernel memory model, or do we
want position-independent (for hyp-like stuff) and non-position-
independent variants (for everything else)?
Cheers
---Dave
^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel
2018-11-22 11:53 ` Dave Martin
@ 2018-11-22 12:06 ` Ard Biesheuvel
0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2018-11-22 12:06 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 22 Nov 2018 at 12:53, Dave Martin <Dave.Martin@arm.com> wrote:
>
> On Mon, Nov 19, 2018 at 09:48:16AM -0800, Nick Desaulniers wrote:
> > On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel
> > <ard.biesheuvel@linaro.org> wrote:
> > >
> > > Hello all,
> > >
> > > Some of us (on cc) have discussed this a bit on various occasions, so
> > > perhaps it's time to sit down and do something about it :-)
> > >
> > > The Clang work that Nick is involved in has made explicit some
> > > assumptions that we are currently making in the kernel with respect to
> > > the behavior of GCC, and it would be good to formalise this so we can
> > > keep relying on it in the future, and to clarify to other compiler
> > > developers what is needed.
> > >
> > > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
> > > introduce the same for AArch64. Under this model, we should be able to
> > > make the following assumptions:
> > > - symbol references are emitted as ADRP/ADD pairs
> > > - no absolute references in generated code like jump tables etc (like
> > > -fpic/-fpie) [*]
> > > - no shared library semantics (no GOT indirections to support symbol
> > > preemption, or to reduce the CoW footprint and/or avoid text
> > > relocations)
> > > - resulting objects can be linked in -pie mode by ld.bfd
> > >
> > > Another thing that came up is that we currently rely on the stack
> > > pointer never to assume a value that is not 16-byte aligned, even
> > > transiently.
> > >
> > > Other things I've missed?
> > >
> > > [*] This is only strictly required in parts of the code that may
> > > execute at a different offset than the linked offset, even after
> > > processing dynamic relocations at boot time (e.g., KVM hyp code
> > > running in a different exception level) but avoiding those altogether
> > > is reasonable. Note that GCC does the right thing for us here already,
> > > but Clang current;y needs -fno-jump-tables to build the KVM hyp code.
> >
> > Just following up on the KVM hyp code. At plumbers Mark and I took a
> > look at that case and were no longer able to reproduce. Mark was able
> > to boot a Clang compiled kernel at EL2. IIRC, the code in question
> > had been changed. Note: Clang still probably does the wrong thing for
> > jump tables, so we should pursue this code model, I just don't think
> > this previous example is still the case. Since it could always come
> > back, we should fix the issue on the compiler side though.
>
> We should absolutely document it as a requirement either way,
> and the putative gcc -mcmodel=kernel for arm64 should guarantee it.
>
> Is this a requirement for non-hyp code though?
>
No. Only code that may execute at a different offset than it was
linked at (build time or runtime) needs this, and then we still need
to apply on our checks on top to ensure that we don't rely on things
like statically aligned function pointers or other quantities that can
only be fixed up through absolute relocations.
> KASLR doesn't strictly need it, since we could keep the relocation info
> around and fix the kernel up at boot time (though obviously life is
> easier if we don't need to do this).
>
We already do that anyway. Every R_AARCH_ABS64 relocation results in a
dynamic R_AARCH64_RELATIVE relocation that is fixed up by the early
boot code. With the default small code model (on GCC) we don't get
R_AARCH64_AB64 relocations in .text but we have lots and lots of them
in .rodata/.data for statically initialized structs with function
pointers or other symbol references.
The problem is that each 8 byte quantity results in a 24-byte RELA
entry in the .rela section, and so absolute references should not be
emitted needlessly.
> Currently the lack of relocations seems to be an accidental side-effect
> of -mcmodel=small. It's not clear to me whether we rely on this because
> it's advantageous, or whether we rely on it by accident because it
> happens to be the default.
>
> My question is: should there be just one kernel memory model, or do we
> want position-independent (for hyp-like stuff) and non-position-
> independent variants (for everything else)?
>
The current situation is that the non-PIC small code model already
gives us what we need, given that symbol references and branch target
references are always relative by default. Switching to -fpic, which
seems more appropriate semantically, gives us worse code since all
symbol references are now indirected via a GOT entry [i.e., a memory
access] to load an absolute address [which needs to be fixed up and
results in a 24-byte RELA entry]
This -fpic behavior is geared towards shared libraries, which have to
support symbol preemption for externally visible symbols, and try to
avoid dynamic relocations in .text so that .so mappings can share
clean pages between processes except for the GOT entries.
So if -mcmodel=small -fpic -fvisibility=hidden would given use the
right behavior, perhaps we wouldn't need this code model. But it
doesn't: -fvisibility is weaker than #pragma visibility, and so we
still end up with lots of GOT entries.
IOW, if we define a code model for the kernel, there is no point in
having different behavior between EFI stub/HYP code and other parts,
since the code that runs on the former is optimal for the latter
anyway.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-11-22 12:06 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-19 17:41 compiler code model for the kernel Ard Biesheuvel
2018-11-19 17:48 ` Nick Desaulniers
2018-11-22 11:53 ` Dave Martin
2018-11-22 12:06 ` Ard Biesheuvel
2018-11-19 23:54 ` Nick Desaulniers
2018-11-20 11:20 ` Peter Smith
2018-11-20 13:16 ` Ard Biesheuvel
2018-11-20 13:23 ` Will Deacon
2018-11-20 13:49 ` Ard Biesheuvel
2018-11-20 17:02 ` Ramana Radhakrishnan
2018-11-20 13:25 ` Ramana Radhakrishnan
2018-11-20 13:37 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox