* compiler code model for the kernel
@ 2018-11-19 17:41 Ard Biesheuvel
2018-11-19 17:48 ` Nick Desaulniers
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2018-11-19 17:41 UTC (permalink / raw)
To: linux-arm-kernel
Hello all,
Some of us (on cc) have discussed this a bit on various occasions, so
perhaps it's time to sit down and do something about it :-)
The Clang work that Nick is involved in has made explicit some
assumptions that we are currently making in the kernel with respect to
the behavior of GCC, and it would be good to formalise this so we can
keep relying on it in the future, and to clarify to other compiler
developers what is needed.
GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should
introduce the same for AArch64. Under this model, we should be able to
make the following assumptions:
- symbol references are emitted as ADRP/ADD pairs
- no absolute references in generated code like jump tables etc (like
-fpic/-fpie) [*]
- no shared library semantics (no GOT indirections to support symbol
preemption, or to reduce the CoW footprint and/or avoid text
relocations)
- resulting objects can be linked in -pie mode by ld.bfd
Another thing that came up is that we currently rely on the stack
pointer never to assume a value that is not 16-byte aligned, even
transiently.
Other things I've missed?
[*] This is only strictly required in parts of the code that may
execute at a different offset than the linked offset, even after
processing dynamic relocations at boot time (e.g., KVM hyp code
running in a different exception level) but avoiding those altogether
is reasonable. Note that GCC does the right thing for us here already,
but Clang current;y needs -fno-jump-tables to build the KVM hyp code.
^ permalink raw reply [flat|nested] 12+ messages in thread* compiler code model for the kernel 2018-11-19 17:41 compiler code model for the kernel Ard Biesheuvel @ 2018-11-19 17:48 ` Nick Desaulniers 2018-11-22 11:53 ` Dave Martin 2018-11-19 23:54 ` Nick Desaulniers ` (2 subsequent siblings) 3 siblings, 1 reply; 12+ messages in thread From: Nick Desaulniers @ 2018-11-19 17:48 UTC (permalink / raw) To: linux-arm-kernel On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > Hello all, > > Some of us (on cc) have discussed this a bit on various occasions, so > perhaps it's time to sit down and do something about it :-) > > The Clang work that Nick is involved in has made explicit some > assumptions that we are currently making in the kernel with respect to > the behavior of GCC, and it would be good to formalise this so we can > keep relying on it in the future, and to clarify to other compiler > developers what is needed. > > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should > introduce the same for AArch64. Under this model, we should be able to > make the following assumptions: > - symbol references are emitted as ADRP/ADD pairs > - no absolute references in generated code like jump tables etc (like > -fpic/-fpie) [*] > - no shared library semantics (no GOT indirections to support symbol > preemption, or to reduce the CoW footprint and/or avoid text > relocations) > - resulting objects can be linked in -pie mode by ld.bfd > > Another thing that came up is that we currently rely on the stack > pointer never to assume a value that is not 16-byte aligned, even > transiently. > > Other things I've missed? > > [*] This is only strictly required in parts of the code that may > execute at a different offset than the linked offset, even after > processing dynamic relocations at boot time (e.g., KVM hyp code > running in a different exception level) but avoiding those altogether > is reasonable. Note that GCC does the right thing for us here already, > but Clang current;y needs -fno-jump-tables to build the KVM hyp code. Just following up on the KVM hyp code. At plumbers Mark and I took a look at that case and were no longer able to reproduce. Mark was able to boot a Clang compiled kernel at EL2. IIRC, the code in question had been changed. Note: Clang still probably does the wrong thing for jump tables, so we should pursue this code model, I just don't think this previous example is still the case. Since it could always come back, we should fix the issue on the compiler side though. -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel 2018-11-19 17:48 ` Nick Desaulniers @ 2018-11-22 11:53 ` Dave Martin 2018-11-22 12:06 ` Ard Biesheuvel 0 siblings, 1 reply; 12+ messages in thread From: Dave Martin @ 2018-11-22 11:53 UTC (permalink / raw) To: linux-arm-kernel On Mon, Nov 19, 2018 at 09:48:16AM -0800, Nick Desaulniers wrote: > On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > > > Hello all, > > > > Some of us (on cc) have discussed this a bit on various occasions, so > > perhaps it's time to sit down and do something about it :-) > > > > The Clang work that Nick is involved in has made explicit some > > assumptions that we are currently making in the kernel with respect to > > the behavior of GCC, and it would be good to formalise this so we can > > keep relying on it in the future, and to clarify to other compiler > > developers what is needed. > > > > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should > > introduce the same for AArch64. Under this model, we should be able to > > make the following assumptions: > > - symbol references are emitted as ADRP/ADD pairs > > - no absolute references in generated code like jump tables etc (like > > -fpic/-fpie) [*] > > - no shared library semantics (no GOT indirections to support symbol > > preemption, or to reduce the CoW footprint and/or avoid text > > relocations) > > - resulting objects can be linked in -pie mode by ld.bfd > > > > Another thing that came up is that we currently rely on the stack > > pointer never to assume a value that is not 16-byte aligned, even > > transiently. > > > > Other things I've missed? > > > > [*] This is only strictly required in parts of the code that may > > execute at a different offset than the linked offset, even after > > processing dynamic relocations at boot time (e.g., KVM hyp code > > running in a different exception level) but avoiding those altogether > > is reasonable. Note that GCC does the right thing for us here already, > > but Clang current;y needs -fno-jump-tables to build the KVM hyp code. > > Just following up on the KVM hyp code. At plumbers Mark and I took a > look at that case and were no longer able to reproduce. Mark was able > to boot a Clang compiled kernel at EL2. IIRC, the code in question > had been changed. Note: Clang still probably does the wrong thing for > jump tables, so we should pursue this code model, I just don't think > this previous example is still the case. Since it could always come > back, we should fix the issue on the compiler side though. We should absolutely document it as a requirement either way, and the putative gcc -mcmodel=kernel for arm64 should guarantee it. Is this a requirement for non-hyp code though? KASLR doesn't strictly need it, since we could keep the relocation info around and fix the kernel up at boot time (though obviously life is easier if we don't need to do this). Currently the lack of relocations seems to be an accidental side-effect of -mcmodel=small. It's not clear to me whether we rely on this because it's advantageous, or whether we rely on it by accident because it happens to be the default. My question is: should there be just one kernel memory model, or do we want position-independent (for hyp-like stuff) and non-position- independent variants (for everything else)? Cheers ---Dave ^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel 2018-11-22 11:53 ` Dave Martin @ 2018-11-22 12:06 ` Ard Biesheuvel 0 siblings, 0 replies; 12+ messages in thread From: Ard Biesheuvel @ 2018-11-22 12:06 UTC (permalink / raw) To: linux-arm-kernel On Thu, 22 Nov 2018 at 12:53, Dave Martin <Dave.Martin@arm.com> wrote: > > On Mon, Nov 19, 2018 at 09:48:16AM -0800, Nick Desaulniers wrote: > > On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel > > <ard.biesheuvel@linaro.org> wrote: > > > > > > Hello all, > > > > > > Some of us (on cc) have discussed this a bit on various occasions, so > > > perhaps it's time to sit down and do something about it :-) > > > > > > The Clang work that Nick is involved in has made explicit some > > > assumptions that we are currently making in the kernel with respect to > > > the behavior of GCC, and it would be good to formalise this so we can > > > keep relying on it in the future, and to clarify to other compiler > > > developers what is needed. > > > > > > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should > > > introduce the same for AArch64. Under this model, we should be able to > > > make the following assumptions: > > > - symbol references are emitted as ADRP/ADD pairs > > > - no absolute references in generated code like jump tables etc (like > > > -fpic/-fpie) [*] > > > - no shared library semantics (no GOT indirections to support symbol > > > preemption, or to reduce the CoW footprint and/or avoid text > > > relocations) > > > - resulting objects can be linked in -pie mode by ld.bfd > > > > > > Another thing that came up is that we currently rely on the stack > > > pointer never to assume a value that is not 16-byte aligned, even > > > transiently. > > > > > > Other things I've missed? > > > > > > [*] This is only strictly required in parts of the code that may > > > execute at a different offset than the linked offset, even after > > > processing dynamic relocations at boot time (e.g., KVM hyp code > > > running in a different exception level) but avoiding those altogether > > > is reasonable. Note that GCC does the right thing for us here already, > > > but Clang current;y needs -fno-jump-tables to build the KVM hyp code. > > > > Just following up on the KVM hyp code. At plumbers Mark and I took a > > look at that case and were no longer able to reproduce. Mark was able > > to boot a Clang compiled kernel at EL2. IIRC, the code in question > > had been changed. Note: Clang still probably does the wrong thing for > > jump tables, so we should pursue this code model, I just don't think > > this previous example is still the case. Since it could always come > > back, we should fix the issue on the compiler side though. > > We should absolutely document it as a requirement either way, > and the putative gcc -mcmodel=kernel for arm64 should guarantee it. > > Is this a requirement for non-hyp code though? > No. Only code that may execute at a different offset than it was linked at (build time or runtime) needs this, and then we still need to apply on our checks on top to ensure that we don't rely on things like statically aligned function pointers or other quantities that can only be fixed up through absolute relocations. > KASLR doesn't strictly need it, since we could keep the relocation info > around and fix the kernel up at boot time (though obviously life is > easier if we don't need to do this). > We already do that anyway. Every R_AARCH_ABS64 relocation results in a dynamic R_AARCH64_RELATIVE relocation that is fixed up by the early boot code. With the default small code model (on GCC) we don't get R_AARCH64_AB64 relocations in .text but we have lots and lots of them in .rodata/.data for statically initialized structs with function pointers or other symbol references. The problem is that each 8 byte quantity results in a 24-byte RELA entry in the .rela section, and so absolute references should not be emitted needlessly. > Currently the lack of relocations seems to be an accidental side-effect > of -mcmodel=small. It's not clear to me whether we rely on this because > it's advantageous, or whether we rely on it by accident because it > happens to be the default. > > My question is: should there be just one kernel memory model, or do we > want position-independent (for hyp-like stuff) and non-position- > independent variants (for everything else)? > The current situation is that the non-PIC small code model already gives us what we need, given that symbol references and branch target references are always relative by default. Switching to -fpic, which seems more appropriate semantically, gives us worse code since all symbol references are now indirected via a GOT entry [i.e., a memory access] to load an absolute address [which needs to be fixed up and results in a 24-byte RELA entry] This -fpic behavior is geared towards shared libraries, which have to support symbol preemption for externally visible symbols, and try to avoid dynamic relocations in .text so that .so mappings can share clean pages between processes except for the GOT entries. So if -mcmodel=small -fpic -fvisibility=hidden would given use the right behavior, perhaps we wouldn't need this code model. But it doesn't: -fvisibility is weaker than #pragma visibility, and so we still end up with lots of GOT entries. IOW, if we define a code model for the kernel, there is no point in having different behavior between EFI stub/HYP code and other parts, since the code that runs on the former is optimal for the latter anyway. ^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel 2018-11-19 17:41 compiler code model for the kernel Ard Biesheuvel 2018-11-19 17:48 ` Nick Desaulniers @ 2018-11-19 23:54 ` Nick Desaulniers 2018-11-20 11:20 ` Peter Smith 2018-11-20 13:23 ` Will Deacon 2018-11-20 13:25 ` Ramana Radhakrishnan 3 siblings, 1 reply; 12+ messages in thread From: Nick Desaulniers @ 2018-11-19 23:54 UTC (permalink / raw) To: linux-arm-kernel + some more folks to take a look On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > Hello all, > > Some of us (on cc) have discussed this a bit on various occasions, so > perhaps it's time to sit down and do something about it :-) > > The Clang work that Nick is involved in has made explicit some > assumptions that we are currently making in the kernel with respect to > the behavior of GCC, and it would be good to formalise this so we can > keep relying on it in the future, and to clarify to other compiler > developers what is needed. > > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should > introduce the same for AArch64. Under this model, we should be able to > make the following assumptions: > - symbol references are emitted as ADRP/ADD pairs > - no absolute references in generated code like jump tables etc (like > -fpic/-fpie) [*] > - no shared library semantics (no GOT indirections to support symbol > preemption, or to reduce the CoW footprint and/or avoid text > relocations) > - resulting objects can be linked in -pie mode by ld.bfd > > Another thing that came up is that we currently rely on the stack > pointer never to assume a value that is not 16-byte aligned, even > transiently. > > Other things I've missed? > > [*] This is only strictly required in parts of the code that may > execute at a different offset than the linked offset, even after > processing dynamic relocations at boot time (e.g., KVM hyp code > running in a different exception level) but avoiding those altogether > is reasonable. Note that GCC does the right thing for us here already, > but Clang current;y needs -fno-jump-tables to build the KVM hyp code. -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel 2018-11-19 23:54 ` Nick Desaulniers @ 2018-11-20 11:20 ` Peter Smith 2018-11-20 13:16 ` Ard Biesheuvel 0 siblings, 1 reply; 12+ messages in thread From: Peter Smith @ 2018-11-20 11:20 UTC (permalink / raw) To: linux-arm-kernel ________________________________________ From: Nick Desaulniers <ndesaulniers@google.com> Sent: 19 November 2018 23:54 To: Ard Biesheuvel Cc: Will Deacon; Catalin Marinas; Ramana Radhakrishnan; Linux ARM; Mark Rutland; Arnd Bergmann; Marc Zyngier; Peter Smith; Kristof Beyls; Stephen Hines Subject: Re: compiler code model for the kernel + some more folks to take a look On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > Hello all, > > Some of us (on cc) have discussed this a bit on various occasions, so > perhaps it's time to sit down and do something about it :-) > > The Clang work that Nick is involved in has made explicit some > assumptions that we are currently making in the kernel with respect to > the behavior of GCC, and it would be good to formalise this so we can > keep relying on it in the future, and to clarify to other compiler > developers what is needed. > > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should > introduce the same for AArch64. Under this model, we should be able to > make the following assumptions: > - symbol references are emitted as ADRP/ADD pairs > - no absolute references in generated code like jump tables etc (like > -fpic/-fpie) [*] > - no shared library semantics (no GOT indirections to support symbol > preemption, or to reduce the CoW footprint and/or avoid text > relocations) > - resulting objects can be linked in -pie mode by ld.bfd > > Another thing that came up is that we currently rely on the stack > pointer never to assume a value that is not 16-byte aligned, even > transiently. > > Other things I've missed? > > [*] This is only strictly required in parts of the code that may > execute at a different offset than the linked offset, even after > processing dynamic relocations at boot time (e.g., KVM hyp code > running in a different exception level) but avoiding those altogether > is reasonable. Note that GCC does the right thing for us here already, > but Clang current;y needs -fno-jump-tables to build the KVM hyp code. Not got a lot to add at the moment. Some observations from the clang/llvm side. Clang already has -mcmodel=kernel for fuchsia targets so there is some prior art. My understanding is that it uses the small code-model with some small tweaks. There will probably need to be some renaming in the back-end to distinguish between fuchsia kernel code model and linux kernel code model. I think that the assumptions above should be describable without too much difficulty. ADRP/ADD is covered by the small code model. I think the existing -fpie where every global has hidden visibility and hence cannot be found in a shared object will produce code with no GOT indirections. My limited understanding of the frame lowering code suggests that clang maintains a minimum 16-byte alignment of the stack pointer at all times. It would need to be looked over by an expert to make sure though. Peter IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel 2018-11-20 11:20 ` Peter Smith @ 2018-11-20 13:16 ` Ard Biesheuvel 0 siblings, 0 replies; 12+ messages in thread From: Ard Biesheuvel @ 2018-11-20 13:16 UTC (permalink / raw) To: linux-arm-kernel On Tue, 20 Nov 2018 at 12:20, Peter Smith <Peter.Smith@arm.com> wrote: > > > ________________________________________ > From: Nick Desaulniers <ndesaulniers@google.com> > Sent: 19 November 2018 23:54 > To: Ard Biesheuvel > Cc: Will Deacon; Catalin Marinas; Ramana Radhakrishnan; Linux ARM; Mark Rutland; Arnd Bergmann; Marc Zyngier; Peter Smith; Kristof Beyls; Stephen Hines > Subject: Re: compiler code model for the kernel > > + some more folks to take a look > > On Mon, Nov 19, 2018 at 9:42 AM Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > > > Hello all, > > > > Some of us (on cc) have discussed this a bit on various occasions, so > > perhaps it's time to sit down and do something about it :-) > > > > The Clang work that Nick is involved in has made explicit some > > assumptions that we are currently making in the kernel with respect to > > the behavior of GCC, and it would be good to formalise this so we can > > keep relying on it in the future, and to clarify to other compiler > > developers what is needed. > > > > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should > > introduce the same for AArch64. Under this model, we should be able to > > make the following assumptions: > > - symbol references are emitted as ADRP/ADD pairs > > - no absolute references in generated code like jump tables etc (like > > -fpic/-fpie) [*] > > - no shared library semantics (no GOT indirections to support symbol > > preemption, or to reduce the CoW footprint and/or avoid text > > relocations) > > - resulting objects can be linked in -pie mode by ld.bfd > > > > Another thing that came up is that we currently rely on the stack > > pointer never to assume a value that is not 16-byte aligned, even > > transiently. > > > > Other things I've missed? > > > > [*] This is only strictly required in parts of the code that may > > execute at a different offset than the linked offset, even after > > processing dynamic relocations at boot time (e.g., KVM hyp code > > running in a different exception level) but avoiding those altogether > > is reasonable. Note that GCC does the right thing for us here already, > > but Clang current;y needs -fno-jump-tables to build the KVM hyp code. > > Not got a lot to add at the moment. Some observations from the clang/llvm side. > > Clang already has -mcmodel=kernel for fuchsia targets so there is some prior art. My understanding is that it uses the small code-model with some small tweaks. There will probably need to be some renaming in the back-end to distinguish between fuchsia kernel code model and linux kernel code model. > > I think that the assumptions above should be describable without too much difficulty. ADRP/ADD is covered by the small code model. I think the existing -fpie where every global has hidden visibility and hence cannot be found in a shared object will produce code with no GOT indirections. > Indeed. On GCC, using the visibility 'hidden' pragma (which affects extern declarations as well as definitions, unlike the -fvisibility command line option) with -fpie results in a working kernel image without any artificial absolute references like GOT entries. > My limited understanding of the frame lowering code suggests that clang maintains a minimum 16-byte alignment of the stack pointer at all times. It would need to be looked over by an expert to make sure though. > Excellent. That is basically the same situation as GCC: the backend only ever adjusts the stack pointer in 16 byte increments, but it would be good to have agreement that it should remain that way. ^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel 2018-11-19 17:41 compiler code model for the kernel Ard Biesheuvel 2018-11-19 17:48 ` Nick Desaulniers 2018-11-19 23:54 ` Nick Desaulniers @ 2018-11-20 13:23 ` Will Deacon 2018-11-20 13:49 ` Ard Biesheuvel 2018-11-20 13:25 ` Ramana Radhakrishnan 3 siblings, 1 reply; 12+ messages in thread From: Will Deacon @ 2018-11-20 13:23 UTC (permalink / raw) To: linux-arm-kernel On Mon, Nov 19, 2018 at 09:41:54AM -0800, Ard Biesheuvel wrote: > Some of us (on cc) have discussed this a bit on various occasions, so > perhaps it's time to sit down and do something about it :-) > > The Clang work that Nick is involved in has made explicit some > assumptions that we are currently making in the kernel with respect to > the behavior of GCC, and it would be good to formalise this so we can > keep relying on it in the future, and to clarify to other compiler > developers what is needed. > > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should > introduce the same for AArch64. Under this model, we should be able to > make the following assumptions: > - symbol references are emitted as ADRP/ADD pairs > - no absolute references in generated code like jump tables etc (like > -fpic/-fpie) [*] > - no shared library semantics (no GOT indirections to support symbol > preemption, or to reduce the CoW footprint and/or avoid text > relocations) > - resulting objects can be linked in -pie mode by ld.bfd > > Another thing that came up is that we currently rely on the stack > pointer never to assume a value that is not 16-byte aligned, even > transiently. > > Other things I've missed? It would be great if we could disable idiom recognition by the compiler as part of this option, since this ends up with us failing to inline code because the compiler ends up wanting to use vector registers but can't, so replaces the idiom with a call to a libgcc function which we're forced to implement out-of-line. https://bugzilla.kernel.org/show_bug.cgi?id=200671 Another desirable feature would be having a way to force the assembler to accept arbitrary instructions, rather than have to use .arch_extension all over the place. Will ^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel 2018-11-20 13:23 ` Will Deacon @ 2018-11-20 13:49 ` Ard Biesheuvel 2018-11-20 17:02 ` Ramana Radhakrishnan 0 siblings, 1 reply; 12+ messages in thread From: Ard Biesheuvel @ 2018-11-20 13:49 UTC (permalink / raw) To: linux-arm-kernel On Tue, 20 Nov 2018 at 14:23, Will Deacon <will.deacon@arm.com> wrote: > > On Mon, Nov 19, 2018 at 09:41:54AM -0800, Ard Biesheuvel wrote: > > Some of us (on cc) have discussed this a bit on various occasions, so > > perhaps it's time to sit down and do something about it :-) > > > > The Clang work that Nick is involved in has made explicit some > > assumptions that we are currently making in the kernel with respect to > > the behavior of GCC, and it would be good to formalise this so we can > > keep relying on it in the future, and to clarify to other compiler > > developers what is needed. > > > > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should > > introduce the same for AArch64. Under this model, we should be able to > > make the following assumptions: > > - symbol references are emitted as ADRP/ADD pairs > > - no absolute references in generated code like jump tables etc (like > > -fpic/-fpie) [*] > > - no shared library semantics (no GOT indirections to support symbol > > preemption, or to reduce the CoW footprint and/or avoid text > > relocations) > > - resulting objects can be linked in -pie mode by ld.bfd > > > > Another thing that came up is that we currently rely on the stack > > pointer never to assume a value that is not 16-byte aligned, even > > transiently. > > > > Other things I've missed? > > It would be great if we could disable idiom recognition by the compiler as > part of this option, since this ends up with us failing to inline code > because the compiler ends up wanting to use vector registers but can't, so > replaces the idiom with a call to a libgcc function which we're forced to > implement out-of-line. > > https://bugzilla.kernel.org/show_bug.cgi?id=200671 > This is another thing that we may have been getting away with by accident: currently, we don't include libgcc at all, and since we build the kernel with -mgeneral-regs-only, there is no way we could without rebuilding it (unless libgcc is somehow guaranteed not to use FP registers) So I think idiom recognition is fine in general: I guess this is also the thing that replaces memset() calls with stp xzr,xzr,[] instructions? It is the unanticipated libgcc dependency that is a problem, and I guess this may trigger in other parts of the compiler as well. Note that we do provide the 'intrinsic' memcpy() and memset() routines, simply because they have the same name in the kernel (and they are libc not libgcc). > Another desirable feature would be having a way to force the assembler > to accept arbitrary instructions, rather than have to use .arch_extension > all over the place. > IIRC this is a recent change in GCC where it repeats the .arch/.cpu directive for each function? ^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel 2018-11-20 13:49 ` Ard Biesheuvel @ 2018-11-20 17:02 ` Ramana Radhakrishnan 0 siblings, 0 replies; 12+ messages in thread From: Ramana Radhakrishnan @ 2018-11-20 17:02 UTC (permalink / raw) To: linux-arm-kernel On 20/11/2018 13:49, Ard Biesheuvel wrote: >> >> It would be great if we could disable idiom recognition by the compiler as >> part of this option, since this ends up with us failing to inline code >> because the compiler ends up wanting to use vector registers but can't, so >> replaces the idiom with a call to a libgcc function which we're forced to >> implement out-of-line. >> >> https://bugzilla.kernel.org/show_bug.cgi?id=200671 >> I think idiom recognition is fine. Is this still an issue, I thought the commit on 13th November "fixed" it ? We shouldn't look to turn off every single optimization feature, these are put into the compiler and we should look to make this work better between the kernel and the toolchain in general and not just punt always. Case in point, see the work that's being attempted with the ipa optimizations and live patching. I don't like the idea of pushing in such things into a -mcmodel option as it is really not sustainable. I think those sort of errors should be fixed generically and not really as part of this backend option. > > This is another thing that we may have been getting away with by > accident: currently, we don't include libgcc at all, and since we > build the kernel with -mgeneral-regs-only, there is no way we could > without rebuilding it (unless libgcc is somehow guaranteed not to use > FP registers) > > So I think idiom recognition is fine in general: I guess this is also > the thing that replaces memset() calls with stp xzr,xzr,[] > instructions? It is the unanticipated libgcc dependency that is a > problem, and I guess this may trigger in other parts of the compiler > as well. > > Note that we do provide the 'intrinsic' memcpy() and memset() > routines, simply because they have the same name in the kernel (and > they are libc not libgcc). > > >> Another desirable feature would be having a way to force the assembler >> to accept arbitrary instructions, rather than have to use .arch_extension >> all over the place. >> > > IIRC this is a recent change in GCC where it repeats the .arch/.cpu > directive for each function? > That is required IIRC for LTO and target function attributes to work reliably. You could have different functions compiled at different architecture levels with __attribute__((target (arch=)) . And part of it is due to the static checking that gas gives you with respect to the architecture. The .arch_extension is really an arm (AArch32) feature, on AArch64 architectural features should really get added to the .arch string separated by a `+'. Ramana ^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel 2018-11-19 17:41 compiler code model for the kernel Ard Biesheuvel ` (2 preceding siblings ...) 2018-11-20 13:23 ` Will Deacon @ 2018-11-20 13:25 ` Ramana Radhakrishnan 2018-11-20 13:37 ` Ard Biesheuvel 3 siblings, 1 reply; 12+ messages in thread From: Ramana Radhakrishnan @ 2018-11-20 13:25 UTC (permalink / raw) To: linux-arm-kernel Hi Ard, CC'ing Richard E and James Greenhalgh as well on this thread. On 19/11/2018 17:41, Ard Biesheuvel wrote: > Hello all, > > Some of us (on cc) have discussed this a bit on various occasions, so > perhaps it's time to sit down and do something about it :-) > > The Clang work that Nick is involved in has made explicit some > assumptions that we are currently making in the kernel with respect to > the behavior of GCC, and it would be good to formalise this so we can > keep relying on it in the future, and to clarify to other compiler > developers what is needed. > > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should > introduce the same for AArch64. Under this model, we should be able to > make the following assumptions: I would prefer this is -mcmodel=kernel-small so that it's explicit that this is like the small memory model. On AArch64 this is what GCC calls the small absolute model in code i.e. really all global data accesses from code are really PC relative and thus position independent. The problem is data initializers but I understand you are able to deal with it by producing relative relocations by using hidden attributes. Thus I wonder if we should go as far as implying -fvisibility=hidden in the compiler for all global data. Is that what you do today ? In-fact the patch stack I am carrying for the sp_el0 based canary toyed with the idea of adding such a flag and only putting out the symbol name for the offset as I really dont' see anyone but the kernel using that kind of code generation. > - symbol references are emitted as ADRP/ADD pairs Ok. > - no absolute references in generated code like jump tables etc (like > -fpic/-fpie) [*] Ok. -fPIC/pie with default hidden visibility just achieves the same thing with -mcmodel=small ? > - no shared library semantics (no GOT indirections to support symbol > preemption, or to reduce the CoW footprint and/or avoid text > relocations) Ok. > - resulting objects can be linked in -pie mode by ld.bfd Ok, but never producing a GOT. Further you probably want to have checks on the final binary once a build is completed to ensure this ? > > Another thing that came up is that we currently rely on the stack > pointer never to assume a value that is not 16-byte aligned, even > transiently. I think we've discussed that to great detail as far as GCC is concerned. I would prefer it is documented in the kernel documentation as a requirement for this standard. TL;DR http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/611445.html regards Ramana > > Other things I've missed? > > [*] This is only strictly required in parts of the code that may > execute at a different offset than the linked offset, even after > processing dynamic relocations at boot time (e.g., KVM hyp code > running in a different exception level) but avoiding those altogether > is reasonable. Note that GCC does the right thing for us here already, > but Clang current;y needs -fno-jump-tables to build the KVM hyp code. > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ^ permalink raw reply [flat|nested] 12+ messages in thread
* compiler code model for the kernel 2018-11-20 13:25 ` Ramana Radhakrishnan @ 2018-11-20 13:37 ` Ard Biesheuvel 0 siblings, 0 replies; 12+ messages in thread From: Ard Biesheuvel @ 2018-11-20 13:37 UTC (permalink / raw) To: linux-arm-kernel On Tue, 20 Nov 2018 at 14:25, Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com> wrote: > > Hi Ard, > > CC'ing Richard E and James Greenhalgh as well on this thread. > > > On 19/11/2018 17:41, Ard Biesheuvel wrote: > > Hello all, > > > > Some of us (on cc) have discussed this a bit on various occasions, so > > perhaps it's time to sit down and do something about it :-) > > > > The Clang work that Nick is involved in has made explicit some > > assumptions that we are currently making in the kernel with respect to > > the behavior of GCC, and it would be good to formalise this so we can > > keep relying on it in the future, and to clarify to other compiler > > developers what is needed. > > > > GCC for x86 has a -mcmodel=kernel parameter, so perhaps we should > > introduce the same for AArch64. Under this model, we should be able to > > make the following assumptions: > > I would prefer this is -mcmodel=kernel-small so that it's explicit that > this is like the small memory model. > I don't have a strong preference regarding the name, but I will point out that other arches have a 'kernel' code model already, also on Clang (as Peter reports) > On AArch64 this is what GCC calls the small absolute model in code i.e. > really all global data accesses from code are really PC relative and > thus position independent. The problem is data initializers but I > understand you are able to deal with it by producing relative > relocations by using hidden attributes. > > Thus I wonder if we should go as far as implying -fvisibility=hidden in > the compiler for all global data. Is that what you do today ? > Fortunately, we don't have to do this currently since the non-PIC small model already gives us what we need. -fivisibility=hidden isn't quite sufficient though when building with -fpic, only the visibility pragma affects extern declarations as well as definition. > In-fact the patch stack I am carrying for the sp_el0 based canary toyed > with the idea of adding such a flag and only putting out the symbol name > for the offset as I really dont' see anyone but the kernel using that > kind of code generation. > > > - symbol references are emitted as ADRP/ADD pairs > > Ok. > > > - no absolute references in generated code like jump tables etc (like > > -fpic/-fpie) [*] > > Ok. -fPIC/pie with default hidden visibility just achieves the same > thing with -mcmodel=small ? > Yes, but only if you use the pragma > > - no shared library semantics (no GOT indirections to support symbol > > preemption, or to reduce the CoW footprint and/or avoid text > > relocations) > > Ok. > > > - resulting objects can be linked in -pie mode by ld.bfd > > Ok, but never producing a GOT. Further you probably want to have checks > on the final binary once a build is completed to ensure this ? > In general, having no GOT whatsoever is not a hard requirement. It is a hard requirement for /some/ pieces of the code (EFI stub, KVM hyp code) that we have no absolute references whatsoever, but in general, we simply fix up the R_AARCH64_RELATIVE relocations wherever we encounter them. Having all symbol references go through the GOT just bloats the binary needlessly given that we don't require the shared library semantics. > > > > Another thing that came up is that we currently rely on the stack > > pointer never to assume a value that is not 16-byte aligned, even > > transiently. > > I think we've discussed that to great detail as far as GCC is concerned. > I would prefer it is documented in the kernel documentation as a > requirement for this standard. > > > TL;DR > > http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/611445.html > > > regards > Ramana > > > > > Other things I've missed? > > > > [*] This is only strictly required in parts of the code that may > > execute at a different offset than the linked offset, even after > > processing dynamic relocations at boot time (e.g., KVM hyp code > > running in a different exception level) but avoiding those altogether > > is reasonable. Note that GCC does the right thing for us here already, > > but Clang current;y needs -fno-jump-tables to build the KVM hyp code. > > > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-11-22 12:06 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-11-19 17:41 compiler code model for the kernel Ard Biesheuvel 2018-11-19 17:48 ` Nick Desaulniers 2018-11-22 11:53 ` Dave Martin 2018-11-22 12:06 ` Ard Biesheuvel 2018-11-19 23:54 ` Nick Desaulniers 2018-11-20 11:20 ` Peter Smith 2018-11-20 13:16 ` Ard Biesheuvel 2018-11-20 13:23 ` Will Deacon 2018-11-20 13:49 ` Ard Biesheuvel 2018-11-20 17:02 ` Ramana Radhakrishnan 2018-11-20 13:25 ` Ramana Radhakrishnan 2018-11-20 13:37 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox