* [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled @ 2019-01-29 11:08 Alex Bennée 2019-01-29 11:46 ` Mark Rutland 0 siblings, 1 reply; 4+ messages in thread From: Alex Bennée @ 2019-01-29 11:08 UTC (permalink / raw) To: qemu-devel, open list:ARM, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu Cc: Mark Rutland, Richard Henderson, Marc Zyngier Hi, Following up on yesterday's discussion on IRC I thought I'd better report on my findings in the permanent record so things don't get lost. As I tend to periodically rebuild my test kernels from the current state of linux.git I occasionally run into these things. My test invocation is: qemu-system-aarch64 -machine type=virt,virtualization=on \ -display none -m 4096 -serial mon:stdio \ -kernel ../../kernel-v8-plain.build/arch/arm64/boot/Image \ -append 'console=ttyAMA0 panic=-1' -no-reboot -cpu max The kernel is essentially a defconfig kernel with a bunch of the VIRTIO device drivers built-in for when I actually boot a more complex setup with disks and drives. However this is a boot test so doesn't really matter. The -machine type=virt,virtualization=on enables our virt machine model with EL2 turned on. As there is no BIOS involved the kernel is invoked directly at EL2. The -cpu max enabled a cortex-a57 + whatever extra features we've enabled in QEMU so far. It won't match any "real" CPU but it should be architecturally correct in so far we implement prerequisite features for any given feature. The cpuid feature bits should also be correct as we test them internally in QEMU to enable features. The breakage is the kernel never boots (no output on serial port) and on attaching with gdb I found it stuck in: (gdb) bt #0 0xffffff8010a9e480 in overflow_stack () Backtrace stopped: not enough registers or memory available to unwind further If I turn on exception tracing it looks like we go into an exception loop. On the QEMU side this breakage comes in at: commit 1ce32e47db52e3511132c7104770eae65d412144 (HEAD, refs/bisect/bad) Author: Richard Henderson <richard.henderson@linaro.org> Date: Mon Jan 21 10:23:13 2019 +0000 target/arm: Enable PAuth for -cpu max Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20190108223129.5570-30-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> and as you would expect the system boots fine with -cpu cortex-a57 On the kernel side it breaks at: commit 04ca3204fa09f5f55c8f113b0072004a7b364ff4 Author: Mark Rutland <mark.rutland@arm.com> Date: Fri Dec 7 18:39:30 2018 +0000 arm64: enable pointer authentication Now that all the necessary bits are in place for userspace, add the necessary Kconfig logic to allow this to be enabled. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> So predictably we failed at enabling PAuth somewhere between the kernel and QEMU. I'm guessing the kernel so far has been tested on the fast model with a full chain of TF, UEFI and kernel? I think Richard's tests were without EL2 enabled. So in the case that the kernel boots in EL2 is it expecting anyone else to deal with Pauth exceptions or should it be able to cope with an enabled Pauth but no firmware underneath it? Either we've got something wrong or we'll need to rethink what features the user can have enabled by -cpu max on a direct kernel boot. -- Alex Bennée ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled 2019-01-29 11:08 [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled Alex Bennée @ 2019-01-29 11:46 ` Mark Rutland 2019-01-29 11:54 ` Peter Maydell 0 siblings, 1 reply; 4+ messages in thread From: Mark Rutland @ 2019-01-29 11:46 UTC (permalink / raw) To: Alex Bennée Cc: qemu-devel, open list:ARM, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, Richard Henderson, Marc Zyngier, Kristina Martsenko Hi, [adding Kristina, who is in charge of Linux pointer authentication] On Tue, Jan 29, 2019 at 11:08:19AM +0000, Alex Bennée wrote: > Hi, > > Following up on yesterday's discussion on IRC I thought I'd better > report on my findings in the permanent record so things don't get lost. > > As I tend to periodically rebuild my test kernels from the current > state of linux.git I occasionally run into these things. My test > invocation is: > > qemu-system-aarch64 -machine type=virt,virtualization=on \ > -display none -m 4096 -serial mon:stdio \ > -kernel ../../kernel-v8-plain.build/arch/arm64/boot/Image \ > -append 'console=ttyAMA0 panic=-1' -no-reboot -cpu max > > The kernel is essentially a defconfig kernel with a bunch of the VIRTIO > device drivers built-in for when I actually boot a more complex setup > with disks and drives. However this is a boot test so doesn't really > matter. > > The -machine type=virt,virtualization=on enables our virt machine model > with EL2 turned on. As there is no BIOS involved the kernel is invoked > directly at EL2. > > The -cpu max enabled a cortex-a57 + whatever extra features we've > enabled in QEMU so far. It won't match any "real" CPU but it should be > architecturally correct in so far we implement prerequisite features for > any given feature. The cpuid feature bits should also be correct as we > test them internally in QEMU to enable features. Just to check, does this enable VHE? > The breakage is the kernel never boots (no output on serial port) and on > attaching with gdb I found it stuck in: > > (gdb) bt > #0 0xffffff8010a9e480 in overflow_stack () > Backtrace stopped: not enough registers or memory available to unwind further > > If I turn on exception tracing it looks like we go into an exception > loop. As mentioned on IRC, this looks very odd, since overflow_stack is a data pointer, not code. I can't presently see how we could branch here. If you pass the kernel 'earlycon keep_bootcon', do you get any output? > On the QEMU side this breakage comes in at: > > commit 1ce32e47db52e3511132c7104770eae65d412144 (HEAD, refs/bisect/bad) > Author: Richard Henderson <richard.henderson@linaro.org> > Date: Mon Jan 21 10:23:13 2019 +0000 > > target/arm: Enable PAuth for -cpu max > > Reviewed-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > Message-id: 20190108223129.5570-30-richard.henderson@linaro.org > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > > and as you would expect the system boots fine with -cpu cortex-a57 > > On the kernel side it breaks at: > > commit 04ca3204fa09f5f55c8f113b0072004a7b364ff4 > Author: Mark Rutland <mark.rutland@arm.com> > Date: Fri Dec 7 18:39:30 2018 +0000 > > arm64: enable pointer authentication > > Now that all the necessary bits are in place for userspace, add the > necessary Kconfig logic to allow this to be enabled. > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > Signed-off-by: Will Deacon <will.deacon@arm.com> > > So predictably we failed at enabling PAuth somewhere between the kernel > and QEMU. > > I'm guessing the kernel so far has been tested on the fast model with a > full chain of TF, UEFI and kernel? The kernel has been tested on a fast model with the Linux bootwrapper: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/ Kristina, could you confirm whether or not it's been tested with ATF+UEFI? > I think Richard's tests were without EL2 enabled. > > So in the case that the kernel boots in EL2 is it expecting anyone else > to deal with Pauth exceptions or should it be able to cope with an > enabled Pauth but no firmware underneath it? So long as the highest implemented exception level is EL2, the kernel should handle that itself. During boot we'll configure HCR_EL2.{API,APK} in el2_setup(). >From that point onwards, there should be no traps for pointer authentication functionality from EL1, AFAICT. > Either we've got something wrong or we'll need to rethink what features > the user can have enabled by -cpu max on a direct kernel boot. It's not immediately clear to me when precisely things are going wrong, so I think we need to narrow that down first. For example, it's not clear whether a trap is being taken, or something is unexpectedly behaving is UNDEF. Is it possible to watch the exception vectors to see if/when an exception is taken, and from where? Thanks, Mark. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled 2019-01-29 11:46 ` Mark Rutland @ 2019-01-29 11:54 ` Peter Maydell 2019-01-29 12:51 ` Mark Rutland 0 siblings, 1 reply; 4+ messages in thread From: Peter Maydell @ 2019-01-29 11:54 UTC (permalink / raw) To: Mark Rutland Cc: Alex Bennée, Marc Zyngier, Richard Henderson, qemu-devel, Kristina Martsenko, open list:ARM, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org On Tue, 29 Jan 2019 at 11:46, Mark Rutland <mark.rutland@arm.com> wrote: > On Tue, Jan 29, 2019 at 11:08:19AM +0000, Alex Bennée wrote: > > The -cpu max enabled a cortex-a57 + whatever extra features we've > > enabled in QEMU so far. It won't match any "real" CPU but it should be > > architecturally correct in so far we implement prerequisite features for > > any given feature. The cpuid feature bits should also be correct as we > > test them internally in QEMU to enable features. > > Just to check, does this enable VHE? It does not. (We have no implementation of VHE yet -- as and when we do implement that "-cpu max" will turn it on, but today it doesn't.) thanks -- PMM ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled 2019-01-29 11:54 ` Peter Maydell @ 2019-01-29 12:51 ` Mark Rutland 0 siblings, 0 replies; 4+ messages in thread From: Mark Rutland @ 2019-01-29 12:51 UTC (permalink / raw) To: Peter Maydell Cc: Alex Bennée, Marc Zyngier, Richard Henderson, qemu-devel, Kristina Martsenko, open list:ARM, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org On Tue, Jan 29, 2019 at 11:54:13AM +0000, Peter Maydell wrote: > On Tue, 29 Jan 2019 at 11:46, Mark Rutland <mark.rutland@arm.com> wrote: > > On Tue, Jan 29, 2019 at 11:08:19AM +0000, Alex Bennée wrote: > > > The -cpu max enabled a cortex-a57 + whatever extra features we've > > > enabled in QEMU so far. It won't match any "real" CPU but it should be > > > architecturally correct in so far we implement prerequisite features for > > > any given feature. The cpuid feature bits should also be correct as we > > > test them internally in QEMU to enable features. > > > > Just to check, does this enable VHE? > > It does not. (We have no implementation of VHE yet -- as and when > we do implement that "-cpu max" will turn it on, but today it doesn't.) Ok. In that case, the kernel should drop to EL1N and stay there after it configures EL2. There are no pointer authentication traps taken to EL1N, so if something is going wrong, I suspect either: * a trap is taken to EL2, and we hyp_panic() * an UNDEF is being taken to EL1N Thanks, Mark. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-01-29 12:51 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-01-29 11:08 [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled Alex Bennée 2019-01-29 11:46 ` Mark Rutland 2019-01-29 11:54 ` Peter Maydell 2019-01-29 12:51 ` Mark Rutland
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).