qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled
@ 2019-01-29 11:08 Alex Bennée
  2019-01-29 11:46 ` Mark Rutland
  0 siblings, 1 reply; 4+ messages in thread
From: Alex Bennée @ 2019-01-29 11:08 UTC (permalink / raw)
  To: qemu-devel, open list:ARM, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu
  Cc: Mark Rutland, Richard Henderson, Marc Zyngier


Hi,

Following up on yesterday's discussion on IRC I thought I'd better
report on my findings in the permanent record so things don't get lost.

As I tend to periodically rebuild my test kernels from the current
state of linux.git I occasionally run into these things. My test
invocation is:

  qemu-system-aarch64 -machine type=virt,virtualization=on \
                      -display none -m 4096 -serial mon:stdio \
                      -kernel ../../kernel-v8-plain.build/arch/arm64/boot/Image \
                      -append 'console=ttyAMA0 panic=-1' -no-reboot -cpu max

The kernel is essentially a defconfig kernel with a bunch of the VIRTIO
device drivers built-in for when I actually boot a more complex setup
with disks and drives. However this is a boot test so doesn't really
matter.

The -machine type=virt,virtualization=on enables our virt machine model
with EL2 turned on. As there is no BIOS involved the kernel is invoked
directly at EL2.

The -cpu max enabled a cortex-a57 + whatever extra features we've
enabled in QEMU so far. It won't match any "real" CPU but it should be
architecturally correct in so far we implement prerequisite features for
any given feature. The cpuid feature bits should also be correct as we
test them internally in QEMU to enable features.

The breakage is the kernel never boots (no output on serial port) and on
attaching with gdb I found it stuck in:

  (gdb) bt
  #0  0xffffff8010a9e480 in overflow_stack ()
  Backtrace stopped: not enough registers or memory available to unwind further

If I turn on exception tracing it looks like we go into an exception
loop.

On the QEMU side this breakage comes in at:

  commit 1ce32e47db52e3511132c7104770eae65d412144 (HEAD, refs/bisect/bad)
  Author: Richard Henderson <richard.henderson@linaro.org>
  Date:   Mon Jan 21 10:23:13 2019 +0000

      target/arm: Enable PAuth for -cpu max

      Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
      Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
      Message-id: 20190108223129.5570-30-richard.henderson@linaro.org
      Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

and as you would expect the system boots fine with -cpu cortex-a57

On the kernel side it breaks at:

  commit 04ca3204fa09f5f55c8f113b0072004a7b364ff4
  Author: Mark Rutland <mark.rutland@arm.com>
  Date:   Fri Dec 7 18:39:30 2018 +0000

      arm64: enable pointer authentication

      Now that all the necessary bits are in place for userspace, add the
      necessary Kconfig logic to allow this to be enabled.

      Signed-off-by: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: Will Deacon <will.deacon@arm.com>

So predictably we failed at enabling PAuth somewhere between the kernel
and QEMU.

I'm guessing the kernel so far has been tested on the fast model with a
full chain of TF, UEFI and kernel?

I think Richard's tests were without EL2 enabled.

So in the case that the kernel boots in EL2 is it expecting anyone else
to deal with Pauth exceptions or should it be able to cope with an
enabled Pauth but no firmware underneath it?

Either we've got something wrong or we'll need to rethink what features
the user can have enabled by -cpu max on a direct kernel boot.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled
  2019-01-29 11:08 [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled Alex Bennée
@ 2019-01-29 11:46 ` Mark Rutland
  2019-01-29 11:54   ` Peter Maydell
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Rutland @ 2019-01-29 11:46 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, open list:ARM, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, Richard Henderson, Marc Zyngier,
	Kristina Martsenko

Hi,

[adding Kristina, who is in charge of Linux pointer authentication]

On Tue, Jan 29, 2019 at 11:08:19AM +0000, Alex Bennée wrote:
> Hi,
> 
> Following up on yesterday's discussion on IRC I thought I'd better
> report on my findings in the permanent record so things don't get lost.
> 
> As I tend to periodically rebuild my test kernels from the current
> state of linux.git I occasionally run into these things. My test
> invocation is:
> 
>   qemu-system-aarch64 -machine type=virt,virtualization=on \
>                       -display none -m 4096 -serial mon:stdio \
>                       -kernel ../../kernel-v8-plain.build/arch/arm64/boot/Image \
>                       -append 'console=ttyAMA0 panic=-1' -no-reboot -cpu max
>
> The kernel is essentially a defconfig kernel with a bunch of the VIRTIO
> device drivers built-in for when I actually boot a more complex setup
> with disks and drives. However this is a boot test so doesn't really
> matter.
> 
> The -machine type=virt,virtualization=on enables our virt machine model
> with EL2 turned on. As there is no BIOS involved the kernel is invoked
> directly at EL2.
> 
> The -cpu max enabled a cortex-a57 + whatever extra features we've
> enabled in QEMU so far. It won't match any "real" CPU but it should be
> architecturally correct in so far we implement prerequisite features for
> any given feature. The cpuid feature bits should also be correct as we
> test them internally in QEMU to enable features.

Just to check, does this enable VHE?

> The breakage is the kernel never boots (no output on serial port) and on
> attaching with gdb I found it stuck in:
> 
>   (gdb) bt
>   #0  0xffffff8010a9e480 in overflow_stack ()
>   Backtrace stopped: not enough registers or memory available to unwind further
> 
> If I turn on exception tracing it looks like we go into an exception
> loop.

As mentioned on IRC, this looks very odd, since overflow_stack is a data
pointer, not code. I can't presently see how we could branch here.

If you pass the kernel 'earlycon keep_bootcon', do you get any output?

> On the QEMU side this breakage comes in at:
> 
>   commit 1ce32e47db52e3511132c7104770eae65d412144 (HEAD, refs/bisect/bad)
>   Author: Richard Henderson <richard.henderson@linaro.org>
>   Date:   Mon Jan 21 10:23:13 2019 +0000
> 
>       target/arm: Enable PAuth for -cpu max
> 
>       Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>       Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>       Message-id: 20190108223129.5570-30-richard.henderson@linaro.org
>       Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> 
> and as you would expect the system boots fine with -cpu cortex-a57
> 
> On the kernel side it breaks at:
> 
>   commit 04ca3204fa09f5f55c8f113b0072004a7b364ff4
>   Author: Mark Rutland <mark.rutland@arm.com>
>   Date:   Fri Dec 7 18:39:30 2018 +0000
> 
>       arm64: enable pointer authentication
> 
>       Now that all the necessary bits are in place for userspace, add the
>       necessary Kconfig logic to allow this to be enabled.
> 
>       Signed-off-by: Mark Rutland <mark.rutland@arm.com>
>       Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
>       Cc: Catalin Marinas <catalin.marinas@arm.com>
>       Cc: Will Deacon <will.deacon@arm.com>
>       Signed-off-by: Will Deacon <will.deacon@arm.com>
> 
> So predictably we failed at enabling PAuth somewhere between the kernel
> and QEMU.
> 
> I'm guessing the kernel so far has been tested on the fast model with a
> full chain of TF, UEFI and kernel?

The kernel has been tested on a fast model with the Linux bootwrapper:

https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/

Kristina, could you confirm whether or not it's been tested with
ATF+UEFI?

> I think Richard's tests were without EL2 enabled.
> 
> So in the case that the kernel boots in EL2 is it expecting anyone else
> to deal with Pauth exceptions or should it be able to cope with an
> enabled Pauth but no firmware underneath it?

So long as the highest implemented exception level is EL2, the kernel
should handle that itself. During boot we'll configure HCR_EL2.{API,APK}
in el2_setup().

>From that point onwards, there should be no traps for pointer
authentication functionality from EL1, AFAICT.

> Either we've got something wrong or we'll need to rethink what features
> the user can have enabled by -cpu max on a direct kernel boot.

It's not immediately clear to me when precisely things are going wrong,
so I think we need to narrow that down first. For example, it's not
clear whether a trap is being taken, or something is unexpectedly
behaving is UNDEF.

Is it possible to watch the exception vectors to see if/when an
exception is taken, and from where?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled
  2019-01-29 11:46 ` Mark Rutland
@ 2019-01-29 11:54   ` Peter Maydell
  2019-01-29 12:51     ` Mark Rutland
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Maydell @ 2019-01-29 11:54 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alex Bennée, Marc Zyngier, Richard Henderson, qemu-devel,
	Kristina Martsenko, open list:ARM, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org

On Tue, 29 Jan 2019 at 11:46, Mark Rutland <mark.rutland@arm.com> wrote:
> On Tue, Jan 29, 2019 at 11:08:19AM +0000, Alex Bennée wrote:
> > The -cpu max enabled a cortex-a57 + whatever extra features we've
> > enabled in QEMU so far. It won't match any "real" CPU but it should be
> > architecturally correct in so far we implement prerequisite features for
> > any given feature. The cpuid feature bits should also be correct as we
> > test them internally in QEMU to enable features.
>
> Just to check, does this enable VHE?

It does not. (We have no implementation of VHE yet -- as and when
we do implement that "-cpu max" will turn it on, but today it doesn't.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled
  2019-01-29 11:54   ` Peter Maydell
@ 2019-01-29 12:51     ` Mark Rutland
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Rutland @ 2019-01-29 12:51 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Alex Bennée, Marc Zyngier, Richard Henderson, qemu-devel,
	Kristina Martsenko, open list:ARM, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org

On Tue, Jan 29, 2019 at 11:54:13AM +0000, Peter Maydell wrote:
> On Tue, 29 Jan 2019 at 11:46, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Tue, Jan 29, 2019 at 11:08:19AM +0000, Alex Bennée wrote:
> > > The -cpu max enabled a cortex-a57 + whatever extra features we've
> > > enabled in QEMU so far. It won't match any "real" CPU but it should be
> > > architecturally correct in so far we implement prerequisite features for
> > > any given feature. The cpuid feature bits should also be correct as we
> > > test them internally in QEMU to enable features.
> >
> > Just to check, does this enable VHE?
> 
> It does not. (We have no implementation of VHE yet -- as and when
> we do implement that "-cpu max" will turn it on, but today it doesn't.)

Ok. In that case, the kernel should drop to EL1N and stay there after it
configures EL2.

There are no pointer authentication traps taken to EL1N, so if something
is going wrong, I suspect either:

* a trap is taken to EL2, and we hyp_panic()
* an UNDEF is being taken to EL1N

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-01-29 12:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-29 11:08 [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled Alex Bennée
2019-01-29 11:46 ` Mark Rutland
2019-01-29 11:54   ` Peter Maydell
2019-01-29 12:51     ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).