[Qemu-devel] QEMU TCG issue when executing UEFI

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] QEMU TCG issue when executing UEFI
@ 2016-08-16 12:08 Ard Biesheuvel
  2016-08-18 10:40 ` Peter Maydell
  2016-08-18 14:10 ` Peter Maydell
  0 siblings, 2 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2016-08-16 12:08 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Peter Maydell

Hello all,

I am hitting this strange issue when executing the UEFI firmware for
QEMU mach-virt/AArch64. This only occurs when building the firmware
with GCC5 in RELEASE mode, but the failure mode suggests that this may
not be relevant.

Running a aarch64-softmmu QEMU built from today's master, I get

$ qemu-system-aarch64 -M virt -nographic -cpu cortex-a53 -bios QEMU_EFI.fd

add-symbol-file
/home/ard/build/edk2/Build/ArmVirtQemu-AARCH64/RELEASE_GCC5/AARCH64/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore/DEBUG/ArmPlatformPrePeiCore.dll
0x1800
add-symbol-file
/home/ard/build/edk2/Build/ArmVirtQemu-AARCH64/RELEASE_GCC5/AARCH64/MdeModulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll
0x7980
Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE
Install PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
Install PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
The 0th FV start address is 0x00000001000, size is 0x001FF000, handle is 0x1000
Register PPI Notify: 49EDB1C1-BF21-4761-BB12-EB0031AABB39
Register PPI Notify: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38
Install PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6
Install PPI: DBE23AA9-A345-4B97-85B6-B226F1617389
add-symbol-file
/home/ard/build/edk2/Build/ArmVirtQemu-AARCH64/RELEASE_GCC5/AARCH64/MdeModulePkg/Universal/PCD/Pei/Pcd/DEBUG/PcdPeim.dll
0x16B80
Loading PEIM at 0x00000016AA0 EntryPoint=0x0000001789C PcdPeim.efi
Install PPI: 06E81C58-4AD7-44BC-8390-F10265F72480
Install PPI: 01F34D25-4DE2-23AD-3FF3-36353FF323F1
Install PPI: 4D8B155B-C059-4C8F-8926-06FD4331DB8A
Install PPI: A60C6B59-E459-425D-9C69-0BCC9CB27D81
Bad ram pointer 0x54
Aborted (core dumped)

UEFI build is here
http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.xz

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU TCG issue when executing UEFI
  2016-08-16 12:08 [Qemu-devel] QEMU TCG issue when executing UEFI Ard Biesheuvel
@ 2016-08-18 10:40 ` Peter Maydell
  2016-08-18 10:43   ` Ard Biesheuvel
  2016-08-18 14:10 ` Peter Maydell
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2016-08-18 10:40 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: QEMU Developers

On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> I am hitting this strange issue when executing the UEFI firmware for
> QEMU mach-virt/AArch64. This only occurs when building the firmware
> with GCC5 in RELEASE mode, but the failure mode suggests that this may
> not be relevant.

Yeah, we shouldn't dump core even if the guest binary is doing
weird stuff...

> Running a aarch64-softmmu QEMU built from today's master, I get
>
> $ qemu-system-aarch64 -M virt -nographic -cpu cortex-a53 -bios QEMU_EFI.fd

> Bad ram pointer 0x54
> Aborted (core dumped)
>
> UEFI build is here
> http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.xz

Thanks for the bug report -- I have reproduced it and will have a look.

This bug is also present in QEMU 2.6, so this isn't a recent regression
and likely not a blocker for 2.7 release (unless the bug turns out to
have a simple fix and be of the "how did this ever work" flavour ;-))

thanks
-- PMM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU TCG issue when executing UEFI
  2016-08-18 10:40 ` Peter Maydell
@ 2016-08-18 10:43   ` Ard Biesheuvel
  0 siblings, 0 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2016-08-18 10:43 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 18 August 2016 at 12:40, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> I am hitting this strange issue when executing the UEFI firmware for
>> QEMU mach-virt/AArch64. This only occurs when building the firmware
>> with GCC5 in RELEASE mode, but the failure mode suggests that this may
>> not be relevant.
>
> Yeah, we shouldn't dump core even if the guest binary is doing
> weird stuff...
>

Indeed. What I failed to mention is that this is an LTO build, which
means the individual functions are much larger. Not sure how this
should be relevant, but still worth mentioning, I suppose.

>> Running a aarch64-softmmu QEMU built from today's master, I get
>>
>> $ qemu-system-aarch64 -M virt -nographic -cpu cortex-a53 -bios QEMU_EFI.fd
>
>> Bad ram pointer 0x54
>> Aborted (core dumped)
>>
>> UEFI build is here
>> http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.xz
>
> Thanks for the bug report -- I have reproduced it and will have a look.
>
> This bug is also present in QEMU 2.6, so this isn't a recent regression
> and likely not a blocker for 2.7 release (unless the bug turns out to
> have a simple fix and be of the "how did this ever work" flavour ;-))
>

Thanks. Let me know if you need any more info.

-- 
Ard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU TCG issue when executing UEFI
  2016-08-16 12:08 [Qemu-devel] QEMU TCG issue when executing UEFI Ard Biesheuvel
  2016-08-18 10:40 ` Peter Maydell
@ 2016-08-18 14:10 ` Peter Maydell
  2016-08-18 14:15   ` Ard Biesheuvel
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2016-08-18 14:10 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: QEMU Developers, Paolo Bonzini

On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> Bad ram pointer 0x54
> Aborted (core dumped)

So the reason this happens is that get_page_addr_code() doesn't
correctly handle the case of the memory region being a
ROM that's not in ROMD mode. That is, the flash memory can
be either in "reads map directly to guest memory" (normal)
mode or "reads are MMIO to a device" (ROMD) mode. QEMU
can't execute from devices, so the best case here would
be that we print the "Sorry, we can't execute from a device"
message and stop execution.

Treating the flash device's "return the current status"
bytes as code probably wasn't what you wanted to do anyway :-)

In more detail: when we call get_page_addr_code() for this
address, we notice that there is no TLB entry for it, and
so we call cpu_ldub_code() which is supposed to fill the TLB.
This ends up calling tlb_set_page_with_attrs(), which for a
not-RAM-not-ROMD MR will set the addend to 0 and then OR
TLB_MMIO into the address field (rather than setting the
addend to the right offset to get between the guest
address and the host RAM address). get_page_addr_code()
unfortunately then uses a different condition when it
distinguishes "is this an IO address we can't handle"
from "is this RAM", which means it takes the path for
"treat the addend as the offset between guest and host",
resulting in a completely bogus host address.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU TCG issue when executing UEFI
  2016-08-18 14:10 ` Peter Maydell
@ 2016-08-18 14:15   ` Ard Biesheuvel
  2016-08-18 14:36     ` Peter Maydell
  0 siblings, 1 reply; 7+ messages in thread
From: Ard Biesheuvel @ 2016-08-18 14:15 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Paolo Bonzini

On 18 August 2016 at 16:10, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> Bad ram pointer 0x54
>> Aborted (core dumped)
>
> So the reason this happens is that get_page_addr_code() doesn't
> correctly handle the case of the memory region being a
> ROM that's not in ROMD mode. That is, the flash memory can
> be either in "reads map directly to guest memory" (normal)
> mode or "reads are MMIO to a device" (ROMD) mode. QEMU
> can't execute from devices, so the best case here would
> be that we print the "Sorry, we can't execute from a device"
> message and stop execution.
>

So is there a spurious write somewhere that causes the ROM to switch
into ROMD mode? Because it executes happily from ROM (until it
doesn't, of course)

> Treating the flash device's "return the current status"
> bytes as code probably wasn't what you wanted to do anyway :-)
>
> In more detail: when we call get_page_addr_code() for this
> address, we notice that there is no TLB entry for it, and
> so we call cpu_ldub_code() which is supposed to fill the TLB.
> This ends up calling tlb_set_page_with_attrs(), which for a
> not-RAM-not-ROMD MR will set the addend to 0 and then OR
> TLB_MMIO into the address field (rather than setting the
> addend to the right offset to get between the guest
> address and the host RAM address). get_page_addr_code()
> unfortunately then uses a different condition when it
> distinguishes "is this an IO address we can't handle"
> from "is this RAM", which means it takes the path for
> "treat the addend as the offset between guest and host",
> resulting in a completely bogus host address.
>

OK, so that sounds like something that should be fixable. But I still
don't understand if the guest code is doing anything wrong.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU TCG issue when executing UEFI
  2016-08-18 14:15   ` Ard Biesheuvel
@ 2016-08-18 14:36     ` Peter Maydell
  2016-08-18 16:17       ` Ard Biesheuvel
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2016-08-18 14:36 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: QEMU Developers, Paolo Bonzini

On 18 August 2016 at 15:15, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 18 August 2016 at 16:10, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>> Bad ram pointer 0x54
>>> Aborted (core dumped)
>>
>> So the reason this happens is that get_page_addr_code() doesn't
>> correctly handle the case of the memory region being a
>> ROM that's not in ROMD mode. That is, the flash memory can
>> be either in "reads map directly to guest memory" (normal)
>> mode or "reads are MMIO to a device" (ROMD) mode. QEMU
>> can't execute from devices, so the best case here would
>> be that we print the "Sorry, we can't execute from a device"
>> message and stop execution.
>>
>
> So is there a spurious write somewhere that causes the ROM to switch
> into ROMD mode? Because it executes happily from ROM (until it
> doesn't, of course)

The write that causes us to go into not-ROMD mode is in this block:

0x00000000000096ac:  cb000294      sub x20, x20, x0
0x00000000000096b0:  f9000a74      str x20, [x19, #16]
0x00000000000096b4:  9100627c      add x28, x19, #0x18 (24)
0x00000000000096b8:  b9400780      ldr w0, [x28, #4]
0x00000000000096bc:  35002cc0      cbnz w0, #+0x598 (addr 0x9c54)

which is executed with

PC=00000000000096ac  SP=000000004007f590
X00=0000000000000160 X01=0000000000000095 X02=000000003031424e
X03=0000000000001b40
X04=0000000000010b64 X05=0000000000000160 X06=0000000000000188
X07=000000004007c268
X08=00000000000149a0 X09=000000004007fe58 X10=000000004007f793
X11=0000000000000002
X12=00000000707fe07a X13=0000000000000002 X14=0000000000000000
X15=0000000000000000
X16=0000000000000000 X17=0000000000000000 X18=0000000000000000
X19=00000000000149a0
X20=00000000000149a0 X21=00000000000149a0 X22=0000000000000001
X23=0000000000000160
X24=000000004007fa24 X25=000000004007fa38 X26=0000000000000000
X27=0000000000014840
X28=00000000000149a0 X29=0000000000000000 X30=0000000000009364
PSTATE=200003c5 --C- EL1h

so you write 0x14840 to address 0x149b0, which is in the flash.

(This is the last TB we execute, because trying to find the
next one hits the problem of the flash not being in ROMD mode.
So it's the very last thing in the log if you run QEMU with
-d in_asm,out_asm,exec,cpu,int -D /tmp/q.log)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU TCG issue when executing UEFI
  2016-08-18 14:36     ` Peter Maydell
@ 2016-08-18 16:17       ` Ard Biesheuvel
  0 siblings, 0 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2016-08-18 16:17 UTC (permalink / raw)
  To: Peter Maydell, Leif Lindholm; +Cc: QEMU Developers, Paolo Bonzini

(+ Leif)

Exec summary: strange QEMU bug triggered by RELEASE_GCC5 code, which
is caused by a spurious write to the NOR flash at runtime. The latter
is also a bug, in Tianocore.

On 18 August 2016 at 16:36, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 18 August 2016 at 15:15, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 18 August 2016 at 16:10, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>> Bad ram pointer 0x54
>>>> Aborted (core dumped)
>>>
>>> So the reason this happens is that get_page_addr_code() doesn't
>>> correctly handle the case of the memory region being a
>>> ROM that's not in ROMD mode. That is, the flash memory can
>>> be either in "reads map directly to guest memory" (normal)
>>> mode or "reads are MMIO to a device" (ROMD) mode. QEMU
>>> can't execute from devices, so the best case here would
>>> be that we print the "Sorry, we can't execute from a device"
>>> message and stop execution.
>>>
>>
>> So is there a spurious write somewhere that causes the ROM to switch
>> into ROMD mode? Because it executes happily from ROM (until it
>> doesn't, of course)
>
> The write that causes us to go into not-ROMD mode is in this block:
>
> 0x00000000000096ac:  cb000294      sub x20, x20, x0
> 0x00000000000096b0:  f9000a74      str x20, [x19, #16]
> 0x00000000000096b4:  9100627c      add x28, x19, #0x18 (24)
> 0x00000000000096b8:  b9400780      ldr w0, [x28, #4]
> 0x00000000000096bc:  35002cc0      cbnz w0, #+0x598 (addr 0x9c54)
>
> which is executed with
>
> PC=00000000000096ac  SP=000000004007f590
> X00=0000000000000160 X01=0000000000000095 X02=000000003031424e
> X03=0000000000001b40
> X04=0000000000010b64 X05=0000000000000160 X06=0000000000000188
> X07=000000004007c268
> X08=00000000000149a0 X09=000000004007fe58 X10=000000004007f793
> X11=0000000000000002
> X12=00000000707fe07a X13=0000000000000002 X14=0000000000000000
> X15=0000000000000000
> X16=0000000000000000 X17=0000000000000000 X18=0000000000000000
> X19=00000000000149a0
> X20=00000000000149a0 X21=00000000000149a0 X22=0000000000000001
> X23=0000000000000160
> X24=000000004007fa24 X25=000000004007fa38 X26=0000000000000000
> X27=0000000000014840
> X28=00000000000149a0 X29=0000000000000000 X30=0000000000009364
> PSTATE=200003c5 --C- EL1h
>
> so you write 0x14840 to address 0x149b0, which is in the flash.
>
> (This is the last TB we execute, because trying to find the
> next one hits the problem of the flash not being in ROMD mode.
> So it's the very last thing in the log if you run QEMU with
> -d in_asm,out_asm,exec,cpu,int -D /tmp/q.log)
>

OK, this rabbit hole goes pretty deep :-)

Normally, the uncompressed PE/COFF images in the NOR flash (the ones
that set up the MMU etc) are relocated at build time, so that they can
execute from the offset they end up at in the NOR image. The
relocation code sets the base address in the image header, and applies
all fixups in the .reloc PE/COFF section.

As it turns out, the LTO is so effective that it optimizes away all
absolute symbol references, leaving us with no .reloc section at all.
(i.e., the module turns out completely position independent, but
purely by accident). The runtime loader does not cope well with this,
and ends up writing to the NOR flash, triggering the issue above.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-18 16:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-16 12:08 [Qemu-devel] QEMU TCG issue when executing UEFI Ard Biesheuvel
2016-08-18 10:40 ` Peter Maydell
2016-08-18 10:43   ` Ard Biesheuvel
2016-08-18 14:10 ` Peter Maydell
2016-08-18 14:15   ` Ard Biesheuvel
2016-08-18 14:36     ` Peter Maydell
2016-08-18 16:17       ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).