* [Qemu-devel] QEMU TCG issue when executing UEFI @ 2016-08-16 12:08 Ard Biesheuvel 2016-08-18 10:40 ` Peter Maydell 2016-08-18 14:10 ` Peter Maydell 0 siblings, 2 replies; 7+ messages in thread From: Ard Biesheuvel @ 2016-08-16 12:08 UTC (permalink / raw) To: QEMU Developers; +Cc: Peter Maydell Hello all, I am hitting this strange issue when executing the UEFI firmware for QEMU mach-virt/AArch64. This only occurs when building the firmware with GCC5 in RELEASE mode, but the failure mode suggests that this may not be relevant. Running a aarch64-softmmu QEMU built from today's master, I get $ qemu-system-aarch64 -M virt -nographic -cpu cortex-a53 -bios QEMU_EFI.fd add-symbol-file /home/ard/build/edk2/Build/ArmVirtQemu-AARCH64/RELEASE_GCC5/AARCH64/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore/DEBUG/ArmPlatformPrePeiCore.dll 0x1800 add-symbol-file /home/ard/build/edk2/Build/ArmVirtQemu-AARCH64/RELEASE_GCC5/AARCH64/MdeModulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll 0x7980 Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE Install PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3 Install PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A The 0th FV start address is 0x00000001000, size is 0x001FF000, handle is 0x1000 Register PPI Notify: 49EDB1C1-BF21-4761-BB12-EB0031AABB39 Register PPI Notify: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38 Install PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6 Install PPI: DBE23AA9-A345-4B97-85B6-B226F1617389 add-symbol-file /home/ard/build/edk2/Build/ArmVirtQemu-AARCH64/RELEASE_GCC5/AARCH64/MdeModulePkg/Universal/PCD/Pei/Pcd/DEBUG/PcdPeim.dll 0x16B80 Loading PEIM at 0x00000016AA0 EntryPoint=0x0000001789C PcdPeim.efi Install PPI: 06E81C58-4AD7-44BC-8390-F10265F72480 Install PPI: 01F34D25-4DE2-23AD-3FF3-36353FF323F1 Install PPI: 4D8B155B-C059-4C8F-8926-06FD4331DB8A Install PPI: A60C6B59-E459-425D-9C69-0BCC9CB27D81 Bad ram pointer 0x54 Aborted (core dumped) UEFI build is here http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.xz Thanks, Ard. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] QEMU TCG issue when executing UEFI 2016-08-16 12:08 [Qemu-devel] QEMU TCG issue when executing UEFI Ard Biesheuvel @ 2016-08-18 10:40 ` Peter Maydell 2016-08-18 10:43 ` Ard Biesheuvel 2016-08-18 14:10 ` Peter Maydell 1 sibling, 1 reply; 7+ messages in thread From: Peter Maydell @ 2016-08-18 10:40 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: QEMU Developers On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > I am hitting this strange issue when executing the UEFI firmware for > QEMU mach-virt/AArch64. This only occurs when building the firmware > with GCC5 in RELEASE mode, but the failure mode suggests that this may > not be relevant. Yeah, we shouldn't dump core even if the guest binary is doing weird stuff... > Running a aarch64-softmmu QEMU built from today's master, I get > > $ qemu-system-aarch64 -M virt -nographic -cpu cortex-a53 -bios QEMU_EFI.fd > Bad ram pointer 0x54 > Aborted (core dumped) > > UEFI build is here > http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.xz Thanks for the bug report -- I have reproduced it and will have a look. This bug is also present in QEMU 2.6, so this isn't a recent regression and likely not a blocker for 2.7 release (unless the bug turns out to have a simple fix and be of the "how did this ever work" flavour ;-)) thanks -- PMM ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] QEMU TCG issue when executing UEFI 2016-08-18 10:40 ` Peter Maydell @ 2016-08-18 10:43 ` Ard Biesheuvel 0 siblings, 0 replies; 7+ messages in thread From: Ard Biesheuvel @ 2016-08-18 10:43 UTC (permalink / raw) To: Peter Maydell; +Cc: QEMU Developers On 18 August 2016 at 12:40, Peter Maydell <peter.maydell@linaro.org> wrote: > On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> I am hitting this strange issue when executing the UEFI firmware for >> QEMU mach-virt/AArch64. This only occurs when building the firmware >> with GCC5 in RELEASE mode, but the failure mode suggests that this may >> not be relevant. > > Yeah, we shouldn't dump core even if the guest binary is doing > weird stuff... > Indeed. What I failed to mention is that this is an LTO build, which means the individual functions are much larger. Not sure how this should be relevant, but still worth mentioning, I suppose. >> Running a aarch64-softmmu QEMU built from today's master, I get >> >> $ qemu-system-aarch64 -M virt -nographic -cpu cortex-a53 -bios QEMU_EFI.fd > >> Bad ram pointer 0x54 >> Aborted (core dumped) >> >> UEFI build is here >> http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.xz > > Thanks for the bug report -- I have reproduced it and will have a look. > > This bug is also present in QEMU 2.6, so this isn't a recent regression > and likely not a blocker for 2.7 release (unless the bug turns out to > have a simple fix and be of the "how did this ever work" flavour ;-)) > Thanks. Let me know if you need any more info. -- Ard. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] QEMU TCG issue when executing UEFI 2016-08-16 12:08 [Qemu-devel] QEMU TCG issue when executing UEFI Ard Biesheuvel 2016-08-18 10:40 ` Peter Maydell @ 2016-08-18 14:10 ` Peter Maydell 2016-08-18 14:15 ` Ard Biesheuvel 1 sibling, 1 reply; 7+ messages in thread From: Peter Maydell @ 2016-08-18 14:10 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: QEMU Developers, Paolo Bonzini On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > Bad ram pointer 0x54 > Aborted (core dumped) So the reason this happens is that get_page_addr_code() doesn't correctly handle the case of the memory region being a ROM that's not in ROMD mode. That is, the flash memory can be either in "reads map directly to guest memory" (normal) mode or "reads are MMIO to a device" (ROMD) mode. QEMU can't execute from devices, so the best case here would be that we print the "Sorry, we can't execute from a device" message and stop execution. Treating the flash device's "return the current status" bytes as code probably wasn't what you wanted to do anyway :-) In more detail: when we call get_page_addr_code() for this address, we notice that there is no TLB entry for it, and so we call cpu_ldub_code() which is supposed to fill the TLB. This ends up calling tlb_set_page_with_attrs(), which for a not-RAM-not-ROMD MR will set the addend to 0 and then OR TLB_MMIO into the address field (rather than setting the addend to the right offset to get between the guest address and the host RAM address). get_page_addr_code() unfortunately then uses a different condition when it distinguishes "is this an IO address we can't handle" from "is this RAM", which means it takes the path for "treat the addend as the offset between guest and host", resulting in a completely bogus host address. thanks -- PMM ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] QEMU TCG issue when executing UEFI 2016-08-18 14:10 ` Peter Maydell @ 2016-08-18 14:15 ` Ard Biesheuvel 2016-08-18 14:36 ` Peter Maydell 0 siblings, 1 reply; 7+ messages in thread From: Ard Biesheuvel @ 2016-08-18 14:15 UTC (permalink / raw) To: Peter Maydell; +Cc: QEMU Developers, Paolo Bonzini On 18 August 2016 at 16:10, Peter Maydell <peter.maydell@linaro.org> wrote: > On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> Bad ram pointer 0x54 >> Aborted (core dumped) > > So the reason this happens is that get_page_addr_code() doesn't > correctly handle the case of the memory region being a > ROM that's not in ROMD mode. That is, the flash memory can > be either in "reads map directly to guest memory" (normal) > mode or "reads are MMIO to a device" (ROMD) mode. QEMU > can't execute from devices, so the best case here would > be that we print the "Sorry, we can't execute from a device" > message and stop execution. > So is there a spurious write somewhere that causes the ROM to switch into ROMD mode? Because it executes happily from ROM (until it doesn't, of course) > Treating the flash device's "return the current status" > bytes as code probably wasn't what you wanted to do anyway :-) > > In more detail: when we call get_page_addr_code() for this > address, we notice that there is no TLB entry for it, and > so we call cpu_ldub_code() which is supposed to fill the TLB. > This ends up calling tlb_set_page_with_attrs(), which for a > not-RAM-not-ROMD MR will set the addend to 0 and then OR > TLB_MMIO into the address field (rather than setting the > addend to the right offset to get between the guest > address and the host RAM address). get_page_addr_code() > unfortunately then uses a different condition when it > distinguishes "is this an IO address we can't handle" > from "is this RAM", which means it takes the path for > "treat the addend as the offset between guest and host", > resulting in a completely bogus host address. > OK, so that sounds like something that should be fixable. But I still don't understand if the guest code is doing anything wrong. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] QEMU TCG issue when executing UEFI 2016-08-18 14:15 ` Ard Biesheuvel @ 2016-08-18 14:36 ` Peter Maydell 2016-08-18 16:17 ` Ard Biesheuvel 0 siblings, 1 reply; 7+ messages in thread From: Peter Maydell @ 2016-08-18 14:36 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: QEMU Developers, Paolo Bonzini On 18 August 2016 at 15:15, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 18 August 2016 at 16:10, Peter Maydell <peter.maydell@linaro.org> wrote: >> On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>> Bad ram pointer 0x54 >>> Aborted (core dumped) >> >> So the reason this happens is that get_page_addr_code() doesn't >> correctly handle the case of the memory region being a >> ROM that's not in ROMD mode. That is, the flash memory can >> be either in "reads map directly to guest memory" (normal) >> mode or "reads are MMIO to a device" (ROMD) mode. QEMU >> can't execute from devices, so the best case here would >> be that we print the "Sorry, we can't execute from a device" >> message and stop execution. >> > > So is there a spurious write somewhere that causes the ROM to switch > into ROMD mode? Because it executes happily from ROM (until it > doesn't, of course) The write that causes us to go into not-ROMD mode is in this block: 0x00000000000096ac: cb000294 sub x20, x20, x0 0x00000000000096b0: f9000a74 str x20, [x19, #16] 0x00000000000096b4: 9100627c add x28, x19, #0x18 (24) 0x00000000000096b8: b9400780 ldr w0, [x28, #4] 0x00000000000096bc: 35002cc0 cbnz w0, #+0x598 (addr 0x9c54) which is executed with PC=00000000000096ac SP=000000004007f590 X00=0000000000000160 X01=0000000000000095 X02=000000003031424e X03=0000000000001b40 X04=0000000000010b64 X05=0000000000000160 X06=0000000000000188 X07=000000004007c268 X08=00000000000149a0 X09=000000004007fe58 X10=000000004007f793 X11=0000000000000002 X12=00000000707fe07a X13=0000000000000002 X14=0000000000000000 X15=0000000000000000 X16=0000000000000000 X17=0000000000000000 X18=0000000000000000 X19=00000000000149a0 X20=00000000000149a0 X21=00000000000149a0 X22=0000000000000001 X23=0000000000000160 X24=000000004007fa24 X25=000000004007fa38 X26=0000000000000000 X27=0000000000014840 X28=00000000000149a0 X29=0000000000000000 X30=0000000000009364 PSTATE=200003c5 --C- EL1h so you write 0x14840 to address 0x149b0, which is in the flash. (This is the last TB we execute, because trying to find the next one hits the problem of the flash not being in ROMD mode. So it's the very last thing in the log if you run QEMU with -d in_asm,out_asm,exec,cpu,int -D /tmp/q.log) thanks -- PMM ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] QEMU TCG issue when executing UEFI 2016-08-18 14:36 ` Peter Maydell @ 2016-08-18 16:17 ` Ard Biesheuvel 0 siblings, 0 replies; 7+ messages in thread From: Ard Biesheuvel @ 2016-08-18 16:17 UTC (permalink / raw) To: Peter Maydell, Leif Lindholm; +Cc: QEMU Developers, Paolo Bonzini (+ Leif) Exec summary: strange QEMU bug triggered by RELEASE_GCC5 code, which is caused by a spurious write to the NOR flash at runtime. The latter is also a bug, in Tianocore. On 18 August 2016 at 16:36, Peter Maydell <peter.maydell@linaro.org> wrote: > On 18 August 2016 at 15:15, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> On 18 August 2016 at 16:10, Peter Maydell <peter.maydell@linaro.org> wrote: >>> On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>>> Bad ram pointer 0x54 >>>> Aborted (core dumped) >>> >>> So the reason this happens is that get_page_addr_code() doesn't >>> correctly handle the case of the memory region being a >>> ROM that's not in ROMD mode. That is, the flash memory can >>> be either in "reads map directly to guest memory" (normal) >>> mode or "reads are MMIO to a device" (ROMD) mode. QEMU >>> can't execute from devices, so the best case here would >>> be that we print the "Sorry, we can't execute from a device" >>> message and stop execution. >>> >> >> So is there a spurious write somewhere that causes the ROM to switch >> into ROMD mode? Because it executes happily from ROM (until it >> doesn't, of course) > > The write that causes us to go into not-ROMD mode is in this block: > > 0x00000000000096ac: cb000294 sub x20, x20, x0 > 0x00000000000096b0: f9000a74 str x20, [x19, #16] > 0x00000000000096b4: 9100627c add x28, x19, #0x18 (24) > 0x00000000000096b8: b9400780 ldr w0, [x28, #4] > 0x00000000000096bc: 35002cc0 cbnz w0, #+0x598 (addr 0x9c54) > > which is executed with > > PC=00000000000096ac SP=000000004007f590 > X00=0000000000000160 X01=0000000000000095 X02=000000003031424e > X03=0000000000001b40 > X04=0000000000010b64 X05=0000000000000160 X06=0000000000000188 > X07=000000004007c268 > X08=00000000000149a0 X09=000000004007fe58 X10=000000004007f793 > X11=0000000000000002 > X12=00000000707fe07a X13=0000000000000002 X14=0000000000000000 > X15=0000000000000000 > X16=0000000000000000 X17=0000000000000000 X18=0000000000000000 > X19=00000000000149a0 > X20=00000000000149a0 X21=00000000000149a0 X22=0000000000000001 > X23=0000000000000160 > X24=000000004007fa24 X25=000000004007fa38 X26=0000000000000000 > X27=0000000000014840 > X28=00000000000149a0 X29=0000000000000000 X30=0000000000009364 > PSTATE=200003c5 --C- EL1h > > so you write 0x14840 to address 0x149b0, which is in the flash. > > (This is the last TB we execute, because trying to find the > next one hits the problem of the flash not being in ROMD mode. > So it's the very last thing in the log if you run QEMU with > -d in_asm,out_asm,exec,cpu,int -D /tmp/q.log) > OK, this rabbit hole goes pretty deep :-) Normally, the uncompressed PE/COFF images in the NOR flash (the ones that set up the MMU etc) are relocated at build time, so that they can execute from the offset they end up at in the NOR image. The relocation code sets the base address in the image header, and applies all fixups in the .reloc PE/COFF section. As it turns out, the LTO is so effective that it optimizes away all absolute symbol references, leaving us with no .reloc section at all. (i.e., the module turns out completely position independent, but purely by accident). The runtime loader does not cope well with this, and ends up writing to the NOR flash, triggering the issue above. Thanks, Ard. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-08-18 16:17 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-08-16 12:08 [Qemu-devel] QEMU TCG issue when executing UEFI Ard Biesheuvel 2016-08-18 10:40 ` Peter Maydell 2016-08-18 10:43 ` Ard Biesheuvel 2016-08-18 14:10 ` Peter Maydell 2016-08-18 14:15 ` Ard Biesheuvel 2016-08-18 14:36 ` Peter Maydell 2016-08-18 16:17 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).