On 04/10/15 13:04, Laszlo Ersek wrote: > On 04/10/15 12:06, Laszlo Ersek wrote: >> On 04/10/15 10:14, Gerd Hoffmann wrote: >>> Hi, >>> >>>> In summary, please ask Gerd to rebuild the ipxe binaries that are >>>> bundled with upstream qemu such that they include those two iPXE patches >>>> of ours (see the last reference). >>> >>> https://www.kraxel.org/cgit/qemu/log/?h=rebase/roms-next >>> >>> Can you give this a try? >> >> Thank you for this update, I tested it. >> >> (1) I reproduced the issue, so that I could be sure that the fix wasn't >> meaningless. Indeed the bug reproduces with the iPXE binaries bundled >> with upstream qemu. >> >> I then checked out, built and installed your branch, and tried again, >> with virtio-net and then e1000. >> >> (2) Virito-net results: >> - OVMF loads shim.efi via network >> - shim.efi loads grubx64.efi via network >> - grubx64.efi loads grub.cfg via network >> - grubx64.efi loads vmlinuz via network >> >> However, while grubx64.efi loads initrd.img via the network, qemu >> crashes the guest, with the following message: >> >> qemu-system-x86_64: Guest moved used index from 46499 to 65534 >> >> This is a virtio protocol bug in the guest (efi-virtio.rom), *or* in >> QEMU. I don't know. >> >> * e1000 results: >> - OVMF loads shim.efi via network >> - shim.efi loads grubx64.efi via network >> - grubx64.efi loads grub.cfg via network >> - grubx64.efi loads vmlinuz via network >> - grubx64.efi loads initrd.img via network >> - guest kernel boots >> >> So, I think the update is fine in general; but maybe there's a new >> virtio-related bug in either "efi-virtio.rom" or in QEMU. >> >> (When I originally wrote the (earlier versions of the) patches, I tested >> them with virtio-net using RHEL-7 qemu, so I guess this could be an >> upstream QEMU regression. The machine type I used for testing was >> pc-i440fx-2.3.) >> >> (3) ... Confirmed, this is a qemu regression. Namely, I checked your new >> efi-virtio.rom with RHEL-7 qemu, and it works fine. CC'ing qemu-devel. > > Small update, before I start bisecting it: the bug does not reproduce > with "-netdev bridge". > > It seems to be specific to "-netdev tap". Further, "vhost=on" seems to > play no role, "-netdev tap" reproduces the error both with and without > vhost=on. This is creepy. It was not easy to bisect, because machine type "pc-i440fx-2.3" is obviously not available in eg. v2.2.0. Ultimately I realized that machine type pc-i440fx-2.0 does not reproduce the error, even with current master. So I picked machine type pc-i440fx-2.1, and bisected the interval between the introduction of "pc-i440fx-2.1" (commit 3458b2b0) and current master (commit 6a460ed1). Log attached. The result makes me question my sanity, or at least that I issued the correct "git bisect bad" and "git bisect good" commands. This is the culprit: commit 18045fb9f457a0f0cba2bd113c748a2dcb4ed39e Author: Paolo Bonzini Date: Mon Jul 28 17:34:16 2014 +0200 pc: future-proof migration-compatibility of ACPI tables This patch avoids that similar changes break QEMU again in the future. QEMU will now hard-code 64k as the maximum ACPI table size, which (despite being an order of magnitude smaller than 640k) should be enough for everyone. Reviewed-by: Laszlo Ersek Tested-by: Igor Mammedov Signed-off-by: Paolo Bonzini Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin How?! Anyway, then I patched qemu, on top of current master, still sticking with machine type "pc-i440fx-2.1", as follows: ----------- diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 1fe7bfb..6cb00a2 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -344,6 +344,7 @@ static void pc_compat_2_1(MachineState *machine) x86_cpu_compat_set_features("core2duo", FEAT_1_ECX, CPUID_EXT_VMX, 0); x86_cpu_compat_kvm_no_autodisable(FEAT_8000_0001_ECX, CPUID_EXT3_SVM); pcms->enforce_aligned_dimm = false; + legacy_acpi_table_size = 6652; } static void pc_compat_2_0(MachineState *machine) ----------- Incredibly, this made the crash go away. Without this patch (ie. when it crashes), the fw_cfg file called "etc/acpi/tables" has size 0x20000. With the patch (which happens to suppress the crash for some reason), the same fw_cfg file has size 0x2000 (1/16th). This is consistent with the branches in acpi_build(). (Note that the warning block visible there, in the second branch, is never printed.) It seems very unlikely that qemu is doing anything wrong. The difference in the fw_cfg file size causes a differently sized memory allocation in OVMF, which displaces further allocations by 1 page (4KB). For example, "1af41000.efi" (the iPXE virtio-net driver) is also loaded 4KB higher than before. But that doesn't directly explain why grub places garbage in the virtio-net ring while it downloads "initrd.img". Anyway I think we can rule out any qemu regression at this point. It's a bug in some other component that the different memory map (due to the larger, 0x20000 allocation) exposes. Thanks, Laszlo