From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53782) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yga2h-0002HG-Jk for qemu-devel@nongnu.org; Fri, 10 Apr 2015 10:36:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yga2e-0005Wq-6C for qemu-devel@nongnu.org; Fri, 10 Apr 2015 10:36:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41692) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yga2d-0005Wm-T0 for qemu-devel@nongnu.org; Fri, 10 Apr 2015 10:36:20 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (Postfix) with ESMTPS id 7AFDEA10B7 for ; Fri, 10 Apr 2015 14:36:19 +0000 (UTC) Message-ID: <5527DFDF.6090007@redhat.com> Date: Fri, 10 Apr 2015 16:36:15 +0200 From: Laszlo Ersek MIME-Version: 1.0 References: <5523E12E.8010103@redhat.com> <1428653687.11559.5.camel@nilsson.home.kraxel.org> <5527A093.30904@redhat.com> <5527AE47.8080909@redhat.com> In-Reply-To: <5527AE47.8080909@redhat.com> Content-Type: multipart/mixed; boundary="------------070702040203060506030106" Subject: Re: [Qemu-devel] virtio-net regression [was: syslinux vs. OVMF] List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu devel list Cc: Gerd Hoffmann This is a multi-part message in MIME format. --------------070702040203060506030106 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 04/10/15 13:04, Laszlo Ersek wrote: > On 04/10/15 12:06, Laszlo Ersek wrote: >> On 04/10/15 10:14, Gerd Hoffmann wrote: >>> Hi, >>> >>>> In summary, please ask Gerd to rebuild the ipxe binaries that are >>>> bundled with upstream qemu such that they include those two iPXE patches >>>> of ours (see the last reference). >>> >>> https://www.kraxel.org/cgit/qemu/log/?h=rebase/roms-next >>> >>> Can you give this a try? >> >> Thank you for this update, I tested it. >> >> (1) I reproduced the issue, so that I could be sure that the fix wasn't >> meaningless. Indeed the bug reproduces with the iPXE binaries bundled >> with upstream qemu. >> >> I then checked out, built and installed your branch, and tried again, >> with virtio-net and then e1000. >> >> (2) Virito-net results: >> - OVMF loads shim.efi via network >> - shim.efi loads grubx64.efi via network >> - grubx64.efi loads grub.cfg via network >> - grubx64.efi loads vmlinuz via network >> >> However, while grubx64.efi loads initrd.img via the network, qemu >> crashes the guest, with the following message: >> >> qemu-system-x86_64: Guest moved used index from 46499 to 65534 >> >> This is a virtio protocol bug in the guest (efi-virtio.rom), *or* in >> QEMU. I don't know. >> >> * e1000 results: >> - OVMF loads shim.efi via network >> - shim.efi loads grubx64.efi via network >> - grubx64.efi loads grub.cfg via network >> - grubx64.efi loads vmlinuz via network >> - grubx64.efi loads initrd.img via network >> - guest kernel boots >> >> So, I think the update is fine in general; but maybe there's a new >> virtio-related bug in either "efi-virtio.rom" or in QEMU. >> >> (When I originally wrote the (earlier versions of the) patches, I tested >> them with virtio-net using RHEL-7 qemu, so I guess this could be an >> upstream QEMU regression. The machine type I used for testing was >> pc-i440fx-2.3.) >> >> (3) ... Confirmed, this is a qemu regression. Namely, I checked your new >> efi-virtio.rom with RHEL-7 qemu, and it works fine. CC'ing qemu-devel. > > Small update, before I start bisecting it: the bug does not reproduce > with "-netdev bridge". > > It seems to be specific to "-netdev tap". Further, "vhost=on" seems to > play no role, "-netdev tap" reproduces the error both with and without > vhost=on. This is creepy. It was not easy to bisect, because machine type "pc-i440fx-2.3" is obviously not available in eg. v2.2.0. Ultimately I realized that machine type pc-i440fx-2.0 does not reproduce the error, even with current master. So I picked machine type pc-i440fx-2.1, and bisected the interval between the introduction of "pc-i440fx-2.1" (commit 3458b2b0) and current master (commit 6a460ed1). Log attached. The result makes me question my sanity, or at least that I issued the correct "git bisect bad" and "git bisect good" commands. This is the culprit: commit 18045fb9f457a0f0cba2bd113c748a2dcb4ed39e Author: Paolo Bonzini Date: Mon Jul 28 17:34:16 2014 +0200 pc: future-proof migration-compatibility of ACPI tables This patch avoids that similar changes break QEMU again in the future. QEMU will now hard-code 64k as the maximum ACPI table size, which (despite being an order of magnitude smaller than 640k) should be enough for everyone. Reviewed-by: Laszlo Ersek Tested-by: Igor Mammedov Signed-off-by: Paolo Bonzini Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin How?! Anyway, then I patched qemu, on top of current master, still sticking with machine type "pc-i440fx-2.1", as follows: ----------- diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 1fe7bfb..6cb00a2 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -344,6 +344,7 @@ static void pc_compat_2_1(MachineState *machine) x86_cpu_compat_set_features("core2duo", FEAT_1_ECX, CPUID_EXT_VMX, 0); x86_cpu_compat_kvm_no_autodisable(FEAT_8000_0001_ECX, CPUID_EXT3_SVM); pcms->enforce_aligned_dimm = false; + legacy_acpi_table_size = 6652; } static void pc_compat_2_0(MachineState *machine) ----------- Incredibly, this made the crash go away. Without this patch (ie. when it crashes), the fw_cfg file called "etc/acpi/tables" has size 0x20000. With the patch (which happens to suppress the crash for some reason), the same fw_cfg file has size 0x2000 (1/16th). This is consistent with the branches in acpi_build(). (Note that the warning block visible there, in the second branch, is never printed.) It seems very unlikely that qemu is doing anything wrong. The difference in the fw_cfg file size causes a differently sized memory allocation in OVMF, which displaces further allocations by 1 page (4KB). For example, "1af41000.efi" (the iPXE virtio-net driver) is also loaded 4KB higher than before. But that doesn't directly explain why grub places garbage in the virtio-net ring while it downloads "initrd.img". Anyway I think we can rule out any qemu regression at this point. It's a bug in some other component that the different memory map (due to the larger, 0x20000 allocation) exposes. Thanks, Laszlo --------------070702040203060506030106 Content-Type: text/x-log; name="bisect.log" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="bisect.log" git bisect start # bad: [6a460ed18a3fda0eb2d9c96b8b01817b4dcbded4] configure: disable Archipelago by default and warn about libxseg GPLv3 license git bisect bad 6a460ed18a3fda0eb2d9c96b8b01817b4dcbded4 # good: [3458b2b075f92f163ccb9a1f24733eb5705947f0] pc: add 2.1 machine type git bisect good 3458b2b075f92f163ccb9a1f24733eb5705947f0 # bad: [ed173cb704f01a62143a3ef0dcf8b493bc795c23] .travis.yml: remove "make check" from main matrix git bisect bad ed173cb704f01a62143a3ef0dcf8b493bc795c23 # good: [089a39486f2c47994c6c0d34ac7abf34baf40d9d] Merge remote-tracking branch 'remotes/qmp-unstable/queue/qmp' into staging git bisect good 089a39486f2c47994c6c0d34ac7abf34baf40d9d # bad: [39ba3bf69c4ef4d8a8b683ee7282efd25b3f01ff] qcow2: fix new_blocks double-free in alloc_refcount_block() git bisect bad 39ba3bf69c4ef4d8a8b683ee7282efd25b3f01ff # good: [4bce526ec4b88362a684fd858e0e14c83ddf0db4] target-ppc: KVMPPC_H_CAS fix cpu-version endianess git bisect good 4bce526ec4b88362a684fd858e0e14c83ddf0db4 # bad: [a9047ec3f6ab56295cba5b07e0d46cded9e2a7ff] hw/arm/boot: Set PC correctly when loading AArch64 ELF files git bisect bad a9047ec3f6ab56295cba5b07e0d46cded9e2a7ff # good: [82172b751929314a81337aa91deea82e8297af1f] tests/Makefile: Only run vhost-user-test on Linux git bisect good 82172b751929314a81337aa91deea82e8297af1f # good: [3a18d449836d21dee60439b154056cca9a3b6aee] Merge remote-tracking branch 'remotes/agraf/tags/signed-ppc-for-upstream' into staging git bisect good 3a18d449836d21dee60439b154056cca9a3b6aee # bad: [18045fb9f457a0f0cba2bd113c748a2dcb4ed39e] pc: future-proof migration-compatibility of ACPI tables git bisect bad 18045fb9f457a0f0cba2bd113c748a2dcb4ed39e # good: [3b257486639cf6c25e1f3a744d1f19e6b4efdc7a] Merge remote-tracking branch 'remotes/qmp-unstable/queue/qmp' into staging git bisect good 3b257486639cf6c25e1f3a744d1f19e6b4efdc7a # good: [c60a57ff497667780132a3fcdc1500c83af5d5c0] Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging git bisect good c60a57ff497667780132a3fcdc1500c83af5d5c0 # good: [cb348985abd3673b40c8af069c3e3b84f547b6f7] bios-tables-test: fix ASL normalization false positive git bisect good cb348985abd3673b40c8af069c3e3b84f547b6f7 # good: [093a35e5fc0c60508e8c754ae81572090365723d] acpi-build: minor code cleanup git bisect good 093a35e5fc0c60508e8c754ae81572090365723d # first bad commit: [18045fb9f457a0f0cba2bd113c748a2dcb4ed39e] pc: future-proof migration-compatibility of ACPI tables --------------070702040203060506030106--