From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51297) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z2cuK-0004cJ-DJ for qemu-devel@nongnu.org; Wed, 10 Jun 2015 06:06:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z2cuG-0000LB-7W for qemu-devel@nongnu.org; Wed, 10 Jun 2015 06:06:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59105) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z2cuF-0000Ku-NM for qemu-devel@nongnu.org; Wed, 10 Jun 2015 06:06:47 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id 587388E779 for ; Wed, 10 Jun 2015 10:06:47 +0000 (UTC) Message-ID: <55780C34.2000403@redhat.com> Date: Wed, 10 Jun 2015 13:06:44 +0300 From: Marcel Apfelbaum MIME-Version: 1.0 References: <5572347E.1030105@redhat.com> <1433547989-7238-1-git-send-email-lersek@redhat.com> <1433547989-7238-5-git-send-email-lersek@redhat.com> <20150607111941-mutt-send-email-mst@redhat.com> <55774DD8.9050606@redhat.com> In-Reply-To: <55774DD8.9050606@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laszlo Ersek , "Michael S. Tsirkin" Cc: Paolo Bonzini , qemu-devel@nongnu.org On 06/09/2015 11:34 PM, Laszlo Ersek wrote: > On 06/07/15 11:23, Michael S. Tsirkin wrote: >> On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote: >>> OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI >>> Bus driver globally signals the firmware that PCI enumeration and resource >>> allocation have completed. At this point QEMU regenerates the ACPI payload >>> in an fw_cfg read callback, and this is when the PXB's _CRS gets >>> populated. >>> >>> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in >>> the root bus's command register, *unlike* under SeaBIOS. The consequences >>> unfold as follows: >>> >>> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one, >>> because pci_update_mappings() --> pci_bar_address() calculated it as >>> PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear. >>> >>> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to >>> the _CRS, *despite* having been programmed in PCI config space. >>> >>> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main >>> root bus's DWordMemory descriptor. >>> >>> - Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR >>> within the PXB's config space, and notice that it conflicts with the >>> main root bus's memory resource descriptors. Linux reports >>> >>> pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100) >>> pci 0000:04:00.0: BAR 0: trying firmware assignment [mem >>> 0x88200000-0x882000ff 64bit] >>> pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts >>> with PCI Bus 0000:00 [mem >>> 0x88200000-0xfebfffff] >>> >>> While Windows Server 2012 R2 reports >>> >>> https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx >>> >>> This device cannot find enough free resources that it can use. If you >>> want to use this device, you will need to disable one of the other >>> devices on this system. (Code 12) >>> >>> (This issue was apparently encountered earlier, see: >>> >>> https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html >>> >>> and the current hole-punching logic in build_crs() and build_ssdt() is >>> probably supposed to remedy exactly that problem -- however, for OVMF they >>> don't work, because at the end of the PCI enumeration and resource >>> allocation, which cues the ACPI linker/loader client, the command register >>> is clear.) >>> >>> The solution is to fetch the BAR addresses from PCI config space directly, >>> for the purposes of build_crs(), regardless of the PCI command register >>> and any MemoryRegion state. >>> >>> Example MMIO maps for a 2GB guest: >>> >>> SeaBIOS: >>> >>> main: 0x80000000..0xFC000000 / 0x7C000000 >>> pxb: 0xFC000000..0xFC200000 / 0x00200000 >>> main: 0xFC200000..0xFC213000 / 0x00013000 >>> pxb: 0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR >>> main: 0xFC213100..0xFEA00000 / 0x027ECF00 >>> pxb: 0xFEA00000..0xFEC00000 / 0x00200000 >>> >>> OVMF, without the fix: >>> >>> main: 0x80000000..0x88100000 / 0x08100000 >>> pxb: 0x88100000..0x88200000 / 0x00100000 >>> main: 0x88200000..0xFEC00000 / 0x76A00000 >>> >>> OVMF, with the fix: >>> >>> main: 0x80000000..0x88100000 / 0x08100000 >>> pxb: 0x88100000..0x88200000 / 0x00100000 >>> pxb: 0x88200000..0x88200100 / 0x00000100 <- SHPC BAR >>> main: 0x88200100..0xFEC00000 / 0x769FFF00 >>> >>> (Note that under OVMF the PCI enumerator includes requests for >>> prefetchable memory in the nonprefetchable memory pool -- separate windows >>> for nonprefetchable and prefetchable memory are not supported (yet) -- >>> that's why there's one fewer pxb range in the corrected OVMF case than in >>> the SeaBIOS case.) >>> >>> Cc: Marcel Apfelbaum >>> Cc: Michael S. Tsirkin >>> Signed-off-by: Laszlo Ersek >> >> This is problematic - disabled BAR values have no meaning according to >> the PCI spec. >> >> It might be best to add a property to just disable shpc in the bridge so >> no devices reside directly behind the pxb? > > I started looking into this. > > The property itself doesn't seem terribly hard (there's already an "msi" > property which I can take as an example). Making stuff conditional on > this new "shpc" property looked feasible in the beginning, however I > qucikly ran into two issues: > > - Migration. > > One idea would be to keep the SHPC setup around at all times, and > just not expose the PCI bar to the guest (same as in Marcel's hack > [1]). I didn't like this, although it would keep the migration stream > intact. > > The other idea is to omit even the shpc_init() call when SHPC is > disabled. I like this, but it would require breaking migration > compatibility. Both "minimum_version_id" and "version_id" would have > to be set to 1 (from the current zero), and the fixed SHPC_VMSTATE() > field should be replaced with a subsection (dependent on the new > "shpc" flag). > > Seems sweaty but doable. > > - Hotplug handling. > > This is the deal breaker. The new "shpc" flag can affect *instances* > of the bridge, but SHPC is a class-level trait. > pci_bridge_dev_class_init() sets hc->plug and hc->unplug_request to > SHPC-related callbacks, plus "pci_bridge_dev_info" advertises itself > as TYPE_HOTPLUG_HANDLER. > > This implies that I'd have to split the TYPE_PCI_BRIDGE_DEV class > into a base class and a derived class. Only the derived class would > support SHPC / TYPE_HOTPLUG_HANDLER. And, PXB would have to be > diverted to the new base class (without SHPC), in pxb_dev_initfn(), > from "pci-bridge". (The derived class would preserve the name > "pci-bridge".) > > Consequences for migration are unclear to me. Maybe the new derived > class (functionally equivalent to the current TYPE_PCI_BRIDGE_DEV) > would be migration-compatible with the current class. > > If not, I could create the "basic" bridge class as a standalone one, > without interrupt / MSI / SHPC / hotplug support, and PXB would use > that. The file "hw/pci-bridge/pci_bridge_dev.c" is really small, so > this would be quite easy; it woduln't duplicate a lot of code, and > would not affect preexistent migration at all. On the other hand, > people might not like that the base class functionality were > duplicated, instead of inherited. Hi Laszlo, Can you please check that the above hack [1] solves the problem? If it works and there is no much code duplication, the latest idea seems to me the right way to go: A specific PCI-2-PCI bridge. Also we can always reduce duplication by moving common code in utility methods :) I did (almost) the same with the PCIBus, creating a PXB version of it. Now I am back from my PTO and I can help. Let me know if you want me to handle this issue or any other way I can assist. Thanks, Marcel > > I've managed such a base/descendant class split once before > (splitting fw_cfg into the IO and MMIO mapped variants) -- with help > of course -- so perhaps I could try it again, if that's the > preference. > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html > > Thoughts? > > Thanks, > Laszlo > > >>> --- >>> hw/i386/acpi-build.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c >>> index b71e942..60d4f75 100644 >>> --- a/hw/i386/acpi-build.c >>> +++ b/hw/i386/acpi-build.c >>> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host, >>> for (i = 0; i < PCI_NUM_REGIONS; i++) { >>> PCIIORegion *r = &dev->io_regions[i]; >>> >>> - range_base = r->addr; >>> - range_limit = r->addr + r->size - 1; >>> + range_base = pci_bar_address(dev, i, r->type, r->size, false); >>> + range_limit = range_base + r->size - 1; >>> >>> /* >>> * Work-around for old bioses >>> -- >>> 1.8.3.1 >