qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF
@ 2015-06-05 23:45 Laszlo Ersek
  2015-06-05 23:46 ` [Qemu-devel] [PATCH 0/4] PXB tweaks and fixes Laszlo Ersek
  2015-06-10  9:09 ` [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF Marcel Apfelbaum
  0 siblings, 2 replies; 20+ messages in thread
From: Laszlo Ersek @ 2015-06-05 23:45 UTC (permalink / raw)
  To: qemu devel list, edk2-devel list, Marcel Apfelbaum,
	Michael Tsirkin, Gabriel L. Somlo (CMU)

Following up on this cross-posted message, I will send two patch sets,
one for QEMU (to qemu-devel) and another for OVMF (to edk2-devel). With
both in place, OVMF supports multiple PCI root buses.

Below I'm writing up the way I tested the feature, plus a few random
notes.

(1) Interrupt line assignments. I patched SeaBIOS (temporarily) to print
    interrupt line assignments, and I also patched OVMF (permanently) to
    print the same. (See both patches in the OVMF patch

      OvmfPkg: PlatformBdsLib: debug log interrupt line assignments

    in one of the followup series; the commit message contains the
    ad-hoc SeaBIOS patch.) The QEMU command line was:

    qemu-system-x86_64 \
      -m 2048 \
      -M pc \
      -enable-kvm \
      -device qxl-vga \
      \
      $FIRMWARE_OPTIONS \
      \
      -drive id=cdrom,if=none,readonly,format=raw,file=$ISO \
      -device virtio-scsi-pci,id=scsi0 \
      -device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
      \
      -debugcon file:debug.log \
      -global isa-debugcon.iobase=0x402 \
      \
      -monitor stdio \
      \
      -device pxb,id=bridge1,bus_nr=4 \
      \
      -netdev user,id=netdev0,hostfwd=tcp:127.0.0.1:2222-:22 \
      -device e1000,netdev=netdev0,bus=bridge1,addr=2,romfile= \
      \
      -netdev user,id=netdev1,hostfwd=tcp:127.0.0.1:2223-:22 \
      -device e1000,netdev=netdev1,bus=bridge1,addr=3,romfile= \
      \
      -netdev user,id=netdev2,hostfwd=tcp:127.0.0.1:2224-:22 \
      -device e1000,netdev=netdev2,bus=bridge1,addr=4,romfile= \
      \
      -netdev user,id=netdev3,hostfwd=tcp:127.0.0.1:2225-:22 \
      -device e1000,netdev=netdev3,bus=bridge1,addr=5,romfile=

    The ISO variable pointed to a Fedora 20 LiveCD (although it is not
    relevant for this test). FIRMWARE_OPTIONS were

      -bios .../out/bios.bin

    for the (debug-patched, see above) SeaBIOS test case, and

      -drive if=pflash,readonly,format=raw,file=.../OVMF_CODE.fd \
      -drive if=pflash,format=raw,file=.../COPY_OF_OVMF_VARS.fd

    for the OVMF test case.

    As you can see the QEMU command line places four e1000 NICs on one
    PXB (I used e1000 because that's what Marcel recommended for all
    testing). SeaBIOS then printed

      PCI: init bdf=00:01.3 id=8086:7113
      assigned irq line 10

      PCI: init bdf=00:02.0 id=1b36:0100
      assigned irq line 10

      PCI: init bdf=00:03.0 id=1af4:1004
      assigned irq line 11

      PCI: init bdf=04:00.0 id=1b36:0001
      assigned irq line 11

      PCI: init bdf=05:02.0 id=8086:100e
      assigned irq line 10

      PCI: init bdf=05:03.0 id=8086:100e
      assigned irq line 11

      PCI: init bdf=05:04.0 id=8086:100e
      assigned irq line 11

      PCI: init bdf=05:05.0 id=8086:100e
      assigned irq line 10

    whereas OVMF printed

      SetPciIntLine: PciRoot(0x0)/Pci(0x1,0x3) -> 0x0A
      SetPciIntLine: PciRoot(0x0)/Pci(0x2,0x0) -> 0x0A
      SetPciIntLine: PciRoot(0x0)/Pci(0x3,0x0) -> 0x0B
      SetPciIntLine: PciRoot(0x4)/Pci(0x0,0x0) -> 0x0B
      SetPciIntLine: PciRoot(0x4)/Pci(0x0,0x0)/Pci(0x2,0x0) -> 0x0A
      SetPciIntLine: PciRoot(0x4)/Pci(0x0,0x0)/Pci(0x3,0x0) -> 0x0B
      SetPciIntLine: PciRoot(0x4)/Pci(0x0,0x0)/Pci(0x4,0x0) -> 0x0B
      SetPciIntLine: PciRoot(0x4)/Pci(0x0,0x0)/Pci(0x5,0x0) -> 0x0A

    (Note that the bus number assigned to the devices on the pxb is 5 in
    the OVMF case too. This is not visible from the device paths logged
    above, but it is clear from edk2's PCI bus driver's log, and the
    output of the "PCI" UEFI shell command.)

    Therefore I concluded that the IRQ line assignment logic in OVMF
    that Gabriel had contributed earlier (and that I updated very
    slightly in the OVMF patchset here) continued to work correctly.

(2) Marcel, no new fw_cfg file is necessary for exposing the bus numbers
    of the extra root buses to OVMF. OVMF probes all buses and all
    devices (function 0) to locate the extra root buses. This procedure
    is shortened by adhering to the "etc/extra-pci-roots" fw_cfg file,
    which is already there.

    This recommendation came from both Michael & Marcel, and it was
    "surprisingly easy" to implement with edk2's PciLib.

(3) We discussed earlier the idea to flip the PXB's _HID and _UID
    objects to numeric values in QEMU; I implemented and tested that
    (see more test cases below).

(4) Ping test was successful from the OVMF-booted Fedora 20 LivecD
    environment, using one e1000 NIC on a PXB.

(5) Ping test was successful from the UEFI shell, using Intel's
    proprietary UEFI driver for the e1000 NIC (called PROEFI). The NIC
    was located on the same PXB as in the previous point.

(6) Successfully loaded public web pages with Firefox in an OVMF-booted
    Windows Server 2012 R2 guest. The NIC was located on the same PXB as
    in the previous point.

(7) At least one issue remains to be solved (designed) in QEMU, for both
    SeaBIOS's and OVMF's sake: booting off devices that are located on
    the PXB. The problem is with the "bootorder" fw_cfg file. Consider
    the following example:

      /pci@i0cf8/scsi@3/channel@0/disk@0,0
      /pci/pci-bridge@0/ethernet@2/ethernet-phy@0

    which is generated for the options

      -device virtio-scsi-pci,id=scsi0 \
      -device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
      \
      -device pxb,id=bridge1,bus_nr=4 \
      -device e1000,netdev=netdev0,bus=bridge1,addr=2,romfile=,bootindex=1

    While the first entry is recognized by both SeaBIOS and OVMF, the
    second entry (generated for the NIC hanging off the PXB, see above)
    is recognized by neither. (I tested OVMF, and investigated the
    SeaBIOS source, for this claim.)

    For the SeaBIOS explanation, grep the source code for FW_PCI_DOMAIN.

    The OVMF explanation is that OVMF simply rejects the initial
    OpenFirmware device path node "/pci" with a controlled parse error
    (as opposed to the "/pci@i0cf8" node, which it recognizes and
    translates to UEFI in combination with the rest of that OFW device
    path).

    The "/pci" node comes from QEMU's sysbus_get_fw_dev_path() function,
    file "hw/core/sysbus.c", where *neither* of the (s->num_mmio) and
    (s->num_pio) branches apply. (The (s->num_pio) branch applies for
    the first entry, ie. "/pci@i0cf8".)

    Something has to be invented here to clue in both firmwares as to
    the root bus number (here bus_nr=4), in a format that is compliant
    with the "OpenFirmware unit address" concept. (Note that
    "/pci-bridge@0" only gives away the slot number *on* the extra root
    bus, not the number of the root bus itself.) For example:

      /pci@rootbus4/pci-bridge@0/ethernet@2/ethernet-phy@0

    would be acceptable. However, I don't know how to implement this in
    sysbus_get_fw_dev_path().

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 0/4] PXB tweaks and fixes
  2015-06-05 23:45 [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF Laszlo Ersek
@ 2015-06-05 23:46 ` Laszlo Ersek
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 1/4] i386/acpi-build: more traditional _UID and _HID for PXB root buses Laszlo Ersek
                     ` (3 more replies)
  2015-06-10  9:09 ` [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF Marcel Apfelbaum
  1 sibling, 4 replies; 20+ messages in thread
From: Laszlo Ersek @ 2015-06-05 23:46 UTC (permalink / raw)
  To: qemu-devel, lersek; +Cc: Marcel Apfelbaum, Michael S. Tsirkin

I've come across these issues while working on the OVMF patch series.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>

Laszlo Ersek (4):
  i386/acpi-build: more traditional _UID and _HID for PXB root buses
  i386/acpi-build: fix PXB workarounds for unsupported BIOSes
  hw/pci: allow the caller of pci_bar_address() to ignore command
    register
  i386/acpi-build: build_crs(): fetch BAR from PCI config space directly

 include/hw/pci/pci.h |  2 ++
 hw/i386/acpi-build.c | 17 ++++++++---------
 hw/pci/pci.c         | 18 +++++++++++-------
 3 files changed, 21 insertions(+), 16 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 1/4] i386/acpi-build: more traditional _UID and _HID for PXB root buses
  2015-06-05 23:46 ` [Qemu-devel] [PATCH 0/4] PXB tweaks and fixes Laszlo Ersek
@ 2015-06-05 23:46   ` Laszlo Ersek
  2015-06-10  9:16     ` Marcel Apfelbaum
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 2/4] i386/acpi-build: fix PXB workarounds for unsupported BIOSes Laszlo Ersek
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Laszlo Ersek @ 2015-06-05 23:46 UTC (permalink / raw)
  To: qemu-devel, lersek; +Cc: Marcel Apfelbaum, Michael S. Tsirkin

The ACPI specification permits the _HID and _UID objects to evaluate to
strings. (See "6.1.5 _HID (Hardware ID)" and "6.1.12 _UID (Unique ID)" in
the ACPI v6.0 spec.)

With regard to related standards, the UEFI specification can also express
a device address composed from string _HID and _UID identifiers, inside
the Expanded ACPI Device Path Node. (See "9.3.3 ACPI Device Path", Table
49, in the UEFI v2.5 spec.)

However, numeric (integer) contents for both _HID and _UID are more
traditional. They are recommended by the UEFI spec for size reasons:

  [...] the ACPI Device Path node is smaller and should be used if
  possible to reduce the size of device paths that may potentially be
  stored in nonvolatile storage [...]

External tools support them better (for example the --acpi_hid and
--acpi_uid options of "efibootmgr" only take numeric identifiers).
Finally, numeric _HID and _UID contents are existing practice in the QEMU
source.

This patch was tested with a Fedora 20 LiveCD and a preexistent Windows
Server 2012 R2 guest. Using "acpidump" and "iasl" in the Fedora guest, we
get, in the SSDT:

> Scope (\_SB)
> {
>   Device (PC04)
>   {
>     Name (_UID, 0x04)  // _UID: Unique ID
>     Name (_HID, EisaId ("PNP0A03") /* PCI Bus */)  // _HID: Hardware ID

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 hw/i386/acpi-build.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 5593e41..52c2591 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -945,9 +945,8 @@ build_ssdt(GArray *table_data, GArray *linker,
 
             scope = aml_scope("\\_SB");
             dev = aml_device("PC%.02X", bus_num);
-            aml_append(dev,
-                       aml_name_decl("_UID", aml_string("PC%.02X", bus_num)));
-            aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A03")));
+            aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
+            aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
             aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
 
             if (numa_node != NUMA_NODE_UNASSIGNED) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 2/4] i386/acpi-build: fix PXB workarounds for unsupported BIOSes
  2015-06-05 23:46 ` [Qemu-devel] [PATCH 0/4] PXB tweaks and fixes Laszlo Ersek
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 1/4] i386/acpi-build: more traditional _UID and _HID for PXB root buses Laszlo Ersek
@ 2015-06-05 23:46   ` Laszlo Ersek
  2015-06-10  9:17     ` Marcel Apfelbaum
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 3/4] hw/pci: allow the caller of pci_bar_address() to ignore command register Laszlo Ersek
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly Laszlo Ersek
  3 siblings, 1 reply; 20+ messages in thread
From: Laszlo Ersek @ 2015-06-05 23:46 UTC (permalink / raw)
  To: qemu-devel, lersek; +Cc: Marcel Apfelbaum, Michael S. Tsirkin

The patch

  apci: fix PXB behaviour if used with unsupported BIOS

uses the following condition to see if a "PXB mem/IO chunk" has *not* been
configured by the BIOS:

  (!range_base || range_base > range_limit)

When this condition evaluates to true, said patch *omits* the
corresponding entry from the _CRS.

Later on the patch checks for the opposite condition (with the intent of
*adding* entries to the _CRS if the "PXB mem/IO chunks" *have* been
configured). Unfortunately, the condition was negated incorrectly: only
the first ! operator was removed, which led to the nonsensical expression

  (range_base || range_base > range_limit)

leading to bogus entries in the _CRS, and causing BSOD in Windows Server
2012 R2 when it runs on OVMF.

The correct negative of the condition seen at the top is

  (range_base && range_base <= range_limit)

Fix the expressions.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 hw/i386/acpi-build.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 52c2591..b71e942 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -833,7 +833,7 @@ static Aml *build_crs(PCIHostState *host,
              * Work-around for old bioses
              * that do not support multiple root buses
              */
-            if (range_base || range_base > range_limit) {
+            if (range_base && range_base <= range_limit) {
                 aml_append(crs,
                            aml_word_io(AML_MIN_FIXED, AML_MAX_FIXED,
                                        AML_POS_DECODE, AML_ENTIRE_RANGE,
@@ -854,7 +854,7 @@ static Aml *build_crs(PCIHostState *host,
              * Work-around for old bioses
              * that do not support multiple root buses
              */
-            if (range_base || range_base > range_limit) {
+            if (range_base && range_base <= range_limit) {
                 aml_append(crs,
                            aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED,
                                             AML_MAX_FIXED, AML_NON_CACHEABLE,
@@ -865,7 +865,7 @@ static Aml *build_crs(PCIHostState *host,
                                             0,
                                             range_limit - range_base + 1));
                 crs_range_insert(mem_ranges, range_base, range_limit);
-          }
+            }
 
             range_base =
                 pci_bridge_get_base(dev, PCI_BASE_ADDRESS_MEM_PREFETCH);
@@ -876,7 +876,7 @@ static Aml *build_crs(PCIHostState *host,
              * Work-around for old bioses
              * that do not support multiple root buses
              */
-            if (range_base || range_base > range_limit) {
+            if (range_base && range_base <= range_limit) {
                 aml_append(crs,
                            aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED,
                                             AML_MAX_FIXED, AML_NON_CACHEABLE,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 3/4] hw/pci: allow the caller of pci_bar_address() to ignore command register
  2015-06-05 23:46 ` [Qemu-devel] [PATCH 0/4] PXB tweaks and fixes Laszlo Ersek
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 1/4] i386/acpi-build: more traditional _UID and _HID for PXB root buses Laszlo Ersek
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 2/4] i386/acpi-build: fix PXB workarounds for unsupported BIOSes Laszlo Ersek
@ 2015-06-05 23:46   ` Laszlo Ersek
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly Laszlo Ersek
  3 siblings, 0 replies; 20+ messages in thread
From: Laszlo Ersek @ 2015-06-05 23:46 UTC (permalink / raw)
  To: qemu-devel, lersek; +Cc: Marcel Apfelbaum, Michael S. Tsirkin

The pci_bar_address() function calculates the start address of a given PCI
BAR. Its main purpose is supporting MemoryRegion mapping / unmapping in
pci_update_mappings(), which is guest-observable.

For that reason it contains heavy logic to enforce the validity of the
BAR, and this logic includes checks for the PCI command register:
- If the PCI_COMMAND_IO bit is clear in the command register, then all IO
  space BARs will be reported unmapped (all bits one),
- If the PCI_COMMAND_MEMORY bit is clear in the command register, then all
  MMIO space BARs will be reported unmapped (all bits one).

For an upcoming use case, we'd like pci_bar_address() to calculate the BAR
address from the config space *regardless* of the command register. Add a
new parameter that allows the caller to cut out the command register
verification.

All current callers preserve their behaviors.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 include/hw/pci/pci.h |  2 ++
 hw/pci/pci.c         | 18 +++++++++++-------
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d44bc84..16b467b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -314,6 +314,8 @@ void pci_register_vga(PCIDevice *pci_dev, MemoryRegion *mem,
                       MemoryRegion *io_lo, MemoryRegion *io_hi);
 void pci_unregister_vga(PCIDevice *pci_dev);
 pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num);
+pcibus_t pci_bar_address(PCIDevice *d, int reg, uint8_t type, pcibus_t size,
+                         bool respect_command);
 
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
                        uint8_t offset, uint8_t size);
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 750f3da..61a70de 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1054,15 +1054,19 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
     return pci_dev->io_regions[region_num].addr;
 }
 
-static pcibus_t pci_bar_address(PCIDevice *d,
-				int reg, uint8_t type, pcibus_t size)
+pcibus_t pci_bar_address(PCIDevice *d, int reg, uint8_t type, pcibus_t size,
+                         bool respect_command)
 {
     pcibus_t new_addr, last_addr;
     int bar = pci_bar(d, reg);
-    uint16_t cmd = pci_get_word(d->config + PCI_COMMAND);
+    uint16_t cmd;
+
+    if (respect_command) {
+        cmd = pci_get_word(d->config + PCI_COMMAND);
+    }
 
     if (type & PCI_BASE_ADDRESS_SPACE_IO) {
-        if (!(cmd & PCI_COMMAND_IO)) {
+        if (respect_command && !(cmd & PCI_COMMAND_IO)) {
             return PCI_BAR_UNMAPPED;
         }
         new_addr = pci_get_long(d->config + bar) & ~(size - 1);
@@ -1076,7 +1080,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
         return new_addr;
     }
 
-    if (!(cmd & PCI_COMMAND_MEMORY)) {
+    if (respect_command && !(cmd & PCI_COMMAND_MEMORY)) {
         return PCI_BAR_UNMAPPED;
     }
     if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
@@ -1134,7 +1138,7 @@ static void pci_update_mappings(PCIDevice *d)
         if (!r->size)
             continue;
 
-        new_addr = pci_bar_address(d, i, r->type, r->size);
+        new_addr = pci_bar_address(d, i, r->type, r->size, true);
 
         /* This bar isn't changed */
         if (new_addr == r->addr)
@@ -2427,7 +2431,7 @@ static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
             !(r->type & PCI_BASE_ADDRESS_MEM_TYPE_64)) {
             continue;
         }
-        region_range.begin = pci_bar_address(dev, i, r->type, r->size);
+        region_range.begin = pci_bar_address(dev, i, r->type, r->size, true);
         region_range.end = region_range.begin + r->size;
 
         if (region_range.begin == PCI_BAR_UNMAPPED) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly
  2015-06-05 23:46 ` [Qemu-devel] [PATCH 0/4] PXB tweaks and fixes Laszlo Ersek
                     ` (2 preceding siblings ...)
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 3/4] hw/pci: allow the caller of pci_bar_address() to ignore command register Laszlo Ersek
@ 2015-06-05 23:46   ` Laszlo Ersek
  2015-06-07  9:23     ` Michael S. Tsirkin
  3 siblings, 1 reply; 20+ messages in thread
From: Laszlo Ersek @ 2015-06-05 23:46 UTC (permalink / raw)
  To: qemu-devel, lersek; +Cc: Marcel Apfelbaum, Michael S. Tsirkin

OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
Bus driver globally signals the firmware that PCI enumeration and resource
allocation have completed. At this point QEMU regenerates the ACPI payload
in an fw_cfg read callback, and this is when the PXB's _CRS gets
populated.

Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
the root bus's command register, *unlike* under SeaBIOS. The consequences
unfold as follows:

- When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
  because pci_update_mappings() --> pci_bar_address() calculated it as
  PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.

- Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
  the _CRS, *despite* having been programmed in PCI config space.

- Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
  root bus's DWordMemory descriptor.

- Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
  within the PXB's config space, and notice that it conflicts with the
  main root bus's memory resource descriptors. Linux reports

  pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
  pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
                           0x88200000-0x882000ff 64bit]
  pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
                           with PCI Bus 0000:00 [mem
                           0x88200000-0xfebfffff]

  While Windows Server 2012 R2 reports

    https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx

    This device cannot find enough free resources that it can use. If you
    want to use this device, you will need to disable one of the other
    devices on this system. (Code 12)

(This issue was apparently encountered earlier, see:

  https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html

and the current hole-punching logic in build_crs() and build_ssdt() is
probably supposed to remedy exactly that problem -- however, for OVMF they
don't work, because at the end of the PCI enumeration and resource
allocation, which cues the ACPI linker/loader client, the command register
is clear.)

The solution is to fetch the BAR addresses from PCI config space directly,
for the purposes of build_crs(), regardless of the PCI command register
and any MemoryRegion state.

Example MMIO maps for a 2GB guest:

SeaBIOS:

  main: 0x80000000..0xFC000000 / 0x7C000000
  pxb:  0xFC000000..0xFC200000 / 0x00200000
  main: 0xFC200000..0xFC213000 / 0x00013000
  pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
  main: 0xFC213100..0xFEA00000 / 0x027ECF00
  pxb:  0xFEA00000..0xFEC00000 / 0x00200000

OVMF, without the fix:

  main: 0x80000000..0x88100000 / 0x08100000
  pxb:  0x88100000..0x88200000 / 0x00100000
  main: 0x88200000..0xFEC00000 / 0x76A00000

OVMF, with the fix:

  main: 0x80000000..0x88100000 / 0x08100000
  pxb:  0x88100000..0x88200000 / 0x00100000
  pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
  main: 0x88200100..0xFEC00000 / 0x769FFF00

(Note that under OVMF the PCI enumerator includes requests for
prefetchable memory in the nonprefetchable memory pool -- separate windows
for nonprefetchable and prefetchable memory are not supported (yet) --
that's why there's one fewer pxb range in the corrected OVMF case than in
the SeaBIOS case.)

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 hw/i386/acpi-build.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index b71e942..60d4f75 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
         for (i = 0; i < PCI_NUM_REGIONS; i++) {
             PCIIORegion *r = &dev->io_regions[i];
 
-            range_base = r->addr;
-            range_limit = r->addr + r->size - 1;
+            range_base = pci_bar_address(dev, i, r->type, r->size, false);
+            range_limit = range_base + r->size - 1;
 
             /*
              * Work-around for old bioses
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly Laszlo Ersek
@ 2015-06-07  9:23     ` Michael S. Tsirkin
  2015-06-08  7:56       ` Laszlo Ersek
  2015-06-09 20:34       ` Laszlo Ersek
  0 siblings, 2 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2015-06-07  9:23 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: Marcel Apfelbaum, qemu-devel

On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
> OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
> Bus driver globally signals the firmware that PCI enumeration and resource
> allocation have completed. At this point QEMU regenerates the ACPI payload
> in an fw_cfg read callback, and this is when the PXB's _CRS gets
> populated.
> 
> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
> the root bus's command register, *unlike* under SeaBIOS. The consequences
> unfold as follows:
> 
> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
>   because pci_update_mappings() --> pci_bar_address() calculated it as
>   PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
> 
> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
>   the _CRS, *despite* having been programmed in PCI config space.
> 
> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
>   root bus's DWordMemory descriptor.
> 
> - Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
>   within the PXB's config space, and notice that it conflicts with the
>   main root bus's memory resource descriptors. Linux reports
> 
>   pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
>   pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
>                            0x88200000-0x882000ff 64bit]
>   pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
>                            with PCI Bus 0000:00 [mem
>                            0x88200000-0xfebfffff]
> 
>   While Windows Server 2012 R2 reports
> 
>     https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
> 
>     This device cannot find enough free resources that it can use. If you
>     want to use this device, you will need to disable one of the other
>     devices on this system. (Code 12)
> 
> (This issue was apparently encountered earlier, see:
> 
>   https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
> 
> and the current hole-punching logic in build_crs() and build_ssdt() is
> probably supposed to remedy exactly that problem -- however, for OVMF they
> don't work, because at the end of the PCI enumeration and resource
> allocation, which cues the ACPI linker/loader client, the command register
> is clear.)
> 
> The solution is to fetch the BAR addresses from PCI config space directly,
> for the purposes of build_crs(), regardless of the PCI command register
> and any MemoryRegion state.
> 
> Example MMIO maps for a 2GB guest:
> 
> SeaBIOS:
> 
>   main: 0x80000000..0xFC000000 / 0x7C000000
>   pxb:  0xFC000000..0xFC200000 / 0x00200000
>   main: 0xFC200000..0xFC213000 / 0x00013000
>   pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
>   main: 0xFC213100..0xFEA00000 / 0x027ECF00
>   pxb:  0xFEA00000..0xFEC00000 / 0x00200000
> 
> OVMF, without the fix:
> 
>   main: 0x80000000..0x88100000 / 0x08100000
>   pxb:  0x88100000..0x88200000 / 0x00100000
>   main: 0x88200000..0xFEC00000 / 0x76A00000
> 
> OVMF, with the fix:
> 
>   main: 0x80000000..0x88100000 / 0x08100000
>   pxb:  0x88100000..0x88200000 / 0x00100000
>   pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
>   main: 0x88200100..0xFEC00000 / 0x769FFF00
> 
> (Note that under OVMF the PCI enumerator includes requests for
> prefetchable memory in the nonprefetchable memory pool -- separate windows
> for nonprefetchable and prefetchable memory are not supported (yet) --
> that's why there's one fewer pxb range in the corrected OVMF case than in
> the SeaBIOS case.)
> 
> Cc: Marcel Apfelbaum <marcel@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>

This is problematic - disabled BAR values have no meaning according to
the PCI spec.

It might be best to add a property to just disable shpc in the bridge so
no devices reside directly behind the pxb?

> ---
>  hw/i386/acpi-build.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index b71e942..60d4f75 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
>          for (i = 0; i < PCI_NUM_REGIONS; i++) {
>              PCIIORegion *r = &dev->io_regions[i];
>  
> -            range_base = r->addr;
> -            range_limit = r->addr + r->size - 1;
> +            range_base = pci_bar_address(dev, i, r->type, r->size, false);
> +            range_limit = range_base + r->size - 1;
>  
>              /*
>               * Work-around for old bioses
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly
  2015-06-07  9:23     ` Michael S. Tsirkin
@ 2015-06-08  7:56       ` Laszlo Ersek
  2015-06-08  9:40         ` Michael S. Tsirkin
  2015-06-09 20:34       ` Laszlo Ersek
  1 sibling, 1 reply; 20+ messages in thread
From: Laszlo Ersek @ 2015-06-08  7:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Marcel Apfelbaum, qemu-devel

On 06/07/15 11:23, Michael S. Tsirkin wrote:
> On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
>> OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
>> Bus driver globally signals the firmware that PCI enumeration and resource
>> allocation have completed. At this point QEMU regenerates the ACPI payload
>> in an fw_cfg read callback, and this is when the PXB's _CRS gets
>> populated.
>>
>> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
>> the root bus's command register, *unlike* under SeaBIOS. The consequences
>> unfold as follows:
>>
>> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
>>   because pci_update_mappings() --> pci_bar_address() calculated it as
>>   PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
>>
>> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
>>   the _CRS, *despite* having been programmed in PCI config space.
>>
>> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
>>   root bus's DWordMemory descriptor.
>>
>> - Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
>>   within the PXB's config space, and notice that it conflicts with the
>>   main root bus's memory resource descriptors. Linux reports
>>
>>   pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
>>   pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
>>                            0x88200000-0x882000ff 64bit]
>>   pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
>>                            with PCI Bus 0000:00 [mem
>>                            0x88200000-0xfebfffff]
>>
>>   While Windows Server 2012 R2 reports
>>
>>     https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
>>
>>     This device cannot find enough free resources that it can use. If you
>>     want to use this device, you will need to disable one of the other
>>     devices on this system. (Code 12)
>>
>> (This issue was apparently encountered earlier, see:
>>
>>   https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
>>
>> and the current hole-punching logic in build_crs() and build_ssdt() is
>> probably supposed to remedy exactly that problem -- however, for OVMF they
>> don't work, because at the end of the PCI enumeration and resource
>> allocation, which cues the ACPI linker/loader client, the command register
>> is clear.)
>>
>> The solution is to fetch the BAR addresses from PCI config space directly,
>> for the purposes of build_crs(), regardless of the PCI command register
>> and any MemoryRegion state.
>>
>> Example MMIO maps for a 2GB guest:
>>
>> SeaBIOS:
>>
>>   main: 0x80000000..0xFC000000 / 0x7C000000
>>   pxb:  0xFC000000..0xFC200000 / 0x00200000
>>   main: 0xFC200000..0xFC213000 / 0x00013000
>>   pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
>>   main: 0xFC213100..0xFEA00000 / 0x027ECF00
>>   pxb:  0xFEA00000..0xFEC00000 / 0x00200000
>>
>> OVMF, without the fix:
>>
>>   main: 0x80000000..0x88100000 / 0x08100000
>>   pxb:  0x88100000..0x88200000 / 0x00100000
>>   main: 0x88200000..0xFEC00000 / 0x76A00000
>>
>> OVMF, with the fix:
>>
>>   main: 0x80000000..0x88100000 / 0x08100000
>>   pxb:  0x88100000..0x88200000 / 0x00100000
>>   pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
>>   main: 0x88200100..0xFEC00000 / 0x769FFF00
>>
>> (Note that under OVMF the PCI enumerator includes requests for
>> prefetchable memory in the nonprefetchable memory pool -- separate windows
>> for nonprefetchable and prefetchable memory are not supported (yet) --
>> that's why there's one fewer pxb range in the corrected OVMF case than in
>> the SeaBIOS case.)
>>
>> Cc: Marcel Apfelbaum <marcel@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> 
> This is problematic - disabled BAR values have no meaning according to
> the PCI spec.
> 
> It might be best to add a property to just disable shpc in the bridge so
> no devices reside directly behind the pxb?

If that solves this problem (and I do think it should), then I don't
have anything against the idea. I guess I could look at other property
code for examples. I think I'd have the main problem with implementing
the "policy" part: when exactly do you flip the new property to false,
and where. (Also, I'm not really sure what the SHPC bar is good for, and
if it won't be missed by someone who wanted to use PXB otherwise.)

Thanks
Laszlo

>> ---
>>  hw/i386/acpi-build.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index b71e942..60d4f75 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
>>          for (i = 0; i < PCI_NUM_REGIONS; i++) {
>>              PCIIORegion *r = &dev->io_regions[i];
>>  
>> -            range_base = r->addr;
>> -            range_limit = r->addr + r->size - 1;
>> +            range_base = pci_bar_address(dev, i, r->type, r->size, false);
>> +            range_limit = range_base + r->size - 1;
>>  
>>              /*
>>               * Work-around for old bioses
>> -- 
>> 1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly
  2015-06-08  7:56       ` Laszlo Ersek
@ 2015-06-08  9:40         ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2015-06-08  9:40 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: Marcel Apfelbaum, qemu-devel

On Mon, Jun 08, 2015 at 09:56:15AM +0200, Laszlo Ersek wrote:
> On 06/07/15 11:23, Michael S. Tsirkin wrote:
> > On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
> >> OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
> >> Bus driver globally signals the firmware that PCI enumeration and resource
> >> allocation have completed. At this point QEMU regenerates the ACPI payload
> >> in an fw_cfg read callback, and this is when the PXB's _CRS gets
> >> populated.
> >>
> >> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
> >> the root bus's command register, *unlike* under SeaBIOS. The consequences
> >> unfold as follows:
> >>
> >> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
> >>   because pci_update_mappings() --> pci_bar_address() calculated it as
> >>   PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
> >>
> >> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
> >>   the _CRS, *despite* having been programmed in PCI config space.
> >>
> >> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
> >>   root bus's DWordMemory descriptor.
> >>
> >> - Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
> >>   within the PXB's config space, and notice that it conflicts with the
> >>   main root bus's memory resource descriptors. Linux reports
> >>
> >>   pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
> >>   pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
> >>                            0x88200000-0x882000ff 64bit]
> >>   pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
> >>                            with PCI Bus 0000:00 [mem
> >>                            0x88200000-0xfebfffff]
> >>
> >>   While Windows Server 2012 R2 reports
> >>
> >>     https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
> >>
> >>     This device cannot find enough free resources that it can use. If you
> >>     want to use this device, you will need to disable one of the other
> >>     devices on this system. (Code 12)
> >>
> >> (This issue was apparently encountered earlier, see:
> >>
> >>   https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
> >>
> >> and the current hole-punching logic in build_crs() and build_ssdt() is
> >> probably supposed to remedy exactly that problem -- however, for OVMF they
> >> don't work, because at the end of the PCI enumeration and resource
> >> allocation, which cues the ACPI linker/loader client, the command register
> >> is clear.)
> >>
> >> The solution is to fetch the BAR addresses from PCI config space directly,
> >> for the purposes of build_crs(), regardless of the PCI command register
> >> and any MemoryRegion state.
> >>
> >> Example MMIO maps for a 2GB guest:
> >>
> >> SeaBIOS:
> >>
> >>   main: 0x80000000..0xFC000000 / 0x7C000000
> >>   pxb:  0xFC000000..0xFC200000 / 0x00200000
> >>   main: 0xFC200000..0xFC213000 / 0x00013000
> >>   pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
> >>   main: 0xFC213100..0xFEA00000 / 0x027ECF00
> >>   pxb:  0xFEA00000..0xFEC00000 / 0x00200000
> >>
> >> OVMF, without the fix:
> >>
> >>   main: 0x80000000..0x88100000 / 0x08100000
> >>   pxb:  0x88100000..0x88200000 / 0x00100000
> >>   main: 0x88200000..0xFEC00000 / 0x76A00000
> >>
> >> OVMF, with the fix:
> >>
> >>   main: 0x80000000..0x88100000 / 0x08100000
> >>   pxb:  0x88100000..0x88200000 / 0x00100000
> >>   pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
> >>   main: 0x88200100..0xFEC00000 / 0x769FFF00
> >>
> >> (Note that under OVMF the PCI enumerator includes requests for
> >> prefetchable memory in the nonprefetchable memory pool -- separate windows
> >> for nonprefetchable and prefetchable memory are not supported (yet) --
> >> that's why there's one fewer pxb range in the corrected OVMF case than in
> >> the SeaBIOS case.)
> >>
> >> Cc: Marcel Apfelbaum <marcel@redhat.com>
> >> Cc: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> > 
> > This is problematic - disabled BAR values have no meaning according to
> > the PCI spec.
> > 
> > It might be best to add a property to just disable shpc in the bridge so
> > no devices reside directly behind the pxb?
> 
> If that solves this problem (and I do think it should), then I don't
> have anything against the idea. I guess I could look at other property
> code for examples. I think I'd have the main problem with implementing
> the "policy" part: when exactly do you flip the new property to false,
> and where.

Perhaps when the bridge is created. But if not, perhaps just ask
management to do this if bridge is connected to the PXB.

> (Also, I'm not really sure what the SHPC bar is good for, and
> if it won't be missed by someone who wanted to use PXB otherwise.)
> 
> Thanks
> Laszlo

At the moment it's only used on non-PC systems that lack ACPI.


> >> ---
> >>  hw/i386/acpi-build.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >> index b71e942..60d4f75 100644
> >> --- a/hw/i386/acpi-build.c
> >> +++ b/hw/i386/acpi-build.c
> >> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
> >>          for (i = 0; i < PCI_NUM_REGIONS; i++) {
> >>              PCIIORegion *r = &dev->io_regions[i];
> >>  
> >> -            range_base = r->addr;
> >> -            range_limit = r->addr + r->size - 1;
> >> +            range_base = pci_bar_address(dev, i, r->type, r->size, false);
> >> +            range_limit = range_base + r->size - 1;
> >>  
> >>              /*
> >>               * Work-around for old bioses
> >> -- 
> >> 1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly
  2015-06-07  9:23     ` Michael S. Tsirkin
  2015-06-08  7:56       ` Laszlo Ersek
@ 2015-06-09 20:34       ` Laszlo Ersek
  2015-06-10 10:06         ` Marcel Apfelbaum
  2015-06-10 16:19         ` Michael S. Tsirkin
  1 sibling, 2 replies; 20+ messages in thread
From: Laszlo Ersek @ 2015-06-09 20:34 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Marcel Apfelbaum, Paolo Bonzini, qemu-devel

On 06/07/15 11:23, Michael S. Tsirkin wrote:
> On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
>> OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
>> Bus driver globally signals the firmware that PCI enumeration and resource
>> allocation have completed. At this point QEMU regenerates the ACPI payload
>> in an fw_cfg read callback, and this is when the PXB's _CRS gets
>> populated.
>>
>> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
>> the root bus's command register, *unlike* under SeaBIOS. The consequences
>> unfold as follows:
>>
>> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
>>   because pci_update_mappings() --> pci_bar_address() calculated it as
>>   PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
>>
>> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
>>   the _CRS, *despite* having been programmed in PCI config space.
>>
>> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
>>   root bus's DWordMemory descriptor.
>>
>> - Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
>>   within the PXB's config space, and notice that it conflicts with the
>>   main root bus's memory resource descriptors. Linux reports
>>
>>   pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
>>   pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
>>                            0x88200000-0x882000ff 64bit]
>>   pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
>>                            with PCI Bus 0000:00 [mem
>>                            0x88200000-0xfebfffff]
>>
>>   While Windows Server 2012 R2 reports
>>
>>     https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
>>
>>     This device cannot find enough free resources that it can use. If you
>>     want to use this device, you will need to disable one of the other
>>     devices on this system. (Code 12)
>>
>> (This issue was apparently encountered earlier, see:
>>
>>   https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
>>
>> and the current hole-punching logic in build_crs() and build_ssdt() is
>> probably supposed to remedy exactly that problem -- however, for OVMF they
>> don't work, because at the end of the PCI enumeration and resource
>> allocation, which cues the ACPI linker/loader client, the command register
>> is clear.)
>>
>> The solution is to fetch the BAR addresses from PCI config space directly,
>> for the purposes of build_crs(), regardless of the PCI command register
>> and any MemoryRegion state.
>>
>> Example MMIO maps for a 2GB guest:
>>
>> SeaBIOS:
>>
>>   main: 0x80000000..0xFC000000 / 0x7C000000
>>   pxb:  0xFC000000..0xFC200000 / 0x00200000
>>   main: 0xFC200000..0xFC213000 / 0x00013000
>>   pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
>>   main: 0xFC213100..0xFEA00000 / 0x027ECF00
>>   pxb:  0xFEA00000..0xFEC00000 / 0x00200000
>>
>> OVMF, without the fix:
>>
>>   main: 0x80000000..0x88100000 / 0x08100000
>>   pxb:  0x88100000..0x88200000 / 0x00100000
>>   main: 0x88200000..0xFEC00000 / 0x76A00000
>>
>> OVMF, with the fix:
>>
>>   main: 0x80000000..0x88100000 / 0x08100000
>>   pxb:  0x88100000..0x88200000 / 0x00100000
>>   pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
>>   main: 0x88200100..0xFEC00000 / 0x769FFF00
>>
>> (Note that under OVMF the PCI enumerator includes requests for
>> prefetchable memory in the nonprefetchable memory pool -- separate windows
>> for nonprefetchable and prefetchable memory are not supported (yet) --
>> that's why there's one fewer pxb range in the corrected OVMF case than in
>> the SeaBIOS case.)
>>
>> Cc: Marcel Apfelbaum <marcel@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> 
> This is problematic - disabled BAR values have no meaning according to
> the PCI spec.
> 
> It might be best to add a property to just disable shpc in the bridge so
> no devices reside directly behind the pxb?

I started looking into this.

The property itself doesn't seem terribly hard (there's already an "msi"
property which I can take as an example). Making stuff conditional on
this new "shpc" property looked feasible in the beginning, however I
qucikly ran into two issues:

- Migration.

  One idea would be to keep the SHPC setup around at all times, and
  just not expose the PCI bar to the guest (same as in Marcel's hack
  [1]). I didn't like this, although it would keep the migration stream
  intact.

  The other idea is to omit even the shpc_init() call when SHPC is
  disabled. I like this, but it would require breaking migration
  compatibility. Both "minimum_version_id" and "version_id" would have
  to be set to 1 (from the current zero), and the fixed SHPC_VMSTATE()
  field should be replaced with a subsection (dependent on the new
  "shpc" flag).

  Seems sweaty but doable.

- Hotplug handling.

  This is the deal breaker. The new "shpc" flag can affect *instances*
  of the bridge, but SHPC is a class-level trait.
  pci_bridge_dev_class_init() sets hc->plug and hc->unplug_request to
  SHPC-related callbacks, plus "pci_bridge_dev_info" advertises itself
  as TYPE_HOTPLUG_HANDLER.

  This implies that I'd have to split the TYPE_PCI_BRIDGE_DEV class
  into a base class and a derived class. Only the derived class would
  support SHPC / TYPE_HOTPLUG_HANDLER. And, PXB would have to be
  diverted to the new base class (without SHPC), in pxb_dev_initfn(),
  from "pci-bridge". (The derived class would preserve the name
  "pci-bridge".)

  Consequences for migration are unclear to me. Maybe the new derived
  class (functionally equivalent to the current TYPE_PCI_BRIDGE_DEV)
  would be migration-compatible with the current class.

  If not, I could create the "basic" bridge class as a standalone one,
  without interrupt / MSI / SHPC / hotplug support, and PXB would use
  that. The file "hw/pci-bridge/pci_bridge_dev.c" is really small, so
  this would be quite easy; it woduln't duplicate a lot of code, and
  would not affect preexistent migration at all. On the other hand,
  people might not like that the base class functionality were
  duplicated, instead of inherited.

  I've managed such a base/descendant class split once before
  (splitting fw_cfg into the IO and MMIO mapped variants) -- with help
  of course -- so perhaps I could try it again, if that's the
  preference.

[1] https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html

Thoughts?

Thanks,
Laszlo


>> ---
>>  hw/i386/acpi-build.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index b71e942..60d4f75 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
>>          for (i = 0; i < PCI_NUM_REGIONS; i++) {
>>              PCIIORegion *r = &dev->io_regions[i];
>>  
>> -            range_base = r->addr;
>> -            range_limit = r->addr + r->size - 1;
>> +            range_base = pci_bar_address(dev, i, r->type, r->size, false);
>> +            range_limit = range_base + r->size - 1;
>>  
>>              /*
>>               * Work-around for old bioses
>> -- 
>> 1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF
  2015-06-05 23:45 [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF Laszlo Ersek
  2015-06-05 23:46 ` [Qemu-devel] [PATCH 0/4] PXB tweaks and fixes Laszlo Ersek
@ 2015-06-10  9:09 ` Marcel Apfelbaum
  2015-06-10 11:04   ` Laszlo Ersek
  1 sibling, 1 reply; 20+ messages in thread
From: Marcel Apfelbaum @ 2015-06-10  9:09 UTC (permalink / raw)
  To: Laszlo Ersek, qemu devel list, edk2-devel list, Marcel Apfelbaum,
	Michael Tsirkin, Gabriel L. Somlo (CMU)

On 06/06/2015 02:45 AM, Laszlo Ersek wrote:
> Following up on this cross-posted message, I will send two patch sets,
> one for QEMU (to qemu-devel) and another for OVMF (to edk2-devel). With
> both in place, OVMF supports multiple PCI root buses.
Hi Laszlo,
Thank you for the patches.

>
> Below I'm writing up the way I tested the feature, plus a few random
> notes.
>
> (1) Interrupt line assignments. I patched SeaBIOS (temporarily) to print
>      interrupt line assignments, and I also patched OVMF (permanently) to
>      print the same. (See both patches in the OVMF patch
>
>        OvmfPkg: PlatformBdsLib: debug log interrupt line assignments
>
>      in one of the followup series; the commit message contains the
>      ad-hoc SeaBIOS patch.) The QEMU command line was:
>
>      qemu-system-x86_64 \
>        -m 2048 \
>        -M pc \
>        -enable-kvm \
>        -device qxl-vga \
>        \
>        $FIRMWARE_OPTIONS \
>        \
>        -drive id=cdrom,if=none,readonly,format=raw,file=$ISO \
>        -device virtio-scsi-pci,id=scsi0 \
>        -device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
>        \
>        -debugcon file:debug.log \
>        -global isa-debugcon.iobase=0x402 \
>        \
>        -monitor stdio \
>        \
>        -device pxb,id=bridge1,bus_nr=4 \
>        \
>        -netdev user,id=netdev0,hostfwd=tcp:127.0.0.1:2222-:22 \
>        -device e1000,netdev=netdev0,bus=bridge1,addr=2,romfile= \
>        \
>        -netdev user,id=netdev1,hostfwd=tcp:127.0.0.1:2223-:22 \
>        -device e1000,netdev=netdev1,bus=bridge1,addr=3,romfile= \
>        \
>        -netdev user,id=netdev2,hostfwd=tcp:127.0.0.1:2224-:22 \
>        -device e1000,netdev=netdev2,bus=bridge1,addr=4,romfile= \
>        \
>        -netdev user,id=netdev3,hostfwd=tcp:127.0.0.1:2225-:22 \
>        -device e1000,netdev=netdev3,bus=bridge1,addr=5,romfile=
>
>      The ISO variable pointed to a Fedora 20 LiveCD (although it is not
>      relevant for this test). FIRMWARE_OPTIONS were
>
>        -bios .../out/bios.bin
>
>      for the (debug-patched, see above) SeaBIOS test case, and
>
>        -drive if=pflash,readonly,format=raw,file=.../OVMF_CODE.fd \
>        -drive if=pflash,format=raw,file=.../COPY_OF_OVMF_VARS.fd
>
>      for the OVMF test case.
>
>      As you can see the QEMU command line places four e1000 NICs on one
>      PXB (I used e1000 because that's what Marcel recommended for all
>      testing). SeaBIOS then printed
>
>        PCI: init bdf=00:01.3 id=8086:7113
>        assigned irq line 10
>
>        PCI: init bdf=00:02.0 id=1b36:0100
>        assigned irq line 10
>
>        PCI: init bdf=00:03.0 id=1af4:1004
>        assigned irq line 11
>
>        PCI: init bdf=04:00.0 id=1b36:0001
>        assigned irq line 11
>
>        PCI: init bdf=05:02.0 id=8086:100e
>        assigned irq line 10
>
>        PCI: init bdf=05:03.0 id=8086:100e
>        assigned irq line 11
>
>        PCI: init bdf=05:04.0 id=8086:100e
>        assigned irq line 11
>
>        PCI: init bdf=05:05.0 id=8086:100e
>        assigned irq line 10
>
>      whereas OVMF printed
>
>        SetPciIntLine: PciRoot(0x0)/Pci(0x1,0x3) -> 0x0A
>        SetPciIntLine: PciRoot(0x0)/Pci(0x2,0x0) -> 0x0A
>        SetPciIntLine: PciRoot(0x0)/Pci(0x3,0x0) -> 0x0B
>        SetPciIntLine: PciRoot(0x4)/Pci(0x0,0x0) -> 0x0B
>        SetPciIntLine: PciRoot(0x4)/Pci(0x0,0x0)/Pci(0x2,0x0) -> 0x0A
>        SetPciIntLine: PciRoot(0x4)/Pci(0x0,0x0)/Pci(0x3,0x0) -> 0x0B
>        SetPciIntLine: PciRoot(0x4)/Pci(0x0,0x0)/Pci(0x4,0x0) -> 0x0B
>        SetPciIntLine: PciRoot(0x4)/Pci(0x0,0x0)/Pci(0x5,0x0) -> 0x0A
>
>      (Note that the bus number assigned to the devices on the pxb is 5 in
>      the OVMF case too. This is not visible from the device paths logged
>      above, but it is clear from edk2's PCI bus driver's log, and the
>      output of the "PCI" UEFI shell command.)
>
>      Therefore I concluded that the IRQ line assignment logic in OVMF
>      that Gabriel had contributed earlier (and that I updated very
>      slightly in the OVMF patchset here) continued to work correctly.
>
> (2) Marcel, no new fw_cfg file is necessary for exposing the bus numbers
>      of the extra root buses to OVMF. OVMF probes all buses and all
>      devices (function 0) to locate the extra root buses. This procedure
>      is shortened by adhering to the "etc/extra-pci-roots" fw_cfg file,
>      which is already there.
Sure. It would not be a problem to add it, but if there is no need for it...

>
>      This recommendation came from both Michael & Marcel, and it was
>      "surprisingly easy" to implement with edk2's PciLib.
>
> (3) We discussed earlier the idea to flip the PXB's _HID and _UID
>      objects to numeric values in QEMU; I implemented and tested that
>      (see more test cases below).
>
> (4) Ping test was successful from the OVMF-booted Fedora 20 LivecD
>      environment, using one e1000 NIC on a PXB.
>
> (5) Ping test was successful from the UEFI shell, using Intel's
>      proprietary UEFI driver for the e1000 NIC (called PROEFI). The NIC
>      was located on the same PXB as in the previous point.
>
> (6) Successfully loaded public web pages with Firefox in an OVMF-booted
>      Windows Server 2012 R2 guest. The NIC was located on the same PXB as
>      in the previous point.
>
> (7) At least one issue remains to be solved (designed) in QEMU, for both
>      SeaBIOS's and OVMF's sake: booting off devices that are located on
>      the PXB. The problem is with the "bootorder" fw_cfg file. Consider
>      the following example:
>
>        /pci@i0cf8/scsi@3/channel@0/disk@0,0
>        /pci/pci-bridge@0/ethernet@2/ethernet-phy@0
>
>      which is generated for the options
>
>        -device virtio-scsi-pci,id=scsi0 \
>        -device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
>        \
>        -device pxb,id=bridge1,bus_nr=4 \
>        -device e1000,netdev=netdev0,bus=bridge1,addr=2,romfile=,bootindex=1
>
>      While the first entry is recognized by both SeaBIOS and OVMF, the
>      second entry (generated for the NIC hanging off the PXB, see above)
>      is recognized by neither. (I tested OVMF, and investigated the
>      SeaBIOS source, for this claim.)
>
>      For the SeaBIOS explanation, grep the source code for FW_PCI_DOMAIN.
Thanks for bringing it to my attention.

>
>      The OVMF explanation is that OVMF simply rejects the initial
>      OpenFirmware device path node "/pci" with a controlled parse error
>      (as opposed to the "/pci@i0cf8" node, which it recognizes and
>      translates to UEFI in combination with the rest of that OFW device
>      path).
>
>      The "/pci" node comes from QEMU's sysbus_get_fw_dev_path() function,
>      file "hw/core/sysbus.c", where *neither* of the (s->num_mmio) and
>      (s->num_pio) branches apply. (The (s->num_pio) branch applies for
>      the first entry, ie. "/pci@i0cf8".)
>
>      Something has to be invented here to clue in both firmwares as to
>      the root bus number (here bus_nr=4), in a format that is compliant
>      with the "OpenFirmware unit address" concept. (Note that
>      "/pci-bridge@0" only gives away the slot number *on* the extra root
>      bus, not the number of the root bus itself.) For example:
>
>        /pci@rootbus4/pci-bridge@0/ethernet@2/ethernet-phy@0
>
>      would be acceptable. However, I don't know how to implement this in
>      sysbus_get_fw_dev_path().
I'll look into it. What is the OpenFirmware unit address" concept ? :)


Thanks,
Marcel

>
> Thanks
> Laszlo
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] i386/acpi-build: more traditional _UID and _HID for PXB root buses
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 1/4] i386/acpi-build: more traditional _UID and _HID for PXB root buses Laszlo Ersek
@ 2015-06-10  9:16     ` Marcel Apfelbaum
  0 siblings, 0 replies; 20+ messages in thread
From: Marcel Apfelbaum @ 2015-06-10  9:16 UTC (permalink / raw)
  To: Laszlo Ersek, qemu-devel; +Cc: Michael S. Tsirkin

On 06/06/2015 02:46 AM, Laszlo Ersek wrote:
> The ACPI specification permits the _HID and _UID objects to evaluate to
> strings. (See "6.1.5 _HID (Hardware ID)" and "6.1.12 _UID (Unique ID)" in
> the ACPI v6.0 spec.)
>
> With regard to related standards, the UEFI specification can also express
> a device address composed from string _HID and _UID identifiers, inside
> the Expanded ACPI Device Path Node. (See "9.3.3 ACPI Device Path", Table
> 49, in the UEFI v2.5 spec.)
>
> However, numeric (integer) contents for both _HID and _UID are more
> traditional. They are recommended by the UEFI spec for size reasons:
>
>    [...] the ACPI Device Path node is smaller and should be used if
>    possible to reduce the size of device paths that may potentially be
>    stored in nonvolatile storage [...]
>
> External tools support them better (for example the --acpi_hid and
> --acpi_uid options of "efibootmgr" only take numeric identifiers).
> Finally, numeric _HID and _UID contents are existing practice in the QEMU
> source.
>
> This patch was tested with a Fedora 20 LiveCD and a preexistent Windows
> Server 2012 R2 guest. Using "acpidump" and "iasl" in the Fedora guest, we
> get, in the SSDT:
>
>> Scope (\_SB)
>> {
>>    Device (PC04)
>>    {
>>      Name (_UID, 0x04)  // _UID: Unique ID
>>      Name (_HID, EisaId ("PNP0A03") /* PCI Bus */)  // _HID: Hardware ID
>
> Cc: Marcel Apfelbaum <marcel@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>   hw/i386/acpi-build.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 5593e41..52c2591 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -945,9 +945,8 @@ build_ssdt(GArray *table_data, GArray *linker,
>
>               scope = aml_scope("\\_SB");
>               dev = aml_device("PC%.02X", bus_num);
> -            aml_append(dev,
> -                       aml_name_decl("_UID", aml_string("PC%.02X", bus_num)));
> -            aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A03")));
> +            aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
> +            aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
>               aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
>
>               if (numa_node != NUMA_NODE_UNASSIGNED) {
>

Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>


Thanks,
Marcel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] i386/acpi-build: fix PXB workarounds for unsupported BIOSes
  2015-06-05 23:46   ` [Qemu-devel] [PATCH 2/4] i386/acpi-build: fix PXB workarounds for unsupported BIOSes Laszlo Ersek
@ 2015-06-10  9:17     ` Marcel Apfelbaum
  0 siblings, 0 replies; 20+ messages in thread
From: Marcel Apfelbaum @ 2015-06-10  9:17 UTC (permalink / raw)
  To: Laszlo Ersek, qemu-devel; +Cc: Michael S. Tsirkin

On 06/06/2015 02:46 AM, Laszlo Ersek wrote:
> The patch
>
>    apci: fix PXB behaviour if used with unsupported BIOS
>
> uses the following condition to see if a "PXB mem/IO chunk" has *not* been
> configured by the BIOS:
>
>    (!range_base || range_base > range_limit)
>
> When this condition evaluates to true, said patch *omits* the
> corresponding entry from the _CRS.
>
> Later on the patch checks for the opposite condition (with the intent of
> *adding* entries to the _CRS if the "PXB mem/IO chunks" *have* been
> configured). Unfortunately, the condition was negated incorrectly: only
> the first ! operator was removed, which led to the nonsensical expression
>
>    (range_base || range_base > range_limit)
>
> leading to bogus entries in the _CRS, and causing BSOD in Windows Server
> 2012 R2 when it runs on OVMF.
Thanks for catching this!

>
> The correct negative of the condition seen at the top is
>
>    (range_base && range_base <= range_limit)
>
> Fix the expressions.
>
> Cc: Marcel Apfelbaum <marcel@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>   hw/i386/acpi-build.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 52c2591..b71e942 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -833,7 +833,7 @@ static Aml *build_crs(PCIHostState *host,
>                * Work-around for old bioses
>                * that do not support multiple root buses
>                */
> -            if (range_base || range_base > range_limit) {
> +            if (range_base && range_base <= range_limit) {
>                   aml_append(crs,
>                              aml_word_io(AML_MIN_FIXED, AML_MAX_FIXED,
>                                          AML_POS_DECODE, AML_ENTIRE_RANGE,
> @@ -854,7 +854,7 @@ static Aml *build_crs(PCIHostState *host,
>                * Work-around for old bioses
>                * that do not support multiple root buses
>                */
> -            if (range_base || range_base > range_limit) {
> +            if (range_base && range_base <= range_limit) {
>                   aml_append(crs,
>                              aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED,
>                                               AML_MAX_FIXED, AML_NON_CACHEABLE,
> @@ -865,7 +865,7 @@ static Aml *build_crs(PCIHostState *host,
>                                               0,
>                                               range_limit - range_base + 1));
>                   crs_range_insert(mem_ranges, range_base, range_limit);
> -          }
> +            }
>
>               range_base =
>                   pci_bridge_get_base(dev, PCI_BASE_ADDRESS_MEM_PREFETCH);
> @@ -876,7 +876,7 @@ static Aml *build_crs(PCIHostState *host,
>                * Work-around for old bioses
>                * that do not support multiple root buses
>                */
> -            if (range_base || range_base > range_limit) {
> +            if (range_base && range_base <= range_limit) {
>                   aml_append(crs,
>                              aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED,
>                                               AML_MAX_FIXED, AML_NON_CACHEABLE,
>

Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>

Thanks,
Marcel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly
  2015-06-09 20:34       ` Laszlo Ersek
@ 2015-06-10 10:06         ` Marcel Apfelbaum
  2015-06-10 11:07           ` Laszlo Ersek
  2015-06-10 16:19         ` Michael S. Tsirkin
  1 sibling, 1 reply; 20+ messages in thread
From: Marcel Apfelbaum @ 2015-06-10 10:06 UTC (permalink / raw)
  To: Laszlo Ersek, Michael S. Tsirkin; +Cc: Paolo Bonzini, qemu-devel

On 06/09/2015 11:34 PM, Laszlo Ersek wrote:
> On 06/07/15 11:23, Michael S. Tsirkin wrote:
>> On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
>>> OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
>>> Bus driver globally signals the firmware that PCI enumeration and resource
>>> allocation have completed. At this point QEMU regenerates the ACPI payload
>>> in an fw_cfg read callback, and this is when the PXB's _CRS gets
>>> populated.
>>>
>>> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
>>> the root bus's command register, *unlike* under SeaBIOS. The consequences
>>> unfold as follows:
>>>
>>> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
>>>    because pci_update_mappings() --> pci_bar_address() calculated it as
>>>    PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
>>>
>>> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
>>>    the _CRS, *despite* having been programmed in PCI config space.
>>>
>>> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
>>>    root bus's DWordMemory descriptor.
>>>
>>> - Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
>>>    within the PXB's config space, and notice that it conflicts with the
>>>    main root bus's memory resource descriptors. Linux reports
>>>
>>>    pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
>>>    pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
>>>                             0x88200000-0x882000ff 64bit]
>>>    pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
>>>                             with PCI Bus 0000:00 [mem
>>>                             0x88200000-0xfebfffff]
>>>
>>>    While Windows Server 2012 R2 reports
>>>
>>>      https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
>>>
>>>      This device cannot find enough free resources that it can use. If you
>>>      want to use this device, you will need to disable one of the other
>>>      devices on this system. (Code 12)
>>>
>>> (This issue was apparently encountered earlier, see:
>>>
>>>    https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
>>>
>>> and the current hole-punching logic in build_crs() and build_ssdt() is
>>> probably supposed to remedy exactly that problem -- however, for OVMF they
>>> don't work, because at the end of the PCI enumeration and resource
>>> allocation, which cues the ACPI linker/loader client, the command register
>>> is clear.)
>>>
>>> The solution is to fetch the BAR addresses from PCI config space directly,
>>> for the purposes of build_crs(), regardless of the PCI command register
>>> and any MemoryRegion state.
>>>
>>> Example MMIO maps for a 2GB guest:
>>>
>>> SeaBIOS:
>>>
>>>    main: 0x80000000..0xFC000000 / 0x7C000000
>>>    pxb:  0xFC000000..0xFC200000 / 0x00200000
>>>    main: 0xFC200000..0xFC213000 / 0x00013000
>>>    pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
>>>    main: 0xFC213100..0xFEA00000 / 0x027ECF00
>>>    pxb:  0xFEA00000..0xFEC00000 / 0x00200000
>>>
>>> OVMF, without the fix:
>>>
>>>    main: 0x80000000..0x88100000 / 0x08100000
>>>    pxb:  0x88100000..0x88200000 / 0x00100000
>>>    main: 0x88200000..0xFEC00000 / 0x76A00000
>>>
>>> OVMF, with the fix:
>>>
>>>    main: 0x80000000..0x88100000 / 0x08100000
>>>    pxb:  0x88100000..0x88200000 / 0x00100000
>>>    pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
>>>    main: 0x88200100..0xFEC00000 / 0x769FFF00
>>>
>>> (Note that under OVMF the PCI enumerator includes requests for
>>> prefetchable memory in the nonprefetchable memory pool -- separate windows
>>> for nonprefetchable and prefetchable memory are not supported (yet) --
>>> that's why there's one fewer pxb range in the corrected OVMF case than in
>>> the SeaBIOS case.)
>>>
>>> Cc: Marcel Apfelbaum <marcel@redhat.com>
>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>>
>> This is problematic - disabled BAR values have no meaning according to
>> the PCI spec.
>>
>> It might be best to add a property to just disable shpc in the bridge so
>> no devices reside directly behind the pxb?
>
> I started looking into this.
>
> The property itself doesn't seem terribly hard (there's already an "msi"
> property which I can take as an example). Making stuff conditional on
> this new "shpc" property looked feasible in the beginning, however I
> qucikly ran into two issues:
>
> - Migration.
>
>    One idea would be to keep the SHPC setup around at all times, and
>    just not expose the PCI bar to the guest (same as in Marcel's hack
>    [1]). I didn't like this, although it would keep the migration stream
>    intact.
>
>    The other idea is to omit even the shpc_init() call when SHPC is
>    disabled. I like this, but it would require breaking migration
>    compatibility. Both "minimum_version_id" and "version_id" would have
>    to be set to 1 (from the current zero), and the fixed SHPC_VMSTATE()
>    field should be replaced with a subsection (dependent on the new
>    "shpc" flag).
>
>    Seems sweaty but doable.
>
> - Hotplug handling.
>
>    This is the deal breaker. The new "shpc" flag can affect *instances*
>    of the bridge, but SHPC is a class-level trait.
>    pci_bridge_dev_class_init() sets hc->plug and hc->unplug_request to
>    SHPC-related callbacks, plus "pci_bridge_dev_info" advertises itself
>    as TYPE_HOTPLUG_HANDLER.
>
>    This implies that I'd have to split the TYPE_PCI_BRIDGE_DEV class
>    into a base class and a derived class. Only the derived class would
>    support SHPC / TYPE_HOTPLUG_HANDLER. And, PXB would have to be
>    diverted to the new base class (without SHPC), in pxb_dev_initfn(),
>    from "pci-bridge". (The derived class would preserve the name
>    "pci-bridge".)
>
>    Consequences for migration are unclear to me. Maybe the new derived
>    class (functionally equivalent to the current TYPE_PCI_BRIDGE_DEV)
>    would be migration-compatible with the current class.
>
>    If not, I could create the "basic" bridge class as a standalone one,
>    without interrupt / MSI / SHPC / hotplug support, and PXB would use
>    that. The file "hw/pci-bridge/pci_bridge_dev.c" is really small, so
>    this would be quite easy; it woduln't duplicate a lot of code, and
>    would not affect preexistent migration at all. On the other hand,
>    people might not like that the base class functionality were
>    duplicated, instead of inherited.
Hi Laszlo,

Can you please check that the above hack [1] solves the problem?
If it works and there is no much code duplication, the latest idea
seems to me the right way to go: A specific PCI-2-PCI bridge.
Also we can always reduce duplication by moving common code in utility methods :)
I did (almost) the same with the PCIBus, creating a PXB version of it.

Now I am back from my PTO and I can help. Let me know if you want me to handle
this issue or any other way I can assist.

Thanks,
Marcel

>
>    I've managed such a base/descendant class split once before
>    (splitting fw_cfg into the IO and MMIO mapped variants) -- with help
>    of course -- so perhaps I could try it again, if that's the
>    preference.
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
>
> Thoughts?

>
> Thanks,
> Laszlo
>
>
>>> ---
>>>   hw/i386/acpi-build.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>> index b71e942..60d4f75 100644
>>> --- a/hw/i386/acpi-build.c
>>> +++ b/hw/i386/acpi-build.c
>>> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
>>>           for (i = 0; i < PCI_NUM_REGIONS; i++) {
>>>               PCIIORegion *r = &dev->io_regions[i];
>>>
>>> -            range_base = r->addr;
>>> -            range_limit = r->addr + r->size - 1;
>>> +            range_base = pci_bar_address(dev, i, r->type, r->size, false);
>>> +            range_limit = range_base + r->size - 1;
>>>
>>>               /*
>>>                * Work-around for old bioses
>>> --
>>> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF
  2015-06-10  9:09 ` [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF Marcel Apfelbaum
@ 2015-06-10 11:04   ` Laszlo Ersek
  2015-06-10 11:55     ` Marcel Apfelbaum
  0 siblings, 1 reply; 20+ messages in thread
From: Laszlo Ersek @ 2015-06-10 11:04 UTC (permalink / raw)
  To: Marcel Apfelbaum, qemu devel list, edk2-devel list,
	Marcel Apfelbaum, Michael Tsirkin, Gabriel L. Somlo (CMU)

On 06/10/15 11:09, Marcel Apfelbaum wrote:
> On 06/06/2015 02:45 AM, Laszlo Ersek wrote:

>> (7) At least one issue remains to be solved (designed) in QEMU, for both
>>      SeaBIOS's and OVMF's sake: booting off devices that are located on
>>      the PXB. The problem is with the "bootorder" fw_cfg file. Consider
>>      the following example:
>>
>>        /pci@i0cf8/scsi@3/channel@0/disk@0,0
>>        /pci/pci-bridge@0/ethernet@2/ethernet-phy@0
>>
>>      which is generated for the options
>>
>>        -device virtio-scsi-pci,id=scsi0 \
>>        -device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
>>        \
>>        -device pxb,id=bridge1,bus_nr=4 \
>>        -device
>> e1000,netdev=netdev0,bus=bridge1,addr=2,romfile=,bootindex=1
>>
>>      While the first entry is recognized by both SeaBIOS and OVMF, the
>>      second entry (generated for the NIC hanging off the PXB, see above)
>>      is recognized by neither. (I tested OVMF, and investigated the
>>      SeaBIOS source, for this claim.)
>>
>>      For the SeaBIOS explanation, grep the source code for FW_PCI_DOMAIN.
> Thanks for bringing it to my attention.
> 
>>
>>      The OVMF explanation is that OVMF simply rejects the initial
>>      OpenFirmware device path node "/pci" with a controlled parse error
>>      (as opposed to the "/pci@i0cf8" node, which it recognizes and
>>      translates to UEFI in combination with the rest of that OFW device
>>      path).
>>
>>      The "/pci" node comes from QEMU's sysbus_get_fw_dev_path() function,
>>      file "hw/core/sysbus.c", where *neither* of the (s->num_mmio) and
>>      (s->num_pio) branches apply. (The (s->num_pio) branch applies for
>>      the first entry, ie. "/pci@i0cf8".)
>>
>>      Something has to be invented here to clue in both firmwares as to
>>      the root bus number (here bus_nr=4), in a format that is compliant
>>      with the "OpenFirmware unit address" concept. (Note that
>>      "/pci-bridge@0" only gives away the slot number *on* the extra root
>>      bus, not the number of the root bus itself.) For example:
>>
>>        /pci@rootbus4/pci-bridge@0/ethernet@2/ethernet-phy@0
>>
>>      would be acceptable. However, I don't know how to implement this in
>>      sysbus_get_fw_dev_path().
> I'll look into it. What is the OpenFirmware unit address" concept ? :)

Okay, I looked it up again (also rechecked the OVMF parser code) -- in
fact the example I gave would not be preferable.

* Background:

For the specification, please see "3.2.1.1 Node names" in

  IEEE Standard for Boot (Initialization Configuration)
  Firmware: Core Requirements and Practices

The notation is

  driver-name@unit-address:device-arguments

It says (excerpt):

  The /driver name/ field is a sequence of between one and 31 letters,
  digits, and punctuation characters from the set “, . _ + - ”.
  Uppercase and lowercase characters are distinct. [...]

  [...]

  The /unit address/ field is the text representation of the physical
  address of the device within the address space defined by its parent
  node. The form of the text representation is bus-dependent.

Please see the TranslatePciOfwNodes() function in OVMF, for the
"PCI-like" OFW device paths that OVMF currently recognizes -- just
scroll through the function to see the comments:

https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuBootOrderLib/QemuBootOrderLib.c#L582

* Therefore, the only kind of /unit address/ that OVMF has faced,
exposed by QEMU, is "comma separated list of hexadecimal integers". OVMF
uses the helper function ParseUnitAddressHexList() to parse them. (It is
defined in the same file linked above.)

It would be preferable to stick with this assumption. Therefore, let me
revise my earlier recommendation, and ask for:

  /pxb@4/pci-bridge@0/ethernet@2/ethernet-phy@0
       ^
       bus_nr (hex, without 0x prefix)

instead. Providing "pxb" as /driver name/ in the very first OFW node
would be sufficient for OVMF (and I guess for SeaBIOS too) to recognize
the extra root bus. The single hex integer in the /unit address/ of the
first node would identify bus_nr. The rest of the nodes in the OFW
devpath are already recognized by OVMF.

An alternative would be simply

  /pci@4

(quoting just the first node), because I can still tell apart the
numeric ("4") /unit address/ from the "i0cf8" one that identifies the
main root bus.

Summary: either of the following would be okay:

  /pxb@4
  /pci@4

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly
  2015-06-10 10:06         ` Marcel Apfelbaum
@ 2015-06-10 11:07           ` Laszlo Ersek
  2015-06-10 16:21             ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Laszlo Ersek @ 2015-06-10 11:07 UTC (permalink / raw)
  To: Marcel Apfelbaum, Michael S. Tsirkin; +Cc: Paolo Bonzini, qemu-devel

On 06/10/15 12:06, Marcel Apfelbaum wrote:
> On 06/09/2015 11:34 PM, Laszlo Ersek wrote:
>> On 06/07/15 11:23, Michael S. Tsirkin wrote:
>>> On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
>>>> OVMF downloads the ACPI linker/loader script from QEMU when the edk2
>>>> PCI
>>>> Bus driver globally signals the firmware that PCI enumeration and
>>>> resource
>>>> allocation have completed. At this point QEMU regenerates the ACPI
>>>> payload
>>>> in an fw_cfg read callback, and this is when the PXB's _CRS gets
>>>> populated.
>>>>
>>>> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is
>>>> clear in
>>>> the root bus's command register, *unlike* under SeaBIOS. The
>>>> consequences
>>>> unfold as follows:
>>>>
>>>> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
>>>>    because pci_update_mappings() --> pci_bar_address() calculated it as
>>>>    PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
>>>>
>>>> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
>>>>    the _CRS, *despite* having been programmed in PCI config space.
>>>>
>>>> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
>>>>    root bus's DWordMemory descriptor.
>>>>
>>>> - Guest OSes (Linux and Windows alike) notice the pre-programmed
>>>> SHPC BAR
>>>>    within the PXB's config space, and notice that it conflicts with the
>>>>    main root bus's memory resource descriptors. Linux reports
>>>>
>>>>    pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
>>>>    pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
>>>>                             0x88200000-0x882000ff 64bit]
>>>>    pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
>>>>                             with PCI Bus 0000:00 [mem
>>>>                             0x88200000-0xfebfffff]
>>>>
>>>>    While Windows Server 2012 R2 reports
>>>>
>>>>     
>>>> https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
>>>>
>>>>      This device cannot find enough free resources that it can use.
>>>> If you
>>>>      want to use this device, you will need to disable one of the other
>>>>      devices on this system. (Code 12)
>>>>
>>>> (This issue was apparently encountered earlier, see:
>>>>
>>>>   
>>>> https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
>>>>
>>>> and the current hole-punching logic in build_crs() and build_ssdt() is
>>>> probably supposed to remedy exactly that problem -- however, for
>>>> OVMF they
>>>> don't work, because at the end of the PCI enumeration and resource
>>>> allocation, which cues the ACPI linker/loader client, the command
>>>> register
>>>> is clear.)
>>>>
>>>> The solution is to fetch the BAR addresses from PCI config space
>>>> directly,
>>>> for the purposes of build_crs(), regardless of the PCI command register
>>>> and any MemoryRegion state.
>>>>
>>>> Example MMIO maps for a 2GB guest:
>>>>
>>>> SeaBIOS:
>>>>
>>>>    main: 0x80000000..0xFC000000 / 0x7C000000
>>>>    pxb:  0xFC000000..0xFC200000 / 0x00200000
>>>>    main: 0xFC200000..0xFC213000 / 0x00013000
>>>>    pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
>>>>    main: 0xFC213100..0xFEA00000 / 0x027ECF00
>>>>    pxb:  0xFEA00000..0xFEC00000 / 0x00200000
>>>>
>>>> OVMF, without the fix:
>>>>
>>>>    main: 0x80000000..0x88100000 / 0x08100000
>>>>    pxb:  0x88100000..0x88200000 / 0x00100000
>>>>    main: 0x88200000..0xFEC00000 / 0x76A00000
>>>>
>>>> OVMF, with the fix:
>>>>
>>>>    main: 0x80000000..0x88100000 / 0x08100000
>>>>    pxb:  0x88100000..0x88200000 / 0x00100000
>>>>    pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
>>>>    main: 0x88200100..0xFEC00000 / 0x769FFF00
>>>>
>>>> (Note that under OVMF the PCI enumerator includes requests for
>>>> prefetchable memory in the nonprefetchable memory pool -- separate
>>>> windows
>>>> for nonprefetchable and prefetchable memory are not supported (yet) --
>>>> that's why there's one fewer pxb range in the corrected OVMF case
>>>> than in
>>>> the SeaBIOS case.)
>>>>
>>>> Cc: Marcel Apfelbaum <marcel@redhat.com>
>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>>>
>>> This is problematic - disabled BAR values have no meaning according to
>>> the PCI spec.
>>>
>>> It might be best to add a property to just disable shpc in the bridge so
>>> no devices reside directly behind the pxb?
>>
>> I started looking into this.
>>
>> The property itself doesn't seem terribly hard (there's already an "msi"
>> property which I can take as an example). Making stuff conditional on
>> this new "shpc" property looked feasible in the beginning, however I
>> qucikly ran into two issues:
>>
>> - Migration.
>>
>>    One idea would be to keep the SHPC setup around at all times, and
>>    just not expose the PCI bar to the guest (same as in Marcel's hack
>>    [1]). I didn't like this, although it would keep the migration stream
>>    intact.
>>
>>    The other idea is to omit even the shpc_init() call when SHPC is
>>    disabled. I like this, but it would require breaking migration
>>    compatibility. Both "minimum_version_id" and "version_id" would have
>>    to be set to 1 (from the current zero), and the fixed SHPC_VMSTATE()
>>    field should be replaced with a subsection (dependent on the new
>>    "shpc" flag).
>>
>>    Seems sweaty but doable.
>>
>> - Hotplug handling.
>>
>>    This is the deal breaker. The new "shpc" flag can affect *instances*
>>    of the bridge, but SHPC is a class-level trait.
>>    pci_bridge_dev_class_init() sets hc->plug and hc->unplug_request to
>>    SHPC-related callbacks, plus "pci_bridge_dev_info" advertises itself
>>    as TYPE_HOTPLUG_HANDLER.
>>
>>    This implies that I'd have to split the TYPE_PCI_BRIDGE_DEV class
>>    into a base class and a derived class. Only the derived class would
>>    support SHPC / TYPE_HOTPLUG_HANDLER. And, PXB would have to be
>>    diverted to the new base class (without SHPC), in pxb_dev_initfn(),
>>    from "pci-bridge". (The derived class would preserve the name
>>    "pci-bridge".)
>>
>>    Consequences for migration are unclear to me. Maybe the new derived
>>    class (functionally equivalent to the current TYPE_PCI_BRIDGE_DEV)
>>    would be migration-compatible with the current class.
>>
>>    If not, I could create the "basic" bridge class as a standalone one,
>>    without interrupt / MSI / SHPC / hotplug support, and PXB would use
>>    that. The file "hw/pci-bridge/pci_bridge_dev.c" is really small, so
>>    this would be quite easy; it woduln't duplicate a lot of code, and
>>    would not affect preexistent migration at all. On the other hand,
>>    people might not like that the base class functionality were
>>    duplicated, instead of inherited.
> Hi Laszlo,
> 
> Can you please check that the above hack [1] solves the problem?
> If it works and there is no much code duplication, the latest idea
> seems to me the right way to go: A specific PCI-2-PCI bridge.
> Also we can always reduce duplication by moving common code in utility
> methods :)
> I did (almost) the same with the PCIBus, creating a PXB version of it.
> 
> Now I am back from my PTO and I can help. Let me know if you want me to
> handle
> this issue or any other way I can assist.

Thanks. I'll prototype a separate bridge class for this then. Hopefully
I can post something before the end of this week. :)

Cheers
Laszlo

> 
> Thanks,
> Marcel
> 
>>
>>    I've managed such a base/descendant class split once before
>>    (splitting fw_cfg into the IO and MMIO mapped variants) -- with help
>>    of course -- so perhaps I could try it again, if that's the
>>    preference.
>>
>> [1]
>> https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
>>
>> Thoughts?
> 
>>
>> Thanks,
>> Laszlo
>>
>>
>>>> ---
>>>>   hw/i386/acpi-build.c | 4 ++--
>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>>> index b71e942..60d4f75 100644
>>>> --- a/hw/i386/acpi-build.c
>>>> +++ b/hw/i386/acpi-build.c
>>>> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
>>>>           for (i = 0; i < PCI_NUM_REGIONS; i++) {
>>>>               PCIIORegion *r = &dev->io_regions[i];
>>>>
>>>> -            range_base = r->addr;
>>>> -            range_limit = r->addr + r->size - 1;
>>>> +            range_base = pci_bar_address(dev, i, r->type, r->size,
>>>> false);
>>>> +            range_limit = range_base + r->size - 1;
>>>>
>>>>               /*
>>>>                * Work-around for old bioses
>>>> -- 
>>>> 1.8.3.1
>>
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF
  2015-06-10 11:04   ` Laszlo Ersek
@ 2015-06-10 11:55     ` Marcel Apfelbaum
  2015-06-10 12:05       ` Laszlo Ersek
  0 siblings, 1 reply; 20+ messages in thread
From: Marcel Apfelbaum @ 2015-06-10 11:55 UTC (permalink / raw)
  To: Laszlo Ersek, qemu devel list, edk2-devel list, Marcel Apfelbaum,
	Michael Tsirkin, Gabriel L. Somlo (CMU)

On 06/10/2015 02:04 PM, Laszlo Ersek wrote:
> On 06/10/15 11:09, Marcel Apfelbaum wrote:
>> On 06/06/2015 02:45 AM, Laszlo Ersek wrote:
>
>>> (7) At least one issue remains to be solved (designed) in QEMU, for both
>>>       SeaBIOS's and OVMF's sake: booting off devices that are located on
>>>       the PXB. The problem is with the "bootorder" fw_cfg file. Consider
>>>       the following example:
>>>
>>>         /pci@i0cf8/scsi@3/channel@0/disk@0,0
>>>         /pci/pci-bridge@0/ethernet@2/ethernet-phy@0
>>>
>>>       which is generated for the options
>>>
>>>         -device virtio-scsi-pci,id=scsi0 \
>>>         -device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
>>>         \
>>>         -device pxb,id=bridge1,bus_nr=4 \
>>>         -device
>>> e1000,netdev=netdev0,bus=bridge1,addr=2,romfile=,bootindex=1
>>>
>>>       While the first entry is recognized by both SeaBIOS and OVMF, the
>>>       second entry (generated for the NIC hanging off the PXB, see above)
>>>       is recognized by neither. (I tested OVMF, and investigated the
>>>       SeaBIOS source, for this claim.)
>>>
>>>       For the SeaBIOS explanation, grep the source code for FW_PCI_DOMAIN.
>> Thanks for bringing it to my attention.
>>
>>>
>>>       The OVMF explanation is that OVMF simply rejects the initial
>>>       OpenFirmware device path node "/pci" with a controlled parse error
>>>       (as opposed to the "/pci@i0cf8" node, which it recognizes and
>>>       translates to UEFI in combination with the rest of that OFW device
>>>       path).
>>>
>>>       The "/pci" node comes from QEMU's sysbus_get_fw_dev_path() function,
>>>       file "hw/core/sysbus.c", where *neither* of the (s->num_mmio) and
>>>       (s->num_pio) branches apply. (The (s->num_pio) branch applies for
>>>       the first entry, ie. "/pci@i0cf8".)
>>>
>>>       Something has to be invented here to clue in both firmwares as to
>>>       the root bus number (here bus_nr=4), in a format that is compliant
>>>       with the "OpenFirmware unit address" concept. (Note that
>>>       "/pci-bridge@0" only gives away the slot number *on* the extra root
>>>       bus, not the number of the root bus itself.) For example:
>>>
>>>         /pci@rootbus4/pci-bridge@0/ethernet@2/ethernet-phy@0
>>>
>>>       would be acceptable. However, I don't know how to implement this in
>>>       sysbus_get_fw_dev_path().
>> I'll look into it. What is the OpenFirmware unit address" concept ? :)
>
> Okay, I looked it up again (also rechecked the OVMF parser code) -- in
> fact the example I gave would not be preferable.
>
> * Background:
>
> For the specification, please see "3.2.1.1 Node names" in
>
>    IEEE Standard for Boot (Initialization Configuration)
>    Firmware: Core Requirements and Practices
>
> The notation is
>
>    driver-name@unit-address:device-arguments
>
> It says (excerpt):
>
>    The /driver name/ field is a sequence of between one and 31 letters,
>    digits, and punctuation characters from the set “, . _ + - ”.
>    Uppercase and lowercase characters are distinct. [...]
>
>    [...]
>
>    The /unit address/ field is the text representation of the physical
>    address of the device within the address space defined by its parent
>    node. The form of the text representation is bus-dependent.
>
> Please see the TranslatePciOfwNodes() function in OVMF, for the
> "PCI-like" OFW device paths that OVMF currently recognizes -- just
> scroll through the function to see the comments:
>
> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuBootOrderLib/QemuBootOrderLib.c#L582
>
> * Therefore, the only kind of /unit address/ that OVMF has faced,
> exposed by QEMU, is "comma separated list of hexadecimal integers". OVMF
> uses the helper function ParseUnitAddressHexList() to parse them. (It is
> defined in the same file linked above.)
>
> It would be preferable to stick with this assumption. Therefore, let me
> revise my earlier recommendation, and ask for:
>
>    /pxb@4/pci-bridge@0/ethernet@2/ethernet-phy@0
>         ^
>         bus_nr (hex, without 0x prefix)
>
> instead. Providing "pxb" as /driver name/ in the very first OFW node
> would be sufficient for OVMF (and I guess for SeaBIOS too) to recognize
> the extra root bus. The single hex integer in the /unit address/ of the
> first node would identify bus_nr. The rest of the nodes in the OFW
> devpath are already recognized by OVMF.
>
> An alternative would be simply
>
>    /pci@4
>
> (quoting just the first node), because I can still tell apart the
> numeric ("4") /unit address/ from the "i0cf8" one that identifies the
> main root bus.
>
> Summary: either of the following would be okay:
>
>    /pxb@4
>    /pci@4
Thanks a lot for the pointer. I prefer the latest.
I'll get to it.

Thanks,
Marcel

>
> Thanks
> Laszlo
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF
  2015-06-10 11:55     ` Marcel Apfelbaum
@ 2015-06-10 12:05       ` Laszlo Ersek
  0 siblings, 0 replies; 20+ messages in thread
From: Laszlo Ersek @ 2015-06-10 12:05 UTC (permalink / raw)
  To: Marcel Apfelbaum, qemu devel list, edk2-devel list,
	Marcel Apfelbaum, Michael Tsirkin, Gabriel L. Somlo (CMU)

On 06/10/15 13:55, Marcel Apfelbaum wrote:
> On 06/10/2015 02:04 PM, Laszlo Ersek wrote:
>> On 06/10/15 11:09, Marcel Apfelbaum wrote:
>>> On 06/06/2015 02:45 AM, Laszlo Ersek wrote:
>>
>>>> (7) At least one issue remains to be solved (designed) in QEMU, for
>>>> both
>>>>       SeaBIOS's and OVMF's sake: booting off devices that are
>>>> located on
>>>>       the PXB. The problem is with the "bootorder" fw_cfg file.
>>>> Consider
>>>>       the following example:
>>>>
>>>>         /pci@i0cf8/scsi@3/channel@0/disk@0,0
>>>>         /pci/pci-bridge@0/ethernet@2/ethernet-phy@0
>>>>
>>>>       which is generated for the options
>>>>
>>>>         -device virtio-scsi-pci,id=scsi0 \
>>>>         -device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
>>>>         \
>>>>         -device pxb,id=bridge1,bus_nr=4 \
>>>>         -device
>>>> e1000,netdev=netdev0,bus=bridge1,addr=2,romfile=,bootindex=1
>>>>
>>>>       While the first entry is recognized by both SeaBIOS and OVMF, the
>>>>       second entry (generated for the NIC hanging off the PXB, see
>>>> above)
>>>>       is recognized by neither. (I tested OVMF, and investigated the
>>>>       SeaBIOS source, for this claim.)
>>>>
>>>>       For the SeaBIOS explanation, grep the source code for
>>>> FW_PCI_DOMAIN.
>>> Thanks for bringing it to my attention.
>>>
>>>>
>>>>       The OVMF explanation is that OVMF simply rejects the initial
>>>>       OpenFirmware device path node "/pci" with a controlled parse
>>>> error
>>>>       (as opposed to the "/pci@i0cf8" node, which it recognizes and
>>>>       translates to UEFI in combination with the rest of that OFW
>>>> device
>>>>       path).
>>>>
>>>>       The "/pci" node comes from QEMU's sysbus_get_fw_dev_path()
>>>> function,
>>>>       file "hw/core/sysbus.c", where *neither* of the (s->num_mmio) and
>>>>       (s->num_pio) branches apply. (The (s->num_pio) branch applies for
>>>>       the first entry, ie. "/pci@i0cf8".)
>>>>
>>>>       Something has to be invented here to clue in both firmwares as to
>>>>       the root bus number (here bus_nr=4), in a format that is
>>>> compliant
>>>>       with the "OpenFirmware unit address" concept. (Note that
>>>>       "/pci-bridge@0" only gives away the slot number *on* the extra
>>>> root
>>>>       bus, not the number of the root bus itself.) For example:
>>>>
>>>>         /pci@rootbus4/pci-bridge@0/ethernet@2/ethernet-phy@0
>>>>
>>>>       would be acceptable. However, I don't know how to implement
>>>> this in
>>>>       sysbus_get_fw_dev_path().
>>> I'll look into it. What is the OpenFirmware unit address" concept ? :)
>>
>> Okay, I looked it up again (also rechecked the OVMF parser code) -- in
>> fact the example I gave would not be preferable.
>>
>> * Background:
>>
>> For the specification, please see "3.2.1.1 Node names" in
>>
>>    IEEE Standard for Boot (Initialization Configuration)
>>    Firmware: Core Requirements and Practices
>>
>> The notation is
>>
>>    driver-name@unit-address:device-arguments
>>
>> It says (excerpt):
>>
>>    The /driver name/ field is a sequence of between one and 31 letters,
>>    digits, and punctuation characters from the set “, . _ + - ”.
>>    Uppercase and lowercase characters are distinct. [...]
>>
>>    [...]
>>
>>    The /unit address/ field is the text representation of the physical
>>    address of the device within the address space defined by its parent
>>    node. The form of the text representation is bus-dependent.
>>
>> Please see the TranslatePciOfwNodes() function in OVMF, for the
>> "PCI-like" OFW device paths that OVMF currently recognizes -- just
>> scroll through the function to see the comments:
>>
>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuBootOrderLib/QemuBootOrderLib.c#L582
>>
>>
>> * Therefore, the only kind of /unit address/ that OVMF has faced,
>> exposed by QEMU, is "comma separated list of hexadecimal integers". OVMF
>> uses the helper function ParseUnitAddressHexList() to parse them. (It is
>> defined in the same file linked above.)
>>
>> It would be preferable to stick with this assumption. Therefore, let me
>> revise my earlier recommendation, and ask for:
>>
>>    /pxb@4/pci-bridge@0/ethernet@2/ethernet-phy@0
>>         ^
>>         bus_nr (hex, without 0x prefix)
>>
>> instead. Providing "pxb" as /driver name/ in the very first OFW node
>> would be sufficient for OVMF (and I guess for SeaBIOS too) to recognize
>> the extra root bus. The single hex integer in the /unit address/ of the
>> first node would identify bus_nr. The rest of the nodes in the OFW
>> devpath are already recognized by OVMF.
>>
>> An alternative would be simply
>>
>>    /pci@4
>>
>> (quoting just the first node), because I can still tell apart the
>> numeric ("4") /unit address/ from the "i0cf8" one that identifies the
>> main root bus.
>>
>> Summary: either of the following would be okay:
>>
>>    /pxb@4
>>    /pci@4
> Thanks a lot for the pointer. I prefer the latest.

Indeed, the current initial node that is produced for the nonzero root
buses is "/pci", so the "only" part you need to append to them is
"@bus_nr". :)

Thanks!
Laszlo

> I'll get to it.
> 
> Thanks,
> Marcel
> 
>>
>> Thanks
>> Laszlo
>>
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly
  2015-06-09 20:34       ` Laszlo Ersek
  2015-06-10 10:06         ` Marcel Apfelbaum
@ 2015-06-10 16:19         ` Michael S. Tsirkin
  1 sibling, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2015-06-10 16:19 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: Marcel Apfelbaum, Paolo Bonzini, qemu-devel

On Tue, Jun 09, 2015 at 10:34:32PM +0200, Laszlo Ersek wrote:
> On 06/07/15 11:23, Michael S. Tsirkin wrote:
> > On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
> >> OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
> >> Bus driver globally signals the firmware that PCI enumeration and resource
> >> allocation have completed. At this point QEMU regenerates the ACPI payload
> >> in an fw_cfg read callback, and this is when the PXB's _CRS gets
> >> populated.
> >>
> >> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
> >> the root bus's command register, *unlike* under SeaBIOS. The consequences
> >> unfold as follows:
> >>
> >> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
> >>   because pci_update_mappings() --> pci_bar_address() calculated it as
> >>   PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
> >>
> >> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
> >>   the _CRS, *despite* having been programmed in PCI config space.
> >>
> >> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
> >>   root bus's DWordMemory descriptor.
> >>
> >> - Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
> >>   within the PXB's config space, and notice that it conflicts with the
> >>   main root bus's memory resource descriptors. Linux reports
> >>
> >>   pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
> >>   pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
> >>                            0x88200000-0x882000ff 64bit]
> >>   pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
> >>                            with PCI Bus 0000:00 [mem
> >>                            0x88200000-0xfebfffff]
> >>
> >>   While Windows Server 2012 R2 reports
> >>
> >>     https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
> >>
> >>     This device cannot find enough free resources that it can use. If you
> >>     want to use this device, you will need to disable one of the other
> >>     devices on this system. (Code 12)
> >>
> >> (This issue was apparently encountered earlier, see:
> >>
> >>   https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
> >>
> >> and the current hole-punching logic in build_crs() and build_ssdt() is
> >> probably supposed to remedy exactly that problem -- however, for OVMF they
> >> don't work, because at the end of the PCI enumeration and resource
> >> allocation, which cues the ACPI linker/loader client, the command register
> >> is clear.)
> >>
> >> The solution is to fetch the BAR addresses from PCI config space directly,
> >> for the purposes of build_crs(), regardless of the PCI command register
> >> and any MemoryRegion state.
> >>
> >> Example MMIO maps for a 2GB guest:
> >>
> >> SeaBIOS:
> >>
> >>   main: 0x80000000..0xFC000000 / 0x7C000000
> >>   pxb:  0xFC000000..0xFC200000 / 0x00200000
> >>   main: 0xFC200000..0xFC213000 / 0x00013000
> >>   pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
> >>   main: 0xFC213100..0xFEA00000 / 0x027ECF00
> >>   pxb:  0xFEA00000..0xFEC00000 / 0x00200000
> >>
> >> OVMF, without the fix:
> >>
> >>   main: 0x80000000..0x88100000 / 0x08100000
> >>   pxb:  0x88100000..0x88200000 / 0x00100000
> >>   main: 0x88200000..0xFEC00000 / 0x76A00000
> >>
> >> OVMF, with the fix:
> >>
> >>   main: 0x80000000..0x88100000 / 0x08100000
> >>   pxb:  0x88100000..0x88200000 / 0x00100000
> >>   pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
> >>   main: 0x88200100..0xFEC00000 / 0x769FFF00
> >>
> >> (Note that under OVMF the PCI enumerator includes requests for
> >> prefetchable memory in the nonprefetchable memory pool -- separate windows
> >> for nonprefetchable and prefetchable memory are not supported (yet) --
> >> that's why there's one fewer pxb range in the corrected OVMF case than in
> >> the SeaBIOS case.)
> >>
> >> Cc: Marcel Apfelbaum <marcel@redhat.com>
> >> Cc: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> > 
> > This is problematic - disabled BAR values have no meaning according to
> > the PCI spec.
> > 
> > It might be best to add a property to just disable shpc in the bridge so
> > no devices reside directly behind the pxb?
> 
> I started looking into this.
> 
> The property itself doesn't seem terribly hard (there's already an "msi"
> property which I can take as an example). Making stuff conditional on
> this new "shpc" property looked feasible in the beginning, however I
> qucikly ran into two issues:
> 
> - Migration.
> 
>   One idea would be to keep the SHPC setup around at all times, and
>   just not expose the PCI bar to the guest (same as in Marcel's hack
>   [1]). I didn't like this, although it would keep the migration stream
>   intact.

I don't get it. You can't migrate device with SHPC to device without,
the difference is guest visible.
Why worry about migration stream being compatible?

>   The other idea is to omit even the shpc_init() call when SHPC is
>   disabled. I like this, but it would require breaking migration
>   compatibility. Both "minimum_version_id" and "version_id" would have
>   to be set to 1 (from the current zero), and the fixed SHPC_VMSTATE()
>   field should be replaced with a subsection (dependent on the new
>   "shpc" flag).
> 
>   Seems sweaty but doable.
> 
> - Hotplug handling.
> 
>   This is the deal breaker. The new "shpc" flag can affect *instances*
>   of the bridge, but SHPC is a class-level trait.
>   pci_bridge_dev_class_init() sets hc->plug and hc->unplug_request to
>   SHPC-related callbacks, plus "pci_bridge_dev_info" advertises itself
>   as TYPE_HOTPLUG_HANDLER.
> 
>   This implies that I'd have to split the TYPE_PCI_BRIDGE_DEV class
>   into a base class and a derived class. Only the derived class would
>   support SHPC / TYPE_HOTPLUG_HANDLER. And, PXB would have to be
>   diverted to the new base class (without SHPC), in pxb_dev_initfn(),
>   from "pci-bridge". (The derived class would preserve the name
>   "pci-bridge".)
> 
>   Consequences for migration are unclear to me. Maybe the new derived
>   class (functionally equivalent to the current TYPE_PCI_BRIDGE_DEV)
>   would be migration-compatible with the current class.
> 
>   If not, I could create the "basic" bridge class as a standalone one,
>   without interrupt / MSI / SHPC / hotplug support, and PXB would use
>   that. The file "hw/pci-bridge/pci_bridge_dev.c" is really small, so
>   this would be quite easy; it woduln't duplicate a lot of code, and
>   would not affect preexistent migration at all. On the other hand,
>   people might not like that the base class functionality were
>   duplicated, instead of inherited.
> 
>   I've managed such a base/descendant class split once before
>   (splitting fw_cfg into the IO and MMIO mapped variants) -- with help
>   of course -- so perhaps I could try it again, if that's the
>   preference.
> 
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
> 
> Thoughts?
> 
> Thanks,
> Laszlo
> 
> 
> >> ---
> >>  hw/i386/acpi-build.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >> index b71e942..60d4f75 100644
> >> --- a/hw/i386/acpi-build.c
> >> +++ b/hw/i386/acpi-build.c
> >> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
> >>          for (i = 0; i < PCI_NUM_REGIONS; i++) {
> >>              PCIIORegion *r = &dev->io_regions[i];
> >>  
> >> -            range_base = r->addr;
> >> -            range_limit = r->addr + r->size - 1;
> >> +            range_base = pci_bar_address(dev, i, r->type, r->size, false);
> >> +            range_limit = range_base + r->size - 1;
> >>  
> >>              /*
> >>               * Work-around for old bioses
> >> -- 
> >> 1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly
  2015-06-10 11:07           ` Laszlo Ersek
@ 2015-06-10 16:21             ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2015-06-10 16:21 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: Marcel Apfelbaum, Paolo Bonzini, qemu-devel

On Wed, Jun 10, 2015 at 01:07:17PM +0200, Laszlo Ersek wrote:
> On 06/10/15 12:06, Marcel Apfelbaum wrote:
> > On 06/09/2015 11:34 PM, Laszlo Ersek wrote:
> >> On 06/07/15 11:23, Michael S. Tsirkin wrote:
> >>> On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
> >>>> OVMF downloads the ACPI linker/loader script from QEMU when the edk2
> >>>> PCI
> >>>> Bus driver globally signals the firmware that PCI enumeration and
> >>>> resource
> >>>> allocation have completed. At this point QEMU regenerates the ACPI
> >>>> payload
> >>>> in an fw_cfg read callback, and this is when the PXB's _CRS gets
> >>>> populated.
> >>>>
> >>>> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is
> >>>> clear in
> >>>> the root bus's command register, *unlike* under SeaBIOS. The
> >>>> consequences
> >>>> unfold as follows:
> >>>>
> >>>> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
> >>>>    because pci_update_mappings() --> pci_bar_address() calculated it as
> >>>>    PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
> >>>>
> >>>> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
> >>>>    the _CRS, *despite* having been programmed in PCI config space.
> >>>>
> >>>> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
> >>>>    root bus's DWordMemory descriptor.
> >>>>
> >>>> - Guest OSes (Linux and Windows alike) notice the pre-programmed
> >>>> SHPC BAR
> >>>>    within the PXB's config space, and notice that it conflicts with the
> >>>>    main root bus's memory resource descriptors. Linux reports
> >>>>
> >>>>    pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
> >>>>    pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
> >>>>                             0x88200000-0x882000ff 64bit]
> >>>>    pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
> >>>>                             with PCI Bus 0000:00 [mem
> >>>>                             0x88200000-0xfebfffff]
> >>>>
> >>>>    While Windows Server 2012 R2 reports
> >>>>
> >>>>     
> >>>> https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
> >>>>
> >>>>      This device cannot find enough free resources that it can use.
> >>>> If you
> >>>>      want to use this device, you will need to disable one of the other
> >>>>      devices on this system. (Code 12)
> >>>>
> >>>> (This issue was apparently encountered earlier, see:
> >>>>
> >>>>   
> >>>> https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
> >>>>
> >>>> and the current hole-punching logic in build_crs() and build_ssdt() is
> >>>> probably supposed to remedy exactly that problem -- however, for
> >>>> OVMF they
> >>>> don't work, because at the end of the PCI enumeration and resource
> >>>> allocation, which cues the ACPI linker/loader client, the command
> >>>> register
> >>>> is clear.)
> >>>>
> >>>> The solution is to fetch the BAR addresses from PCI config space
> >>>> directly,
> >>>> for the purposes of build_crs(), regardless of the PCI command register
> >>>> and any MemoryRegion state.
> >>>>
> >>>> Example MMIO maps for a 2GB guest:
> >>>>
> >>>> SeaBIOS:
> >>>>
> >>>>    main: 0x80000000..0xFC000000 / 0x7C000000
> >>>>    pxb:  0xFC000000..0xFC200000 / 0x00200000
> >>>>    main: 0xFC200000..0xFC213000 / 0x00013000
> >>>>    pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
> >>>>    main: 0xFC213100..0xFEA00000 / 0x027ECF00
> >>>>    pxb:  0xFEA00000..0xFEC00000 / 0x00200000
> >>>>
> >>>> OVMF, without the fix:
> >>>>
> >>>>    main: 0x80000000..0x88100000 / 0x08100000
> >>>>    pxb:  0x88100000..0x88200000 / 0x00100000
> >>>>    main: 0x88200000..0xFEC00000 / 0x76A00000
> >>>>
> >>>> OVMF, with the fix:
> >>>>
> >>>>    main: 0x80000000..0x88100000 / 0x08100000
> >>>>    pxb:  0x88100000..0x88200000 / 0x00100000
> >>>>    pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
> >>>>    main: 0x88200100..0xFEC00000 / 0x769FFF00
> >>>>
> >>>> (Note that under OVMF the PCI enumerator includes requests for
> >>>> prefetchable memory in the nonprefetchable memory pool -- separate
> >>>> windows
> >>>> for nonprefetchable and prefetchable memory are not supported (yet) --
> >>>> that's why there's one fewer pxb range in the corrected OVMF case
> >>>> than in
> >>>> the SeaBIOS case.)
> >>>>
> >>>> Cc: Marcel Apfelbaum <marcel@redhat.com>
> >>>> Cc: Michael S. Tsirkin <mst@redhat.com>
> >>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> >>>
> >>> This is problematic - disabled BAR values have no meaning according to
> >>> the PCI spec.
> >>>
> >>> It might be best to add a property to just disable shpc in the bridge so
> >>> no devices reside directly behind the pxb?
> >>
> >> I started looking into this.
> >>
> >> The property itself doesn't seem terribly hard (there's already an "msi"
> >> property which I can take as an example). Making stuff conditional on
> >> this new "shpc" property looked feasible in the beginning, however I
> >> qucikly ran into two issues:
> >>
> >> - Migration.
> >>
> >>    One idea would be to keep the SHPC setup around at all times, and
> >>    just not expose the PCI bar to the guest (same as in Marcel's hack
> >>    [1]). I didn't like this, although it would keep the migration stream
> >>    intact.
> >>
> >>    The other idea is to omit even the shpc_init() call when SHPC is
> >>    disabled. I like this, but it would require breaking migration
> >>    compatibility. Both "minimum_version_id" and "version_id" would have
> >>    to be set to 1 (from the current zero), and the fixed SHPC_VMSTATE()
> >>    field should be replaced with a subsection (dependent on the new
> >>    "shpc" flag).
> >>
> >>    Seems sweaty but doable.
> >>
> >> - Hotplug handling.
> >>
> >>    This is the deal breaker. The new "shpc" flag can affect *instances*
> >>    of the bridge, but SHPC is a class-level trait.
> >>    pci_bridge_dev_class_init() sets hc->plug and hc->unplug_request to
> >>    SHPC-related callbacks, plus "pci_bridge_dev_info" advertises itself
> >>    as TYPE_HOTPLUG_HANDLER.

Right, but you can simply add code to fail plug requests (and ignore
unplug requests) after checking this property.


> >>
> >>    This implies that I'd have to split the TYPE_PCI_BRIDGE_DEV class
> >>    into a base class and a derived class. Only the derived class would
> >>    support SHPC / TYPE_HOTPLUG_HANDLER. And, PXB would have to be
> >>    diverted to the new base class (without SHPC), in pxb_dev_initfn(),
> >>    from "pci-bridge". (The derived class would preserve the name
> >>    "pci-bridge".)
> >>
> >>    Consequences for migration are unclear to me. Maybe the new derived
> >>    class (functionally equivalent to the current TYPE_PCI_BRIDGE_DEV)
> >>    would be migration-compatible with the current class.
> >>
> >>    If not, I could create the "basic" bridge class as a standalone one,
> >>    without interrupt / MSI / SHPC / hotplug support, and PXB would use
> >>    that. The file "hw/pci-bridge/pci_bridge_dev.c" is really small, so
> >>    this would be quite easy; it woduln't duplicate a lot of code, and
> >>    would not affect preexistent migration at all. On the other hand,
> >>    people might not like that the base class functionality were
> >>    duplicated, instead of inherited.
> > Hi Laszlo,
> > 
> > Can you please check that the above hack [1] solves the problem?
> > If it works and there is no much code duplication, the latest idea
> > seems to me the right way to go: A specific PCI-2-PCI bridge.
> > Also we can always reduce duplication by moving common code in utility
> > methods :)
> > I did (almost) the same with the PCIBus, creating a PXB version of it.
> > 
> > Now I am back from my PTO and I can help. Let me know if you want me to
> > handle
> > this issue or any other way I can assist.
> 
> Thanks. I'll prototype a separate bridge class for this then. Hopefully
> I can post something before the end of this week. :)
> 
> Cheers
> Laszlo
> 
> > 
> > Thanks,
> > Marcel
> > 
> >>
> >>    I've managed such a base/descendant class split once before
> >>    (splitting fw_cfg into the IO and MMIO mapped variants) -- with help
> >>    of course -- so perhaps I could try it again, if that's the
> >>    preference.
> >>
> >> [1]
> >> https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
> >>
> >> Thoughts?
> > 
> >>
> >> Thanks,
> >> Laszlo
> >>
> >>
> >>>> ---
> >>>>   hw/i386/acpi-build.c | 4 ++--
> >>>>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >>>> index b71e942..60d4f75 100644
> >>>> --- a/hw/i386/acpi-build.c
> >>>> +++ b/hw/i386/acpi-build.c
> >>>> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
> >>>>           for (i = 0; i < PCI_NUM_REGIONS; i++) {
> >>>>               PCIIORegion *r = &dev->io_regions[i];
> >>>>
> >>>> -            range_base = r->addr;
> >>>> -            range_limit = r->addr + r->size - 1;
> >>>> +            range_base = pci_bar_address(dev, i, r->type, r->size,
> >>>> false);
> >>>> +            range_limit = range_base + r->size - 1;
> >>>>
> >>>>               /*
> >>>>                * Work-around for old bioses
> >>>> -- 
> >>>> 1.8.3.1
> >>
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-06-10 16:21 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-05 23:45 [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF Laszlo Ersek
2015-06-05 23:46 ` [Qemu-devel] [PATCH 0/4] PXB tweaks and fixes Laszlo Ersek
2015-06-05 23:46   ` [Qemu-devel] [PATCH 1/4] i386/acpi-build: more traditional _UID and _HID for PXB root buses Laszlo Ersek
2015-06-10  9:16     ` Marcel Apfelbaum
2015-06-05 23:46   ` [Qemu-devel] [PATCH 2/4] i386/acpi-build: fix PXB workarounds for unsupported BIOSes Laszlo Ersek
2015-06-10  9:17     ` Marcel Apfelbaum
2015-06-05 23:46   ` [Qemu-devel] [PATCH 3/4] hw/pci: allow the caller of pci_bar_address() to ignore command register Laszlo Ersek
2015-06-05 23:46   ` [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly Laszlo Ersek
2015-06-07  9:23     ` Michael S. Tsirkin
2015-06-08  7:56       ` Laszlo Ersek
2015-06-08  9:40         ` Michael S. Tsirkin
2015-06-09 20:34       ` Laszlo Ersek
2015-06-10 10:06         ` Marcel Apfelbaum
2015-06-10 11:07           ` Laszlo Ersek
2015-06-10 16:21             ` Michael S. Tsirkin
2015-06-10 16:19         ` Michael S. Tsirkin
2015-06-10  9:09 ` [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF Marcel Apfelbaum
2015-06-10 11:04   ` Laszlo Ersek
2015-06-10 11:55     ` Marcel Apfelbaum
2015-06-10 12:05       ` Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).