qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Marcel Apfelbaum <marcel@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly
Date: Wed, 10 Jun 2015 13:06:44 +0300	[thread overview]
Message-ID: <55780C34.2000403@redhat.com> (raw)
In-Reply-To: <55774DD8.9050606@redhat.com>

On 06/09/2015 11:34 PM, Laszlo Ersek wrote:
> On 06/07/15 11:23, Michael S. Tsirkin wrote:
>> On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
>>> OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
>>> Bus driver globally signals the firmware that PCI enumeration and resource
>>> allocation have completed. At this point QEMU regenerates the ACPI payload
>>> in an fw_cfg read callback, and this is when the PXB's _CRS gets
>>> populated.
>>>
>>> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
>>> the root bus's command register, *unlike* under SeaBIOS. The consequences
>>> unfold as follows:
>>>
>>> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
>>>    because pci_update_mappings() --> pci_bar_address() calculated it as
>>>    PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
>>>
>>> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
>>>    the _CRS, *despite* having been programmed in PCI config space.
>>>
>>> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
>>>    root bus's DWordMemory descriptor.
>>>
>>> - Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
>>>    within the PXB's config space, and notice that it conflicts with the
>>>    main root bus's memory resource descriptors. Linux reports
>>>
>>>    pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
>>>    pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
>>>                             0x88200000-0x882000ff 64bit]
>>>    pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
>>>                             with PCI Bus 0000:00 [mem
>>>                             0x88200000-0xfebfffff]
>>>
>>>    While Windows Server 2012 R2 reports
>>>
>>>      https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
>>>
>>>      This device cannot find enough free resources that it can use. If you
>>>      want to use this device, you will need to disable one of the other
>>>      devices on this system. (Code 12)
>>>
>>> (This issue was apparently encountered earlier, see:
>>>
>>>    https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
>>>
>>> and the current hole-punching logic in build_crs() and build_ssdt() is
>>> probably supposed to remedy exactly that problem -- however, for OVMF they
>>> don't work, because at the end of the PCI enumeration and resource
>>> allocation, which cues the ACPI linker/loader client, the command register
>>> is clear.)
>>>
>>> The solution is to fetch the BAR addresses from PCI config space directly,
>>> for the purposes of build_crs(), regardless of the PCI command register
>>> and any MemoryRegion state.
>>>
>>> Example MMIO maps for a 2GB guest:
>>>
>>> SeaBIOS:
>>>
>>>    main: 0x80000000..0xFC000000 / 0x7C000000
>>>    pxb:  0xFC000000..0xFC200000 / 0x00200000
>>>    main: 0xFC200000..0xFC213000 / 0x00013000
>>>    pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
>>>    main: 0xFC213100..0xFEA00000 / 0x027ECF00
>>>    pxb:  0xFEA00000..0xFEC00000 / 0x00200000
>>>
>>> OVMF, without the fix:
>>>
>>>    main: 0x80000000..0x88100000 / 0x08100000
>>>    pxb:  0x88100000..0x88200000 / 0x00100000
>>>    main: 0x88200000..0xFEC00000 / 0x76A00000
>>>
>>> OVMF, with the fix:
>>>
>>>    main: 0x80000000..0x88100000 / 0x08100000
>>>    pxb:  0x88100000..0x88200000 / 0x00100000
>>>    pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
>>>    main: 0x88200100..0xFEC00000 / 0x769FFF00
>>>
>>> (Note that under OVMF the PCI enumerator includes requests for
>>> prefetchable memory in the nonprefetchable memory pool -- separate windows
>>> for nonprefetchable and prefetchable memory are not supported (yet) --
>>> that's why there's one fewer pxb range in the corrected OVMF case than in
>>> the SeaBIOS case.)
>>>
>>> Cc: Marcel Apfelbaum <marcel@redhat.com>
>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>>
>> This is problematic - disabled BAR values have no meaning according to
>> the PCI spec.
>>
>> It might be best to add a property to just disable shpc in the bridge so
>> no devices reside directly behind the pxb?
>
> I started looking into this.
>
> The property itself doesn't seem terribly hard (there's already an "msi"
> property which I can take as an example). Making stuff conditional on
> this new "shpc" property looked feasible in the beginning, however I
> qucikly ran into two issues:
>
> - Migration.
>
>    One idea would be to keep the SHPC setup around at all times, and
>    just not expose the PCI bar to the guest (same as in Marcel's hack
>    [1]). I didn't like this, although it would keep the migration stream
>    intact.
>
>    The other idea is to omit even the shpc_init() call when SHPC is
>    disabled. I like this, but it would require breaking migration
>    compatibility. Both "minimum_version_id" and "version_id" would have
>    to be set to 1 (from the current zero), and the fixed SHPC_VMSTATE()
>    field should be replaced with a subsection (dependent on the new
>    "shpc" flag).
>
>    Seems sweaty but doable.
>
> - Hotplug handling.
>
>    This is the deal breaker. The new "shpc" flag can affect *instances*
>    of the bridge, but SHPC is a class-level trait.
>    pci_bridge_dev_class_init() sets hc->plug and hc->unplug_request to
>    SHPC-related callbacks, plus "pci_bridge_dev_info" advertises itself
>    as TYPE_HOTPLUG_HANDLER.
>
>    This implies that I'd have to split the TYPE_PCI_BRIDGE_DEV class
>    into a base class and a derived class. Only the derived class would
>    support SHPC / TYPE_HOTPLUG_HANDLER. And, PXB would have to be
>    diverted to the new base class (without SHPC), in pxb_dev_initfn(),
>    from "pci-bridge". (The derived class would preserve the name
>    "pci-bridge".)
>
>    Consequences for migration are unclear to me. Maybe the new derived
>    class (functionally equivalent to the current TYPE_PCI_BRIDGE_DEV)
>    would be migration-compatible with the current class.
>
>    If not, I could create the "basic" bridge class as a standalone one,
>    without interrupt / MSI / SHPC / hotplug support, and PXB would use
>    that. The file "hw/pci-bridge/pci_bridge_dev.c" is really small, so
>    this would be quite easy; it woduln't duplicate a lot of code, and
>    would not affect preexistent migration at all. On the other hand,
>    people might not like that the base class functionality were
>    duplicated, instead of inherited.
Hi Laszlo,

Can you please check that the above hack [1] solves the problem?
If it works and there is no much code duplication, the latest idea
seems to me the right way to go: A specific PCI-2-PCI bridge.
Also we can always reduce duplication by moving common code in utility methods :)
I did (almost) the same with the PCIBus, creating a PXB version of it.

Now I am back from my PTO and I can help. Let me know if you want me to handle
this issue or any other way I can assist.

Thanks,
Marcel

>
>    I've managed such a base/descendant class split once before
>    (splitting fw_cfg into the IO and MMIO mapped variants) -- with help
>    of course -- so perhaps I could try it again, if that's the
>    preference.
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
>
> Thoughts?

>
> Thanks,
> Laszlo
>
>
>>> ---
>>>   hw/i386/acpi-build.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>> index b71e942..60d4f75 100644
>>> --- a/hw/i386/acpi-build.c
>>> +++ b/hw/i386/acpi-build.c
>>> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
>>>           for (i = 0; i < PCI_NUM_REGIONS; i++) {
>>>               PCIIORegion *r = &dev->io_regions[i];
>>>
>>> -            range_base = r->addr;
>>> -            range_limit = r->addr + r->size - 1;
>>> +            range_base = pci_bar_address(dev, i, r->type, r->size, false);
>>> +            range_limit = range_base + r->size - 1;
>>>
>>>               /*
>>>                * Work-around for old bioses
>>> --
>>> 1.8.3.1
>

  reply	other threads:[~2015-06-10 10:06 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-05 23:45 [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF Laszlo Ersek
2015-06-05 23:46 ` [Qemu-devel] [PATCH 0/4] PXB tweaks and fixes Laszlo Ersek
2015-06-05 23:46   ` [Qemu-devel] [PATCH 1/4] i386/acpi-build: more traditional _UID and _HID for PXB root buses Laszlo Ersek
2015-06-10  9:16     ` Marcel Apfelbaum
2015-06-05 23:46   ` [Qemu-devel] [PATCH 2/4] i386/acpi-build: fix PXB workarounds for unsupported BIOSes Laszlo Ersek
2015-06-10  9:17     ` Marcel Apfelbaum
2015-06-05 23:46   ` [Qemu-devel] [PATCH 3/4] hw/pci: allow the caller of pci_bar_address() to ignore command register Laszlo Ersek
2015-06-05 23:46   ` [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly Laszlo Ersek
2015-06-07  9:23     ` Michael S. Tsirkin
2015-06-08  7:56       ` Laszlo Ersek
2015-06-08  9:40         ` Michael S. Tsirkin
2015-06-09 20:34       ` Laszlo Ersek
2015-06-10 10:06         ` Marcel Apfelbaum [this message]
2015-06-10 11:07           ` Laszlo Ersek
2015-06-10 16:21             ` Michael S. Tsirkin
2015-06-10 16:19         ` Michael S. Tsirkin
2015-06-10  9:09 ` [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF Marcel Apfelbaum
2015-06-10 11:04   ` Laszlo Ersek
2015-06-10 11:55     ` Marcel Apfelbaum
2015-06-10 12:05       ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55780C34.2000403@redhat.com \
    --to=marcel@redhat.com \
    --cc=lersek@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).