All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcel Apfelbaum <marcel@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly
Date: Wed, 10 Jun 2015 13:06:44 +0300	[thread overview]
Message-ID: <55780C34.2000403@redhat.com> (raw)
In-Reply-To: <55774DD8.9050606@redhat.com>

On 06/09/2015 11:34 PM, Laszlo Ersek wrote:
> On 06/07/15 11:23, Michael S. Tsirkin wrote:
>> On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
>>> OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
>>> Bus driver globally signals the firmware that PCI enumeration and resource
>>> allocation have completed. At this point QEMU regenerates the ACPI payload
>>> in an fw_cfg read callback, and this is when the PXB's _CRS gets
>>> populated.
>>>
>>> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
>>> the root bus's command register, *unlike* under SeaBIOS. The consequences
>>> unfold as follows:
>>>
>>> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
>>>    because pci_update_mappings() --> pci_bar_address() calculated it as
>>>    PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
>>>
>>> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
>>>    the _CRS, *despite* having been programmed in PCI config space.
>>>
>>> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
>>>    root bus's DWordMemory descriptor.
>>>
>>> - Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
>>>    within the PXB's config space, and notice that it conflicts with the
>>>    main root bus's memory resource descriptors. Linux reports
>>>
>>>    pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
>>>    pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
>>>                             0x88200000-0x882000ff 64bit]
>>>    pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
>>>                             with PCI Bus 0000:00 [mem
>>>                             0x88200000-0xfebfffff]
>>>
>>>    While Windows Server 2012 R2 reports
>>>
>>>      https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
>>>
>>>      This device cannot find enough free resources that it can use. If you
>>>      want to use this device, you will need to disable one of the other
>>>      devices on this system. (Code 12)
>>>
>>> (This issue was apparently encountered earlier, see:
>>>
>>>    https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
>>>
>>> and the current hole-punching logic in build_crs() and build_ssdt() is
>>> probably supposed to remedy exactly that problem -- however, for OVMF they
>>> don't work, because at the end of the PCI enumeration and resource
>>> allocation, which cues the ACPI linker/loader client, the command register
>>> is clear.)
>>>
>>> The solution is to fetch the BAR addresses from PCI config space directly,
>>> for the purposes of build_crs(), regardless of the PCI command register
>>> and any MemoryRegion state.
>>>
>>> Example MMIO maps for a 2GB guest:
>>>
>>> SeaBIOS:
>>>
>>>    main: 0x80000000..0xFC000000 / 0x7C000000
>>>    pxb:  0xFC000000..0xFC200000 / 0x00200000
>>>    main: 0xFC200000..0xFC213000 / 0x00013000
>>>    pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
>>>    main: 0xFC213100..0xFEA00000 / 0x027ECF00
>>>    pxb:  0xFEA00000..0xFEC00000 / 0x00200000
>>>
>>> OVMF, without the fix:
>>>
>>>    main: 0x80000000..0x88100000 / 0x08100000
>>>    pxb:  0x88100000..0x88200000 / 0x00100000
>>>    main: 0x88200000..0xFEC00000 / 0x76A00000
>>>
>>> OVMF, with the fix:
>>>
>>>    main: 0x80000000..0x88100000 / 0x08100000
>>>    pxb:  0x88100000..0x88200000 / 0x00100000
>>>    pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
>>>    main: 0x88200100..0xFEC00000 / 0x769FFF00
>>>
>>> (Note that under OVMF the PCI enumerator includes requests for
>>> prefetchable memory in the nonprefetchable memory pool -- separate windows
>>> for nonprefetchable and prefetchable memory are not supported (yet) --
>>> that's why there's one fewer pxb range in the corrected OVMF case than in
>>> the SeaBIOS case.)
>>>
>>> Cc: Marcel Apfelbaum <marcel@redhat.com>
>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>>
>> This is problematic - disabled BAR values have no meaning according to
>> the PCI spec.
>>
>> It might be best to add a property to just disable shpc in the bridge so
>> no devices reside directly behind the pxb?
>
> I started looking into this.
>
> The property itself doesn't seem terribly hard (there's already an "msi"
> property which I can take as an example). Making stuff conditional on
> this new "shpc" property looked feasible in the beginning, however I
> qucikly ran into two issues:
>
> - Migration.
>
>    One idea would be to keep the SHPC setup around at all times, and
>    just not expose the PCI bar to the guest (same as in Marcel's hack
>    [1]). I didn't like this, although it would keep the migration stream
>    intact.
>
>    The other idea is to omit even the shpc_init() call when SHPC is
>    disabled. I like this, but it would require breaking migration
>    compatibility. Both "minimum_version_id" and "version_id" would have
>    to be set to 1 (from the current zero), and the fixed SHPC_VMSTATE()
>    field should be replaced with a subsection (dependent on the new
>    "shpc" flag).
>
>    Seems sweaty but doable.
>
> - Hotplug handling.
>
>    This is the deal breaker. The new "shpc" flag can affect *instances*
>    of the bridge, but SHPC is a class-level trait.
>    pci_bridge_dev_class_init() sets hc->plug and hc->unplug_request to
>    SHPC-related callbacks, plus "pci_bridge_dev_info" advertises itself
>    as TYPE_HOTPLUG_HANDLER.
>
>    This implies that I'd have to split the TYPE_PCI_BRIDGE_DEV class
>    into a base class and a derived class. Only the derived class would
>    support SHPC / TYPE_HOTPLUG_HANDLER. And, PXB would have to be
>    diverted to the new base class (without SHPC), in pxb_dev_initfn(),
>    from "pci-bridge". (The derived class would preserve the name
>    "pci-bridge".)
>
>    Consequences for migration are unclear to me. Maybe the new derived
>    class (functionally equivalent to the current TYPE_PCI_BRIDGE_DEV)
>    would be migration-compatible with the current class.
>
>    If not, I could create the "basic" bridge class as a standalone one,
>    without interrupt / MSI / SHPC / hotplug support, and PXB would use
>    that. The file "hw/pci-bridge/pci_bridge_dev.c" is really small, so
>    this would be quite easy; it woduln't duplicate a lot of code, and
>    would not affect preexistent migration at all. On the other hand,
>    people might not like that the base class functionality were
>    duplicated, instead of inherited.
Hi Laszlo,

Can you please check that the above hack [1] solves the problem?
If it works and there is no much code duplication, the latest idea
seems to me the right way to go: A specific PCI-2-PCI bridge.
Also we can always reduce duplication by moving common code in utility methods :)
I did (almost) the same with the PCIBus, creating a PXB version of it.

Now I am back from my PTO and I can help. Let me know if you want me to handle
this issue or any other way I can assist.

Thanks,
Marcel

>
>    I've managed such a base/descendant class split once before
>    (splitting fw_cfg into the IO and MMIO mapped variants) -- with help
>    of course -- so perhaps I could try it again, if that's the
>    preference.
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
>
> Thoughts?

>
> Thanks,
> Laszlo
>
>
>>> ---
>>>   hw/i386/acpi-build.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>> index b71e942..60d4f75 100644
>>> --- a/hw/i386/acpi-build.c
>>> +++ b/hw/i386/acpi-build.c
>>> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
>>>           for (i = 0; i < PCI_NUM_REGIONS; i++) {
>>>               PCIIORegion *r = &dev->io_regions[i];
>>>
>>> -            range_base = r->addr;
>>> -            range_limit = r->addr + r->size - 1;
>>> +            range_base = pci_bar_address(dev, i, r->type, r->size, false);
>>> +            range_limit = range_base + r->size - 1;
>>>
>>>               /*
>>>                * Work-around for old bioses
>>> --
>>> 1.8.3.1
>

  reply	other threads:[~2015-06-10 10:06 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-05 23:45 [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF Laszlo Ersek
2015-06-05 23:46 ` [Qemu-devel] [PATCH 0/4] PXB tweaks and fixes Laszlo Ersek
2015-06-05 23:46   ` [Qemu-devel] [PATCH 1/4] i386/acpi-build: more traditional _UID and _HID for PXB root buses Laszlo Ersek
2015-06-10  9:16     ` Marcel Apfelbaum
2015-06-05 23:46   ` [Qemu-devel] [PATCH 2/4] i386/acpi-build: fix PXB workarounds for unsupported BIOSes Laszlo Ersek
2015-06-10  9:17     ` Marcel Apfelbaum
2015-06-05 23:46   ` [Qemu-devel] [PATCH 3/4] hw/pci: allow the caller of pci_bar_address() to ignore command register Laszlo Ersek
2015-06-05 23:46   ` [Qemu-devel] [PATCH 4/4] i386/acpi-build: build_crs(): fetch BAR from PCI config space directly Laszlo Ersek
2015-06-07  9:23     ` Michael S. Tsirkin
2015-06-08  7:56       ` Laszlo Ersek
2015-06-08  9:40         ` Michael S. Tsirkin
2015-06-09 20:34       ` Laszlo Ersek
2015-06-10 10:06         ` Marcel Apfelbaum [this message]
2015-06-10 11:07           ` Laszlo Ersek
2015-06-10 16:21             ` Michael S. Tsirkin
2015-06-10 16:19         ` Michael S. Tsirkin
2015-06-10  9:09 ` [Qemu-devel] PXB fixes for QEMU, and extra root buses for OVMF Marcel Apfelbaum
2015-06-10 11:04   ` Laszlo Ersek
2015-06-10 11:55     ` Marcel Apfelbaum
2015-06-10 12:05       ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55780C34.2000403@redhat.com \
    --to=marcel@redhat.com \
    --cc=lersek@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.