From: Marcel Apfelbaum <marcel@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>, qemu-devel@nongnu.org
Cc: mst@redhat.com
Subject: Re: [Qemu-devel] [PATCH] hw/pci: do not update the PCI mappings while Decode (I/O or memory) bit is not set in the Command register
Date: Mon, 11 Jan 2016 20:57:51 +0200 [thread overview]
Message-ID: <5693FB2F.2080403@redhat.com> (raw)
In-Reply-To: <5693F802.2010304@redhat.com>
On 01/11/2016 08:44 PM, Laszlo Ersek wrote:
> On 01/11/16 19:01, Marcel Apfelbaum wrote:
>> On 01/11/2016 07:15 PM, Laszlo Ersek wrote:
>>> On 01/11/16 17:34, Marcel Apfelbaum wrote:
>>>> On 01/11/2016 06:11 PM, Laszlo Ersek wrote:
>>>>> On 01/11/16 13:24, Marcel Apfelbaum wrote:
>>>>>> Two reasons:
>>>>>> - PCI Spec indicates that while the bit is not set
>>>>>> the memory sizing is not finished.
>>>>>> - pci_bar_address will return PCI_BAR_UNMAPPED
>>>>>> and a previous value can be accidentally overridden
>>>>>> if the command register is modified (and not the BAR).
>>>>>>
>>>>>> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
>>>>>> ---
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I found this when trying to use multiple root complexes with OVMF.
>>>>>>
>>>>>> When trying to attach a device to the pxb-pcie device as Integrated
>>>>>> Device it did not receive the IO/MEM resources.
>>>>>>
>>>>>> The reason is that OVMF is working like that:
>>>>>> 1. It disables the Decode (I/O or memory) bit in the Command
>>>>>> register
>>>>>> 2. It configures the device BARS
>>>>>> 3. Makes some tests on the Command register
>>>>>> 4. ...
>>>>>> 5. Enables the Decode (I/O or memory) at some point.
>>>>>>
>>>>>> On step 3 all the BARS are overridden to 0xffffffff by QEMU.
>>>>>>
>>>>>> Since QEMU uses the device BARs to compute the new host bridge
>>>>>> resources
>>>>>> it now gets garbage.
>>>>>>
>>>>>> Laszlo, this also solves the SHPC problem for the pci-2-pci bridge
>>>>>> inside the pxb.
>>>>>> Now we can enable the SHPC for it too.
>>>>>
>>>>> I encountered the exact same problem months ago. I posted patches for
>>>>> it; you were CC'd. :)
>>>>>
>>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342209
>>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342210
>>>>>
>>>>> As you can see under the second link above, I made the same analysis &
>>>>> observations as you do now. (It took me quite long to track down the
>>>>> "inexplicable" behavior of edk2's generic PCI bus driver / enumerator
>>>>> that is built into OVMF.)
>>>>
>>>> Wow, I just re-worked this issue again from 0! I wish I have remembered
>>>> those threads :(
>>>> This was another symptom of the exact problem! And I remembered
>>>> something about
>>>> SHPC, I should have looked at those mail threads again...
>>>>
>>>>>
>>>>> I proposed to change pci_bar_address() so that it could return, to
>>>>> distinguished callers, the BAR values "under programming", even if the
>>>>> command bits were clear. Then the ACPI generator would utilize this
>>>>> special exception.
>>>>>
>>>>> Michael disagreed; in
>>>>>
>>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342242
>>>>>
>>>>> he wrote "[t]his is problematic - disabled BAR values have no meaning
>>>>> according to the PCI spec".
>>>>>
>>>>
>>>> Yes... because it looked like a hook for our case only,
>>>> the good news is that this patch is based exactly on the fact that
>>>> the BARs have no meaning if the bit is not set.
>>>>
>>>>> The current solution to the problem (= we disable the SHPC) was
>>>>> recommended by Michael in that message: "It might be best to add a
>>>>> property to just disable shpc in the bridge so no devices reside
>>>>> directly behind the pxb?"
>>>>>
>>>>
>>>> I confess I don't exactly understand what the SHPC of the pci-2-pci
>>>> bridge
>>>> has to do with sibling devices on the pxb's root bus (SHPC is the
>>>> hot-plug controller
>>>> for the devices behind the pci-2-pci bridge).
>>>>
>>>> The second part I do understand, the pxb design was to not have devices
>>>> directly behind
>>>> the pxb, so maybe he meant that SHPC is the part of the pci-bridge that
>>>> behaves like
>>>> a device in the sense it requires IO/MEM resources.
>>>>
>>>> Bottom line, your solution for the PXB was just fine :)
>>>>
>>>>
>>>>> In comparison, your patch doesn't change pci_bar_address(). Instead, it
>>>>> modifies pci_update_mappings() *not to call* pci_bar_address(), if the
>>>>> respective command bits are clear.
>>>>>
>>>>> I guess that could have about the same effect.
>>>>>
>>>>> If, unlike my patch, yours actually improves QEMU's compliance with the
>>>>> PCI specs, then it's likely a good patch. (And apparently more general
>>>>> than the SHPC-specific solution we have now.)
>>>>
>>>>
>>>> Exactly! Why should a pci write to the command register *delete*
>>>> previously set resources? I am looking at it as a bug.
>>>>
>>>> And also updating the mappings while the Decoding bit is not enables
>>>> is at least not necessary.
>>>>
>>>>>
>>>>> I just don't know if it's a good idea to leave any old mappings active
>>>>> while the BARs are being reprogrammed (with the command bits clear).
>>>>>
>>>>
>>>> First, because the OS can't use the IO/MEM anyway, secondly the guest
>>>> OS/firmware
>>>> is the one that disabled the bit... (in order to program resources)
>>>
>>> I have something like the following in mind. Do you think it is a valid
>>> (although contrived) use case?
>>>
>>> - guest programs some BAR and uses it (as designed / intended)
>>> - guest disables command bit, modifies BAR location
>>> - guest accesses *old* BAR location
>>>
>>> What should a guest *expect* in such a case? Is this invalid guest
>>> behavior?
>>
>> Yes, this is indeed invalid behavior, from the device point of view
>> it is disabled. Best case scenario - the guest will see 0xffffffff,
>> worst case - garbage.
>>
>>>
>>> If it is not invalid, then will QEMU comply with the guest's
>>> expectations if your patch is applied? Pre-patch, the guest would likely
>>> access a "hole" in the host bridge MMIO aperture, whereas with your
>>> patch (I guess?) it still might access the device through the old (still
>>> active) BAR?
>>>
>>
>> Since the IO is disabled, pci_bar_address will return PCI_BAR_UNMAPPED
>> and *no updates will be made* pre or post this patch. It will behave the
>> same
>> from the guest point of view. It will still access the memory region
>> of the device.
>>
>>
>>> Or would QEMU prevent that just by virtue of the command bit being clear?
>>
>> Again, this patch only changes the behavior in a specific case:
>> when the device is disabled and the guest writes to the command register
>> without
>> enabling IO/MEM.
>
> Ah, right! That's exactly what the edk2 PCI bus driver / enumerator
> does. Massages the command register without touching those two bits, and...
>
>>
>> Pre-patch -> the old BAR addresses are overridden with 0xffffffff
>> (I really think this is a bug, nobody asked for this!!)
>
> the previously programmed BAR address gets lost. It's great that you
> have the PCI knowledge to state that this is actually a bug! It had
> looked fishy to me as well, but I couldn't make the same argument.
>
>> Post-Patch -> the old BAR values are returned to the prev state ->
>> correct behavior IMHO.
>
> I agree.
>
>> Please continue to ask questions until we get to the bottom of it. :)
>
> Okay, I think I can now try to review this patch. See below.
>
>>
>> Thanks,
>> Marcel
>>
>>>
>>> Thanks
>>> Laszlo
>>>
>>>>> In other words, what guarantees that this change will not regress
>>>>> anything? (I'm not doubting -- I'm asking; I honestly don't know.)
>>>>>
>>>>> So I guess I'll defer to Michael on this one.
>>>>
>>>> Michael, do you agree with the above?
>>>>
>>>>>
>>>>> In any case, I fully agree with your analysis of OVMF's behavior.
>>>>
>>>> Thanks! I looked for this bug in OVMF for some time now :)
>>>> Marcel
>>>>
>>>>>
>>>>> Thanks!
>>>>> Laszlo
>>>>>
>>>>>> Thanks,
>>>>>> Marcel
>>>>>>
>>>>>> hw/pci/pci.c | 17 +++++++++++++++++
>>>>>> 1 file changed, 17 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>> index 168b9cc..f9127dc 100644
>>>>>> --- a/hw/pci/pci.c
>>>>>> +++ b/hw/pci/pci.c
>>>>>> @@ -1148,6 +1148,7 @@ static void pci_update_mappings(PCIDevice *d)
>>>>>> PCIIORegion *r;
>>>>>> int i;
>>>>>> pcibus_t new_addr;
>>>>>> + uint16_t cmd = pci_get_word(d->config + PCI_COMMAND);
>>>>>>
>>>>>> for(i = 0; i < PCI_NUM_REGIONS; i++) {
>>>>>> r = &d->io_regions[i];
>>>>>> @@ -1156,6 +1157,22 @@ static void pci_update_mappings(PCIDevice *d)
>>>>>> if (!r->size)
>>>>>> continue;
>>>>>>
>>>>>> + /*
>>>>>> + * Do not update the mappings until the command register's
>>>>>> + * Decode (I/O or memory) bit is not set. Two reasons:
>
> I propose the following wording change (for noob's like myself :))
>
> Do not update this BAR's mapping if the command register's decode
> bit (I/O or memory, matching the BAR's type) is clear. Two
> reasons: ...
>
> Spelling out "this BAR's mapping" is clearer to me -- we're looping over
> the BAR's. (The end result is the same of course.)
>
>>>>>> + * - PCI Spec indicates that while the bit is not set
>>>>>> + * the memory sizing is not finished.
>
> I propose "BAR sizing".
>
>>>>>> + * - pci_bar_address will return PCI_BAR_UNMAPPED
>
> I propose pci_bar_address() -- i.e., parens.
>
>>>>>> + * and a previous value can be accidentally overridden
>
> I recommend "may be unintentionally" over "can be accidentally".
>
>>>>>> + * if the command register is modified (and not the BAR).
>>>>>> + * */
>
> The last line should simply terminate the comment block -- runaway
> asterisk I think.
>
>>>>>> + if (((r->type & PCI_BASE_ADDRESS_SPACE_IO) &&
>>>>>> + !(cmd & PCI_COMMAND_IO)) ||
>>>>>> + ((r->type != PCI_BASE_ADDRESS_SPACE_IO) &&
>>>>>> + !(cmd & PCI_COMMAND_MEMORY))) {
>>>>>> + continue;
>>>>>> + }
>>>>>> +
>
> It might be equivalent, but in the second part, I'd feel better about
>
> !(r->type & PCI_BASE_ADDRESS_SPACE_IO)
>
> than
>
> (r->type != PCI_BASE_ADDRESS_SPACE_IO)
>
> Or even better:
>
> uint16_t decode_bit = (r->type & PCI_BASE_ADDRESS_SPACE_IO) ?
> PCI_COMMAND_IO :
> PCI_COMMAND_MEMORY;
> if (!(cmd & decode_bit)) {
> continue;
> }
>
>
> ... The placement of the "continue" statement looks good.
>
> Just some thoughts; I won't complain if the patch is committed as-is. :)
Thank you for all the points, I'll address all your comments on the next version.
Thanks again,
Marcel
>
> Thanks
> Laszlo
>
>>>>>> new_addr = pci_bar_address(d, i, r->type, r->size);
>>>>>>
>>>>>> /* This bar isn't changed */
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
next prev parent reply other threads:[~2016-01-11 18:58 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-11 12:24 [Qemu-devel] [PATCH] hw/pci: do not update the PCI mappings while Decode (I/O or memory) bit is not set in the Command register Marcel Apfelbaum
2016-01-11 14:07 ` Igor Mammedov
2016-01-11 15:10 ` Marcel Apfelbaum
2016-01-11 16:11 ` Laszlo Ersek
2016-01-11 16:34 ` Marcel Apfelbaum
2016-01-11 17:15 ` Laszlo Ersek
2016-01-11 18:01 ` Marcel Apfelbaum
2016-01-11 18:44 ` Laszlo Ersek
2016-01-11 18:57 ` Marcel Apfelbaum [this message]
2016-01-14 12:24 ` Marcel Apfelbaum
2016-01-14 14:30 ` Laszlo Ersek
2016-01-14 14:49 ` Michael S. Tsirkin
2016-01-14 15:23 ` Marcel Apfelbaum
2016-01-14 15:37 ` Michael S. Tsirkin
2016-01-14 17:20 ` Marcel Apfelbaum
2016-01-14 17:28 ` Michael S. Tsirkin
2016-01-14 18:25 ` Marcel Apfelbaum
2016-01-14 15:14 ` Marcel Apfelbaum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5693FB2F.2080403@redhat.com \
--to=marcel@redhat.com \
--cc=lersek@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.