From: Marcel Apfelbaum <marcel@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>, qemu-devel@nongnu.org
Cc: mst@redhat.com
Subject: Re: [Qemu-devel] [PATCH] hw/pci: do not update the PCI mappings while Decode (I/O or memory) bit is not set in the Command register
Date: Mon, 11 Jan 2016 20:57:51 +0200 [thread overview]
Message-ID: <5693FB2F.2080403@redhat.com> (raw)
In-Reply-To: <5693F802.2010304@redhat.com>
On 01/11/2016 08:44 PM, Laszlo Ersek wrote:
> On 01/11/16 19:01, Marcel Apfelbaum wrote:
>> On 01/11/2016 07:15 PM, Laszlo Ersek wrote:
>>> On 01/11/16 17:34, Marcel Apfelbaum wrote:
>>>> On 01/11/2016 06:11 PM, Laszlo Ersek wrote:
>>>>> On 01/11/16 13:24, Marcel Apfelbaum wrote:
>>>>>> Two reasons:
>>>>>> - PCI Spec indicates that while the bit is not set
>>>>>> the memory sizing is not finished.
>>>>>> - pci_bar_address will return PCI_BAR_UNMAPPED
>>>>>> and a previous value can be accidentally overridden
>>>>>> if the command register is modified (and not the BAR).
>>>>>>
>>>>>> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
>>>>>> ---
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I found this when trying to use multiple root complexes with OVMF.
>>>>>>
>>>>>> When trying to attach a device to the pxb-pcie device as Integrated
>>>>>> Device it did not receive the IO/MEM resources.
>>>>>>
>>>>>> The reason is that OVMF is working like that:
>>>>>> 1. It disables the Decode (I/O or memory) bit in the Command
>>>>>> register
>>>>>> 2. It configures the device BARS
>>>>>> 3. Makes some tests on the Command register
>>>>>> 4. ...
>>>>>> 5. Enables the Decode (I/O or memory) at some point.
>>>>>>
>>>>>> On step 3 all the BARS are overridden to 0xffffffff by QEMU.
>>>>>>
>>>>>> Since QEMU uses the device BARs to compute the new host bridge
>>>>>> resources
>>>>>> it now gets garbage.
>>>>>>
>>>>>> Laszlo, this also solves the SHPC problem for the pci-2-pci bridge
>>>>>> inside the pxb.
>>>>>> Now we can enable the SHPC for it too.
>>>>>
>>>>> I encountered the exact same problem months ago. I posted patches for
>>>>> it; you were CC'd. :)
>>>>>
>>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342209
>>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342210
>>>>>
>>>>> As you can see under the second link above, I made the same analysis &
>>>>> observations as you do now. (It took me quite long to track down the
>>>>> "inexplicable" behavior of edk2's generic PCI bus driver / enumerator
>>>>> that is built into OVMF.)
>>>>
>>>> Wow, I just re-worked this issue again from 0! I wish I have remembered
>>>> those threads :(
>>>> This was another symptom of the exact problem! And I remembered
>>>> something about
>>>> SHPC, I should have looked at those mail threads again...
>>>>
>>>>>
>>>>> I proposed to change pci_bar_address() so that it could return, to
>>>>> distinguished callers, the BAR values "under programming", even if the
>>>>> command bits were clear. Then the ACPI generator would utilize this
>>>>> special exception.
>>>>>
>>>>> Michael disagreed; in
>>>>>
>>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342242
>>>>>
>>>>> he wrote "[t]his is problematic - disabled BAR values have no meaning
>>>>> according to the PCI spec".
>>>>>
>>>>
>>>> Yes... because it looked like a hook for our case only,
>>>> the good news is that this patch is based exactly on the fact that
>>>> the BARs have no meaning if the bit is not set.
>>>>
>>>>> The current solution to the problem (= we disable the SHPC) was
>>>>> recommended by Michael in that message: "It might be best to add a
>>>>> property to just disable shpc in the bridge so no devices reside
>>>>> directly behind the pxb?"
>>>>>
>>>>
>>>> I confess I don't exactly understand what the SHPC of the pci-2-pci
>>>> bridge
>>>> has to do with sibling devices on the pxb's root bus (SHPC is the
>>>> hot-plug controller
>>>> for the devices behind the pci-2-pci bridge).
>>>>
>>>> The second part I do understand, the pxb design was to not have devices
>>>> directly behind
>>>> the pxb, so maybe he meant that SHPC is the part of the pci-bridge that
>>>> behaves like
>>>> a device in the sense it requires IO/MEM resources.
>>>>
>>>> Bottom line, your solution for the PXB was just fine :)
>>>>
>>>>
>>>>> In comparison, your patch doesn't change pci_bar_address(). Instead, it
>>>>> modifies pci_update_mappings() *not to call* pci_bar_address(), if the
>>>>> respective command bits are clear.
>>>>>
>>>>> I guess that could have about the same effect.
>>>>>
>>>>> If, unlike my patch, yours actually improves QEMU's compliance with the
>>>>> PCI specs, then it's likely a good patch. (And apparently more general
>>>>> than the SHPC-specific solution we have now.)
>>>>
>>>>
>>>> Exactly! Why should a pci write to the command register *delete*
>>>> previously set resources? I am looking at it as a bug.
>>>>
>>>> And also updating the mappings while the Decoding bit is not enables
>>>> is at least not necessary.
>>>>
>>>>>
>>>>> I just don't know if it's a good idea to leave any old mappings active
>>>>> while the BARs are being reprogrammed (with the command bits clear).
>>>>>
>>>>
>>>> First, because the OS can't use the IO/MEM anyway, secondly the guest
>>>> OS/firmware
>>>> is the one that disabled the bit... (in order to program resources)
>>>
>>> I have something like the following in mind. Do you think it is a valid
>>> (although contrived) use case?
>>>
>>> - guest programs some BAR and uses it (as designed / intended)
>>> - guest disables command bit, modifies BAR location
>>> - guest accesses *old* BAR location
>>>
>>> What should a guest *expect* in such a case? Is this invalid guest
>>> behavior?
>>
>> Yes, this is indeed invalid behavior, from the device point of view
>> it is disabled. Best case scenario - the guest will see 0xffffffff,
>> worst case - garbage.
>>
>>>
>>> If it is not invalid, then will QEMU comply with the guest's
>>> expectations if your patch is applied? Pre-patch, the guest would likely
>>> access a "hole" in the host bridge MMIO aperture, whereas with your
>>> patch (I guess?) it still might access the device through the old (still
>>> active) BAR?
>>>
>>
>> Since the IO is disabled, pci_bar_address will return PCI_BAR_UNMAPPED
>> and *no updates will be made* pre or post this patch. It will behave the
>> same
>> from the guest point of view. It will still access the memory region
>> of the device.
>>
>>
>>> Or would QEMU prevent that just by virtue of the command bit being clear?
>>
>> Again, this patch only changes the behavior in a specific case:
>> when the device is disabled and the guest writes to the command register
>> without
>> enabling IO/MEM.
>
> Ah, right! That's exactly what the edk2 PCI bus driver / enumerator
> does. Massages the command register without touching those two bits, and...
>
>>
>> Pre-patch -> the old BAR addresses are overridden with 0xffffffff
>> (I really think this is a bug, nobody asked for this!!)
>
> the previously programmed BAR address gets lost. It's great that you
> have the PCI knowledge to state that this is actually a bug! It had
> looked fishy to me as well, but I couldn't make the same argument.
>
>> Post-Patch -> the old BAR values are returned to the prev state ->
>> correct behavior IMHO.
>
> I agree.
>
>> Please continue to ask questions until we get to the bottom of it. :)
>
> Okay, I think I can now try to review this patch. See below.
>
>>
>> Thanks,
>> Marcel
>>
>>>
>>> Thanks
>>> Laszlo
>>>
>>>>> In other words, what guarantees that this change will not regress
>>>>> anything? (I'm not doubting -- I'm asking; I honestly don't know.)
>>>>>
>>>>> So I guess I'll defer to Michael on this one.
>>>>
>>>> Michael, do you agree with the above?
>>>>
>>>>>
>>>>> In any case, I fully agree with your analysis of OVMF's behavior.
>>>>
>>>> Thanks! I looked for this bug in OVMF for some time now :)
>>>> Marcel
>>>>
>>>>>
>>>>> Thanks!
>>>>> Laszlo
>>>>>
>>>>>> Thanks,
>>>>>> Marcel
>>>>>>
>>>>>> hw/pci/pci.c | 17 +++++++++++++++++
>>>>>> 1 file changed, 17 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>> index 168b9cc..f9127dc 100644
>>>>>> --- a/hw/pci/pci.c
>>>>>> +++ b/hw/pci/pci.c
>>>>>> @@ -1148,6 +1148,7 @@ static void pci_update_mappings(PCIDevice *d)
>>>>>> PCIIORegion *r;
>>>>>> int i;
>>>>>> pcibus_t new_addr;
>>>>>> + uint16_t cmd = pci_get_word(d->config + PCI_COMMAND);
>>>>>>
>>>>>> for(i = 0; i < PCI_NUM_REGIONS; i++) {
>>>>>> r = &d->io_regions[i];
>>>>>> @@ -1156,6 +1157,22 @@ static void pci_update_mappings(PCIDevice *d)
>>>>>> if (!r->size)
>>>>>> continue;
>>>>>>
>>>>>> + /*
>>>>>> + * Do not update the mappings until the command register's
>>>>>> + * Decode (I/O or memory) bit is not set. Two reasons:
>
> I propose the following wording change (for noob's like myself :))
>
> Do not update this BAR's mapping if the command register's decode
> bit (I/O or memory, matching the BAR's type) is clear. Two
> reasons: ...
>
> Spelling out "this BAR's mapping" is clearer to me -- we're looping over
> the BAR's. (The end result is the same of course.)
>
>>>>>> + * - PCI Spec indicates that while the bit is not set
>>>>>> + * the memory sizing is not finished.
>
> I propose "BAR sizing".
>
>>>>>> + * - pci_bar_address will return PCI_BAR_UNMAPPED
>
> I propose pci_bar_address() -- i.e., parens.
>
>>>>>> + * and a previous value can be accidentally overridden
>
> I recommend "may be unintentionally" over "can be accidentally".
>
>>>>>> + * if the command register is modified (and not the BAR).
>>>>>> + * */
>
> The last line should simply terminate the comment block -- runaway
> asterisk I think.
>
>>>>>> + if (((r->type & PCI_BASE_ADDRESS_SPACE_IO) &&
>>>>>> + !(cmd & PCI_COMMAND_IO)) ||
>>>>>> + ((r->type != PCI_BASE_ADDRESS_SPACE_IO) &&
>>>>>> + !(cmd & PCI_COMMAND_MEMORY))) {
>>>>>> + continue;
>>>>>> + }
>>>>>> +
>
> It might be equivalent, but in the second part, I'd feel better about
>
> !(r->type & PCI_BASE_ADDRESS_SPACE_IO)
>
> than
>
> (r->type != PCI_BASE_ADDRESS_SPACE_IO)
>
> Or even better:
>
> uint16_t decode_bit = (r->type & PCI_BASE_ADDRESS_SPACE_IO) ?
> PCI_COMMAND_IO :
> PCI_COMMAND_MEMORY;
> if (!(cmd & decode_bit)) {
> continue;
> }
>
>
> ... The placement of the "continue" statement looks good.
>
> Just some thoughts; I won't complain if the patch is committed as-is. :)
Thank you for all the points, I'll address all your comments on the next version.
Thanks again,
Marcel
>
> Thanks
> Laszlo
>
>>>>>> new_addr = pci_bar_address(d, i, r->type, r->size);
>>>>>>
>>>>>> /* This bar isn't changed */
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
next prev parent reply other threads:[~2016-01-11 18:58 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-11 12:24 [Qemu-devel] [PATCH] hw/pci: do not update the PCI mappings while Decode (I/O or memory) bit is not set in the Command register Marcel Apfelbaum
2016-01-11 14:07 ` Igor Mammedov
2016-01-11 15:10 ` Marcel Apfelbaum
2016-01-11 16:11 ` Laszlo Ersek
2016-01-11 16:34 ` Marcel Apfelbaum
2016-01-11 17:15 ` Laszlo Ersek
2016-01-11 18:01 ` Marcel Apfelbaum
2016-01-11 18:44 ` Laszlo Ersek
2016-01-11 18:57 ` Marcel Apfelbaum [this message]
2016-01-14 12:24 ` Marcel Apfelbaum
2016-01-14 14:30 ` Laszlo Ersek
2016-01-14 14:49 ` Michael S. Tsirkin
2016-01-14 15:23 ` Marcel Apfelbaum
2016-01-14 15:37 ` Michael S. Tsirkin
2016-01-14 17:20 ` Marcel Apfelbaum
2016-01-14 17:28 ` Michael S. Tsirkin
2016-01-14 18:25 ` Marcel Apfelbaum
2016-01-14 15:14 ` Marcel Apfelbaum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5693FB2F.2080403@redhat.com \
--to=marcel@redhat.com \
--cc=lersek@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).