From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56002) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aIhfH-0002Mk-AF for qemu-devel@nongnu.org; Mon, 11 Jan 2016 13:58:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aIhfD-0003ZO-TQ for qemu-devel@nongnu.org; Mon, 11 Jan 2016 13:58:03 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54963) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aIhfD-0003ZH-JT for qemu-devel@nongnu.org; Mon, 11 Jan 2016 13:57:59 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (Postfix) with ESMTPS id 1F7958E370 for ; Mon, 11 Jan 2016 18:57:59 +0000 (UTC) References: <1452515063-5615-1-git-send-email-marcel@redhat.com> <5693D447.8070000@redhat.com> <5693D999.2030504@gmail.com> <5693E349.4090201@redhat.com> <5693EE0B.7030002@redhat.com> <5693F802.2010304@redhat.com> From: Marcel Apfelbaum Message-ID: <5693FB2F.2080403@redhat.com> Date: Mon, 11 Jan 2016 20:57:51 +0200 MIME-Version: 1.0 In-Reply-To: <5693F802.2010304@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] hw/pci: do not update the PCI mappings while Decode (I/O or memory) bit is not set in the Command register List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laszlo Ersek , qemu-devel@nongnu.org Cc: mst@redhat.com On 01/11/2016 08:44 PM, Laszlo Ersek wrote: > On 01/11/16 19:01, Marcel Apfelbaum wrote: >> On 01/11/2016 07:15 PM, Laszlo Ersek wrote: >>> On 01/11/16 17:34, Marcel Apfelbaum wrote: >>>> On 01/11/2016 06:11 PM, Laszlo Ersek wrote: >>>>> On 01/11/16 13:24, Marcel Apfelbaum wrote: >>>>>> Two reasons: >>>>>> - PCI Spec indicates that while the bit is not set >>>>>> the memory sizing is not finished. >>>>>> - pci_bar_address will return PCI_BAR_UNMAPPED >>>>>> and a previous value can be accidentally overridden >>>>>> if the command register is modified (and not the BAR). >>>>>> >>>>>> Signed-off-by: Marcel Apfelbaum >>>>>> --- >>>>>> >>>>>> Hi, >>>>>> >>>>>> I found this when trying to use multiple root complexes with OVMF. >>>>>> >>>>>> When trying to attach a device to the pxb-pcie device as Integrated >>>>>> Device it did not receive the IO/MEM resources. >>>>>> >>>>>> The reason is that OVMF is working like that: >>>>>> 1. It disables the Decode (I/O or memory) bit in the Command >>>>>> register >>>>>> 2. It configures the device BARS >>>>>> 3. Makes some tests on the Command register >>>>>> 4. ... >>>>>> 5. Enables the Decode (I/O or memory) at some point. >>>>>> >>>>>> On step 3 all the BARS are overridden to 0xffffffff by QEMU. >>>>>> >>>>>> Since QEMU uses the device BARs to compute the new host bridge >>>>>> resources >>>>>> it now gets garbage. >>>>>> >>>>>> Laszlo, this also solves the SHPC problem for the pci-2-pci bridge >>>>>> inside the pxb. >>>>>> Now we can enable the SHPC for it too. >>>>> >>>>> I encountered the exact same problem months ago. I posted patches for >>>>> it; you were CC'd. :) >>>>> >>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342209 >>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342210 >>>>> >>>>> As you can see under the second link above, I made the same analysis & >>>>> observations as you do now. (It took me quite long to track down the >>>>> "inexplicable" behavior of edk2's generic PCI bus driver / enumerator >>>>> that is built into OVMF.) >>>> >>>> Wow, I just re-worked this issue again from 0! I wish I have remembered >>>> those threads :( >>>> This was another symptom of the exact problem! And I remembered >>>> something about >>>> SHPC, I should have looked at those mail threads again... >>>> >>>>> >>>>> I proposed to change pci_bar_address() so that it could return, to >>>>> distinguished callers, the BAR values "under programming", even if the >>>>> command bits were clear. Then the ACPI generator would utilize this >>>>> special exception. >>>>> >>>>> Michael disagreed; in >>>>> >>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342242 >>>>> >>>>> he wrote "[t]his is problematic - disabled BAR values have no meaning >>>>> according to the PCI spec". >>>>> >>>> >>>> Yes... because it looked like a hook for our case only, >>>> the good news is that this patch is based exactly on the fact that >>>> the BARs have no meaning if the bit is not set. >>>> >>>>> The current solution to the problem (= we disable the SHPC) was >>>>> recommended by Michael in that message: "It might be best to add a >>>>> property to just disable shpc in the bridge so no devices reside >>>>> directly behind the pxb?" >>>>> >>>> >>>> I confess I don't exactly understand what the SHPC of the pci-2-pci >>>> bridge >>>> has to do with sibling devices on the pxb's root bus (SHPC is the >>>> hot-plug controller >>>> for the devices behind the pci-2-pci bridge). >>>> >>>> The second part I do understand, the pxb design was to not have devices >>>> directly behind >>>> the pxb, so maybe he meant that SHPC is the part of the pci-bridge that >>>> behaves like >>>> a device in the sense it requires IO/MEM resources. >>>> >>>> Bottom line, your solution for the PXB was just fine :) >>>> >>>> >>>>> In comparison, your patch doesn't change pci_bar_address(). Instead, it >>>>> modifies pci_update_mappings() *not to call* pci_bar_address(), if the >>>>> respective command bits are clear. >>>>> >>>>> I guess that could have about the same effect. >>>>> >>>>> If, unlike my patch, yours actually improves QEMU's compliance with the >>>>> PCI specs, then it's likely a good patch. (And apparently more general >>>>> than the SHPC-specific solution we have now.) >>>> >>>> >>>> Exactly! Why should a pci write to the command register *delete* >>>> previously set resources? I am looking at it as a bug. >>>> >>>> And also updating the mappings while the Decoding bit is not enables >>>> is at least not necessary. >>>> >>>>> >>>>> I just don't know if it's a good idea to leave any old mappings active >>>>> while the BARs are being reprogrammed (with the command bits clear). >>>>> >>>> >>>> First, because the OS can't use the IO/MEM anyway, secondly the guest >>>> OS/firmware >>>> is the one that disabled the bit... (in order to program resources) >>> >>> I have something like the following in mind. Do you think it is a valid >>> (although contrived) use case? >>> >>> - guest programs some BAR and uses it (as designed / intended) >>> - guest disables command bit, modifies BAR location >>> - guest accesses *old* BAR location >>> >>> What should a guest *expect* in such a case? Is this invalid guest >>> behavior? >> >> Yes, this is indeed invalid behavior, from the device point of view >> it is disabled. Best case scenario - the guest will see 0xffffffff, >> worst case - garbage. >> >>> >>> If it is not invalid, then will QEMU comply with the guest's >>> expectations if your patch is applied? Pre-patch, the guest would likely >>> access a "hole" in the host bridge MMIO aperture, whereas with your >>> patch (I guess?) it still might access the device through the old (still >>> active) BAR? >>> >> >> Since the IO is disabled, pci_bar_address will return PCI_BAR_UNMAPPED >> and *no updates will be made* pre or post this patch. It will behave the >> same >> from the guest point of view. It will still access the memory region >> of the device. >> >> >>> Or would QEMU prevent that just by virtue of the command bit being clear? >> >> Again, this patch only changes the behavior in a specific case: >> when the device is disabled and the guest writes to the command register >> without >> enabling IO/MEM. > > Ah, right! That's exactly what the edk2 PCI bus driver / enumerator > does. Massages the command register without touching those two bits, and... > >> >> Pre-patch -> the old BAR addresses are overridden with 0xffffffff >> (I really think this is a bug, nobody asked for this!!) > > the previously programmed BAR address gets lost. It's great that you > have the PCI knowledge to state that this is actually a bug! It had > looked fishy to me as well, but I couldn't make the same argument. > >> Post-Patch -> the old BAR values are returned to the prev state -> >> correct behavior IMHO. > > I agree. > >> Please continue to ask questions until we get to the bottom of it. :) > > Okay, I think I can now try to review this patch. See below. > >> >> Thanks, >> Marcel >> >>> >>> Thanks >>> Laszlo >>> >>>>> In other words, what guarantees that this change will not regress >>>>> anything? (I'm not doubting -- I'm asking; I honestly don't know.) >>>>> >>>>> So I guess I'll defer to Michael on this one. >>>> >>>> Michael, do you agree with the above? >>>> >>>>> >>>>> In any case, I fully agree with your analysis of OVMF's behavior. >>>> >>>> Thanks! I looked for this bug in OVMF for some time now :) >>>> Marcel >>>> >>>>> >>>>> Thanks! >>>>> Laszlo >>>>> >>>>>> Thanks, >>>>>> Marcel >>>>>> >>>>>> hw/pci/pci.c | 17 +++++++++++++++++ >>>>>> 1 file changed, 17 insertions(+) >>>>>> >>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c >>>>>> index 168b9cc..f9127dc 100644 >>>>>> --- a/hw/pci/pci.c >>>>>> +++ b/hw/pci/pci.c >>>>>> @@ -1148,6 +1148,7 @@ static void pci_update_mappings(PCIDevice *d) >>>>>> PCIIORegion *r; >>>>>> int i; >>>>>> pcibus_t new_addr; >>>>>> + uint16_t cmd = pci_get_word(d->config + PCI_COMMAND); >>>>>> >>>>>> for(i = 0; i < PCI_NUM_REGIONS; i++) { >>>>>> r = &d->io_regions[i]; >>>>>> @@ -1156,6 +1157,22 @@ static void pci_update_mappings(PCIDevice *d) >>>>>> if (!r->size) >>>>>> continue; >>>>>> >>>>>> + /* >>>>>> + * Do not update the mappings until the command register's >>>>>> + * Decode (I/O or memory) bit is not set. Two reasons: > > I propose the following wording change (for noob's like myself :)) > > Do not update this BAR's mapping if the command register's decode > bit (I/O or memory, matching the BAR's type) is clear. Two > reasons: ... > > Spelling out "this BAR's mapping" is clearer to me -- we're looping over > the BAR's. (The end result is the same of course.) > >>>>>> + * - PCI Spec indicates that while the bit is not set >>>>>> + * the memory sizing is not finished. > > I propose "BAR sizing". > >>>>>> + * - pci_bar_address will return PCI_BAR_UNMAPPED > > I propose pci_bar_address() -- i.e., parens. > >>>>>> + * and a previous value can be accidentally overridden > > I recommend "may be unintentionally" over "can be accidentally". > >>>>>> + * if the command register is modified (and not the BAR). >>>>>> + * */ > > The last line should simply terminate the comment block -- runaway > asterisk I think. > >>>>>> + if (((r->type & PCI_BASE_ADDRESS_SPACE_IO) && >>>>>> + !(cmd & PCI_COMMAND_IO)) || >>>>>> + ((r->type != PCI_BASE_ADDRESS_SPACE_IO) && >>>>>> + !(cmd & PCI_COMMAND_MEMORY))) { >>>>>> + continue; >>>>>> + } >>>>>> + > > It might be equivalent, but in the second part, I'd feel better about > > !(r->type & PCI_BASE_ADDRESS_SPACE_IO) > > than > > (r->type != PCI_BASE_ADDRESS_SPACE_IO) > > Or even better: > > uint16_t decode_bit = (r->type & PCI_BASE_ADDRESS_SPACE_IO) ? > PCI_COMMAND_IO : > PCI_COMMAND_MEMORY; > if (!(cmd & decode_bit)) { > continue; > } > > > ... The placement of the "continue" statement looks good. > > Just some thoughts; I won't complain if the patch is committed as-is. :) Thank you for all the points, I'll address all your comments on the next version. Thanks again, Marcel > > Thanks > Laszlo > >>>>>> new_addr = pci_bar_address(d, i, r->type, r->size); >>>>>> >>>>>> /* This bar isn't changed */ >>>>>> >>>>> >>>>> >>>> >>> >> >