From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40169) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aIgmx-0001EX-TN for qemu-devel@nongnu.org; Mon, 11 Jan 2016 13:01:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aIgmu-0005t6-Ld for qemu-devel@nongnu.org; Mon, 11 Jan 2016 13:01:55 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45813) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aIgmu-0005sz-D9 for qemu-devel@nongnu.org; Mon, 11 Jan 2016 13:01:52 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (Postfix) with ESMTPS id 971283F3B9 for ; Mon, 11 Jan 2016 18:01:51 +0000 (UTC) References: <1452515063-5615-1-git-send-email-marcel@redhat.com> <5693D447.8070000@redhat.com> <5693D999.2030504@gmail.com> <5693E349.4090201@redhat.com> From: Marcel Apfelbaum Message-ID: <5693EE0B.7030002@redhat.com> Date: Mon, 11 Jan 2016 20:01:47 +0200 MIME-Version: 1.0 In-Reply-To: <5693E349.4090201@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] hw/pci: do not update the PCI mappings while Decode (I/O or memory) bit is not set in the Command register List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laszlo Ersek , qemu-devel@nongnu.org Cc: mst@redhat.com On 01/11/2016 07:15 PM, Laszlo Ersek wrote: > On 01/11/16 17:34, Marcel Apfelbaum wrote: >> On 01/11/2016 06:11 PM, Laszlo Ersek wrote: >>> On 01/11/16 13:24, Marcel Apfelbaum wrote: >>>> Two reasons: >>>> - PCI Spec indicates that while the bit is not set >>>> the memory sizing is not finished. >>>> - pci_bar_address will return PCI_BAR_UNMAPPED >>>> and a previous value can be accidentally overridden >>>> if the command register is modified (and not the BAR). >>>> >>>> Signed-off-by: Marcel Apfelbaum >>>> --- >>>> >>>> Hi, >>>> >>>> I found this when trying to use multiple root complexes with OVMF. >>>> >>>> When trying to attach a device to the pxb-pcie device as Integrated >>>> Device it did not receive the IO/MEM resources. >>>> >>>> The reason is that OVMF is working like that: >>>> 1. It disables the Decode (I/O or memory) bit in the Command register >>>> 2. It configures the device BARS >>>> 3. Makes some tests on the Command register >>>> 4. ... >>>> 5. Enables the Decode (I/O or memory) at some point. >>>> >>>> On step 3 all the BARS are overridden to 0xffffffff by QEMU. >>>> >>>> Since QEMU uses the device BARs to compute the new host bridge resources >>>> it now gets garbage. >>>> >>>> Laszlo, this also solves the SHPC problem for the pci-2-pci bridge >>>> inside the pxb. >>>> Now we can enable the SHPC for it too. >>> >>> I encountered the exact same problem months ago. I posted patches for >>> it; you were CC'd. :) >>> >>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342209 >>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342210 >>> >>> As you can see under the second link above, I made the same analysis & >>> observations as you do now. (It took me quite long to track down the >>> "inexplicable" behavior of edk2's generic PCI bus driver / enumerator >>> that is built into OVMF.) >> >> Wow, I just re-worked this issue again from 0! I wish I have remembered >> those threads :( >> This was another symptom of the exact problem! And I remembered >> something about >> SHPC, I should have looked at those mail threads again... >> >>> >>> I proposed to change pci_bar_address() so that it could return, to >>> distinguished callers, the BAR values "under programming", even if the >>> command bits were clear. Then the ACPI generator would utilize this >>> special exception. >>> >>> Michael disagreed; in >>> >>> http://thread.gmane.org/gmane.comp.emulators.qemu/342206/focus=342242 >>> >>> he wrote "[t]his is problematic - disabled BAR values have no meaning >>> according to the PCI spec". >>> >> >> Yes... because it looked like a hook for our case only, >> the good news is that this patch is based exactly on the fact that >> the BARs have no meaning if the bit is not set. >> >>> The current solution to the problem (= we disable the SHPC) was >>> recommended by Michael in that message: "It might be best to add a >>> property to just disable shpc in the bridge so no devices reside >>> directly behind the pxb?" >>> >> >> I confess I don't exactly understand what the SHPC of the pci-2-pci bridge >> has to do with sibling devices on the pxb's root bus (SHPC is the >> hot-plug controller >> for the devices behind the pci-2-pci bridge). >> >> The second part I do understand, the pxb design was to not have devices >> directly behind >> the pxb, so maybe he meant that SHPC is the part of the pci-bridge that >> behaves like >> a device in the sense it requires IO/MEM resources. >> >> Bottom line, your solution for the PXB was just fine :) >> >> >>> In comparison, your patch doesn't change pci_bar_address(). Instead, it >>> modifies pci_update_mappings() *not to call* pci_bar_address(), if the >>> respective command bits are clear. >>> >>> I guess that could have about the same effect. >>> >>> If, unlike my patch, yours actually improves QEMU's compliance with the >>> PCI specs, then it's likely a good patch. (And apparently more general >>> than the SHPC-specific solution we have now.) >> >> >> Exactly! Why should a pci write to the command register *delete* >> previously set resources? I am looking at it as a bug. >> >> And also updating the mappings while the Decoding bit is not enables >> is at least not necessary. >> >>> >>> I just don't know if it's a good idea to leave any old mappings active >>> while the BARs are being reprogrammed (with the command bits clear). >>> >> >> First, because the OS can't use the IO/MEM anyway, secondly the guest >> OS/firmware >> is the one that disabled the bit... (in order to program resources) > > I have something like the following in mind. Do you think it is a valid > (although contrived) use case? > > - guest programs some BAR and uses it (as designed / intended) > - guest disables command bit, modifies BAR location > - guest accesses *old* BAR location > > What should a guest *expect* in such a case? Is this invalid guest behavior? Yes, this is indeed invalid behavior, from the device point of view it is disabled. Best case scenario - the guest will see 0xffffffff, worst case - garbage. > > If it is not invalid, then will QEMU comply with the guest's > expectations if your patch is applied? Pre-patch, the guest would likely > access a "hole" in the host bridge MMIO aperture, whereas with your > patch (I guess?) it still might access the device through the old (still > active) BAR? > Since the IO is disabled, pci_bar_address will return PCI_BAR_UNMAPPED and *no updates will be made* pre or post this patch. It will behave the same from the guest point of view. It will still access the memory region of the device. > Or would QEMU prevent that just by virtue of the command bit being clear? Again, this patch only changes the behavior in a specific case: when the device is disabled and the guest writes to the command register without enabling IO/MEM. Pre-patch -> the old BAR addresses are overridden with 0xffffffff (I really think this is a bug, nobody asked for this!!) Post-Patch -> the old BAR values are returned to the prev state -> correct behavior IMHO. Please continue to ask questions until we get to the bottom of it. :) Thanks, Marcel > > Thanks > Laszlo > >>> In other words, what guarantees that this change will not regress >>> anything? (I'm not doubting -- I'm asking; I honestly don't know.) >>> >>> So I guess I'll defer to Michael on this one. >> >> Michael, do you agree with the above? >> >>> >>> In any case, I fully agree with your analysis of OVMF's behavior. >> >> Thanks! I looked for this bug in OVMF for some time now :) >> Marcel >> >>> >>> Thanks! >>> Laszlo >>> >>>> Thanks, >>>> Marcel >>>> >>>> hw/pci/pci.c | 17 +++++++++++++++++ >>>> 1 file changed, 17 insertions(+) >>>> >>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c >>>> index 168b9cc..f9127dc 100644 >>>> --- a/hw/pci/pci.c >>>> +++ b/hw/pci/pci.c >>>> @@ -1148,6 +1148,7 @@ static void pci_update_mappings(PCIDevice *d) >>>> PCIIORegion *r; >>>> int i; >>>> pcibus_t new_addr; >>>> + uint16_t cmd = pci_get_word(d->config + PCI_COMMAND); >>>> >>>> for(i = 0; i < PCI_NUM_REGIONS; i++) { >>>> r = &d->io_regions[i]; >>>> @@ -1156,6 +1157,22 @@ static void pci_update_mappings(PCIDevice *d) >>>> if (!r->size) >>>> continue; >>>> >>>> + /* >>>> + * Do not update the mappings until the command register's >>>> + * Decode (I/O or memory) bit is not set. Two reasons: >>>> + * - PCI Spec indicates that while the bit is not set >>>> + * the memory sizing is not finished. >>>> + * - pci_bar_address will return PCI_BAR_UNMAPPED >>>> + * and a previous value can be accidentally overridden >>>> + * if the command register is modified (and not the BAR). >>>> + * */ >>>> + if (((r->type & PCI_BASE_ADDRESS_SPACE_IO) && >>>> + !(cmd & PCI_COMMAND_IO)) || >>>> + ((r->type != PCI_BASE_ADDRESS_SPACE_IO) && >>>> + !(cmd & PCI_COMMAND_MEMORY))) { >>>> + continue; >>>> + } >>>> + >>>> new_addr = pci_bar_address(d, i, r->type, r->size); >>>> >>>> /* This bar isn't changed */ >>>> >>> >>> >> >