From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46245) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ceC73-0001tQ-B4 for qemu-devel@nongnu.org; Wed, 15 Feb 2017 21:48:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ceC6y-00044X-Co for qemu-devel@nongnu.org; Wed, 15 Feb 2017 21:48:05 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54258) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ceC6y-00044P-3y for qemu-devel@nongnu.org; Wed, 15 Feb 2017 21:48:00 -0500 Date: Wed, 15 Feb 2017 19:47:58 -0700 From: Alex Williamson Message-ID: <20170215194758.4622d73c@t450s.home> In-Reply-To: <20170216022839.GC15985@pxdev.xzpeter.org> References: <20170208031216.GA5151@pxdev.xzpeter.org> <20170209035250.GB22153@pxdev.xzpeter.org> <20170214073551.GA9055@pxdev.xzpeter.org> <20170215025243.GA3988@pxdev.xzpeter.org> <20170215033452.GB3988@pxdev.xzpeter.org> <20170215111552.5405418a@t450s.home> <20170216022839.GC15985@pxdev.xzpeter.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] iommu emulation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Xu Cc: Jintack Lim , mst@redhat.com, QEMU Devel Mailing List On Thu, 16 Feb 2017 10:28:39 +0800 Peter Xu wrote: > On Wed, Feb 15, 2017 at 11:15:52AM -0700, Alex Williamson wrote: > > [...] > > > > Alex, do you like something like below to fix above issue that Jintack > > > has encountered? > > > > > > (note: this code is not for compile, only trying show what I mean...) > > > > > > ------8<------- > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > > > index 332f41d..4dca631 100644 > > > --- a/hw/vfio/pci.c > > > +++ b/hw/vfio/pci.c > > > @@ -1877,25 +1877,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) > > > */ > > > config = g_memdup(pdev->config, vdev->config_size); > > > > > > - /* > > > - * Extended capabilities are chained with each pointing to the next, so we > > > - * can drop anything other than the head of the chain simply by modifying > > > - * the previous next pointer. For the head of the chain, we can modify the > > > - * capability ID to something that cannot match a valid capability. ID > > > - * 0 is reserved for this since absence of capabilities is indicated by > > > - * 0 for the ID, version, AND next pointer. However, pcie_add_capability() > > > - * uses ID 0 as reserved for list management and will incorrectly match and > > > - * assert if we attempt to pre-load the head of the chain with this ID. > > > - * Use ID 0xFFFF temporarily since it is also seems to be reserved in > > > - * part for identifying absence of capabilities in a root complex register > > > - * block. If the ID still exists after adding capabilities, switch back to > > > - * zero. We'll mark this entire first dword as emulated for this purpose. > > > - */ > > > - pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE, > > > - PCI_EXT_CAP(0xFFFF, 0, 0)); > > > - pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0); > > > - pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0); > > > - > > > for (next = PCI_CONFIG_SPACE_SIZE; next; > > > next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) { > > > header = pci_get_long(config + next); > > > @@ -1917,6 +1898,8 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) > > > switch (cap_id) { > > > case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */ > > > case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */ > > > + /* keep this ecap header (4 bytes), but mask cap_id to 0xffff */ > > > + ... > > > trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next); > > > break; > > > default: > > > @@ -1925,11 +1908,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) > > > > > > } > > > > > > - /* Cleanup chain head ID if necessary */ > > > - if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) { > > > - pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0); > > > - } > > > - > > > g_free(config); > > > return; > > > } > > > ----->8----- > > > > > > Since after all we need the assumption that 0xffff is reserved for > > > cap_id. Then, we can just remove the "first 0xffff then 0x0" hack, > > > which is imho error-prone and hacky. > > > > This doesn't fix the bug, which is that pcie_add_capability() uses a > > valid capability ID for it's own internal tracking. It's only doing > > this to find the end of the capability chain, which we could do in a > > spec complaint way by looking for a zero next pointer. Fix that and > > then vfio doesn't need to do this set to 0xffff then back to zero > > nonsense at all. Capability ID zero is valid. Thanks, > > Yeah I see Michael's fix on the capability list stuff. However, imho > these are two different issues? Or say, even if with that patch, we > should still need this hack (first 0x0, then 0xffff) right? Since > looks like that patch didn't solve the problem if the first pcie ecap > is masked at 0x100. I thought the problem was that QEMU in the host exposes a device with a capability ID of 0 to the L1 guest. QEMU in the L1 guest balks at a capability ID of 0 because that's how it finds the end of the chain. Therefore if we make QEMU not use capability ID 0 for internal purposes, things work. vfio using 0xffff and swapping back to 0x0 becomes unnecessary, but doesn't hurt anything. Thanks, Alex