qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>,
	Jintack Lim <jintack@cs.columbia.edu>
Subject: Re: [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps
Date: Wed, 22 Feb 2017 11:08:51 +0800	[thread overview]
Message-ID: <20170222030851.GA17314@pxdev.xzpeter.org> (raw)
In-Reply-To: <20170221214228.12515.79751.stgit@gimli.home>

[cc Jintack]

On Tue, Feb 21, 2017 at 02:43:03PM -0700, Alex Williamson wrote:
> Since commit 4bb571d857d9 ("pci/pcie: don't assume cap id 0 is
> reserved") removes the internal use of extended capability ID 0, the
> comment here becomes invalid.  However, peeling back the onion, the
> code is still correct and we still can't seed the capability chain
> with ID 0, unless we want to muck with using the version number to
> force the header to be non-zero, which is much uglier to deal with.
> The comment also now covers some of the subtleties of using cap ID 0,
> such as transparently indicating absence of capabilities if none are
> added.  This doesn't detract from the correctness of the referenced
> commit as vfio in the kernel also uses capability ID zero to mask
> capabilties.  In fact, we should skip zero capabilities precisely
> because the kernel might also expose such a capability at the head
> position and re-introduce the problem.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/vfio/pci.c |   31 +++++++++++++++++++++----------
>  1 file changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index f2ba9b6cfafc..03a3d0154976 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1880,16 +1880,26 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>      /*
>       * Extended capabilities are chained with each pointing to the next, so we
>       * can drop anything other than the head of the chain simply by modifying
> -     * the previous next pointer.  For the head of the chain, we can modify the
> -     * capability ID to something that cannot match a valid capability.  ID
> -     * 0 is reserved for this since absence of capabilities is indicated by
> -     * 0 for the ID, version, AND next pointer.  However, pcie_add_capability()
> -     * uses ID 0 as reserved for list management and will incorrectly match and
> -     * assert if we attempt to pre-load the head of the chain with this ID.
> -     * Use ID 0xFFFF temporarily since it is also seems to be reserved in
> -     * part for identifying absence of capabilities in a root complex register
> -     * block.  If the ID still exists after adding capabilities, switch back to
> -     * zero.  We'll mark this entire first dword as emulated for this purpose.
> +     * the previous next pointer.  Seed the head of the chain here such that
> +     * we can simply skip any capabilities we want to drop below, regardless
> +     * of their position in the chain.  If this stub capability still exists
> +     * after we add the capabilities we want to expose, update the capability
> +     * ID to zero.  Note that we cannot seed with the capability header being
> +     * zero as this conflicts with definition of an absent capability chain
> +     * and prevents capabilities beyond the head of the list from being added.
> +     * By replacing the dummy capability ID with zero after walking the device
> +     * chain, we also transparently mark extended capabilities as absent if
> +     * no capabilities were added.  Note that the PCIe spec defines an absence
> +     * of extended capabilities to be determined by a value of zero for the
> +     * capability ID, version, AND next pointer.  A non-zero next pointer
> +     * should be sufficient to indicate additional capabilities are present,
> +     * which will occur if we call pcie_add_capability() below.  The entire
> +     * first dword is emulated to support this.
> +     *
> +     * NB. The kernel side does similar masking, so be prepared that our
> +     * view of the device may also contain a capability ID zero in the head
> +     * of the chain.  Skip it for the same reason that we cannot seed the
> +     * chain with a zero capability.
>       */
>      pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
>                   PCI_EXT_CAP(0xFFFF, 0, 0));
> @@ -1915,6 +1925,7 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>                                     PCI_EXT_CAP_NEXT_MASK);
>  
>          switch (cap_id) {
> +        case 0: /* kernel masked capability */
>          case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
>          case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
>              trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
> 

Reviewed-by: Peter Xu <peterx@redhat.com>

Since this bug is originally reported by Jintack, maybe we can also
add:

Reported-by: Jintack Lim <jintack@cs.columbia.edu>

Jintack, if you want to test it and provide your tested-by, it would
be nice as well. ;)

Actually I just found that the bug still exist after Michael's fix (I
thought it was fixed). So we definitely need this patch or equivalent.
However, I would still slightly prefer removing the wrapping hack
since after all we need to touch it (and I do feel like that's error
prone...). So, Alex, do you like below one instead?

--------8<----------
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d..6942c1d 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1877,25 +1877,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
      */
     config = g_memdup(pdev->config, vdev->config_size);

-    /*
-     * Extended capabilities are chained with each pointing to the next, so we
-     * can drop anything other than the head of the chain simply by modifying
-     * the previous next pointer.  For the head of the chain, we can modify the
-     * capability ID to something that cannot match a valid capability.  ID
-     * 0 is reserved for this since absence of capabilities is indicated by
-     * 0 for the ID, version, AND next pointer.  However, pcie_add_capability()
-     * uses ID 0 as reserved for list management and will incorrectly match and
-     * assert if we attempt to pre-load the head of the chain with this ID.
-     * Use ID 0xFFFF temporarily since it is also seems to be reserved in
-     * part for identifying absence of capabilities in a root complex register
-     * block.  If the ID still exists after adding capabilities, switch back to
-     * zero.  We'll mark this entire first dword as emulated for this purpose.
-     */
-    pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
-                 PCI_EXT_CAP(0xFFFF, 0, 0));
-    pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
-    pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0);
-
     for (next = PCI_CONFIG_SPACE_SIZE; next;
          next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
         header = pci_get_long(config + next);
@@ -1910,24 +1891,33 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
          */
         size = vfio_ext_cap_max_size(config, next);

-        /* Use emulated next pointer to allow dropping extended caps */
-        pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
-                                   PCI_EXT_CAP_NEXT_MASK);
+        /* Use emulated header to allow dropping extended caps */
+        pci_set_long(vdev->emulated_config_bits + next, 0xffffffffULL);

         switch (cap_id) {
         case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
         case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
+        case PCI_EXT_CAP_ID_VC:
+            /*
+             * For dropped capabilities, we keep their slot but
+             * replace them with a header containing cap_id=0 &&
+             * cap_ver=1. We do this reservation mostly to make sure
+             * the head ecap (at offset 0x100) will always be there.
+             * Anyway it won't hurt if we keep the rest of the dropped
+             * ones as well.
+             *
+             * Here we use non-zero cap_ver because we want to "mark"
+             * this ecap as "available" - from PCIe spec (chap 7.9.1),
+             * it marked out that cap_id=cap_ver=next=0 means empty
+             * ecap, and we really don't want it to be an "empty" slot
+             * especially for the head ecap (we need a head, always!).
+             */
+            pcie_add_capability(pdev, 0, 1, next, size);
             trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
             break;
         default:
             pcie_add_capability(pdev, cap_id, cap_ver, next, size);
         }
-
-    }
-
-    /* Cleanup chain head ID if necessary */
-    if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
-        pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
     }

     g_free(config);
------->8--------

The new patch will keep all the dropped ecaps (so we may see more than
one cap_id=0x0000 field), which I don't know whether would be a
drawback. OTOH, it provides benefits like:

- we can remove the wrapping hack, so the code is much readable and
  less error prone imho

- we can avoid using the assumption that 0xffff cap_id is reserved

I can live with both patches though. :-)

Thanks!

-- peterx

  reply	other threads:[~2017-02-22  3:09 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-21 21:43 [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps Alex Williamson
2017-02-22  3:08 ` Peter Xu [this message]
2017-02-22  3:54   ` Alex Williamson
2017-02-22  4:10     ` Peter Xu
2017-02-22 11:48   ` Jintack Lim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170222030851.GA17314@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=jintack@cs.columbia.edu \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).