Re: [Qemu-devel] [PATCH] vfio/pci: Hide SR-IOV capability

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Laszlo Ersek <lersek@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>, qemu-devel@nongnu.org
Cc: chen.fan.fnst@cn.fujitsu.com, zhoujie2011@cn.fujitsu.com
Subject: Re: [Qemu-devel] [PATCH] vfio/pci: Hide SR-IOV capability
Date: Tue, 21 Jun 2016 02:15:23 +0200	[thread overview]
Message-ID: <6d957d2e-1ddb-8e49-55b1-9efab22c397a@redhat.com> (raw)
In-Reply-To: <20160620215824.13390.52262.stgit@gimli.home>

On 06/21/16 00:04, Alex Williamson wrote:
> The kernel currently exposes the SR-IOV capability as read-only
> through vfio-pci.  This is sufficient to protect the host kernel, but
> has the potential to confuse guests without further virtualization.
> In particular, OVMF tries to size the VF BARs and comes up with absurd
> results, ending with an assert.  There's not much point in adding
> virtualization to a read-only capability, so we simply hide it for
> now.  If the kernel ever enables SR-IOV virtualization, we should
> easily be able to test it through VF BAR sizing or explicit flags.
> 
> Testing whether we should parse extended capabilities is also pulled
> into the function to keep these assumptions in one place.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> This depends on Chen Fan's patch "vfio: add pcie extended capability
> support", which I'll pull from Zhou Jie's latest series unless there
> are comments to the contrary.  Otherwise based on Stefan's tracing
> pull request so as not to conflict.
> 
>  hw/vfio/pci.c        |   49 +++++++++++++++++++++++++++++++++++++++----------
>  hw/vfio/trace-events |    1 +
>  2 files changed, 40 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index a171056b..36d5e00 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1772,6 +1772,12 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
>      uint8_t cap_ver;
>      uint8_t *config;
>  
> +    /* Only add extended caps if we have them and the guest can see them */
> +    if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
> +        !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
> +        return 0;
> +    }
> +
>      /*
>       * pcie_add_capability always inserts the new capability at the tail
>       * of the chain.  Therefore to end up with a chain that matches the
> @@ -1780,6 +1786,25 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
>       */
>      config = g_memdup(pdev->config, vdev->config_size);
>  
> +    /*
> +     * Extended capabilities are chained with each pointing to the next, so we
> +     * can drop anything other than the head of the chain simply by modifying
> +     * the previous next pointer.  For the head of the chain, we can modify the
> +     * capability ID to something that cannot match a valid capability.  ID
> +     * 0 is reserved for this since absence of capabilities is indicated by
> +     * 0 for the ID, version, AND next pointer.  However, pcie_add_capability()
> +     * uses ID 0 as reserved for list management and will incorrectly match and
> +     * assert if we attempt to pre-load the head of the chain with with this
> +     * ID.  Use ID 0xFFFF temporarily since it is also seems to be reserved in
> +     * part for identifying abscense of capabilities in a root complex register
> +     * block.  If the ID still exists after adding capabilities, switch back to
> +     * zero.  We'll mark this entire first dword as emulated for this purpose.
> +     */
> +    pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
> +                 PCI_EXT_CAP(0xFFFF, 0, 0));
> +    pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
> +    pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0);
> +
>      for (next = PCI_CONFIG_SPACE_SIZE; next;
>           next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
>          header = pci_get_long(config + next);
> @@ -1794,12 +1819,23 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
>           */
>          size = vfio_ext_cap_max_size(config, next);
>  
> -        pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> -        pci_set_long(pdev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0));
> -
>          /* Use emulated next pointer to allow dropping extended caps */
>          pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
>                                     PCI_EXT_CAP_NEXT_MASK);
> +
> +        switch (cap_id) {
> +        case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuses OVMF */

I think s/confuses/confuse/.

Other than that, this is mostly black magic to me, so I can't even ACK
it with a straight face :)

I would like to test it, and report back, but then again, I don't have a
NIC with virtual functions. :/

Thank you!
Laszlo

> +            trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
> +            break;
> +        default:
> +            pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> +        }
> +
> +    }
> +
> +    /* Cleanup chain head ID if necessary */
> +    if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
> +        pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
>      }
>  
>      g_free(config);
> @@ -1821,13 +1857,6 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev)
>          return ret;
>      }
>  
> -    /* on PCI bus, it doesn't make sense to expose extended capabilities. */
> -    if (!pci_is_express(pdev) ||
> -        !pci_bus_is_express(pdev->bus) ||
> -        !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
> -        return 0;
> -    }
> -
>      return vfio_add_ext_cap(vdev);
>  }
>  
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 9da0ff9..a768fb5 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -37,6 +37,7 @@ vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot reset: %
>  vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
>  vfio_populate_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m"
>  vfio_initfn(const char *name, int group_id) " (%s) group %d"
> +vfio_add_ext_cap_dropped(const char *name, uint16_t cap, uint16_t offset) "%s %x@%x"
>  vfio_pci_reset(const char *name) " (%s)"
>  vfio_pci_reset_flr(const char *name) "%s FLR/VFIO_DEVICE_RESET"
>  vfio_pci_reset_pm(const char *name) "%s PCI PM Reset"
>

next prev parent reply	other threads:[~2016-06-21  0:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-20 22:04 [Qemu-devel] [PATCH] vfio/pci: Hide SR-IOV capability Alex Williamson
2016-06-20 22:23 ` Eric Blake
2016-06-20 22:31   ` Alex Williamson
2016-06-21  0:15 ` Laszlo Ersek [this message]
2016-06-21  3:54   ` Alex Williamson
2016-06-28 13:10 ` Laszlo Ersek
2016-06-28 20:05 ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6d957d2e-1ddb-8e49-55b1-9efab22c397a@redhat.com \
    --to=lersek@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=chen.fan.fnst@cn.fujitsu.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zhoujie2011@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).