kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: <pbonzini@redhat.com>, <kraxel@redhat.com>, <cjia@nvidia.com>,
	<qemu-devel@nongnu.org>, <kvm@vger.kernel.org>,
	<kevin.tian@intel.com>, <jike.song@intel.com>,
	<bjsdjshi@linux.vnet.ibm.com>
Subject: Re: [PATCH v6 2/4] vfio: VFIO driver for mediated PCI device
Date: Wed, 10 Aug 2016 17:00:44 -0600	[thread overview]
Message-ID: <20160810170044.077a61a7@t450s.home> (raw)
In-Reply-To: <b8e43146-a707-e0a7-528f-b9dbcc6f0b29@nvidia.com>

On Thu, 11 Aug 2016 02:53:10 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 8/10/2016 12:30 AM, Alex Williamson wrote:
> > On Thu, 4 Aug 2016 00:33:52 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> 
> ...
> 
> >> +
> >> +		switch (info.index) {
> >> +		case VFIO_PCI_CONFIG_REGION_INDEX:
> >> +		case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
> >> +			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);  
> > 
> > No, vmdev->vfio_region_info[info.index].offset
> >  
> 
> Ok.
> 
> >> +			info.size = vmdev->vfio_region_info[info.index].size;
> >> +			if (!info.size) {
> >> +				info.flags = 0;
> >> +				break;
> >> +			}
> >> +
> >> +			info.flags = vmdev->vfio_region_info[info.index].flags;
> >> +			break;
> >> +		case VFIO_PCI_VGA_REGION_INDEX:
> >> +		case VFIO_PCI_ROM_REGION_INDEX:  
> > 
> > Why?  Let the vendor driver decide.
> >   
> 
> Ok.
> 
> >> +		switch (info.index) {
> >> +		case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSI_IRQ_INDEX:
> >> +		case VFIO_PCI_REQ_IRQ_INDEX:
> >> +			break;
> >> +			/* pass thru to return error */
> >> +		case VFIO_PCI_MSIX_IRQ_INDEX:  
> > 
> > ???  
> 
> Sorry, I missed to update this. Updating it.
> 
> >> +	case VFIO_DEVICE_SET_IRQS:
> >> +	{  
> ...
> >> +
> >> +		if (parent->ops->set_irqs)
> >> +			ret = parent->ops->set_irqs(mdev, hdr.flags, hdr.index,
> >> +						    hdr.start, hdr.count, data);
> >> +
> >> +		kfree(ptr);
> >> +		return ret;  
> > 
> > Return success if no set_irqs callback?
> >  
> 
> Ideally, vendor driver should provide this function. If vendor driver
> doesn't provide it, do we really need to fail here?

Wouldn't you as a user expect to get an error if you try to call an
ioctl that has no backing rather than assume success and never receive
and interrupt?
 
> >> +static ssize_t vfio_mpci_read(void *device_data, char __user *buf,
> >> +			      size_t count, loff_t *ppos)
> >> +{
> >> +	struct vfio_mdev *vmdev = device_data;
> >> +	struct mdev_device *mdev = vmdev->mdev;
> >> +	struct parent_device *parent = mdev->parent;
> >> +	int ret = 0;
> >> +
> >> +	if (!count)
> >> +		return 0;
> >> +
> >> +	if (parent->ops->read) {
> >> +		char *ret_data, *ptr;
> >> +
> >> +		ptr = ret_data = kzalloc(count, GFP_KERNEL);  
> > 
> > Do we really need to support arbitrary lengths in one shot?  Seems like
> > we could just use a 4 or 8 byte variable on the stack and iterate until
> > done.
> >   
> 
> We just want to pass the arguments to vendor driver as is here. Vendor
> driver could take care of that.

But I think this is exploitable, it lets the user make the kernel
allocate an arbitrarily sized buffer.
 
> >> +
> >> +static ssize_t vfio_mpci_write(void *device_data, const char __user *buf,
> >> +			       size_t count, loff_t *ppos)
> >> +{
> >> +	struct vfio_mdev *vmdev = device_data;
> >> +	struct mdev_device *mdev = vmdev->mdev;
> >> +	struct parent_device *parent = mdev->parent;
> >> +	int ret = 0;
> >> +
> >> +	if (!count)
> >> +		return 0;
> >> +
> >> +	if (parent->ops->write) {
> >> +		char *usr_data, *ptr;
> >> +
> >> +		ptr = usr_data = memdup_user(buf, count);  
> > 
> > Same here, how much do we care to let the user write in one pass and is
> > there any advantage to it?  When QEMU is our userspace we're only
> > likely to see 4-byte accesses anyway.  
> 
> Same as above.
> 
> >> +
> >> +static int mdev_dev_mmio_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> >> +{  
> ...
> >> +	} else {
> >> +		struct pci_dev *pdev;
> >> +
> >> +		virtaddr = vma->vm_start;
> >> +		req_size = vma->vm_end - vma->vm_start;
> >> +
> >> +		pdev = to_pci_dev(parent->dev);
> >> +		index = VFIO_PCI_OFFSET_TO_INDEX(vma->vm_pgoff << PAGE_SHIFT);  
> > 
> > Iterate through region_info[*].offset/size provided by vendor driver.
> >   
> 
> Yes, makes sense.
> 
> >> +
> >> +int vfio_mpci_match(struct device *dev)
> >> +{
> >> +	if (dev_is_pci(dev->parent))  
> > 
> > This is the wrong test, there's really no requirement that a pci mdev
> > device is hosted by a real pci device.    
> 
> Ideally this module is for the mediated device whose parent is PCI
> device. And we are relying on kernel functions like
> pci_resource_start(), to_pci_dev() in this module, so better to check it
> while loading.

IMO, we don't want to care what the parent device is, it's not ideal,
it's actually a limitation to impose that it is a PCI device.  I want to
be able to make purely virtual mediated devices.  I only see that you
use these functions in the mmio fault handling.  Is it useful to assume
that on mmio fault we map to the parent device PCI BAR regions?  Just
require that the vendor driver provides a fault mapping function or
SIGBUS if we get a fault and it doesn't.

> > Can't we check that the device
> > is on an mdev_pci_bus_type?
> >   
> 
> I didn't get this part.
> 
> Each mediated device is of mdev_bus_type. But VFIO module could be
> different based on parent device type and loaded at the same time. For
> example, there should be different modules for channel IO or any other
> type of devices and could be loaded at the same time. Then when mdev
> device is created based on check in match() function of each module, and
> proper driver would be linked for that mdev device.
> 
> If this check is not based on parent device type, do you expect to set
> parent device type by vendor driver and accordingly load corresponding
> VFIO driver?

mdev_pci_bus_type was an off the cuff response since the driver.bus
controls which devices a probe function will see.  If we have a unique
bus for a driver and create devices appropriately, we really don't
even need a match function.  That would still work, but what if you
made a get_device_info callback to the vendor driver rather than
creating that info in the mediated bus driver layer.  Then the probe
function here could simply check the flags to see if the device is
VFIO_DEVICE_FLAGS_PCI?

> >> @@ -18,6 +18,7 @@
> >>  #include <linux/uaccess.h>
> >>  #include <linux/io.h>
> >>  #include <linux/vgaarb.h>
> >> +#include <linux/vfio.h>
> >>  
> >>  #include "vfio_pci_private.h"
> >>  
> >> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> >> index 0ecae0b1cd34..431b824b0d3e 100644
> >> --- a/include/linux/vfio.h
> >> +++ b/include/linux/vfio.h
> >> @@ -18,6 +18,13 @@
> >>  #include <linux/poll.h>
> >>  #include <uapi/linux/vfio.h>
> >>  
> >> +#define VFIO_PCI_OFFSET_SHIFT   40
> >> +
> >> +#define VFIO_PCI_OFFSET_TO_INDEX(off)   (off >> VFIO_PCI_OFFSET_SHIFT)
> >> +#define VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << VFIO_PCI_OFFSET_SHIFT)
> >> +#define VFIO_PCI_OFFSET_MASK    (((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
> >> +
> >> +  
> > 
> > Nak this, I'm not interested in making this any sort of ABI.
> >   
> 
> These macros are used by drivers/vfio/pci/vfio_pci.c and
> drivers/vfio/mdev/vfio_mpci.c and to use those in both these modules,
> they should be moved to common place as you suggested in earlier
> reviews. I think this is better common place. Are there any other
> suggestion?

They're only used in ways that I objected to above and you've agreed
to.  These define implementation details that must not become part of
the mediated vendor driver ABI.  A vendor driver is free to redefine
this the same if they want, but as we can see with how easily they slip
into code where they don't belong, the only way to make sure they don't
become ABI is to keep them in private headers.
 
> >> +static u8 mpci_find_pci_capability(struct mdev_device *mdev, u8 capability)
> >> +{
> >> +	loff_t pos = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_CONFIG_REGION_INDEX);  
> > 
> > This creates a fixed ABI between vfio-mdev-pci and vendor drivers that
> > a given region starts at a pre-defined offset.  We have the offset
> > stored in vfio_mdev.region_info[VFIO_PCI_CONFIG_REGION_INDEX].offset,
> > use it.  It's just as unacceptable to impose this fixed relationship
> > with a vendor driver here as if a userspace driver were to do the same.
> >   
> 
> In the v5 version, where config space was cached in this module,
> suggestion was to don't care about data or caching it at read/write,
> just pass it through. Now since VFIO_PCI_* macros are also available
> here, vendor driver can use it to decode pos to find region index and
> offset of access. Then vendor driver itself add
> vmdev->vfio_region_info[info.index].offset, which is known to him.
> Either we do this in VFIO module or vendor driver?

As I say above, a vendor driver is absolutely free to use the same
index/offset scheme, but it absolutely must not be part of the ABI
between vendor drivers and the mediated driver core.  It's up to the
vendor driver to define that relation and moving these to a common
header is clearly too dangerous.  I'm sorry if I've said otherwise in
the past, but I've only recently discovered a userspace driver (DPDK)
copying these defines and ignoring the index offsets reported through
the REGION_INFO API.  So I'm now bitterly aware how an internal
implementation detail can be abused and if we don't catch them, it's
going to lock us into an implementation that was designed to be
flexible.  Thanks,

Alex

  reply	other threads:[~2016-08-10 23:00 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-03 19:03 [PATCH v6 0/4] Add Mediated device support Kirti Wankhede
2016-08-03 19:03 ` [PATCH v6 1/4] vfio: Mediated device Core driver Kirti Wankhede
2016-08-04  7:21   ` Tian, Kevin
2016-08-05  6:13     ` Kirti Wankhede
2016-08-15  9:15       ` Tian, Kevin
2016-08-09 19:00   ` Alex Williamson
2016-08-12 18:44     ` Kirti Wankhede
2016-08-12 21:16       ` Alex Williamson
2016-08-13  0:37         ` Kirti Wankhede
2016-08-15  9:38           ` Tian, Kevin
2016-08-15 15:59             ` Alex Williamson
2016-08-15 22:09               ` Neo Jia
2016-08-15 22:52                 ` Alex Williamson
2016-08-15 23:23                   ` Neo Jia
2016-08-16  0:49                     ` Tian, Kevin
2016-08-15 19:59             ` Neo Jia
2016-08-15 22:47               ` [Qemu-devel] " Alex Williamson
2016-08-15 23:54                 ` Neo Jia
2016-08-16  0:18                 ` Tian, Kevin
2016-08-16 20:30                 ` [Qemu-devel] " Neo Jia
2016-08-16 20:51                   ` Alex Williamson
2016-08-16 21:17                     ` Neo Jia
2016-08-16  0:30               ` Tian, Kevin
2016-08-16  3:45                 ` Neo Jia
2016-08-16  3:50                   ` Tian, Kevin
2016-08-16  4:16                     ` Neo Jia
2016-08-16  4:52                       ` Tian, Kevin
2016-08-16  5:43                         ` Neo Jia
2016-08-16  5:58                           ` Tian, Kevin
2016-08-16  6:13                             ` Neo Jia
2016-08-16 21:03                               ` Alex Williamson
2016-08-16 12:49                         ` [Qemu-devel] " Alex Williamson
2016-08-03 19:03 ` [PATCH v6 2/4] vfio: VFIO driver for mediated PCI device Kirti Wankhede
2016-08-03 21:03   ` kbuild test robot
2016-08-04  0:19   ` kbuild test robot
2016-08-09 19:00   ` Alex Williamson
2016-08-10 21:23     ` Kirti Wankhede
2016-08-10 23:00       ` Alex Williamson [this message]
2016-08-11 15:59         ` Kirti Wankhede
2016-08-11 16:24           ` Alex Williamson
2016-08-11 17:46             ` Kirti Wankhede
2016-08-11 18:43               ` Alex Williamson
2016-08-12 17:57                 ` Kirti Wankhede
2016-08-12 21:25                   ` Alex Williamson
2016-08-13  0:42                     ` Kirti Wankhede
2016-08-03 19:03 ` [PATCH v6 3/4] vfio iommu: Add support for mediated devices Kirti Wankhede
2016-08-09 19:00   ` Alex Williamson
2016-08-11 14:22     ` Kirti Wankhede
2016-08-11 16:28       ` Alex Williamson
2016-08-03 19:03 ` [PATCH v6 4/4] docs: Add Documentation for Mediated devices Kirti Wankhede
2016-08-04  7:31   ` Tian, Kevin
2016-08-05  7:45     ` Kirti Wankhede
2016-08-24 22:36   ` [Qemu-devel] " Daniel P. Berrange

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160810170044.077a61a7@t450s.home \
    --to=alex.williamson@redhat.com \
    --cc=bjsdjshi@linux.vnet.ibm.com \
    --cc=cjia@nvidia.com \
    --cc=jike.song@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).