Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Alex Williamson <alex.williamson@redhat.com>
To: "Christian Benvenuti (benve)" <benve@cisco.com>
Cc: "Aaron Fabbri (aafabbri)" <aafabbri@cisco.com>,
	aik@au1.ibm.com, kvm@vger.kernel.org, pmac@au1.ibm.com,
	qemu-devel@nongnu.org, joerg.roedel@amd.com,
	konrad.wilk@oracle.com, agraf@suse.de, dwg@au1.ibm.com,
	chrisw@sous-sol.org, B08248@freescale.com,
	iommu@lists.linux-foundation.org, avi@redhat.com,
	linux-pci@vger.kernel.org, B07421@freescale.com
Subject: Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
Date: Mon, 14 Nov 2011 15:59:00 -0700	[thread overview]
Message-ID: <1321311540.17169.122.camel@bling.home> (raw)
In-Reply-To: <184D23435BECB444AB6B9D4630C8EC8302D603A9@XMB-RCD-303.cisco.com>

On Fri, 2011-11-11 at 16:22 -0600, Christian Benvenuti (benve) wrote:
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Friday, November 11, 2011 10:04 AM
> > To: Christian Benvenuti (benve)
> > Cc: chrisw@sous-sol.org; aik@au1.ibm.com; pmac@au1.ibm.com;
> > dwg@au1.ibm.com; joerg.roedel@amd.com; agraf@suse.de; Aaron Fabbri
> > (aafabbri); B08248@freescale.com; B07421@freescale.com; avi@redhat.com;
> > konrad.wilk@oracle.com; kvm@vger.kernel.org; qemu-devel@nongnu.org;
> > iommu@lists.linux-foundation.org; linux-pci@vger.kernel.org
> > Subject: RE: [RFC PATCH] vfio: VFIO Driver core framework
> > 
> > On Wed, 2011-11-09 at 18:57 -0600, Christian Benvenuti (benve) wrote:
> > > Here are few minor comments on vfio_iommu.c ...
> > 
> > Sorry, I've been poking sticks at trying to figure out a clean way to
> > solve the force vfio driver attach problem.
> 
> Attach o detach?

Attach.  For the case when a new device appears that belongs to a group
that already in use.  I'll probably add a claim() operation to the
vfio_device_ops that tells the driver to grab it.  I was hoping for pci
this would just add it to the dynamic ids, but that hits device lock
problems.

> > > > diff --git a/drivers/vfio/vfio_iommu.c b/drivers/vfio/vfio_iommu.c
> > > > new file mode 100644
> > > > index 0000000..029dae3
> > > > --- /dev/null
> > > > +++ b/drivers/vfio/vfio_iommu.c
> > <snip>
> > > > +
> > > > +#include "vfio_private.h"
> > >
> > > Doesn't the 'dma_'  prefix belong to the generic DMA code?
> > 
> > Sure, we could these more vfio-centric.
> 
> Like vfio_dma_map_page?

Something like that, though _page doesn't seem appropriate as it tracks
a region.

> > 
> > > > +struct dma_map_page {
> > > > +	struct list_head	list;
> > > > +	dma_addr_t		daddr;
> > > > +	unsigned long		vaddr;
> > > > +	int			npage;
> > > > +	int			rdwr;
> > > > +};
> > > > +
> > > > +/*
> > > > + * This code handles mapping and unmapping of user data buffers
> > > > + * into DMA'ble space using the IOMMU
> > > > + */
> > > > +
> > > > +#define NPAGE_TO_SIZE(npage)	((size_t)(npage) << PAGE_SHIFT)
> > > > +
> > > > +struct vwork {
> > > > +	struct mm_struct	*mm;
> > > > +	int			npage;
> > > > +	struct work_struct	work;
> > > > +};
> > > > +
> > > > +/* delayed decrement for locked_vm */
> > > > +static void vfio_lock_acct_bg(struct work_struct *work)
> > > > +{
> > > > +	struct vwork *vwork = container_of(work, struct vwork, work);
> > > > +	struct mm_struct *mm;
> > > > +
> > > > +	mm = vwork->mm;
> > > > +	down_write(&mm->mmap_sem);
> > > > +	mm->locked_vm += vwork->npage;
> > > > +	up_write(&mm->mmap_sem);
> > > > +	mmput(mm);		/* unref mm */
> > > > +	kfree(vwork);
> > > > +}
> > > > +
> > > > +static void vfio_lock_acct(int npage)
> > > > +{
> > > > +	struct vwork *vwork;
> > > > +	struct mm_struct *mm;
> > > > +
> > > > +	if (!current->mm) {
> > > > +		/* process exited */
> > > > +		return;
> > > > +	}
> > > > +	if (down_write_trylock(&current->mm->mmap_sem)) {
> > > > +		current->mm->locked_vm += npage;
> > > > +		up_write(&current->mm->mmap_sem);
> > > > +		return;
> > > > +	}
> > > > +	/*
> > > > +	 * Couldn't get mmap_sem lock, so must setup to decrement
> > >                                                       ^^^^^^^^^
> > >
> > > Increment?
> > 
> > Yep

Actually, side note, this is increment/decrement depending on the sign
of the parameter.  So "update" may be more appropriate.  I think Tom
originally used increment in one place and decrement in another to show
it's dual use.

> > <snip>
> > > > +int vfio_remove_dma_overlap(struct vfio_iommu *iommu, dma_addr_t
> > > > start,
> > > > +			    size_t size, struct dma_map_page *mlp)
> > > > +{
> > > > +	struct dma_map_page *split;
> > > > +	int npage_lo, npage_hi;
> > > > +
> > > > +	/* Existing dma region is completely covered, unmap all */
> > >
> > > This works. However, given how vfio_dma_map_dm implements the merging
> > > logic, I think it is impossible to have
> > >
> > >     (start < mlp->daddr &&
> > >      start + size > mlp->daddr + NPAGE_TO_SIZE(mlp->npage))
> > 
> > It's quite possible.  This allows userspace to create a sparse mapping,
> > then blow it all away with a single unmap from 0 to ~0.
> 
> I would prefer the user to use exact ranges in the unmap operations
> because it would make it easier to detect bugs/leaks in the map/unmap
> logic used by the callers.
> My assumptions are that:
> 
> - the user always keeps track of the mappings

My qemu code plays a little on the loose side here, acting as a
passthrough for the internal memory client.  But even there, worst case
would probably be trying to unmap a non-existent entry, not unmapping a
sparse range.

> - the user either unmaps one specific mapping or 'all of them'.
>   The 'all of them' case would also take care of those cases where
>   the user does _not_ keep track of mappings and simply uses
>   the "unmap from 0 to ~0" each time.
> 
> Because of this you could still provide an exact map/unmap logic
> and allow such "unmap from 0 to ~0" by making the latter a special
> case.
> However, if we want to allow any arbitrary/inexact unmap request, then OK.

I can't think of any good reasons we shouldn't be more strict.  I think
it was primarily just convenient to hit all the corner cases since we
merge all the requests together for tracking and need to be able to
split them back apart.  It does feel a little awkward to have a 0/~0
special case though, but I don't think it's worth adding another ioctl
to handle it.

<snip>
> > > > +        if (cmd == VFIO_IOMMU_GET_FLAGS) {
> > > > +                u64 flags = VFIO_IOMMU_FLAGS_MAP_ANY;
> > > > +
> > > > +                ret = put_user(flags, (u64 __user *)arg);
> > > > +
> > > > +        } else if (cmd == VFIO_IOMMU_MAP_DMA) {
> > > > +		struct vfio_dma_map dm;
> > > > +
> > > > +		if (copy_from_user(&dm, (void __user *)arg, sizeof dm))
> > > > +			return -EFAULT;
> > >
> > > What does the "_dm" suffix stand for?
> > 
> > Inherited from Tom, but I figure _dma_map_dm = action(dma map),
> > object(dm), which is a vfio_Dma_Map.
> 
> OK. The reason why I asked is that '_dm' does not add anything to 'vfio_dma_map'.

Yep.  Thanks,

Alex

next prev parent reply	other threads:[~2011-11-14 22:59 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-03 20:12 [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework Alex Williamson
2011-11-09  4:17 ` Aaron Fabbri
2011-11-09  4:41   ` Alex Williamson
2011-11-09  8:11 ` Christian Benvenuti (benve)
2011-11-09 18:02   ` Alex Williamson
2011-11-09 21:08     ` Christian Benvenuti (benve)
2011-11-09 23:40       ` Alex Williamson
2011-11-10  0:57 ` Christian Benvenuti (benve)
2011-11-11 18:04   ` Alex Williamson
2011-11-11 22:22     ` Christian Benvenuti (benve)
2011-11-14 22:59       ` Alex Williamson [this message]
2011-11-15  0:05         ` David Gibson
2011-11-15  0:49           ` Benjamin Herrenschmidt
2011-11-11 17:51 ` Konrad Rzeszutek Wilk
2011-11-11 22:10   ` Alex Williamson
2011-11-15  0:00     ` David Gibson
2011-11-16 16:52     ` Konrad Rzeszutek Wilk
2011-11-17 20:22       ` Alex Williamson
2011-11-17 20:56         ` Scott Wood
2011-11-16 17:47     ` Scott Wood
2011-11-17 20:52       ` Alex Williamson
2011-11-12  0:14 ` Scott Wood
2011-11-14 20:54   ` Alex Williamson
2011-11-14 21:46     ` Alex Williamson
2011-11-14 22:26     ` Scott Wood
2011-11-14 22:48       ` Alexander Graf
2011-11-15  2:29     ` Alex Williamson
2011-11-15  6:34 ` David Gibson
2011-11-15 18:01   ` Alex Williamson
2011-11-17  0:02     ` David Gibson
2011-11-18 20:32       ` Alex Williamson
2011-11-18 21:09         ` Scott Wood
2011-11-22 19:16           ` Alex Williamson
2011-11-22 20:00             ` Scott Wood
2011-11-22 21:28               ` Alex Williamson
2011-11-21  2:47         ` David Gibson
2011-11-22 18:22           ` Alex Williamson
2011-11-15 20:10   ` Scott Wood
2011-11-15 21:40     ` Aaron Fabbri
2011-11-15 22:29       ` Scott Wood
2011-11-16 23:34         ` Alex Williamson
2011-11-29  1:52 ` Alexey Kardashevskiy
2011-11-29  2:01   ` Alexey Kardashevskiy
2011-11-29  2:11     ` Alexey Kardashevskiy
2011-11-29  3:54     ` Alex Williamson
2011-11-29 19:26       ` Alex Williamson
2011-11-29 23:20         ` Stuart Yoder
2011-11-29 23:44           ` Alex Williamson
2011-11-30 15:41             ` Stuart Yoder
2011-11-30 16:58               ` Alex Williamson
2011-12-01 20:58                 ` Stuart Yoder
2011-12-01 21:25                   ` Alex Williamson
2011-12-02 14:40                     ` Stuart Yoder
2011-12-02 18:11                       ` Bhushan Bharat-R65777
2011-12-02 18:27                         ` Scott Wood
2011-12-02 18:35                           ` Bhushan Bharat-R65777
2011-12-02 18:45                           ` Bhushan Bharat-R65777
2011-12-02 18:52                             ` Scott Wood
2011-12-02 18:21                       ` Scott Wood
2011-11-29  3:46   ` Alex Williamson
2011-11-29  4:34     ` Alexey Kardashevskiy
2011-11-29  5:48       ` Alex Williamson
2011-12-02  5:06         ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1321311540.17169.122.camel@bling.home \
    --to=alex.williamson@redhat.com \
    --cc=B07421@freescale.com \
    --cc=B08248@freescale.com \
    --cc=aafabbri@cisco.com \
    --cc=agraf@suse.de \
    --cc=aik@au1.ibm.com \
    --cc=avi@redhat.com \
    --cc=benve@cisco.com \
    --cc=chrisw@sous-sol.org \
    --cc=dwg@au1.ibm.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joerg.roedel@amd.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=pmac@au1.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).