qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: David Gibson <dwg@au1.ibm.com>
Cc: Benjamin Herrenschmidt <benh@au1.ibm.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Stuart Yoder <b08248@gmail.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Alexander Graf <agraf@suse.de>, "avi@redhat.com" <avi@redhat.com>,
	Scott Wood <scottwood@freescale.com>
Subject: Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
Date: Fri, 30 Sep 2011 15:59:29 -0600	[thread overview]
Message-ID: <1317419970.2552.53.camel@bling.home> (raw)
In-Reply-To: 20110930084647.GF4512@yookeroo.fritz.box

On Fri, 2011-09-30 at 10:37 -0600, Alex Williamson wrote:
> On Fri, 2011-09-30 at 18:46 +1000, David Gibson wrote:
> > On Mon, Sep 26, 2011 at 12:34:52PM -0600, Alex Williamson wrote:
> > > On Mon, 2011-09-26 at 12:04 +0200, Alexander Graf wrote:
> > > > Am 26.09.2011 um 09:51 schrieb David Gibson <david@gibson.dropbear.id.au>:
> > [snip]
> > > > Also, if you can come up with an interface that does not have variable
> > > > length descriptors but is still able to export all the required
> > > > generic information, please send a proposal to the list :)
> > > > 
> > > 
> > > Hi,
> > > 
> > > The other obvious possibility is a pure ioctl interface.  To match what
> > > this proposal is trying to describe, plus the runtime interfaces, we'd
> > > need something like:
> > 
> > Right, this also seems a reasonable possibility to me, depending on
> > the details.
> > 
> > > /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */
> > > #define VFIO_DEVICE_GET_FLAGS			_IOR(, , u64)
> > > 
> > > 
> > > /* Return number of mmio/iop/config regions.
> > >  * For PCI this is always 8 (BAR0-5 + ROM + Config) */
> > > #define VFIO_DEVICE_GET_NUM_REGIONS		_IOR(, , int)
> > > 
> > > /* Return length for region index (may be zero) */
> > > #define VFIO_DEVICE_GET_REGION_LEN		_IOWR(, , u64)
> > > 
> > > /* Return flags for region index
> > >  * :0 - mmap'able, :1 - read-only, 63:2 - reserved */
> > > #define VFIO_DEVICE_GET_REGION_FLAGS		_IOR(, , u64)
> > > 
> > > /* Return file offset for region index */
> > > #define VFIO_DEVICE_GET_REGION_OFFSET		_IOWR(, , u64)
> > 
> > The above 3 can be be folded into one "getregioninfo" call.
> 
> Yep, and the phys addr one below.  We can use a flags bit to indicate
> whether it's valid.
> 
> > > /* Return physical address for region index - not implemented for PCI */
> > > #define VFIO_DEVICE_GET_REGION_PHYS_ADDR	_IOWR(, , u64)
> > > 
> > > 
> > > 
> > > /* Return number of IRQs (Not including MSI/MSI-X for PCI) */
> > > #define VFIO_DEVICE_GET_NUM_IRQ			_IOR(, , int)
> > > 
> > > /* Set IRQ eventfd for IRQ index, arg[0] = index, arg[1] = fd */
> > > #define VFIO_DEVICE_SET_IRQ_EVENTFD		_IOW(, , int)
> > > 
> > > /* Unmask IRQ index */
> > > #define VFIO_DEVICE_UNMASK_IRQ			_IOW(, , int)
> > > 
> > > /* Set unmask eventfd for index, arg[0] = index, arg[1] = fd */
> > > #define VFIO_DEVICE_SET_UNMASK_IRQ_EVENTFD	_IOW(, , int)
> > > 
> > > 
> > > /* Return the device tree path for type/index into the user
> > >  * allocated buffer */
> > > struct dtpath {
> > > 	u32	type; (0 = region, 1 = IRQ)
> > > 	u32	index;
> > > 	u32	buf_len;
> > > 	char	*buf;
> > > };
> > > #define VFIO_DEVICE_GET_DTPATH			_IOWR(, , struct dtpath)
> > > 
> > > /* Return the device tree index for type/index */
> > > struct dtindex {
> > > 	u32	type; (0 = region, 1 = IRQ)
> > > 	u32	index;
> > > 	u32	prop_type;
> > > 	u32	prop_index;
> > > };
> > > #define VFIO_DEVICE_GET_DTINDEX			_IOWR(, , struct dtindex)
> > 
> > I think those need some work, but that doesn't impinge on the core
> > semantics.
> > 
> > > /* Reset the device */
> > > #define VFIO_DEVICE_RESET			_IO(, ,)
> > > 
> > > 
> > > /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */
> > > #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS	_IOW(, , int)
> > > #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS	_IOW(, , int)
> > 
> > Why does this need seperate controls, rather than just treating MSIs
> > as interrupts beyond the first for PCI devices?
> 
> Well, we could say that PCI will always report 3 for
> VFIO_DEVICE_GET_NUM_IRQ where 0 = legacy, 1 = MSI, 2 = MSI-X.  ioctls on
> unimplemented IRQs will fail, UNMASK* ioctls on non-level triggered
> interrupts will fail, and the parameter to SET_IRQ_EVENTFD becomes
> arg[0] = index, arg[1] = count, arg[2-n] = fd.  Maybe we'd then have a
> GET_IRQ_INFO that takes something like:
> 
> struct vfio_irq_info {
> 	int index;
> 	unsigned int count;
> 	u64 flags;
> #define VFIO_IRQ_INFO_FLAGS_LEVEL	(1 << 0)
> };
> 
> count would be 0 on PCI if the type of interrupt isn't supported.
> Better?  Thanks,

FYI for all, I've pushed a branch out to github with the current state
of the re-write.  You can find it here

https://awilliam@github.com/awilliam/linux-vfio.git
git://github.com/awilliam/linux-vfio.git

The vfio-ng branch is the latest.  The framework is quite a bit more
solid now, so I figure it's time to move into the device and iommu
implementation.  vfio-pci is now it's own module that depends on vfio, I
expect vfio-dt to be implemented the same.  The PCI ioctl is in place
and supports the interface described above.  I'll continue to port
pieces of the old vfio code into this new infrastructure.  Comments and
patches welcome.  Thanks,

Alex

  parent reply	other threads:[~2011-09-30 21:59 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-09 13:11 [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files Stuart Yoder
2011-09-09 13:16 ` Stuart Yoder
2011-09-19 15:16 ` Alex Williamson
2011-09-19 19:37   ` Scott Wood
2011-09-19 21:07     ` Alex Williamson
2011-09-19 21:15       ` Scott Wood
2011-09-26  7:51 ` David Gibson
2011-09-26 10:04   ` Alexander Graf
2011-09-26 18:34     ` Alex Williamson
2011-09-26 20:03       ` Stuart Yoder
2011-09-26 20:42         ` Alex Williamson
2011-09-26 23:59       ` Scott Wood
2011-09-27  0:45         ` Alex Williamson
2011-09-27 21:28           ` Scott Wood
2011-09-28  2:40             ` Alex Williamson
2011-09-28  8:58               ` Alexander Graf
2011-09-30  8:55                 ` David Gibson
2011-09-30  8:50         ` David Gibson
2011-09-30  8:46       ` David Gibson
2011-09-30 16:37         ` Alex Williamson
2011-09-30 21:59         ` Alex Williamson [this message]
2011-09-30  8:40     ` David Gibson
2011-09-26 19:57   ` Stuart Yoder
2011-09-27  0:25     ` Scott Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1317419970.2552.53.camel@bling.home \
    --to=alex.williamson@redhat.com \
    --cc=agraf@suse.de \
    --cc=avi@redhat.com \
    --cc=b08248@gmail.com \
    --cc=benh@au1.ibm.com \
    --cc=dwg@au1.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).