Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Scott Wood <scottwood@freescale.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Stuart Yoder <b08248@gmail.com>,
	Benjamin Herrenschmidt <benh@au.ibm.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Alexander Graf <agraf@suse.de>, "avi@redhat.com" <avi@redhat.com>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
Date: Tue, 27 Sep 2011 16:28:27 -0500	[thread overview]
Message-ID: <4E823FFB.1030508@freescale.com> (raw)
In-Reply-To: <1317084333.25092.138.camel@x201.home>

On 09/26/2011 07:45 PM, Alex Williamson wrote:
> On Mon, 2011-09-26 at 18:59 -0500, Scott Wood wrote:
>> On 09/26/2011 01:34 PM, Alex Williamson wrote:
>>> /* Reset the device */
>>> #define VFIO_DEVICE_RESET			_IO(, ,)
>>
>> What generic way do we have to do this?  We should probably have a way
>> to determine whether it's possible, without actually asking to do it.
> 
> It's not generic, it could be a VFIO_DEVICE_PCI_RESET or we could add a
> bit to the device flags to indicate if it's available or we could add a
> "probe" arg to the ioctl to either check for existence or do it.

Even with PCI, isn't this only possible if function-level reset is
supported?  I think we need a flag.

For devices that can't be reset by the kernel, we'll want the ability to
stop/start DMA acccess through the IOMMU (or other bus-specific means),
separate from whether the fd is open.  If a device is assigned to a
partition and that partition gets reset, we'll want to disable DMA
before we re-use the memory, and enable it after the partition has reset
or quiesced the device (which requires the fd to be open).

>>> /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */
>>> #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS	_IOW(, , int)
>>> #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS	_IOW(, , int)
>>>
>>> Hope that covers it.
>>
>> It could be done this way, but I predict that the code (both kernel and
>> user side) will be larger.  Maybe not much more complex, but more
>> boilerplate.
>>
>> How will you manage extensions to the interface?
> 
> I would assume we'd do something similar to the kvm capabilities checks.

This information is already built into the data-structure approach.

>> The table should not be particularly large, and you'll need to keep the
>> information around in some form regardless.  Maybe in the PCI case you
>> could produce it dynamically (though I probably wouldn't), but it really
>> wouldn't make sense in the device tree case.
> 
> It would be entirely dynamic for PCI, there's no advantage to caching
> it.  Even for device tree, if you can't fetch it dynamically, you'd have
> to duplicate it between an internal data structure and a buffer reading
> the table.

I don't think we'd need to keep the device tree path/index info around
for anything but the table -- but really, this is a minor consideration.

>> You also lose the ability to easily have a human look at the hexdump for
>> debugging; you'll need a special "lsvfio" tool.  You might want one
>> anyway to pretty-print the info, but with ioctls it's mandatory.
> 
> I don't think this alone justifies duplicating the data and making it
> difficult to parse on both ends.  Chances are we won't need such a tool
> for the ioctl interface because it's easier to get it right the first
> time ;)

It's not just useful for getting the code right, but for e.g. sanity
checking that the devices were bound properly.  I think such a tool
would be generally useful, no matter what the kernel interface ends up
being.  I don't just use lspci to debug the PCI subsystem. :-)

> Note that I'm not stuck on this interface, I was just thinking about how
> to generate the table last week, it seemed like a pain so I thought I'd
> spend a few minutes outlining an ioctl interface... turns out it's not
> so bad.  Thanks,

Yeah, it can work either way, as long as the information's there and
there's a way to add new bits of information, or new bus types, down the
road.  Mainly a matter of aesthetics between the two.

-Scott

WARNING: multiple messages have this Message-ID (diff)

From: Scott Wood <scottwood@freescale.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Benjamin Herrenschmidt <benh@au.ibm.com>,
	Stuart Yoder <b08248@gmail.com>, Alexander Graf <agraf@suse.de>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"avi@redhat.com" <avi@redhat.com>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
Date: Tue, 27 Sep 2011 16:28:27 -0500	[thread overview]
Message-ID: <4E823FFB.1030508@freescale.com> (raw)
In-Reply-To: <1317084333.25092.138.camel@x201.home>

On 09/26/2011 07:45 PM, Alex Williamson wrote:
> On Mon, 2011-09-26 at 18:59 -0500, Scott Wood wrote:
>> On 09/26/2011 01:34 PM, Alex Williamson wrote:
>>> /* Reset the device */
>>> #define VFIO_DEVICE_RESET			_IO(, ,)
>>
>> What generic way do we have to do this?  We should probably have a way
>> to determine whether it's possible, without actually asking to do it.
> 
> It's not generic, it could be a VFIO_DEVICE_PCI_RESET or we could add a
> bit to the device flags to indicate if it's available or we could add a
> "probe" arg to the ioctl to either check for existence or do it.

Even with PCI, isn't this only possible if function-level reset is
supported?  I think we need a flag.

For devices that can't be reset by the kernel, we'll want the ability to
stop/start DMA acccess through the IOMMU (or other bus-specific means),
separate from whether the fd is open.  If a device is assigned to a
partition and that partition gets reset, we'll want to disable DMA
before we re-use the memory, and enable it after the partition has reset
or quiesced the device (which requires the fd to be open).

>>> /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */
>>> #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS	_IOW(, , int)
>>> #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS	_IOW(, , int)
>>>
>>> Hope that covers it.
>>
>> It could be done this way, but I predict that the code (both kernel and
>> user side) will be larger.  Maybe not much more complex, but more
>> boilerplate.
>>
>> How will you manage extensions to the interface?
> 
> I would assume we'd do something similar to the kvm capabilities checks.

This information is already built into the data-structure approach.

>> The table should not be particularly large, and you'll need to keep the
>> information around in some form regardless.  Maybe in the PCI case you
>> could produce it dynamically (though I probably wouldn't), but it really
>> wouldn't make sense in the device tree case.
> 
> It would be entirely dynamic for PCI, there's no advantage to caching
> it.  Even for device tree, if you can't fetch it dynamically, you'd have
> to duplicate it between an internal data structure and a buffer reading
> the table.

I don't think we'd need to keep the device tree path/index info around
for anything but the table -- but really, this is a minor consideration.

>> You also lose the ability to easily have a human look at the hexdump for
>> debugging; you'll need a special "lsvfio" tool.  You might want one
>> anyway to pretty-print the info, but with ioctls it's mandatory.
> 
> I don't think this alone justifies duplicating the data and making it
> difficult to parse on both ends.  Chances are we won't need such a tool
> for the ioctl interface because it's easier to get it right the first
> time ;)

It's not just useful for getting the code right, but for e.g. sanity
checking that the devices were bound properly.  I think such a tool
would be generally useful, no matter what the kernel interface ends up
being.  I don't just use lspci to debug the PCI subsystem. :-)

> Note that I'm not stuck on this interface, I was just thinking about how
> to generate the table last week, it seemed like a pain so I thought I'd
> spend a few minutes outlining an ioctl interface... turns out it's not
> so bad.  Thanks,

Yeah, it can work either way, as long as the information's there and
there's a way to add new bits of information, or new bus types, down the
road.  Mainly a matter of aesthetics between the two.

-Scott

next prev parent reply	other threads:[~2011-09-27 21:29 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-09 13:11 RFC [v2]: vfio / device assignment -- layout of device fd files Stuart Yoder
2011-09-09 13:11 ` [Qemu-devel] " Stuart Yoder
2011-09-09 13:16 ` Stuart Yoder
2011-09-09 13:16   ` [Qemu-devel] " Stuart Yoder
2011-09-19 15:16 ` Alex Williamson
2011-09-19 15:16   ` [Qemu-devel] " Alex Williamson
2011-09-19 19:37   ` Scott Wood
2011-09-19 19:37     ` [Qemu-devel] " Scott Wood
2011-09-19 21:07     ` Alex Williamson
2011-09-19 21:07       ` [Qemu-devel] " Alex Williamson
2011-09-19 21:15       ` Scott Wood
2011-09-19 21:15         ` [Qemu-devel] " Scott Wood
2011-09-26  7:51 ` David Gibson
2011-09-26  7:51   ` David Gibson
2011-09-26 10:04   ` Alexander Graf
2011-09-26 10:04     ` Alexander Graf
2011-09-26 18:34     ` Alex Williamson
2011-09-26 18:34       ` Alex Williamson
2011-09-26 20:03       ` Stuart Yoder
2011-09-26 20:03         ` Stuart Yoder
2011-09-26 20:42         ` Alex Williamson
2011-09-26 20:42           ` Alex Williamson
2011-09-26 23:59       ` Scott Wood
2011-09-26 23:59         ` Scott Wood
2011-09-27  0:45         ` Alex Williamson
2011-09-27  0:45           ` Alex Williamson
2011-09-27 21:28           ` Scott Wood [this message]
2011-09-27 21:28             ` Scott Wood
2011-09-28  2:40             ` Alex Williamson
2011-09-28  2:40               ` [Qemu-devel] " Alex Williamson
2011-09-28  8:58               ` Alexander Graf
2011-09-28  8:58                 ` Alexander Graf
2011-09-30  8:55                 ` David Gibson
2011-09-30  8:55                   ` David Gibson
2011-09-30  8:50         ` David Gibson
2011-09-30  8:50           ` David Gibson
2011-09-30  8:46       ` David Gibson
2011-09-30  8:46         ` David Gibson
2011-09-30 16:37         ` Alex Williamson
2011-09-30 16:37           ` Alex Williamson
2011-09-30 21:59         ` Alex Williamson
2011-09-30 21:59           ` Alex Williamson
2011-10-06 23:18           ` Aaron Fabbri
2011-09-30  8:40     ` David Gibson
2011-09-30  8:40       ` David Gibson
2011-09-26 19:57   ` Stuart Yoder
2011-09-26 19:57     ` Stuart Yoder
2011-09-27  0:25     ` Scott Wood
2011-09-27  0:25       ` Scott Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E823FFB.1030508@freescale.com \
    --to=scottwood@freescale.com \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=avi@redhat.com \
    --cc=b08248@gmail.com \
    --cc=benh@au.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.