From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:56865) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R9ZPn-0005ek-OI for qemu-devel@nongnu.org; Fri, 30 Sep 2011 05:30:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R9ZPj-0002ow-4W for qemu-devel@nongnu.org; Fri, 30 Sep 2011 05:29:55 -0400 Received: from e23smtp06.au.ibm.com ([202.81.31.148]:51680) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R9ZPi-0002oS-F8 for qemu-devel@nongnu.org; Fri, 30 Sep 2011 05:29:51 -0400 Received: from d23relay04.au.ibm.com (d23relay04.au.ibm.com [202.81.31.246]) by e23smtp06.au.ibm.com (8.14.4/8.13.1) with ESMTP id p8U9STdu008596 for ; Fri, 30 Sep 2011 19:28:29 +1000 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p8U9RVLU1843286 for ; Fri, 30 Sep 2011 19:27:31 +1000 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p8U9TaL9017708 for ; Fri, 30 Sep 2011 19:29:37 +1000 Date: Fri, 30 Sep 2011 18:50:49 +1000 From: David Gibson Message-ID: <20110930085049.GG4512@yookeroo.fritz.box> References: <20110926075144.GT12286@yookeroo.fritz.box> <3D54B89C-A0A3-4461-A7A1-3F1E4AB79296@suse.de> <1317062095.25515.75.camel@bling.home> <4E8111E5.4030209@freescale.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E8111E5.4030209@freescale.com> Subject: Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Scott Wood Cc: Benjamin Herrenschmidt , "kvm@vger.kernel.org" , Stuart Yoder , "qemu-devel@nongnu.org" , Alexander Graf , Alex Williamson , "avi@redhat.com" On Mon, Sep 26, 2011 at 06:59:33PM -0500, Scott Wood wrote: > On 09/26/2011 01:34 PM, Alex Williamson wrote: > > The other obvious possibility is a pure ioctl interface. To match what > > this proposal is trying to describe, plus the runtime interfaces, we'd > > need something like: > > > > /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */ > > #define VFIO_DEVICE_GET_FLAGS _IOR(, , u64) > > > > > > /* Return number of mmio/iop/config regions. > > * For PCI this is always 8 (BAR0-5 + ROM + Config) */ > > #define VFIO_DEVICE_GET_NUM_REGIONS _IOR(, , int) > > How do you handle BARs that a particular device doesn't use? Zero-length? > > > /* Return the device tree path for type/index into the user > > * allocated buffer */ > > struct dtpath { > > u32 type; (0 = region, 1 = IRQ) > > u32 index; > > u32 buf_len; > > char *buf; > > }; > > #define VFIO_DEVICE_GET_DTPATH _IOWR(, , struct dtpath) > > So now the user needs to guess a buffer length in advance... and what > happens if it's too small? > > > /* Reset the device */ > > #define VFIO_DEVICE_RESET _IO(, ,) > > What generic way do we have to do this? We should probably have a way > to determine whether it's possible, without actually asking to do it. That's a good point. PCI devices have a standardized reset, but embedded devices often won't. Mind you we could just fail the call in that cse. > > /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */ > > #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS _IOW(, , int) > > #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS _IOW(, , int) > > > > Hope that covers it. > > It could be done this way, but I predict that the code (both kernel and > user side) will be larger. Maybe not much more complex, but more > boilerplate. > > How will you manage extensions to the interface? With the table it's > simple, you see a new (sub)record type and you either understand it or > you skip it. With ioctls you need to call every information-gathering > ioctl you know and care about (or are told is present via some feature > advertisement), and see if there's anything there. No.. quite the opposite. With ioctl()s you call the ones your userspace program cares about / can implement. When an extended interface is added, they keep working as is. Newer userspace which uses the new features will call the new ioctls() if it cares about them. > > Something I prefer about this interface is that > > everything can easily be generated on the fly, whereas reading out a > > table from the device means we really need to have that table somewhere > > in kernel memory to easily support reading random offsets. Thoughts? > > The table should not be particularly large, and you'll need to keep the > information around in some form regardless. Maybe in the PCI case you > could produce it dynamically (though I probably wouldn't), but it really > wouldn't make sense in the device tree case. > > You also lose the ability to easily have a human look at the hexdump for > debugging; you'll need a special "lsvfio" tool. You might want one > anyway to pretty-print the info, but with ioctls it's mandatory. > > -Scott > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson