From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:44411) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R8fDA-0006uK-56 for qemu-devel@nongnu.org; Tue, 27 Sep 2011 17:29:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R8fD8-0002fX-UR for qemu-devel@nongnu.org; Tue, 27 Sep 2011 17:29:08 -0400 Received: from ch1ehsobe005.messaging.microsoft.com ([216.32.181.185]:59650 helo=ch1outboundpool.messaging.microsoft.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R8fD8-0002fS-Qw for qemu-devel@nongnu.org; Tue, 27 Sep 2011 17:29:06 -0400 Message-ID: <4E823FFB.1030508@freescale.com> Date: Tue, 27 Sep 2011 16:28:27 -0500 From: Scott Wood MIME-Version: 1.0 References: <20110926075144.GT12286@yookeroo.fritz.box> <3D54B89C-A0A3-4461-A7A1-3F1E4AB79296@suse.de> <1317062095.25515.75.camel@bling.home> <4E8111E5.4030209@freescale.com> <1317084333 In-Reply-To: <1317084333.25092.138.camel@x201.home> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: "kvm@vger.kernel.org" , Benjamin Herrenschmidt , Stuart Yoder , Alexander Graf , "qemu-devel@nongnu.org" , "avi@redhat.com" , David Gibson On 09/26/2011 07:45 PM, Alex Williamson wrote: > On Mon, 2011-09-26 at 18:59 -0500, Scott Wood wrote: >> On 09/26/2011 01:34 PM, Alex Williamson wrote: >>> /* Reset the device */ >>> #define VFIO_DEVICE_RESET _IO(, ,) >> >> What generic way do we have to do this? We should probably have a way >> to determine whether it's possible, without actually asking to do it. > > It's not generic, it could be a VFIO_DEVICE_PCI_RESET or we could add a > bit to the device flags to indicate if it's available or we could add a > "probe" arg to the ioctl to either check for existence or do it. Even with PCI, isn't this only possible if function-level reset is supported? I think we need a flag. For devices that can't be reset by the kernel, we'll want the ability to stop/start DMA acccess through the IOMMU (or other bus-specific means), separate from whether the fd is open. If a device is assigned to a partition and that partition gets reset, we'll want to disable DMA before we re-use the memory, and enable it after the partition has reset or quiesced the device (which requires the fd to be open). >>> /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */ >>> #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS _IOW(, , int) >>> #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS _IOW(, , int) >>> >>> Hope that covers it. >> >> It could be done this way, but I predict that the code (both kernel and >> user side) will be larger. Maybe not much more complex, but more >> boilerplate. >> >> How will you manage extensions to the interface? > > I would assume we'd do something similar to the kvm capabilities checks. This information is already built into the data-structure approach. >> The table should not be particularly large, and you'll need to keep the >> information around in some form regardless. Maybe in the PCI case you >> could produce it dynamically (though I probably wouldn't), but it really >> wouldn't make sense in the device tree case. > > It would be entirely dynamic for PCI, there's no advantage to caching > it. Even for device tree, if you can't fetch it dynamically, you'd have > to duplicate it between an internal data structure and a buffer reading > the table. I don't think we'd need to keep the device tree path/index info around for anything but the table -- but really, this is a minor consideration. >> You also lose the ability to easily have a human look at the hexdump for >> debugging; you'll need a special "lsvfio" tool. You might want one >> anyway to pretty-print the info, but with ioctls it's mandatory. > > I don't think this alone justifies duplicating the data and making it > difficult to parse on both ends. Chances are we won't need such a tool > for the ioctl interface because it's easier to get it right the first > time ;) It's not just useful for getting the code right, but for e.g. sanity checking that the devices were bound properly. I think such a tool would be generally useful, no matter what the kernel interface ends up being. I don't just use lspci to debug the PCI subsystem. :-) > Note that I'm not stuck on this interface, I was just thinking about how > to generate the table last week, it seemed like a pain so I thought I'd > spend a few minutes outlining an ioctl interface... turns out it's not > so bad. Thanks, Yeah, it can work either way, as long as the information's there and there's a way to add new bits of information, or new bus types, down the road. Mainly a matter of aesthetics between the two. -Scott