From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M2FLV-0001wu-SA for qemu-devel@nongnu.org; Thu, 07 May 2009 21:57:53 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M2FLQ-0001p6-8J for qemu-devel@nongnu.org; Thu, 07 May 2009 21:57:52 -0400 Received: from [199.232.76.173] (port=60674 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M2FLQ-0001p3-2O for qemu-devel@nongnu.org; Thu, 07 May 2009 21:57:48 -0400 Received: from mx2.redhat.com ([66.187.237.31]:38135) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M2FLP-0006wW-DH for qemu-devel@nongnu.org; Thu, 07 May 2009 21:57:47 -0400 Message-ID: <4A0390DC.2010908@redhat.com> Date: Thu, 07 May 2009 15:54:36 -1000 From: Zachary Amsden MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] New device API References: <200905051231.09759.paul@codesourcery.com> In-Reply-To: <200905051231.09759.paul@codesourcery.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paul Brook Cc: qemu-devel@nongnu.org Paul Brook wrote: > The attached patch is my attempt at a new internal API for device creation in > qemu. > > The long term goal here is fully dynamic machine construction, i.e. a user > supplied configuration file/tree rather than the current hardcoded board > configs. > > As a first step towards this, I've implemented an API that allows devices to > be created and configured without requiring device specific knowledge. > > There are two sides to this API. The device side allows a device to > register/configure functionality (e.g. MMIO regions, IRQs, character devices, > etc). Most of the infrastructure for this is already present, we just need to > configure it consistently, rather than on an ad-hoc basis. I expect this API > to remain largely unchanged[1]. > > The board side allows creation on configuration of devices. In the medium term > I expect this to go away, and be replaced by data driven configuration. > > I also expect that this device abstraction will let itself to a system bus > model to solve many of the IOMMU/DMA/address mapping issues that qemu > currently can't handle. > > There are still a few rough edges. Busses aren't handled in a particularly > consistent way, PCI isn't particularly well integrated, and the > implementation of things like qdev_get_chardev is an ugly hack mapping onto > current commandline options. However I believe the device side API is fairly > solid, and is a necessary prerequisite for fixing the bigger configuration > problem. I think the general direction is good, and this is sorely needed, but I think having a fixed / static device struct isn't flexible enough to handle the complexities of multiple device / bus types - for example, MSI-X could add tons of IRQ vectors to a device, or complex devices could have tons of MMIO regions. I think a more flexible scheme would be to have a common device header, and fixed substructures which are added as needed to a device. For example, showing exploded embedded types / initialized properties. This isn't intended to be a realistic device, or even real C code. struct RocketController { struct Device { struct DeviceType *base_class; const char *instance_name; DeviceProperty *configuration_data; struct DeviceExtension *next; } dev; struct PCIExtension { struct DeviceExtension header { int type = DEV_EXT_PCI; struct DeviceExtension *next; } PCIRegs regs; PCIFunction *registered_functions; struct PCIExtension *pci_next; } pci; struct MMIOExtension { struct DeviceExtension header { int type = DEV_EXT_MMIO; struct DeviceExtension *next; } target_phys_addr_t addr; target_phys_addr_t size; mmio_mapfunc cb; } mmio; struct IRQExtension { struct DeviceExtension header { int type = DEV_EXT_IRQ; struct DeviceExtension *next; } int irq; int type = DEV_INTR_LEVEL | DEV_INTR_MSI; } irqs [ROCKET_CONTROLLER_MSIX_IRQS+1]; struct MSIXExtension { struct DeviceExtension header { int type = DEV_EXT_MSIX; struct DeviceExtension *next; } struct MMIOExtension *table_bar; target_phys_addr_t table_offset; struct MMIOExtension *pba_bar; target_phys_addr_t pba_offset; } msix; struct RocketFuelControllerStatus { struct DeviceExtension header { int type = DEV_EXT_PRIVATE; struct DeviceExtension *next; suspend_callback_fn *foo; resume_callback_fn *goo; } ... } private; } Now it's possible to define cool macros to walk an arbitrary device structure (as the extensions are linked by the next field): #define dev_for_each_ext(dev,ext) \ for (ext = dev->next; ext; ext = ext->next) #define dev_for_each_irq(dev,ext) \ dev_for_each_ext(dev,ext) if (ext->type == DEV_EXT_IRQ) inline Bool dev_supports_pci(struct Device *dev) { struct DeviceExtension *ext; dev_for_each_ext(dev,ext) if (ext->type == DEV_EXT_PCI) return TRUE; return FALSE; } You could also use offsets instead of a next pointer (might allow more space efficient packing of a backpointer to the device struct proper). Device configuration would then be something like qdev_add_pci(dev, &rocket->pci); for (i = 0; i < NUM_ROCKET_IRQS; i++) qdev_add_intr(dev, &rocket->intr[i]); qdev_add_msix(dev, &rocket->msix); qdev_add_private_data(dev, &rocket->private, rocket_launch, rocket_land); And of course you can skip regions that aren't configured (MSI-X, might for example have a configuration variable to disable). The config issue is certainly a very important problem to address. I think it should be noted there are three types of configuration data, all of which are ontologically different. There are: 1) Static per-device configuration choices, such as feature enable / disable, framebuffer size, PCI vs. ISA bus representation. These are fixed choices over the lifetime on the VM. In the above, they would be represented as DeviceProperty configuration strings. 2) Dynamic per-device data. Things such as PCI bus numbers, IRQs or MMIO addresses. These are dynamic and change, but have fixed, defined layouts. They may have to be stored in an intermediate representation (a configuration file or wire protocol) for suspend / resume or migration. In the above, it would be possible to write handlers for suspend / resume that operate on a generic device representation, knowing to call specific handlers for PCI, etc. Would certainly mitigate a lot of bug potential in the dangerous intersection of ad-hoc device growth and complex migration / suspend code. 3) Implementation specific data. Pieces of data which are required for a particular realization of a device on the host, such as file descriptors, call back functions, authentication data, network routing. This data is not visible at this layer, nor do I think it should be. Since devices may move, and virtual machines may migrate, possibly across widely differing platforms, there could be an entirely different realization of a device (OSS vs. ALSA sound as a trivial example). You may even want to dynamically change these on a running VM (SDL to VNC is a good example). How to represent #3 is not entirely clear, nor is it clear how to parse the arguments and data required, as it is ad-hoc and necessarily implementation dependent. The dynamic changing could be done via monitor commands or something like it. It should be entirely possible to switch my SVGA backend from SDL local rendering to using VNC at run-time, for example, as long as I supply any additional configuration data needed for VNC. I think any attempt to broach the config problem has to realize these distinctions in whatever namespace it uses if it is to be successful. How to represent this of course will require more discussion. Zach