From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M2QQ9-0003T7-Kh for qemu-devel@nongnu.org; Fri, 08 May 2009 09:47:25 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M2QQ3-0003Rc-Cl for qemu-devel@nongnu.org; Fri, 08 May 2009 09:47:23 -0400 Received: from [199.232.76.173] (port=60352 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M2QQ2-0003RM-2e for qemu-devel@nongnu.org; Fri, 08 May 2009 09:47:18 -0400 Received: from yx-out-1718.google.com ([74.125.44.152]:1160) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M2QQ1-0004Wn-JF for qemu-devel@nongnu.org; Fri, 08 May 2009 09:47:17 -0400 Received: by yx-out-1718.google.com with SMTP id 6so738672yxn.82 for ; Fri, 08 May 2009 06:47:14 -0700 (PDT) Message-ID: <4A0437E0.7080800@codemonkey.ws> Date: Fri, 08 May 2009 08:47:12 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] New device API References: <200905051231.09759.paul@codesourcery.com> <4A0390DC.2010908@redhat.com> In-Reply-To: <4A0390DC.2010908@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Zachary Amsden Cc: Paul Brook , qemu-devel@nongnu.org Zachary Amsden wrote: > I think the general direction is good, and this is sorely needed, but I > think having a fixed / static device struct isn't flexible enough to > handle the complexities of multiple device / bus types - for example, > MSI-X could add tons of IRQ vectors to a device, or complex devices > could have tons of MMIO regions. > > I think a more flexible scheme would be to have a common device header, > and fixed substructures which are added as needed to a device. > > For example, showing exploded embedded types / initialized properties. > This isn't intended to be a realistic device, or even real C code. > > struct RocketController > { > struct Device { > struct DeviceType *base_class; > const char *instance_name; > DeviceProperty *configuration_data; > struct DeviceExtension *next; > } dev; > struct PCIExtension { > struct DeviceExtension header { > int type = DEV_EXT_PCI; > struct DeviceExtension *next; > } > PCIRegs regs; > PCIFunction *registered_functions; > struct PCIExtension *pci_next; > } pci; > struct MMIOExtension { > struct DeviceExtension header { > int type = DEV_EXT_MMIO; > struct DeviceExtension *next; > } > target_phys_addr_t addr; > target_phys_addr_t size; > mmio_mapfunc cb; > } mmio; > struct IRQExtension { > struct DeviceExtension header { > int type = DEV_EXT_IRQ; > struct DeviceExtension *next; > } > int irq; > int type = DEV_INTR_LEVEL | DEV_INTR_MSI; > } irqs [ROCKET_CONTROLLER_MSIX_IRQS+1]; > I think the problem with this is that it gives too much information about CPU constructs (MMIO/IRQs) to a device that is never connected to the actual CPU. PCI devices do not have a concept of "MMIO". They have a concept of IO regions. You can have different types of IO regions and the word sizes that are supported depend on the width of the PCI bus. Additionally, the rules about endianness of the data totally depend on the PCI controller, not the CPU itself. This is where the current API fails miserably. You cannot have a PCI device calling the CPU MMIO registration functions directly because there is no sane way to deal with endianness conversion. Instead, the IO region registration has to go through the PCI bus. Likewise, for IRQs, we should stick to the same principle. PCI exposes MSI and LNK interrupts to devices. We should have an API for devices to consume at the PCI level for that. As Paul said, I don't think it's worth trying to make the same devices work on top of multiple busses. I think it's asking for trouble. Instead, for devices that can connect to multiple busses (like ne2k), they can have separate register_pci_device() and register_isa_device() calls and then just internally abstract their chipset functions. Then it just requires a small big of ISA and PCI glue code to support both device types. > 1) Static per-device configuration choices, such as feature enable / > disable, framebuffer size, PCI vs. ISA bus representation. These are > fixed choices over the lifetime on the VM. In the above, they would be > represented as DeviceProperty configuration strings. > Yes. I think this is important too. But when we introduce this, we need to make sure the devices pre-register what strings they support and provide human consumable descriptions of what those knobs do. Basically, we should be able to auto extract a hardware documentation file from the device that describes in detail all of the supported knobs. > 2) Dynamic per-device data. Things such as PCI bus numbers, IRQs or > MMIO addresses. These are dynamic and change, but have fixed, defined > layouts. They may have to be stored in an intermediate representation > (a configuration file or wire protocol) for suspend / resume or > migration. In the above, it would be possible to write handlers for > suspend / resume that operate on a generic device representation, > knowing to call specific handlers for PCI, etc. Would certainly > mitigate a lot of bug potential in the dangerous intersection of ad-hoc > device growth and complex migration / suspend code. > For the most part, I think the device should be unaware of these things. It never needs to see it's devfn. It should preregister what lnks it supports and whether it supports MSI, but it should never know what IRQs that actually gets routed to. > 3) Implementation specific data. Pieces of data which are required for > a particular realization of a device on the host, such as file > descriptors, call back functions, authentication data, network routing. > This data is not visible at this layer, nor do I think it should be. > Since devices may move, and virtual machines may migrate, possibly > across widely differing platforms, there could be an entirely different > realization of a device (OSS vs. ALSA sound as a trivial example). You > may even want to dynamically change these on a running VM (SDL to VNC is > a good example). > > How to represent #3 is not entirely clear, nor is it clear how to parse > We really have three types of things and it's not entirely clear to me how to name them. What we're currently calling devices are emulated hardware. Additionally, we have host drivers that provide backend functionality. This would include things like SDL, VNC, the tap VLAN driver. We also need something like a host device. This would be the front-end functionality that connects to the host driver backend. A device registers its creation function in it's module_init(). This creation function will then register the fact that it's a PCI device and will basic information about itself in that registration. A PCI device can be instantiated via a device configuration file and when that happens, the device will create a host device for whatever functionality it supports. For a NIC, this would be a NetworkHostDevice or something like that. This NetworkHostDevice would have some link to the device itself (via an id, handle, whatever). A user can then create a NetworkHostDriver and attach the NetworkHostDevice to that driver and you then have a functional emulated NIC. Regards, Anthony Liguori