* Re: A set of "standard" virtual devices? [not found] <4611652F.700@zytor.com> @ 2007-04-02 20:56 ` Jeremy Fitzhardinge 2007-04-02 21:12 ` Andi Kleen 0 siblings, 1 reply; 36+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-02 20:56 UTC (permalink / raw) To: H. Peter Anvin Cc: Linux Kernel Mailing List, mathiasen, Virtualization Mailing List H. Peter Anvin wrote: > On the subject of virtualization; there are a number of devices which > keep being invented and reinvented by just about every virtualization > vendor for no really good reason. > > I personally recently pointed out that a proper virtualization > solution should handle entropy collection at the lowest level (where > the physical hardware drivers are) and present a hw_rng interface to > the guests. Unfortunately, none of the hardware-based hw_rng > interfaces is sane enough to do that with, which calls for a virtual > driver. > > It would be nice if there was one, and not a dozen, such drivers. > > I would therefore like to propose that the Linux Foundation register a > PCI ID for use by LANANA ($3000/year), and we set up a LANANA registry > for these device IDs, together with a description of the device > interface each of them expect. Similarly, a Subsystem ID registry can > be used (for virtualization vendors which don't have their own VID > already) to distinguish different implementations. > > Obviously, anyone who adheres to the published interface can use one > of these VID:DIDs -- as far as I'm concerned, even hardware vendors; > we'll use the SID to distinguish between implementations. How would that work in the case where virtualized guests don't have a visible PCI bus, and the virtual environment doesn't pretend to emulate a PCI bus? J ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-02 20:56 ` A set of "standard" virtual devices? Jeremy Fitzhardinge @ 2007-04-02 21:12 ` Andi Kleen 2007-04-02 21:33 ` Jeff Garzik 2007-04-03 8:29 ` Christian Borntraeger 0 siblings, 2 replies; 36+ messages in thread From: Andi Kleen @ 2007-04-02 21:12 UTC (permalink / raw) To: virtualization Cc: Jeremy Fitzhardinge, H. Peter Anvin, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen > How would that work in the case where virtualized guests don't have a > visible PCI bus, and the virtual environment doesn't pretend to emulate > a PCI bus? If they emulated one with the appropiate device then distribution driver auto probing would just work transparently for them. -Andi ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-02 21:12 ` Andi Kleen @ 2007-04-02 21:33 ` Jeff Garzik 2007-04-02 21:36 ` Andi Kleen 2007-04-03 8:29 ` Christian Borntraeger 1 sibling, 1 reply; 36+ messages in thread From: Jeff Garzik @ 2007-04-02 21:33 UTC (permalink / raw) To: Andi Kleen Cc: Virtualization Mailing List, H. Peter Anvin, Linux Kernel Mailing List, mathiasen, virtualization Andi Kleen wrote: >> How would that work in the case where virtualized guests don't have a >> visible PCI bus, and the virtual environment doesn't pretend to emulate >> a PCI bus? > > If they emulated one with the appropiate device > then distribution driver auto probing would just work transparently for them. Yes, but, ideally with paravirtualization you should be able to avoid the overhead of emulating many major classes of device (storage, network, RNG, etc.) by developing a low-overhead passthrough interface that does not involve PCI at all. Jeff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-02 21:33 ` Jeff Garzik @ 2007-04-02 21:36 ` Andi Kleen 2007-04-02 21:42 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 36+ messages in thread From: Andi Kleen @ 2007-04-02 21:36 UTC (permalink / raw) To: Jeff Garzik Cc: Virtualization Mailing List, H. Peter Anvin, Linux Kernel Mailing List, mathiasen, virtualization On Monday 02 April 2007 23:33:01 Jeff Garzik wrote: > Andi Kleen wrote: > >> How would that work in the case where virtualized guests don't have a > >> visible PCI bus, and the virtual environment doesn't pretend to emulate > >> a PCI bus? > > > > If they emulated one with the appropiate device > > then distribution driver auto probing would just work transparently for them. > > Yes, but, ideally with paravirtualization you should be able to avoid > the overhead of emulating many major classes of device (storage, > network, RNG, etc.) by developing a low-overhead passthrough interface > that does not involve PCI at all. The implementation wouldn't need to use PCI at all. There wouldn't even need to be PCI like registers internally. Just a pci device with an ID somewhere in sysfs. PCI with unique IDs is just a convenient and well established key into the driver module collection. Once you have the right driver it can do what it wants. -Andi ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-02 21:36 ` Andi Kleen @ 2007-04-02 21:42 ` Jeremy Fitzhardinge 2007-04-02 21:53 ` Anthony Liguori 2007-04-02 22:10 ` H. Peter Anvin 0 siblings, 2 replies; 36+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-02 21:42 UTC (permalink / raw) To: Andi Kleen Cc: Virtualization Mailing List, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List, mathiasen, virtualization Andi Kleen wrote: > The implementation wouldn't need to use PCI at all. There wouldn't > even need to be PCI like registers internally. Just a pci device > with an ID somewhere in sysfs. PCI with unique IDs > is just a convenient and well established key into the driver module > collection. Once you have the right driver it can do what it wants. But I understood hpa's suggestion to mean that there would be a standard PCI interface for a hardware RNG, and a single linux driver for that device, which all hypervisors would be expected to implement. But that's only reasonable if the virtualization environment has some notion of PCI to expose to the Linux guest. J ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-02 21:42 ` Jeremy Fitzhardinge @ 2007-04-02 21:53 ` Anthony Liguori 2007-04-02 22:04 ` Jeremy Fitzhardinge 2007-04-02 22:10 ` H. Peter Anvin 1 sibling, 1 reply; 36+ messages in thread From: Anthony Liguori @ 2007-04-02 21:53 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Virtualization Mailing List, Jeff Garzik, H. Peter Anvin, virtualization, Linux Kernel Mailing List, mathiasen Jeremy Fitzhardinge wrote: > Andi Kleen wrote: >> The implementation wouldn't need to use PCI at all. There wouldn't >> even need to be PCI like registers internally. Just a pci device >> with an ID somewhere in sysfs. PCI with unique IDs >> is just a convenient and well established key into the driver module >> collection. Once you have the right driver it can do what it wants. > > But I understood hpa's suggestion to mean that there would be a standard > PCI interface for a hardware RNG, and a single linux driver for that > device, which all hypervisors would be expected to implement. But > that's only reasonable if the virtualization environment has some notion > of PCI to expose to the Linux guest. The actual PCI bus could paravirtualized. It's just a question of whether one reinvents a device discovery mechanism (like XenBus) or whether one piggy backs on existing mechanisms. Furthermore, in the future, I strongly suspect that HVM will become much more important for Xen than PV and since that already has a PCI bus it's not really that big of a deal. Regards, Anthony Liguori > J ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-02 21:53 ` Anthony Liguori @ 2007-04-02 22:04 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 36+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-02 22:04 UTC (permalink / raw) To: Anthony Liguori Cc: Andi Kleen, Virtualization Mailing List, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List, mathiasen, virtualization Anthony Liguori wrote: > The actual PCI bus could paravirtualized. It's just a question of > whether one reinvents a device discovery mechanism (like XenBus) or > whether one piggy backs on existing mechanisms. > > Furthermore, in the future, I strongly suspect that HVM will become > much more important for Xen than PV and since that already has a PCI > bus it's not really that big of a deal. Well, obviously it keeps things simple for me to not worry about PCI support in Xen at this point. But I was thinking more of lguest; I think PCI emulation would kill puppies. J ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-02 21:42 ` Jeremy Fitzhardinge 2007-04-02 21:53 ` Anthony Liguori @ 2007-04-02 22:10 ` H. Peter Anvin 2007-04-02 22:25 ` Jeff Garzik 2007-04-03 9:41 ` Arnd Bergmann 1 sibling, 2 replies; 36+ messages in thread From: H. Peter Anvin @ 2007-04-02 22:10 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Virtualization Mailing List, Jeff Garzik, virtualization, Linux Kernel Mailing List, mathiasen Jeremy Fitzhardinge wrote: > Andi Kleen wrote: >> The implementation wouldn't need to use PCI at all. There wouldn't >> even need to be PCI like registers internally. Just a pci device >> with an ID somewhere in sysfs. PCI with unique IDs >> is just a convenient and well established key into the driver module >> collection. Once you have the right driver it can do what it wants. > > But I understood hpa's suggestion to mean that there would be a standard > PCI interface for a hardware RNG, and a single linux driver for that > device, which all hypervisors would be expected to implement. But > that's only reasonable if the virtualization environment has some notion > of PCI to expose to the Linux guest. > That is, of course, true, although "some notion of" is very broad, and one could also use this for detection and some hypervisor-specific communication for the actual I/O. However, one probably wants to think about what the heck one actually means with "virtualization" in the absence of a lot of this stuff. PCI is probably the closest thing we have to a lowest common denominator for device detection. -hpa ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-02 22:10 ` H. Peter Anvin @ 2007-04-02 22:25 ` Jeff Garzik 2007-04-02 22:30 ` H. Peter Anvin 2007-04-03 9:41 ` Arnd Bergmann 1 sibling, 1 reply; 36+ messages in thread From: Jeff Garzik @ 2007-04-02 22:25 UTC (permalink / raw) To: H. Peter Anvin Cc: Virtualization Mailing List, virtualization, mathiasen, Linux Kernel Mailing List H. Peter Anvin wrote: > However, one probably wants to think about what the heck one actually > means with "virtualization" in the absence of a lot of this stuff. PCI > is probably the closest thing we have to a lowest common denominator for > device detection. Sure, but let's look beyond device detection. For instance, it does not necessarily follow that emulating PCI DMA is the best way to go for communication with a virtual device, once detected. Outside of pci_device_id driver matching, is there much value here? Jeff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-02 22:25 ` Jeff Garzik @ 2007-04-02 22:30 ` H. Peter Anvin 0 siblings, 0 replies; 36+ messages in thread From: H. Peter Anvin @ 2007-04-02 22:30 UTC (permalink / raw) To: Jeff Garzik Cc: Virtualization Mailing List, virtualization, mathiasen, Linux Kernel Mailing List Jeff Garzik wrote: > > Sure, but let's look beyond device detection. For instance, it does not > necessarily follow that emulating PCI DMA is the best way to go for > communication with a virtual device, once detected. > This is true, of course. However, there are going to be a set of virtual devices which don't necessarily have to have super-high performance. In the case of a hwrng device, even doing DMA is probably overkill. > Outside of pci_device_id driver matching, is there much value here? If we can get a set of device drivers that if not all then at least a number of hypervisors and/or emulators can agree upon, I think that's much won. -hpa ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-02 22:10 ` H. Peter Anvin 2007-04-02 22:25 ` Jeff Garzik @ 2007-04-03 9:41 ` Arnd Bergmann 2007-04-03 10:41 ` Cornelia Huck 1 sibling, 1 reply; 36+ messages in thread From: Arnd Bergmann @ 2007-04-03 9:41 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik, virtualization, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tuesday 03 April 2007, H. Peter Anvin wrote: > However, one probably wants to think about what the heck one actually > means with "virtualization" in the absence of a lot of this stuff. PCI > is probably the closest thing we have to a lowest common denominator for > device detection. I think that's true outside of s390, but a standardized virtual device interface should be able to work there as well. Interestingly, the s390 channel I/O also uses two 16 bit numbers to identify a device (type and model), just like PCI or USB, so in that light, we might be able to use the same number space for something entirely different depending on the virtual bus. Arnd <>< ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 9:41 ` Arnd Bergmann @ 2007-04-03 10:41 ` Cornelia Huck 2007-04-03 12:15 ` Arnd Bergmann 0 siblings, 1 reply; 36+ messages in thread From: Cornelia Huck @ 2007-04-03 10:41 UTC (permalink / raw) To: Arnd Bergmann Cc: H. Peter Anvin, Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik, virtualization, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tue, 3 Apr 2007 11:41:49 +0200, Arnd Bergmann <arnd@arndb.de> wrote: > On Tuesday 03 April 2007, H. Peter Anvin wrote: > > However, one probably wants to think about what the heck one actually > > means with "virtualization" in the absence of a lot of this stuff. PCI > > is probably the closest thing we have to a lowest common denominator for > > device detection. > > I think that's true outside of s390, but a standardized virtual device > interface should be able to work there as well. Interestingly, the > s390 channel I/O also uses two 16 bit numbers to identify a device > (type and model), just like PCI or USB, so in that light, we might > be able to use the same number space for something entirely different > depending on the virtual bus. Even if we used those ids for cu_type and dev_type, it would still be ugly IMO. It would be much cleaner to just define a very simple, easy to implement virtual bus without dragging implementation details for other types of devices around. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 10:41 ` Cornelia Huck @ 2007-04-03 12:15 ` Arnd Bergmann 2007-04-03 13:39 ` Cornelia Huck 0 siblings, 1 reply; 36+ messages in thread From: Arnd Bergmann @ 2007-04-03 12:15 UTC (permalink / raw) To: Cornelia Huck Cc: H. Peter Anvin, Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik, virtualization, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tuesday 03 April 2007, Cornelia Huck wrote: > > > > I think that's true outside of s390, but a standardized virtual device > > interface should be able to work there as well. Interestingly, the > > s390 channel I/O also uses two 16 bit numbers to identify a device > > (type and model), just like PCI or USB, so in that light, we might > > be able to use the same number space for something entirely different > > depending on the virtual bus. > > Even if we used those ids for cu_type and dev_type, it would still be > ugly IMO. It would be much cleaner to just define a very simple, easy > to implement virtual bus without dragging implementation details for > other types of devices around. Right, but an interesting point is the question what to do when running another operating system as a guest under Linux, e.g. with kvm. Ideally, you'd want to use the same interface to announce the presence of the device, which can be done far more easily with PCI than using a new bus type that you'd need to implement for every OS, instead of just implementing the virtual PCI driver. Using a 16 bit number to identify a specific interface sounds like a good idea to me, if only for the reason that it is a widely used approach. The alternative would be to use an ascii string, like we have for open-firmware devices on powerpc or sparc. I think in either way, we need to abstract the driver for the virtual device from the underlying bus infrastructure, which is hypervisor and/or platform dependent. The abstraction could work roughly like this: ========== virt_dev.h ========== struct virt_driver { /* platform independent */ struct device_driver drv; struct pci_device_id *ids; /* not necessarily PCI */ }; struct virt_bus { /* platform dependent */ long (*transfer)(struct virt_dev *dev, void *buffer, unsigned long size, int type); }; struct virt_dev { struct device dev; struct virt_driver *driver; struct virt_bus *bus; struct pci_device_id id; int irq; }; ============== virt_example.c ============== static ssize_t virt_pipe_read(struct file *filp, char __user *buffer, size_t len, loff_t *off) { struct virt_dev *dev = filp->private_data; ssize_t ret = dev->bus->transfer(dev, buffer, len, READ); *off += ret; return ret; } static struct file_operations virt_pipe_fops = { .open = nonseekable_open, .read = virt_pipe_read, }; static int virt_pipe_probe(struct device *dev) { struct virt_dev *vdev = to_virt_dev(dev); struct miscdev *mdev = kmalloc(sizeof(*dev), GFP_KERNEL); mdev->name = "virt_pipe"; mdev->fops = &virt_pipe_fops; mdev->parent = dev; return register_miscdev(mdev); } static struct pci_device_id virt_pipe_id = { .vendor = PCI_VENDOR_LINUX, .device = 0x3456, }; MODULE_DEVICE_TABLE(pci, virt_pipe_id); static struct virt_driver virt_pipe_driver = { .drv = { .name = "virt_pipe", .probe = virt_pipe_probe, }, .ids = &virt_pipe_id, } static int virt_pipe_init(void) { return virt_driver_register(&virt_pipe_driver); } module_init(virt_pipe_init); ============== virt_devtree.c ============== static long virt_devtree_transfer(struct virt_dev *dev, void *buffer, unsigned long size, int type) { long reg; switch type { case READ: ret = hcall(HV_READ, dev->dev.platform_data, buffer, size); break; case WRITE: ret = hcall(HV_WRITE, dev->dev.platform_data, buffer, size); break; default: BUG(); } return ret; } static struct virt_bus virt_devtree_bus = { .transfer = virt_devtree_transfer, }; static int virt_devtree_probe(struct of_device *ofdev, struct of_device_id *match) { struct virt_dev *vdev = kzalloc(sizeof(*vdev); vdev->bus = &virt_devtree_bus; vdev->dev.parent = &ofdev->dev; vdev.id.vendor = PCI_VENDOR_LINUX; vdev.id.device = *of_get_property(ofdev, "virt_dev_id"), vdev.irq = of_irq_parse_and_map(ofdev, 0); return device_register(&vdev->dev); } struct of_device_id virt_devtree_ids = { .compatible = "virt-dev", }; static struct of_platform_driver virt_devtree_driver = { .probe = virt_devtree_probe, .match_table = &virt_devtree_ids, }; ============== virt_pci.c ============== static long virt_pci_transfer(struct virt_dev *dev, void *buffer, unsigned long size, int type) { struct virt_pci_regs __iomem *regs = dev->dev.platform_data; switch type { case READ: mmio_insb(regs->read_port, buffer, size); break; case WRITE: mmio_outsb(regs->write_port, buffer, size); break; default: BUG(); } return size; } static struct virt_bus virt_pci_bus = { .transfer = virt_pci_transfer, }; static int virt_pci_probe(struct pci_dev *pdev, struct pci_device_id *match) { struct virt_dev *vdev = kzalloc(sizeof(*vdev); vdev->bus = &virt_pci_bus; vdev->dev.parent = &pdev->dev; vdev.id = *match; vdev.irq = pdev->irq; return device_register(&vdev->dev); } struct pci_device_id virt_pci_ids = { .compatible = "virt-dev", }; static struct of_platform_driver virt_pci_driver = { .probe = virt_pci_probe, .match_table = &virt_pci_ids, }; Arnd <>< ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 12:15 ` Arnd Bergmann @ 2007-04-03 13:39 ` Cornelia Huck 2007-04-03 14:03 ` Arnd Bergmann 0 siblings, 1 reply; 36+ messages in thread From: Cornelia Huck @ 2007-04-03 13:39 UTC (permalink / raw) To: Arnd Bergmann Cc: H. Peter Anvin, Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik, virtualization, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tue, 3 Apr 2007 14:15:37 +0200, Arnd Bergmann <arnd@arndb.de> wrote: > Right, but an interesting point is the question what to do when running > another operating system as a guest under Linux, e.g. with kvm. > > Ideally, you'd want to use the same interface to announce the presence > of the device, which can be done far more easily with PCI than using > a new bus type that you'd need to implement for every OS, instead of > just implementing the virtual PCI driver. That's OK for a virtualized architecture where the base architecture already supports PCI. But a traditional s390 OS would be as unhappy with a PCI device as with a device of a completely new type :) There are several options for virtualized devices (and I don't know why they shouldn't coexist): 1. Emulate a well-known device (like a e1000 network card on PCI or a model 3390 dasd on CCW). Existing operating systems can just use them, but it's a lot of work in the hypervisor. 2. Create a virtual PCI device (or a virtual CCW device) with a new id. Operating systems would need to write a new device driver, but they can use a familiar infrastructure. That seems to be what most people are talking about here. 3. Create a new bus which uses a new access method. This new method can be made very simple, but requires support from the guest operating system. That's what I was talking about :) [Note: I'm not actually advocating an emulated ccw driver. There be dragons.] > Using a 16 bit number to identify a specific interface sounds like > a good idea to me, if only for the reason that it is a widely used > approach. The alternative would be to use an ascii string, like we > have for open-firmware devices on powerpc or sparc. OK, we could use common identifiers (and reserve it) for case 2 across several busses. Like #define PCI_VIRT_ID GENERIC_VIRT_ID #define CCW_VIRT_DEVTYPE GENERIC_VIRT_ID > I think in either way, we need to abstract the driver for the virtual > device from the underlying bus infrastructure, which is hypervisor > and/or platform dependent. Yes, that sounds sane for case 3. We should just standardize the interface. > The abstraction could work roughly like this: > > > ========== > virt_dev.h > ========== > struct virt_driver { /* platform independent */ > struct device_driver drv; > struct pci_device_id *ids; /* not necessarily PCI */ > }; > struct virt_bus { > /* platform dependent */ > long (*transfer)(struct virt_dev *dev, void *buffer, > unsigned long size, int type); > }; Should this embed a struct bus_type? Or reference a generic_virt_bus? > struct virt_dev { > struct device dev; > struct virt_driver *driver; > struct virt_bus *bus; > struct pci_device_id id; > int irq; > }; And that's where I have problems :) The notion of "irq" is far too platform specific. I can bend my mind round using PCI-like ids for non-PCI virtualized devices, but an integer is far too small and to specific for a way to access the device. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 13:39 ` Cornelia Huck @ 2007-04-03 14:03 ` Arnd Bergmann 2007-04-03 16:07 ` Cornelia Huck 0 siblings, 1 reply; 36+ messages in thread From: Arnd Bergmann @ 2007-04-03 14:03 UTC (permalink / raw) To: Cornelia Huck Cc: H. Peter Anvin, Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik, virtualization, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tuesday 03 April 2007, Cornelia Huck wrote: > On Tue, 3 Apr 2007 14:15:37 +0200, Arnd Bergmann <arnd@arndb.de> wrote: > > That's OK for a virtualized architecture where the base architecture > already supports PCI. But a traditional s390 OS would be as unhappy > with a PCI device as with a device of a completely new type :) Sure, that was my point from the start. > There are several options for virtualized devices (and I don't know why > they shouldn't coexist): > > 1. Emulate a well-known device (like a e1000 network card on PCI or a > model 3390 dasd on CCW). Existing operating systems can just use them, > but it's a lot of work in the hypervisor. Most hypervisors already do this, and it's an unrelated topic. What we're trying to achieve is to make sure not every hypervisor and simulator has to introduce its own set of drivers. > > struct virt_bus { > > /* platform dependent */ > > long (*transfer)(struct virt_dev *dev, void *buffer, > > unsigned long size, int type); > > }; > > Should this embed a struct bus_type? Or reference a generic_virt_bus? yes, that should embed the bus_type. > > struct virt_dev { > > struct device dev; > > struct virt_driver *driver; > > struct virt_bus *bus; > > struct pci_device_id id; > > int irq; > > }; > > And that's where I have problems :) The notion of "irq" is far too > platform specific. I can bend my mind round using PCI-like ids for > non-PCI virtualized devices, but an integer is far too small and to > specific for a way to access the device. Sorry, I've been working too long on the lesser architectures. IRQ number are evil indeed. However, I'm pretty sure that we need _some_ abstraction of an interrupt mechanism here. The easiest way is probably to have a callback function like int (*irq_handler)(struct virt_dev*, unsigned long message); in the virt_dev. Arnd <>< ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 14:03 ` Arnd Bergmann @ 2007-04-03 16:07 ` Cornelia Huck 0 siblings, 0 replies; 36+ messages in thread From: Cornelia Huck @ 2007-04-03 16:07 UTC (permalink / raw) To: Arnd Bergmann Cc: H. Peter Anvin, Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik, virtualization, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tue, 3 Apr 2007 16:03:14 +0200, Arnd Bergmann <arnd@arndb.de> wrote: > > > struct virt_dev { > > > struct device dev; > > > struct virt_driver *driver; > > > struct virt_bus *bus; > > > struct pci_device_id id; > > > int irq; > > > }; > > > > And that's where I have problems :) The notion of "irq" is far too > > platform specific. I can bend my mind round using PCI-like ids for > > non-PCI virtualized devices, but an integer is far too small and to > > specific for a way to access the device. > > Sorry, I've been working too long on the lesser architectures. > IRQ number are evil indeed. > However, I'm pretty sure that we need _some_ abstraction of an > interrupt mechanism here. The easiest way is probably to have a > callback function like > int (*irq_handler)(struct virt_dev*, unsigned long message); > in the virt_dev. Yes, something like int (*handler) (struct virt_dev *, struct virt_interrupt_info *); should cover the needed cases. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-02 21:12 ` Andi Kleen 2007-04-02 21:33 ` Jeff Garzik @ 2007-04-03 8:29 ` Christian Borntraeger 2007-04-03 8:30 ` Andi Kleen 1 sibling, 1 reply; 36+ messages in thread From: Christian Borntraeger @ 2007-04-03 8:29 UTC (permalink / raw) To: Andi Kleen Cc: virtualization, Jeremy Fitzhardinge, H. Peter Anvin, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Monday 02 April 2007 23:12, Andi Kleen wrote: > > > How would that work in the case where virtualized guests don't have a > > visible PCI bus, and the virtual environment doesn't pretend to emulate > > a PCI bus? > > If they emulated one with the appropiate device > then distribution driver auto probing would just work transparently for > them. Still, that would only make sense for virtualized platforms that usually have a PCI bus. Thinking about seeing a PCI device on ,lets say, s390 is strange. Christian ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 8:29 ` Christian Borntraeger @ 2007-04-03 8:30 ` Andi Kleen 2007-04-03 9:17 ` Cornelia Huck 0 siblings, 1 reply; 36+ messages in thread From: Andi Kleen @ 2007-04-03 8:30 UTC (permalink / raw) To: Christian Borntraeger Cc: virtualization, Jeremy Fitzhardinge, H. Peter Anvin, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tuesday 03 April 2007 10:29:06 Christian Borntraeger wrote: > On Monday 02 April 2007 23:12, Andi Kleen wrote: > > > > > How would that work in the case where virtualized guests don't have a > > > visible PCI bus, and the virtual environment doesn't pretend to emulate > > > a PCI bus? > > > > If they emulated one with the appropiate device > > then distribution driver auto probing would just work transparently for > > them. > > Still, that would only make sense for virtualized platforms that usually have > a PCI bus. Thinking about seeing a PCI device on ,lets say, s390 is strange. If it gets the job done surely you can tolerate a little strangeness? -Andi ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 8:30 ` Andi Kleen @ 2007-04-03 9:17 ` Cornelia Huck 2007-04-03 9:26 ` Andi Kleen 2007-04-03 17:50 ` Arnd Bergmann 0 siblings, 2 replies; 36+ messages in thread From: Cornelia Huck @ 2007-04-03 9:17 UTC (permalink / raw) To: Andi Kleen Cc: Christian Borntraeger, virtualization, Jeremy Fitzhardinge, H. Peter Anvin, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tue, 3 Apr 2007 10:30:36 +0200, Andi Kleen <ak@suse.de> wrote: > On Tuesday 03 April 2007 10:29:06 Christian Borntraeger wrote: > > On Monday 02 April 2007 23:12, Andi Kleen wrote: > > > > > > > How would that work in the case where virtualized guests don't have a > > > > visible PCI bus, and the virtual environment doesn't pretend to emulate > > > > a PCI bus? > > > > > > If they emulated one with the appropiate device > > > then distribution driver auto probing would just work transparently for > > > them. > > > > Still, that would only make sense for virtualized platforms that usually have > > a PCI bus. Thinking about seeing a PCI device on ,lets say, s390 is strange. > > If it gets the job done surely you can tolerate a little strangeness? On s390, it would be more than strangeness. There's no implementation of PCI at all, someone would have to cook it up - and it wouldn't have any use beyond those special devices. Since there isn't any bus type that is available on *all* architectures, a generic "virtual" bus with very simple probing seems much saner... ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 9:17 ` Cornelia Huck @ 2007-04-03 9:26 ` Andi Kleen 2007-04-03 10:51 ` Cornelia Huck 2007-04-03 15:00 ` Adrian Bunk 2007-04-03 17:50 ` Arnd Bergmann 1 sibling, 2 replies; 36+ messages in thread From: Andi Kleen @ 2007-04-03 9:26 UTC (permalink / raw) To: Cornelia Huck Cc: Christian Borntraeger, virtualization, Jeremy Fitzhardinge, H. Peter Anvin, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen > > On s390, it would be more than strangeness. There's no implementation > of PCI at all, someone would have to cook it up - and it wouldn't have > any use beyond those special devices. Since there isn't any bus type > that is available on *all* architectures, a generic "virtual" bus with > very simple probing seems much saner... You just have to change all the distribution installers then. Ok I suppose on s390 that's not that big issue because there are not that many for s390. But for x86 there exist quite a lot. I suppose it's easier to change it in the kernel. -Andi ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 9:26 ` Andi Kleen @ 2007-04-03 10:51 ` Cornelia Huck 2007-04-03 15:00 ` Adrian Bunk 1 sibling, 0 replies; 36+ messages in thread From: Cornelia Huck @ 2007-04-03 10:51 UTC (permalink / raw) To: Andi Kleen Cc: Christian Borntraeger, virtualization, Jeremy Fitzhardinge, H. Peter Anvin, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tue, 3 Apr 2007 11:26:52 +0200, Andi Kleen <ak@suse.de> wrote: > > > > On s390, it would be more than strangeness. There's no implementation > > of PCI at all, someone would have to cook it up - and it wouldn't have > > any use beyond those special devices. Since there isn't any bus type > > that is available on *all* architectures, a generic "virtual" bus with > > very simple probing seems much saner... > > You just have to change all the distribution installers then. > Ok I suppose on s390 that's not that big issue because there are not > that many for s390. But for x86 there exist quite a lot. I suppose > it's easier to change it in the kernel. Huh? I don't follow you here. Why should this be easier for s390 vs. x86? (And since there seems to be a trend to use HAL as a device discovery tool recently: A new bus type is easy enough to add there.) And I really think we should have a clean design in the kernel instead of trying to wedge virtual devices into a known system. Exposing virtual devices (which may be handled totally differently) as PCI devices just seems hackish to me. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 9:26 ` Andi Kleen 2007-04-03 10:51 ` Cornelia Huck @ 2007-04-03 15:00 ` Adrian Bunk 1 sibling, 0 replies; 36+ messages in thread From: Adrian Bunk @ 2007-04-03 15:00 UTC (permalink / raw) To: Andi Kleen Cc: Cornelia Huck, Christian Borntraeger, virtualization, Jeremy Fitzhardinge, H. Peter Anvin, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tue, Apr 03, 2007 at 11:26:52AM +0200, Andi Kleen wrote: > > > > On s390, it would be more than strangeness. There's no implementation > > of PCI at all, someone would have to cook it up - and it wouldn't have > > any use beyond those special devices. Since there isn't any bus type > > that is available on *all* architectures, a generic "virtual" bus with > > very simple probing seems much saner... > > You just have to change all the distribution installers then. > Ok I suppose on s390 that's not that big issue because there are not > that many for s390. But for x86 there exist quite a lot. I suppose > it's easier to change it in the kernel. I don't get this point. Compared to whatever will be done in the kernel, any change to a distribution installer should be trivial. And a new release of a distribution with a new kernel might anyway usually require some updates to an installer. > -Andi cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 9:17 ` Cornelia Huck 2007-04-03 9:26 ` Andi Kleen @ 2007-04-03 17:50 ` Arnd Bergmann 2007-04-03 19:07 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 36+ messages in thread From: Arnd Bergmann @ 2007-04-03 17:50 UTC (permalink / raw) To: Cornelia Huck Cc: Virtualization Mailing List, H. Peter Anvin, Linux Kernel Mailing List, mathiasen, virtualization On Tuesday 03 April 2007, Cornelia Huck wrote: > On s390, it would be more than strangeness. There's no implementation > of PCI at all, someone would have to cook it up - and it wouldn't have > any use beyond those special devices. Since there isn't any bus type > that is available on *all* architectures, a generic "virtual" bus with > very simple probing seems much saner... I think we need to separate two problems here: 1. Probing: That's really what triggered the discussion, PCI probing is well-understood and implemented on _most_ platforms, so there is some value in reusing it. When you talk about 'very simple probing', I'm not sure what the most simple approach could be. Ideas that have been implemented before include: a) have a limited set of device IDs (e.g. 65535 devices, or a hierarchic tree), and try to access each one of them in order to find out if it's there. We do that for PCI or CCW, for instance. b) Have an iterator in the hypervisor (or firmware), to return a handle to the first, next or child of a device. We do that for open firmware. c) ask the hypervisor for an unused device of a given class, which needs to be returned to the hypervisor when no longer used. This is how the PS3 hypervisor works, but it does not play well with the Linux driver model. 2. Device access: When talking to a virtual device, you want to have at least a way to give commands to it and a way to get interrupts back. Again, multiple ideas have been used in the past, and we should choose a subset: a) PCI-like: mmio using memory and/or I/O space BAR setup, interrupt numbers and DMA to guest physical addresses. b) Channel-like: use an hcall to give commands to the hypervisor, passing down a device handle command code and data areas in guest physical space. Interrupts return the device handle or a OS-defined per-device value. c) Minimalistic: Every device is mapped into the guest address space and can potentially be remapped into user space. The device memory can be shared between guests and/or with the host if that uses the same driver. The guest is able to signal the receiving end using an hcall and gets interrupts like in b) d) UNIX-like: devices appear like file descriptors, the guest can do operations like read/write/sync/mmap, potentially ioctl on them to talk to the host. Arnd <>< ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 17:50 ` Arnd Bergmann @ 2007-04-03 19:07 ` Jeremy Fitzhardinge 2007-04-03 19:42 ` Arnd Bergmann 0 siblings, 1 reply; 36+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-03 19:07 UTC (permalink / raw) To: Arnd Bergmann Cc: Cornelia Huck, Andi Kleen, Christian Borntraeger, virtualization, H. Peter Anvin, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen Arnd Bergmann wrote: > I think we need to separate two problems here: > > 1. Probing: > That's really what triggered the discussion, PCI probing is well-understood > and implemented on _most_ platforms, so there is some value in reusing it. > When you talk about 'very simple probing', I'm not sure what the most simple > approach could be. Is probing an interesting problem to consider on its own? If there's some hypervisor-agnostic device driver in Linux, then obviously it needs some way to find the the corresponding (virtual) hardware for it to talk to. But that probing mechanism will depend on the actual interface structure, and is just one of the many problems that need to be solved. There's no point in overloading PCI to probe for the device unless you're actually using PCI to talk to the device. > Ideas that have been implemented before include: > a) have a limited set of device IDs (e.g. 65535 devices, or a hierarchic tree), > and try to access each one of them in order to find out if it's there. We > do that for PCI or CCW, for instance. > b) Have an iterator in the hypervisor (or firmware), to return a handle to > the first, next or child of a device. We do that for open firmware. > c) ask the hypervisor for an unused device of a given class, which needs to > be returned to the hypervisor when no longer used. This is how the PS3 > hypervisor works, but it does not play well with the Linux driver model. > Xen has xenbus, which is essentially a filesystem-like namespace which can be walked to find the devices being exposed to a guest. It is fairly similar to OFW's device tree. > 2. Device access: > When talking to a virtual device, you want to have at least a way to give > commands to it and a way to get interrupts back. Again, multiple ideas > have been used in the past, and we should choose a subset: > Let me say up front that I'm skeptical that we can come up with a single bus-like abstraction which can be a both simple and efficient interface to all the virtual architectures. I think a more fruitful path is to find what pieces of functionality can be made common, with the aim of having small, simple and self-contained hypervisor-specific backends. I think this needs to be considered on a class by class basis. This thread started with a discussion about entropy sources. In theory you could implement it as simply as exposing a mmaped ringbuffer. There are some extra complexities deriving from the security requirements though; for example, all the entropy needs to be kept strictly private to the domain that consumes it. But beyond that, there are 3 other important classes of device: * console * disk * networking (There are obviously more, but these are the must-have.) Console already provides us with a model to work on, in the form of hvc-console. The hvc-console code itself has the bulk of the common console code, along with a set of very small hypervisor-specific backends. The Xen console implementation shrunk considerably when we switched to using it. If we could do the same thing with disk and net, I would be very happy. For example, if we wanted to change the Xen frontend/backend disk interface, we could use SCSI as the basic protocol, and then convert netfront into a relatively simple scsi driver. There would still be a Xen-specific piece, but it should be fairly small and have a clean interface. Though the existing interface is pretty simple shove-this-block-there affair. I'm not sure what similar common code could be extracted for network devices. I haven't looked into it all that closely. J ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 19:07 ` Jeremy Fitzhardinge @ 2007-04-03 19:42 ` Arnd Bergmann 2007-04-03 19:55 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 36+ messages in thread From: Arnd Bergmann @ 2007-04-03 19:42 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Cornelia Huck, Andi Kleen, Christian Borntraeger, virtualization, H. Peter Anvin, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote: > Arnd Bergmann wrote: > > I think we need to separate two problems here: > > > > 1. Probing: > > That's really what triggered the discussion, PCI probing is well-understood > > and implemented on _most_ platforms, so there is some value in reusing it. > > When you talk about 'very simple probing', I'm not sure what the most simple > > approach could be. > > Is probing an interesting problem to consider on its own? If there's > some hypervisor-agnostic device driver in Linux, then obviously it needs > some way to find the the corresponding (virtual) hardware for it to talk > to. But that probing mechanism will depend on the actual interface > structure, and is just one of the many problems that need to be solved. > There's no point in overloading PCI to probe for the device unless > you're actually using PCI to talk to the device. We already have device drivers for physical devices that can be attached to different buses. The EHCI USB is an example of a driver that can be for instance PCI, OF or an on-chip device. Moreover, you can have an abstracted device behind it that does not need to know about the transport, like the SCSI disk driver does not care if it is talking to an ATA, parallel SCSI or SAS chip, or even which controller that is. > Let me say up front that I'm skeptical that we can come up with a single > bus-like abstraction which can be a both simple and efficient interface > to all the virtual architectures. I think a more fruitful path is to > find what pieces of functionality can be made common, with the aim of > having small, simple and self-contained hypervisor-specific backends. > > I think this needs to be considered on a class by class basis. This > thread started with a discussion about entropy sources. In theory you > could implement it as simply as exposing a mmaped ringbuffer. There are > some extra complexities deriving from the security requirements though; > for example, all the entropy needs to be kept strictly private to the > domain that consumes it. > > But beyond that, there are 3 other important classes of device: > > * console > * disk > * networking > > (There are obviously more, but these are the must-have.) > > Console already provides us with a model to work on, in the form of > hvc-console. The hvc-console code itself has the bulk of the common > console code, along with a set of very small hypervisor-specific > backends. The Xen console implementation shrunk considerably when we > switched to using it. console is also the least problematic interface, you can do it over practically anything. > If we could do the same thing with disk and net, I would be very happy. > > For example, if we wanted to change the Xen frontend/backend disk > interface, we could use SCSI as the basic protocol, and then convert > netfront into a relatively simple scsi driver. There would still be a > Xen-specific piece, but it should be fairly small and have a clean > interface. Though the existing interface is pretty simple > shove-this-block-there affair. Doing a SCSI driver has been tried before, with ibmvscsi. Not good. The interesting question about block devices is how to handle concurrency and interrupt mitigation. An efficient interface should - have asynchronous notification, not sleep until the transfer is complete - allow multiple blocks to be in flight simultaneously, so the host can reorder the requests if it is smart enough - give only a single interrupt when multiple transfers have completed minor optimizations could be - give an interrupt early when some transfers are complete - allow I/O barriers to be inserted in the stream - allow marking blocks as more or less important (readahead vs. read) - provide passthrough of SG_IO or similar for optical media (e.g. DVD writer) > I'm not sure what similar common code could be extracted for network > devices. I haven't looked into it all that closely. One way to do networking would be to simply provide a shared memory area that everyone can write to, then use a ring buffer and atomic operations to synchronize between the guests, and a method to send interrupts to the others for flow control. Arnd <>< ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 19:42 ` Arnd Bergmann @ 2007-04-03 19:55 ` Jeremy Fitzhardinge 2007-04-03 20:03 ` H. Peter Anvin 2007-04-03 20:50 ` Arnd Bergmann 0 siblings, 2 replies; 36+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-03 19:55 UTC (permalink / raw) To: Arnd Bergmann Cc: Cornelia Huck, Andi Kleen, Christian Borntraeger, virtualization, H. Peter Anvin, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen Arnd Bergmann wrote: > We already have device drivers for physical devices that can be attached > to different buses. The EHCI USB is an example of a driver that can > be for instance PCI, OF or an on-chip device. Moreover, you can have an > abstracted device behind it that does not need to know about the transport, > like the SCSI disk driver does not care if it is talking to an ATA, > parallel SCSI or SAS chip, or even which controller that is. > Yes, that kind of layering is useful when there's enough of an abstraction gap to fit the layers into. USB is particularly simple in that way, since it can be made to travel nicely over any number of transports. > console is also the least problematic interface, you can do it over > practically anything. > Sure. But its interesting that there are savings to be had. > Doing a SCSI driver has been tried before, with ibmvscsi. Not good. > OK, interesting. People had proposed using SCSI as the interface, but I wasn't aware of any results from doing that. How is it not good? > The interesting question about block devices is how to handle concurrency > and interrupt mitigation. An efficient interface should > > - have asynchronous notification, not sleep until the transfer is complete > - allow multiple blocks to be in flight simultaneously, so the host can > reorder the requests if it is smart enough > - give only a single interrupt when multiple transfers have completed > Yes. The Xen block interface is already pretty efficient in these respects. >> I'm not sure what similar common code could be extracted for network >> devices. I haven't looked into it all that closely. >> > > One way to do networking would be to simply provide a shared memory area > that everyone can write to, then use a ring buffer and atomic operations > to synchronize between the guests, and a method to send interrupts to the > others for flow control. > Yes, and that's the core of the Xen netfront. But is there really much code which can be shared between different hypervisors? When you get down to it, all the real code is hypervisor-specific stuff for setting up ringbuffers and dealing with interrupts. Like all the other network drivers. J ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 19:55 ` Jeremy Fitzhardinge @ 2007-04-03 20:03 ` H. Peter Anvin 2007-04-03 21:00 ` Jeremy Fitzhardinge 2007-04-03 20:50 ` Arnd Bergmann 1 sibling, 1 reply; 36+ messages in thread From: H. Peter Anvin @ 2007-04-03 20:03 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Virtualization Mailing List, Arnd Bergmann, Cornelia Huck, Linux Kernel Mailing List, mathiasen, virtualization Jeremy Fitzhardinge wrote: > > Yes, and that's the core of the Xen netfront. But is there really much > code which can be shared between different hypervisors? When you get > down to it, all the real code is hypervisor-specific stuff for setting > up ringbuffers and dealing with interrupts. Like all the other network > drivers. > One thing, Jeremy, which I think is being a bit misleading here: you're focusing on big, performance-critical stuff. Those things are going to be the ones which has the most win to implement in hypervisor-specific ways. Although we can offer models for some hypervisors (and G-d knows there are enough implementations out there of virtual disk which are almost identical), they're clearly not going to be universal. However, there are other things; console is some, or my original example, which was random number generation. For those, the benefit of unification is proportionally greater, simply because the win of anything hypervisor-specific is much smaller. -hpa ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 20:03 ` H. Peter Anvin @ 2007-04-03 21:00 ` Jeremy Fitzhardinge 2007-04-03 21:45 ` H. Peter Anvin 2007-04-03 21:51 ` Arnd Bergmann 0 siblings, 2 replies; 36+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-03 21:00 UTC (permalink / raw) To: H. Peter Anvin Cc: Virtualization Mailing List, Arnd Bergmann, Cornelia Huck, Linux Kernel Mailing List, mathiasen, virtualization H. Peter Anvin wrote: > However, there are other things; console is some, or my original > example, which was random number generation. For those, the benefit > of unification is proportionally greater, simply because the win of > anything hypervisor-specific is much smaller. So, what you're saying is: 1. assuming there's going to be a vast number of miscellaneous devices 2. it would be best if there were one per device rather than one per hypervisor per device 3. so we'd have one linux device driver But this implies that the work is just pushed off into all the hypervisors to support this new device over the generic interface; there's no overall reduction of code or complexity, other than making "wc" on the kernel source smaller. That said, something like USB is probably the best bet for this kind of low-performance device. I think. Not that I really know anything about USB. J ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 21:00 ` Jeremy Fitzhardinge @ 2007-04-03 21:45 ` H. Peter Anvin 2007-04-03 21:51 ` Arnd Bergmann 1 sibling, 0 replies; 36+ messages in thread From: H. Peter Anvin @ 2007-04-03 21:45 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Arnd Bergmann, Cornelia Huck, Andi Kleen, Christian Borntraeger, virtualization, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen Jeremy Fitzhardinge wrote: > > So, what you're saying is: > > 1. assuming there's going to be a vast number of miscellaneous devices > 2. it would be best if there were one per device rather than one per > hypervisor per device > 3. so we'd have one linux device driver > > But this implies that the work is just pushed off into all the > hypervisors to support this new device over the generic interface; > there's no overall reduction of code or complexity, other than making > "wc" on the kernel source smaller. > Sure there is, assuming you deal about heterogenous clients. I'm not sure Xen is (although that is, as far as I understand, being remedied), which might explain your different perspective. Consider that this may not even be about Linux -- having these standard devices would enable, say, 'doze device drivers to be written and shared. > That said, something like USB is probably the best bet for this kind of > low-performance device. I think. Not that I really know anything about > USB. USB is evil in the extreme for this kind of stuff. Although in theory you can have any HCI you want, in practice the ones that are implemented requires a very complex framework for full compatiblity. -hpa ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 21:00 ` Jeremy Fitzhardinge 2007-04-03 21:45 ` H. Peter Anvin @ 2007-04-03 21:51 ` Arnd Bergmann 2007-04-03 22:10 ` H. Peter Anvin 1 sibling, 1 reply; 36+ messages in thread From: Arnd Bergmann @ 2007-04-03 21:51 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: H. Peter Anvin, Cornelia Huck, Andi Kleen, Christian Borntraeger, virtualization, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote: > That said, something like USB is probably the best bet for this kind of > low-performance device. I think. Not that I really know anything about > USB. USB has the disadvantage that it is more complex than PCI and requires significantly more code to simulate on the host side. On the plus side, I think it should be possible to implement a virtual USB host on s390, which is not possible with PCI, but that again takes a lot of work to implement. One interesting aspect of the PS3 hypervisor is that some of the low-speed interfaces are implemented as a virtual UART, meaning something that only has read and write operations and uses an interrupt for flow control. The implementation in drivers/ps3/vuart.c is probably more complex than what we want as a generic transport mechanism, but simply having a bidirectional data stream sounds like an ideal abstraction for the "simple" case. Some more or less obvious users of this include: - console - additional tty - random - slow network (using ppp) - printer - watchdog - hid (e.g. mouse) - system management (like ps3) - fast network (in combination with shared memory segment) The transport can be hypervisor specific, e.g. there could be a virtual PCI serial port on kvm, an hcall interface on the ps3 and a virtual CTC on s390 (kidding), while all of them can have the same kind of hardware _behind_ the serial connection. Arnd <>< ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 21:51 ` Arnd Bergmann @ 2007-04-03 22:10 ` H. Peter Anvin 2007-04-03 22:49 ` Arnd Bergmann 0 siblings, 1 reply; 36+ messages in thread From: H. Peter Anvin @ 2007-04-03 22:10 UTC (permalink / raw) To: Arnd Bergmann Cc: Virtualization Mailing List, Cornelia Huck, Linux Kernel Mailing List, mathiasen, virtualization Arnd Bergmann wrote: > > One interesting aspect of the PS3 hypervisor is that some of the > low-speed interfaces are implemented as a virtual UART, meaning > something that only has read and write operations and uses an > interrupt for flow control. The implementation in > drivers/ps3/vuart.c is probably more complex than what we want > as a generic transport mechanism, but simply having a bidirectional > data stream sounds like an ideal abstraction for the "simple" > case. Some more or less obvious users of this include: > > - console > - additional tty > - random > - slow network (using ppp) > - printer > - watchdog > - hid (e.g. mouse) > - system management (like ps3) > - fast network (in combination with > shared memory segment) > > The transport can be hypervisor specific, e.g. there could be > a virtual PCI serial port on kvm, an hcall interface on the ps3 > and a virtual CTC on s390 (kidding), while all of them can have > the same kind of hardware _behind_ the serial connection. > Note that at least for PIO-based devices, there is nothing that says you can't implement PCI over another transport, if you wish. It's really just a very simple RPC protocol. DMA is trickier, as it makes the data appear into the address space of the guest in a way that is both device- and host-dependent (in the presence of PCI domains, IOMMU etc.) There may be reason to avoid DMA for that reason. -hpa ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 22:10 ` H. Peter Anvin @ 2007-04-03 22:49 ` Arnd Bergmann 2007-04-04 0:52 ` H. Peter Anvin 0 siblings, 1 reply; 36+ messages in thread From: Arnd Bergmann @ 2007-04-03 22:49 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Fitzhardinge, Cornelia Huck, Andi Kleen, Christian Borntraeger, virtualization, Virtualization Mailing List, Linux Kernel Mailing List, mathiasen On Wednesday 04 April 2007, H. Peter Anvin wrote: > Note that at least for PIO-based devices, there is nothing that says you > can't implement PCI over another transport, if you wish. It's really > just a very simple RPC protocol. The PIO aspect of PCI is simple, yes, except on architectures that don't have the concept of PIO or even uncached memory, but even that can be done by defining readl/writel/inl/outl/... as hcalls. The tricky part about PCI is the device probing, everything about config space accesses, interrupt swizzling, bus/device/function numbers and base address registers becomes a pointless excercise when the other side is just faking it. > DMA is trickier, as it makes the data appear into the address space of > the guest in a way that is both device- and host-dependent (in the > presence of PCI domains, IOMMU etc.) There may be reason to avoid DMA > for that reason. Right, PCI DMA and virtualization don't mix. DMA in general is fine though, as long as your devices (real or virtual) see the guest physical addresses as a contiguous 64 bit range and have well-defined semantics about what addresses are accessed in what way. When you think of file read/write syscalls as DMA into user space, it's a very clean concept. Async I/O somewhat less so, but still pretty good. Arnd <>< ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 22:49 ` Arnd Bergmann @ 2007-04-04 0:52 ` H. Peter Anvin 2007-04-04 13:11 ` Arnd Bergmann 0 siblings, 1 reply; 36+ messages in thread From: H. Peter Anvin @ 2007-04-04 0:52 UTC (permalink / raw) To: Arnd Bergmann Cc: Virtualization Mailing List, Cornelia Huck, Linux Kernel Mailing List, mathiasen, virtualization Arnd Bergmann wrote: > On Wednesday 04 April 2007, H. Peter Anvin wrote: >> Note that at least for PIO-based devices, there is nothing that says you >> can't implement PCI over another transport, if you wish. It's really >> just a very simple RPC protocol. > > The PIO aspect of PCI is simple, yes, except on architectures that don't > have the concept of PIO or even uncached memory, but even that can > be done by defining readl/writel/inl/outl/... as hcalls. > > The tricky part about PCI is the device probing, everything about config > space accesses, interrupt swizzling, bus/device/function numbers and > base address registers becomes a pointless excercise when the other side > is just faking it. Configuration space access is platform-dependent. It's only defined to work in a specific way on x86 platforms. "Interrupt swizzling" is really totally independent of PCI. ALL PCI really provides is up to four interrupts per device (not counting MSI/MSI-X) and an 8-bit writable field which the platform can choose to use to hold interrupt information. That's all. The rest is all platform information. PCI enumeration is hardly complex. Most of the stuff that doesn't apply to you you can generally ignore, as is done by other busses like HyperTransport when they emulate PCI. That being said, on platforms which are PCI-centric, such as x86, this of course makes it a lot easier to produce virtual devices which work across hypervisors, since the device model, of *any* operating system is set up to handle them. -hpa ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-04 0:52 ` H. Peter Anvin @ 2007-04-04 13:11 ` Arnd Bergmann 2007-04-04 15:50 ` H. Peter Anvin 0 siblings, 1 reply; 36+ messages in thread From: Arnd Bergmann @ 2007-04-04 13:11 UTC (permalink / raw) To: H. Peter Anvin Cc: Virtualization Mailing List, Cornelia Huck, Linux Kernel Mailing List, mathiasen, virtualization On Wednesday 04 April 2007, H. Peter Anvin wrote: > Configuration space access is platform-dependent. It's only defined to > work in a specific way on x86 platforms. > > "Interrupt swizzling" is really totally independent of PCI. ALL PCI > really provides is up to four interrupts per device (not counting > MSI/MSI-X) and an 8-bit writable field which the platform can choose to > use to hold interrupt information. That's all. The rest is all > platform information. > > PCI enumeration is hardly complex. Most of the stuff that doesn't apply > to you you can generally ignore, as is done by other busses like > HyperTransport when they emulate PCI. You still don't get my point: On a platform that doesn't have interrupt numbers, and where most of the fields in the config space don't correspond do anything that is already there, you really don't want to invent a set of new hcalls that implement emulation, to get something as simple as a pipe. wc drivers/pci/*.[ch] include/asm-i386/{pci,io}.h lib/iomap*.c \ arch/i386/pci/*.c kernel/irq/*.c 17015 59037 463967 total Even if you only need half of that code in reality, reimplementing all that in both the kernel and in the hypervisor is an enourmous effort. We've seen that before on the ps3, which initially faked a virtual PCI bus just for the USB controller, but doing something like that requires adding abstraction layers, to decide whether to implement e.g. an inb as a hypercall or as a memory read. > That being said, on platforms which are PCI-centric, such as x86, this > of course makes it a lot easier to produce virtual devices which work > across hypervisors, since the device model, of *any* operating system is > set up to handle them. Yes, as I said there are two separate problems. I really think that a standardized virtual driver interface should be modeled after kernel <-> user interfaces, not hardware <-> kernel interfaces. Once we know what operations we want (e.g. read, write and SIGIO, or some other set of primitives), it will be good to provide a virtual PCI device that can be used as one transport mechanism below it. Using PCI device IDs to tell what functionality is provided by the device would provide a reasonable method for autoprobing. Arnd <>< ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-04 13:11 ` Arnd Bergmann @ 2007-04-04 15:50 ` H. Peter Anvin 0 siblings, 0 replies; 36+ messages in thread From: H. Peter Anvin @ 2007-04-04 15:50 UTC (permalink / raw) To: Arnd Bergmann Cc: Virtualization Mailing List, Cornelia Huck, Linux Kernel Mailing List, mathiasen, virtualization Arnd Bergmann wrote: > >> That being said, on platforms which are PCI-centric, such as x86, this >> of course makes it a lot easier to produce virtual devices which work >> across hypervisors, since the device model, of *any* operating system is >> set up to handle them. > > Yes, as I said there are two separate problems. I really think that > a standardized virtual driver interface should be modeled after > kernel <-> user interfaces, not hardware <-> kernel interfaces. > > Once we know what operations we want (e.g. read, write and SIGIO, > or some other set of primitives), it will be good to provide a > virtual PCI device that can be used as one transport mechanism > below it. Using PCI device IDs to tell what functionality is > provided by the device would provide a reasonable method for > autoprobing. > That seems like a reasonable approach. I *do* care about hardware-equivalent interfaces, because they, too, keep getting reinvented, but it seems reasonable to approach it in a layered fashion like you describe. -hpa ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: A set of "standard" virtual devices? 2007-04-03 19:55 ` Jeremy Fitzhardinge 2007-04-03 20:03 ` H. Peter Anvin @ 2007-04-03 20:50 ` Arnd Bergmann 1 sibling, 0 replies; 36+ messages in thread From: Arnd Bergmann @ 2007-04-03 20:50 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Virtualization Mailing List, H. Peter Anvin, Cornelia Huck, Linux Kernel Mailing List, mathiasen, virtualization On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote: > > Doing a SCSI driver has been tried before, with ibmvscsi. Not good. > > > > OK, interesting. People had proposed using SCSI as the interface, but I > wasn't aware of any results from doing that. How is it not good? > SCSI is really overengineered for something as simple as a block interface. A large part of the SCSI stack deals only with error handling, which you don't want to burden the guests with at all, since most error conditions can be handled fine by the host. Another big aspect of SCSI is device enumeration and probing. Doing it the SCSI way is particularly pointless. It's much simpler to have one device with its own I/O interface at the hcall layer, and one interrupt number for the block device, instead of faking the full hca/bus/dev/lun hierarchy. Arnd <>< ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2007-04-04 15:50 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4611652F.700@zytor.com>
2007-04-02 20:56 ` A set of "standard" virtual devices? Jeremy Fitzhardinge
2007-04-02 21:12 ` Andi Kleen
2007-04-02 21:33 ` Jeff Garzik
2007-04-02 21:36 ` Andi Kleen
2007-04-02 21:42 ` Jeremy Fitzhardinge
2007-04-02 21:53 ` Anthony Liguori
2007-04-02 22:04 ` Jeremy Fitzhardinge
2007-04-02 22:10 ` H. Peter Anvin
2007-04-02 22:25 ` Jeff Garzik
2007-04-02 22:30 ` H. Peter Anvin
2007-04-03 9:41 ` Arnd Bergmann
2007-04-03 10:41 ` Cornelia Huck
2007-04-03 12:15 ` Arnd Bergmann
2007-04-03 13:39 ` Cornelia Huck
2007-04-03 14:03 ` Arnd Bergmann
2007-04-03 16:07 ` Cornelia Huck
2007-04-03 8:29 ` Christian Borntraeger
2007-04-03 8:30 ` Andi Kleen
2007-04-03 9:17 ` Cornelia Huck
2007-04-03 9:26 ` Andi Kleen
2007-04-03 10:51 ` Cornelia Huck
2007-04-03 15:00 ` Adrian Bunk
2007-04-03 17:50 ` Arnd Bergmann
2007-04-03 19:07 ` Jeremy Fitzhardinge
2007-04-03 19:42 ` Arnd Bergmann
2007-04-03 19:55 ` Jeremy Fitzhardinge
2007-04-03 20:03 ` H. Peter Anvin
2007-04-03 21:00 ` Jeremy Fitzhardinge
2007-04-03 21:45 ` H. Peter Anvin
2007-04-03 21:51 ` Arnd Bergmann
2007-04-03 22:10 ` H. Peter Anvin
2007-04-03 22:49 ` Arnd Bergmann
2007-04-04 0:52 ` H. Peter Anvin
2007-04-04 13:11 ` Arnd Bergmann
2007-04-04 15:50 ` H. Peter Anvin
2007-04-03 20:50 ` Arnd Bergmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).