Re: A set of "standard" virtual devices?

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* Re: A set of "standard" virtual devices?
       [not found] <4611652F.700@zytor.com>
@ 2007-04-02 20:56 ` Jeremy Fitzhardinge
  2007-04-02 21:12   ` Andi Kleen
  0 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-02 20:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linux Kernel Mailing List, mathiasen, Virtualization Mailing List

H. Peter Anvin wrote:
> On the subject of virtualization; there are a number of devices which
> keep being invented and reinvented by just about every virtualization
> vendor for no really good reason.
>
> I personally recently pointed out that a proper virtualization
> solution should handle entropy collection at the lowest level (where
> the physical hardware drivers are) and present a hw_rng interface to
> the guests. Unfortunately, none of the hardware-based hw_rng
> interfaces is sane enough to do that with, which calls for a virtual
> driver.
>
> It would be nice if there was one, and not a dozen, such drivers.
>
> I would therefore like to propose that the Linux Foundation register a
> PCI ID for use by LANANA ($3000/year), and we set up a LANANA registry
> for these device IDs, together with a description of the device
> interface each of them expect.  Similarly, a Subsystem ID registry can
> be used (for virtualization vendors which don't have their own VID
> already) to distinguish different implementations.
>
> Obviously, anyone who adheres to the published interface can use one
> of these VID:DIDs -- as far as I'm concerned, even hardware vendors;
> we'll use the SID to distinguish between implementations. 

How would that work in the case where virtualized guests don't have a
visible PCI bus, and the virtual environment doesn't pretend to emulate
a PCI bus?

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-02 20:56 ` A set of "standard" virtual devices? Jeremy Fitzhardinge
@ 2007-04-02 21:12   ` Andi Kleen
  2007-04-02 21:33     ` Jeff Garzik
  2007-04-03  8:29     ` Christian Borntraeger
  0 siblings, 2 replies; 36+ messages in thread
From: Andi Kleen @ 2007-04-02 21:12 UTC (permalink / raw)
  To: virtualization
  Cc: Jeremy Fitzhardinge, H. Peter Anvin, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen


> How would that work in the case where virtualized guests don't have a
> visible PCI bus, and the virtual environment doesn't pretend to emulate
> a PCI bus?

If they emulated one with the appropiate device 
then distribution driver auto probing would just work transparently for them.

-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-02 21:12   ` Andi Kleen
@ 2007-04-02 21:33     ` Jeff Garzik
  2007-04-02 21:36       ` Andi Kleen
  2007-04-03  8:29     ` Christian Borntraeger
  1 sibling, 1 reply; 36+ messages in thread
From: Jeff Garzik @ 2007-04-02 21:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Virtualization Mailing List, H. Peter Anvin,
	Linux Kernel Mailing List, mathiasen, virtualization

Andi Kleen wrote:
>> How would that work in the case where virtualized guests don't have a
>> visible PCI bus, and the virtual environment doesn't pretend to emulate
>> a PCI bus?
> 
> If they emulated one with the appropiate device 
> then distribution driver auto probing would just work transparently for them.

Yes, but, ideally with paravirtualization you should be able to avoid 
the overhead of emulating many major classes of device (storage, 
network, RNG, etc.) by developing a low-overhead passthrough interface 
that does not involve PCI at all.

	Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-02 21:33     ` Jeff Garzik
@ 2007-04-02 21:36       ` Andi Kleen
  2007-04-02 21:42         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 36+ messages in thread
From: Andi Kleen @ 2007-04-02 21:36 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Virtualization Mailing List, H. Peter Anvin,
	Linux Kernel Mailing List, mathiasen, virtualization

On Monday 02 April 2007 23:33:01 Jeff Garzik wrote:
> Andi Kleen wrote:
> >> How would that work in the case where virtualized guests don't have a
> >> visible PCI bus, and the virtual environment doesn't pretend to emulate
> >> a PCI bus?
> > 
> > If they emulated one with the appropiate device 
> > then distribution driver auto probing would just work transparently for them.
> 
> Yes, but, ideally with paravirtualization you should be able to avoid 
> the overhead of emulating many major classes of device (storage, 
> network, RNG, etc.) by developing a low-overhead passthrough interface 
> that does not involve PCI at all.

The implementation wouldn't need to use PCI at all. There wouldn't 
even need to be PCI like registers internally. Just a pci device
with an ID somewhere in sysfs. PCI with unique IDs
is just a convenient and well established key into the driver module
collection. Once you have the right driver it can do what it wants.

-Andi 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-02 21:36       ` Andi Kleen
@ 2007-04-02 21:42         ` Jeremy Fitzhardinge
  2007-04-02 21:53           ` Anthony Liguori
  2007-04-02 22:10           ` H. Peter Anvin
  0 siblings, 2 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-02 21:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Virtualization Mailing List, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, mathiasen, virtualization

Andi Kleen wrote:
> The implementation wouldn't need to use PCI at all. There wouldn't 
> even need to be PCI like registers internally. Just a pci device
> with an ID somewhere in sysfs. PCI with unique IDs
> is just a convenient and well established key into the driver module
> collection. Once you have the right driver it can do what it wants.

But I understood hpa's suggestion to mean that there would be a standard
PCI interface for a hardware RNG, and a single linux driver for that
device, which all hypervisors would be expected to implement.  But
that's only reasonable if the virtualization environment has some notion
of PCI to expose to the Linux guest.

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-02 21:42         ` Jeremy Fitzhardinge
@ 2007-04-02 21:53           ` Anthony Liguori
  2007-04-02 22:04             ` Jeremy Fitzhardinge
  2007-04-02 22:10           ` H. Peter Anvin
  1 sibling, 1 reply; 36+ messages in thread
From: Anthony Liguori @ 2007-04-02 21:53 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Virtualization Mailing List, Jeff Garzik, H. Peter Anvin,
	virtualization, Linux Kernel Mailing List, mathiasen

Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
>> The implementation wouldn't need to use PCI at all. There wouldn't 
>> even need to be PCI like registers internally. Just a pci device
>> with an ID somewhere in sysfs. PCI with unique IDs
>> is just a convenient and well established key into the driver module
>> collection. Once you have the right driver it can do what it wants.
> 
> But I understood hpa's suggestion to mean that there would be a standard
> PCI interface for a hardware RNG, and a single linux driver for that
> device, which all hypervisors would be expected to implement.  But
> that's only reasonable if the virtualization environment has some notion
> of PCI to expose to the Linux guest.

The actual PCI bus could paravirtualized.  It's just a question of 
whether one reinvents a device discovery mechanism (like XenBus) or 
whether one piggy backs on existing mechanisms.

Furthermore, in the future, I strongly suspect that HVM will become much 
more important for Xen than PV and since that already has a PCI bus it's 
not really that big of a deal.

Regards,

Anthony Liguori

>     J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-02 21:53           ` Anthony Liguori
@ 2007-04-02 22:04             ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-02 22:04 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Andi Kleen, Virtualization Mailing List, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List, mathiasen,
	virtualization

Anthony Liguori wrote:
> The actual PCI bus could paravirtualized.  It's just a question of
> whether one reinvents a device discovery mechanism (like XenBus) or
> whether one piggy backs on existing mechanisms.
>
> Furthermore, in the future, I strongly suspect that HVM will become
> much more important for Xen than PV and since that already has a PCI
> bus it's not really that big of a deal. 

Well, obviously it keeps things simple for me to not worry about PCI
support in Xen at this point.  But I was thinking more of lguest; I
think PCI emulation would kill puppies.

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-02 21:42         ` Jeremy Fitzhardinge
  2007-04-02 21:53           ` Anthony Liguori
@ 2007-04-02 22:10           ` H. Peter Anvin
  2007-04-02 22:25             ` Jeff Garzik
  2007-04-03  9:41             ` Arnd Bergmann
  1 sibling, 2 replies; 36+ messages in thread
From: H. Peter Anvin @ 2007-04-02 22:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Virtualization Mailing List, Jeff Garzik, virtualization,
	Linux Kernel Mailing List, mathiasen

Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
>> The implementation wouldn't need to use PCI at all. There wouldn't 
>> even need to be PCI like registers internally. Just a pci device
>> with an ID somewhere in sysfs. PCI with unique IDs
>> is just a convenient and well established key into the driver module
>> collection. Once you have the right driver it can do what it wants.
> 
> But I understood hpa's suggestion to mean that there would be a standard
> PCI interface for a hardware RNG, and a single linux driver for that
> device, which all hypervisors would be expected to implement.  But
> that's only reasonable if the virtualization environment has some notion
> of PCI to expose to the Linux guest.
> 

That is, of course, true, although "some notion of" is very broad, and 
one could also use this for detection and some hypervisor-specific 
communication for the actual I/O.

However, one probably wants to think about what the heck one actually 
means with "virtualization" in the absence of a lot of this stuff.  PCI 
is probably the closest thing we have to a lowest common denominator for 
device detection.

	-hpa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-02 22:10           ` H. Peter Anvin
@ 2007-04-02 22:25             ` Jeff Garzik
  2007-04-02 22:30               ` H. Peter Anvin
  2007-04-03  9:41             ` Arnd Bergmann
  1 sibling, 1 reply; 36+ messages in thread
From: Jeff Garzik @ 2007-04-02 22:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Virtualization Mailing List, virtualization, mathiasen,
	Linux Kernel Mailing List

H. Peter Anvin wrote:
> However, one probably wants to think about what the heck one actually 
> means with "virtualization" in the absence of a lot of this stuff.  PCI 
> is probably the closest thing we have to a lowest common denominator for 
> device detection.


Sure, but let's look beyond device detection.  For instance, it does not 
necessarily follow that emulating PCI DMA is the best way to go for 
communication with a virtual device, once detected.

Outside of pci_device_id driver matching, is there much value here?

	Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-02 22:25             ` Jeff Garzik
@ 2007-04-02 22:30               ` H. Peter Anvin
  0 siblings, 0 replies; 36+ messages in thread
From: H. Peter Anvin @ 2007-04-02 22:30 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Virtualization Mailing List, virtualization, mathiasen,
	Linux Kernel Mailing List

Jeff Garzik wrote:
> 
> Sure, but let's look beyond device detection.  For instance, it does not 
> necessarily follow that emulating PCI DMA is the best way to go for 
> communication with a virtual device, once detected.
> 

This is true, of course.  However, there are going to be a set of 
virtual devices which don't necessarily have to have super-high 
performance.  In the case of a hwrng device, even doing DMA is probably 
overkill.

> Outside of pci_device_id driver matching, is there much value here?

If we can get a set of device drivers that if not all then at least a 
number of hypervisors and/or emulators can agree upon, I think that's 
much won.

	-hpa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-02 22:10           ` H. Peter Anvin
  2007-04-02 22:25             ` Jeff Garzik
@ 2007-04-03  9:41             ` Arnd Bergmann
  2007-04-03 10:41               ` Cornelia Huck
  1 sibling, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2007-04-03  9:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik, virtualization,
	Virtualization Mailing List, Linux Kernel Mailing List, mathiasen

On Tuesday 03 April 2007, H. Peter Anvin wrote:
> However, one probably wants to think about what the heck one actually 
> means with "virtualization" in the absence of a lot of this stuff.  PCI 
> is probably the closest thing we have to a lowest common denominator for 
> device detection.

I think that's true outside of s390, but a standardized virtual device
interface should be able to work there as well. Interestingly, the
s390 channel I/O also uses two 16 bit numbers to identify a device
(type and model), just like PCI or USB, so in that light, we might
be able to use the same number space for something entirely different
depending on the virtual bus.

	Arnd <><

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03  9:41             ` Arnd Bergmann
@ 2007-04-03 10:41               ` Cornelia Huck
  2007-04-03 12:15                 ` Arnd Bergmann
  0 siblings, 1 reply; 36+ messages in thread
From: Cornelia Huck @ 2007-04-03 10:41 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik,
	virtualization, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

On Tue, 3 Apr 2007 11:41:49 +0200,
Arnd Bergmann <arnd@arndb.de> wrote:

> On Tuesday 03 April 2007, H. Peter Anvin wrote:
> > However, one probably wants to think about what the heck one actually 
> > means with "virtualization" in the absence of a lot of this stuff.  PCI 
> > is probably the closest thing we have to a lowest common denominator for 
> > device detection.
> 
> I think that's true outside of s390, but a standardized virtual device
> interface should be able to work there as well. Interestingly, the
> s390 channel I/O also uses two 16 bit numbers to identify a device
> (type and model), just like PCI or USB, so in that light, we might
> be able to use the same number space for something entirely different
> depending on the virtual bus.

Even if we used those ids for cu_type and dev_type, it would still be
ugly IMO. It would be much cleaner to just define a very simple, easy
to implement virtual bus without dragging implementation details for
other types of devices around.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 10:41               ` Cornelia Huck
@ 2007-04-03 12:15                 ` Arnd Bergmann
  2007-04-03 13:39                   ` Cornelia Huck
  0 siblings, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2007-04-03 12:15 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik,
	virtualization, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

On Tuesday 03 April 2007, Cornelia Huck wrote:
> > 
> > I think that's true outside of s390, but a standardized virtual device
> > interface should be able to work there as well. Interestingly, the
> > s390 channel I/O also uses two 16 bit numbers to identify a device
> > (type and model), just like PCI or USB, so in that light, we might
> > be able to use the same number space for something entirely different
> > depending on the virtual bus.
> 
> Even if we used those ids for cu_type and dev_type, it would still be
> ugly IMO. It would be much cleaner to just define a very simple, easy
> to implement virtual bus without dragging implementation details for
> other types of devices around.

Right, but an interesting point is the question what to do when running
another operating system as a guest under Linux, e.g. with kvm.

Ideally, you'd want to use the same interface to announce the presence
of the device, which can be done far more easily with PCI than using
a new bus type that you'd need to implement for every OS, instead of
just implementing the virtual PCI driver.

Using a 16 bit number to identify a specific interface sounds like
a good idea to me, if only for the reason that it is a widely used
approach. The alternative would be to use an ascii string, like we
have for open-firmware devices on powerpc or sparc.

I think in either way, we need to abstract the driver for the virtual
device from the underlying bus infrastructure, which is hypervisor
and/or platform dependent. The abstraction could work roughly like this:


==========
virt_dev.h
==========
struct virt_driver { /* platform independent */
	struct device_driver drv;
	struct pci_device_id *ids; /* not necessarily PCI */
};
struct virt_bus {
	/* platform dependent */
	long (*transfer)(struct virt_dev *dev, void *buffer,
		unsigned long size, int type);
};
struct virt_dev {
	struct device dev;
	struct virt_driver *driver;
	struct virt_bus *bus;
	struct pci_device_id id;
	int irq;
};
==============
virt_example.c
==============
static ssize_t virt_pipe_read(struct file *filp, char __user *buffer,
                         size_t len, loff_t *off)
{
	struct virt_dev *dev = filp->private_data;
	ssize_t ret = dev->bus->transfer(dev, buffer, len, READ);
	*off += ret;
	return ret;
}
static struct file_operations virt_pipe_fops = {
	.open = nonseekable_open,
	.read = virt_pipe_read,
};
static int virt_pipe_probe(struct device *dev)
{
	struct virt_dev *vdev = to_virt_dev(dev);
	struct miscdev *mdev = kmalloc(sizeof(*dev), GFP_KERNEL);
	mdev->name = "virt_pipe";
	mdev->fops = &virt_pipe_fops;
	mdev->parent = dev;
	return register_miscdev(mdev);
}
static struct pci_device_id virt_pipe_id = {
	.vendor = PCI_VENDOR_LINUX, .device = 0x3456,
};
MODULE_DEVICE_TABLE(pci, virt_pipe_id);
static struct virt_driver virt_pipe_driver = {
	.drv = {
		.name = "virt_pipe",
		.probe = virt_pipe_probe,
	},
	.ids = &virt_pipe_id,
}
static int virt_pipe_init(void)
{
	return virt_driver_register(&virt_pipe_driver);
}
module_init(virt_pipe_init);
==============
virt_devtree.c
==============
static long virt_devtree_transfer(struct virt_dev *dev, void *buffer,
		unsigned long size, int type)
{
	long reg;
	switch type {
	case READ:
		ret = hcall(HV_READ, dev->dev.platform_data, buffer, size);
		break;
	case WRITE:
		ret = hcall(HV_WRITE, dev->dev.platform_data, buffer, size);
		break;
	default:
		BUG();
	}
	return ret;
}
static struct virt_bus virt_devtree_bus = {
	.transfer = virt_devtree_transfer,
};
static int virt_devtree_probe(struct of_device *ofdev,
				struct of_device_id *match)
{
	struct virt_dev *vdev = kzalloc(sizeof(*vdev);
	vdev->bus = &virt_devtree_bus;
	vdev->dev.parent = &ofdev->dev;
	vdev.id.vendor = PCI_VENDOR_LINUX;
	vdev.id.device = *of_get_property(ofdev, "virt_dev_id"),
	vdev.irq = of_irq_parse_and_map(ofdev, 0);
	return device_register(&vdev->dev);
}
struct of_device_id virt_devtree_ids = {
	.compatible = "virt-dev",
};
static struct of_platform_driver virt_devtree_driver = {
	.probe = virt_devtree_probe,
	.match_table = &virt_devtree_ids,
};
==============
virt_pci.c
==============
static long virt_pci_transfer(struct virt_dev *dev, void *buffer,
		unsigned long size, int type)
{
	struct virt_pci_regs __iomem *regs = dev->dev.platform_data;
	switch type {
	case READ:
		mmio_insb(regs->read_port, buffer, size);
		break;
	case WRITE:
		mmio_outsb(regs->write_port, buffer, size);
		break;
	default:
		BUG();
	}
	return size;
}
static struct virt_bus virt_pci_bus = {
	.transfer = virt_pci_transfer,
};
static int virt_pci_probe(struct pci_dev *pdev,
				struct pci_device_id *match)
{
	struct virt_dev *vdev = kzalloc(sizeof(*vdev);
	vdev->bus = &virt_pci_bus;
	vdev->dev.parent = &pdev->dev;
	vdev.id = *match;
	vdev.irq = pdev->irq;
	return device_register(&vdev->dev);
}
struct pci_device_id virt_pci_ids = {
	.compatible = "virt-dev",
};
static struct of_platform_driver virt_pci_driver = {
	.probe = virt_pci_probe,
	.match_table = &virt_pci_ids,
};

	Arnd <><

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 12:15                 ` Arnd Bergmann
@ 2007-04-03 13:39                   ` Cornelia Huck
  2007-04-03 14:03                     ` Arnd Bergmann
  0 siblings, 1 reply; 36+ messages in thread
From: Cornelia Huck @ 2007-04-03 13:39 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik,
	virtualization, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

On Tue, 3 Apr 2007 14:15:37 +0200,
Arnd Bergmann <arnd@arndb.de> wrote:

> Right, but an interesting point is the question what to do when running
> another operating system as a guest under Linux, e.g. with kvm.
> 
> Ideally, you'd want to use the same interface to announce the presence
> of the device, which can be done far more easily with PCI than using
> a new bus type that you'd need to implement for every OS, instead of
> just implementing the virtual PCI driver.

That's OK for a virtualized architecture where the base architecture
already supports PCI. But a traditional s390 OS would be as unhappy
with a PCI device as with a device of a completely new type :)

There are several options for virtualized devices (and I don't know why
they shouldn't coexist):

1. Emulate a well-known device (like a e1000 network card on PCI or a
model 3390 dasd on CCW). Existing operating systems can just use them,
but it's a lot of work in the hypervisor.

2. Create a virtual PCI device (or a virtual CCW device) with a new id.
Operating systems would need to write a new device driver, but they can
use a familiar infrastructure. That seems to be what most people are
talking about here.

3. Create a new bus which uses a new access method. This new method can
be made very simple, but requires support from the guest operating
system. That's what I was talking about :)

[Note: I'm not actually advocating an emulated ccw driver. There be
dragons.]

> Using a 16 bit number to identify a specific interface sounds like
> a good idea to me, if only for the reason that it is a widely used
> approach. The alternative would be to use an ascii string, like we
> have for open-firmware devices on powerpc or sparc.

OK, we could use common identifiers (and reserve it) for case 2 across
several busses. Like

#define PCI_VIRT_ID GENERIC_VIRT_ID
#define CCW_VIRT_DEVTYPE GENERIC_VIRT_ID

> I think in either way, we need to abstract the driver for the virtual
> device from the underlying bus infrastructure, which is hypervisor
> and/or platform dependent.

Yes, that sounds sane for case 3. We should just standardize the
interface.

> The abstraction could work roughly like this:
> 
> 
> ==========
> virt_dev.h
> ==========
> struct virt_driver { /* platform independent */
> 	struct device_driver drv;
> 	struct pci_device_id *ids; /* not necessarily PCI */
> };
> struct virt_bus {
> 	/* platform dependent */
> 	long (*transfer)(struct virt_dev *dev, void *buffer,
> 		unsigned long size, int type);
> };

Should this embed a struct bus_type? Or reference a generic_virt_bus?

> struct virt_dev {
> 	struct device dev;
> 	struct virt_driver *driver;
> 	struct virt_bus *bus;
> 	struct pci_device_id id;
> 	int irq;
> };

And that's where I have problems :) The notion of "irq" is far too
platform specific. I can bend my mind round using PCI-like ids for
non-PCI virtualized devices, but an integer is far too small and to
specific for a way to access the device.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 13:39                   ` Cornelia Huck
@ 2007-04-03 14:03                     ` Arnd Bergmann
  2007-04-03 16:07                       ` Cornelia Huck
  0 siblings, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2007-04-03 14:03 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik,
	virtualization, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

On Tuesday 03 April 2007, Cornelia Huck wrote:
> On Tue, 3 Apr 2007 14:15:37 +0200, Arnd Bergmann <arnd@arndb.de> wrote:
> 
> That's OK for a virtualized architecture where the base architecture
> already supports PCI. But a traditional s390 OS would be as unhappy
> with a PCI device as with a device of a completely new type :)

Sure, that was my point from the start.

> There are several options for virtualized devices (and I don't know why
> they shouldn't coexist):
> 
> 1. Emulate a well-known device (like a e1000 network card on PCI or a
> model 3390 dasd on CCW). Existing operating systems can just use them,
> but it's a lot of work in the hypervisor.

Most hypervisors already do this, and it's an unrelated topic. 
What we're trying to achieve is to make sure not every hypervisor
and simulator has to introduce its own set of drivers.


> > struct virt_bus {
> > 	/* platform dependent */
> > 	long (*transfer)(struct virt_dev *dev, void *buffer,
> > 		unsigned long size, int type);
> > };
> 
> Should this embed a struct bus_type? Or reference a generic_virt_bus?

yes, that should embed the bus_type.

> > struct virt_dev {
> > 	struct device dev;
> > 	struct virt_driver *driver;
> > 	struct virt_bus *bus;
> > 	struct pci_device_id id;
> > 	int irq;
> > };
> 
> And that's where I have problems :) The notion of "irq" is far too
> platform specific. I can bend my mind round using PCI-like ids for
> non-PCI virtualized devices, but an integer is far too small and to
> specific for a way to access the device.

Sorry, I've been working too long on the lesser architectures.
IRQ number are evil indeed.
However, I'm pretty sure that we need _some_ abstraction of an
interrupt mechanism here. The easiest way is probably to have a
callback function like
	int (*irq_handler)(struct virt_dev*, unsigned long message);
in the virt_dev.

	Arnd <><

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 14:03                     ` Arnd Bergmann
@ 2007-04-03 16:07                       ` Cornelia Huck
  0 siblings, 0 replies; 36+ messages in thread
From: Cornelia Huck @ 2007-04-03 16:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, Andi Kleen, Jeff Garzik,
	virtualization, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

On Tue, 3 Apr 2007 16:03:14 +0200,
Arnd Bergmann <arnd@arndb.de> wrote:

> > > struct virt_dev {
> > > 	struct device dev;
> > > 	struct virt_driver *driver;
> > > 	struct virt_bus *bus;
> > > 	struct pci_device_id id;
> > > 	int irq;
> > > };
> > 
> > And that's where I have problems :) The notion of "irq" is far too
> > platform specific. I can bend my mind round using PCI-like ids for
> > non-PCI virtualized devices, but an integer is far too small and to
> > specific for a way to access the device.
> 
> Sorry, I've been working too long on the lesser architectures.
> IRQ number are evil indeed.
> However, I'm pretty sure that we need _some_ abstraction of an
> interrupt mechanism here. The easiest way is probably to have a
> callback function like
> 	int (*irq_handler)(struct virt_dev*, unsigned long message);
> in the virt_dev.

Yes, something like
	int (*handler) (struct virt_dev *, struct virt_interrupt_info *);
should cover the needed cases.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-02 21:12   ` Andi Kleen
  2007-04-02 21:33     ` Jeff Garzik
@ 2007-04-03  8:29     ` Christian Borntraeger
  2007-04-03  8:30       ` Andi Kleen
  1 sibling, 1 reply; 36+ messages in thread
From: Christian Borntraeger @ 2007-04-03  8:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: virtualization, Jeremy Fitzhardinge, H. Peter Anvin,
	Virtualization Mailing List, Linux Kernel Mailing List, mathiasen

On Monday 02 April 2007 23:12, Andi Kleen wrote:
> 
> > How would that work in the case where virtualized guests don't have a
> > visible PCI bus, and the virtual environment doesn't pretend to emulate
> > a PCI bus?
> 
> If they emulated one with the appropiate device 
> then distribution driver auto probing would just work transparently for
> them. 

Still, that would only make sense for virtualized platforms that usually have 
a PCI bus. Thinking about seeing a PCI device on ,lets say, s390 is strange.

Christian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03  8:29     ` Christian Borntraeger
@ 2007-04-03  8:30       ` Andi Kleen
  2007-04-03  9:17         ` Cornelia Huck
  0 siblings, 1 reply; 36+ messages in thread
From: Andi Kleen @ 2007-04-03  8:30 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: virtualization, Jeremy Fitzhardinge, H. Peter Anvin,
	Virtualization Mailing List, Linux Kernel Mailing List, mathiasen

On Tuesday 03 April 2007 10:29:06 Christian Borntraeger wrote:
> On Monday 02 April 2007 23:12, Andi Kleen wrote:
> > 
> > > How would that work in the case where virtualized guests don't have a
> > > visible PCI bus, and the virtual environment doesn't pretend to emulate
> > > a PCI bus?
> > 
> > If they emulated one with the appropiate device 
> > then distribution driver auto probing would just work transparently for
> > them. 
> 
> Still, that would only make sense for virtualized platforms that usually have 
> a PCI bus. Thinking about seeing a PCI device on ,lets say, s390 is strange.

If it gets the job done surely you can tolerate a little strangeness?

-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03  8:30       ` Andi Kleen
@ 2007-04-03  9:17         ` Cornelia Huck
  2007-04-03  9:26           ` Andi Kleen
  2007-04-03 17:50           ` Arnd Bergmann
  0 siblings, 2 replies; 36+ messages in thread
From: Cornelia Huck @ 2007-04-03  9:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christian Borntraeger, virtualization, Jeremy Fitzhardinge,
	H. Peter Anvin, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

On Tue, 3 Apr 2007 10:30:36 +0200,
Andi Kleen <ak@suse.de> wrote:

> On Tuesday 03 April 2007 10:29:06 Christian Borntraeger wrote:
> > On Monday 02 April 2007 23:12, Andi Kleen wrote:
> > > 
> > > > How would that work in the case where virtualized guests don't have a
> > > > visible PCI bus, and the virtual environment doesn't pretend to emulate
> > > > a PCI bus?
> > > 
> > > If they emulated one with the appropiate device 
> > > then distribution driver auto probing would just work transparently for
> > > them. 
> > 
> > Still, that would only make sense for virtualized platforms that usually have 
> > a PCI bus. Thinking about seeing a PCI device on ,lets say, s390 is strange.
> 
> If it gets the job done surely you can tolerate a little strangeness?

On s390, it would be more than strangeness. There's no implementation
of PCI at all, someone would have to cook it up - and it wouldn't have
any use beyond those special devices. Since there isn't any bus type
that is available on *all* architectures, a generic "virtual" bus with
very simple probing seems much saner...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03  9:17         ` Cornelia Huck
@ 2007-04-03  9:26           ` Andi Kleen
  2007-04-03 10:51             ` Cornelia Huck
  2007-04-03 15:00             ` Adrian Bunk
  2007-04-03 17:50           ` Arnd Bergmann
  1 sibling, 2 replies; 36+ messages in thread
From: Andi Kleen @ 2007-04-03  9:26 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Christian Borntraeger, virtualization, Jeremy Fitzhardinge,
	H. Peter Anvin, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

> 
> On s390, it would be more than strangeness. There's no implementation
> of PCI at all, someone would have to cook it up - and it wouldn't have
> any use beyond those special devices. Since there isn't any bus type
> that is available on *all* architectures, a generic "virtual" bus with
> very simple probing seems much saner...

You just have to change all the distribution installers then. 
Ok I suppose on s390 that's not that big issue because there are not
that many for s390. But for x86 there exist quite a lot. I suppose
it's easier to change it in the kernel.

-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03  9:26           ` Andi Kleen
@ 2007-04-03 10:51             ` Cornelia Huck
  2007-04-03 15:00             ` Adrian Bunk
  1 sibling, 0 replies; 36+ messages in thread
From: Cornelia Huck @ 2007-04-03 10:51 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christian Borntraeger, virtualization, Jeremy Fitzhardinge,
	H. Peter Anvin, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

On Tue, 3 Apr 2007 11:26:52 +0200,
Andi Kleen <ak@suse.de> wrote:

> > 
> > On s390, it would be more than strangeness. There's no implementation
> > of PCI at all, someone would have to cook it up - and it wouldn't have
> > any use beyond those special devices. Since there isn't any bus type
> > that is available on *all* architectures, a generic "virtual" bus with
> > very simple probing seems much saner...
> 
> You just have to change all the distribution installers then. 
> Ok I suppose on s390 that's not that big issue because there are not
> that many for s390. But for x86 there exist quite a lot. I suppose
> it's easier to change it in the kernel.

Huh? I don't follow you here. Why should this be easier for s390 vs.
x86? (And since there seems to be a trend to use HAL as a device
discovery tool recently: A new bus type is easy enough to add there.)

And I really think we should have a clean design in the kernel instead
of trying to wedge virtual devices into a known system. Exposing
virtual devices (which may be handled totally differently) as PCI
devices just seems hackish to me.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03  9:26           ` Andi Kleen
  2007-04-03 10:51             ` Cornelia Huck
@ 2007-04-03 15:00             ` Adrian Bunk
  1 sibling, 0 replies; 36+ messages in thread
From: Adrian Bunk @ 2007-04-03 15:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Cornelia Huck, Christian Borntraeger, virtualization,
	Jeremy Fitzhardinge, H. Peter Anvin, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

On Tue, Apr 03, 2007 at 11:26:52AM +0200, Andi Kleen wrote:
> > 
> > On s390, it would be more than strangeness. There's no implementation
> > of PCI at all, someone would have to cook it up - and it wouldn't have
> > any use beyond those special devices. Since there isn't any bus type
> > that is available on *all* architectures, a generic "virtual" bus with
> > very simple probing seems much saner...
> 
> You just have to change all the distribution installers then. 
> Ok I suppose on s390 that's not that big issue because there are not
> that many for s390. But for x86 there exist quite a lot. I suppose
> it's easier to change it in the kernel.

I don't get this point.

Compared to whatever will be done in the kernel, any change to a 
distribution installer should be trivial.

And a new release of a distribution with a new kernel might anyway 
usually require some updates to an installer.

> -Andi

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03  9:17         ` Cornelia Huck
  2007-04-03  9:26           ` Andi Kleen
@ 2007-04-03 17:50           ` Arnd Bergmann
  2007-04-03 19:07             ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2007-04-03 17:50 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Virtualization Mailing List, H. Peter Anvin,
	Linux Kernel Mailing List, mathiasen, virtualization

On Tuesday 03 April 2007, Cornelia Huck wrote:
> On s390, it would be more than strangeness. There's no implementation
> of PCI at all, someone would have to cook it up - and it wouldn't have
> any use beyond those special devices. Since there isn't any bus type
> that is available on *all* architectures, a generic "virtual" bus with
> very simple probing seems much saner...

I think we need to separate two problems here:

1. Probing:
That's really what triggered the discussion, PCI probing is well-understood
and implemented on _most_ platforms, so there is some value in reusing it.
When you talk about 'very simple probing', I'm not sure what the most simple
approach could be. Ideas that have been implemented before include:
a) have a limited set of device IDs (e.g. 65535 devices, or a hierarchic tree),
   and try to access each one of them in order to find out if it's there. We
   do that for PCI or CCW, for instance.
b) Have an iterator in the hypervisor (or firmware), to return a handle to
   the first, next or child of a device. We do that for open firmware.
c) ask the hypervisor for an unused device of a given class, which needs to
   be returned to the hypervisor when no longer used. This is how the PS3
   hypervisor works, but it does not play well with the Linux driver model.

2. Device access:
When talking to a virtual device, you want to have at least a way to give
commands to it and a way to get interrupts back. Again, multiple ideas
have been used in the past, and we should choose a subset:
a) PCI-like: mmio using memory and/or I/O space BAR setup, interrupt
   numbers and DMA to guest physical addresses.
b) Channel-like: use an hcall to give commands to the hypervisor, passing
   down a device handle command code and data areas in guest physical space.
   Interrupts return the device handle or a OS-defined per-device value.
c) Minimalistic: Every device is mapped into the guest address space and
   can potentially be remapped into user space. The device memory can be
   shared between guests and/or with the host if that uses the same driver.
   The guest is able to signal the receiving end using an hcall and gets
   interrupts like in b)
d) UNIX-like: devices appear like file descriptors, the guest can do
   operations like read/write/sync/mmap, potentially ioctl on them to talk
   to the host.

	Arnd <><

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 17:50           ` Arnd Bergmann
@ 2007-04-03 19:07             ` Jeremy Fitzhardinge
  2007-04-03 19:42               ` Arnd Bergmann
  0 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-03 19:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Cornelia Huck, Andi Kleen, Christian Borntraeger, virtualization,
	H. Peter Anvin, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

Arnd Bergmann wrote:
> I think we need to separate two problems here:
>
> 1. Probing:
> That's really what triggered the discussion, PCI probing is well-understood
> and implemented on _most_ platforms, so there is some value in reusing it.
> When you talk about 'very simple probing', I'm not sure what the most simple
> approach could be. 

Is probing an interesting problem to consider on its own?  If there's
some hypervisor-agnostic device driver in Linux, then obviously it needs
some way to find the the corresponding (virtual) hardware for it to talk
to.  But that probing mechanism will depend on the actual interface
structure, and is just one of the many problems that need to be solved. 
There's no point in overloading PCI to probe for the device unless
you're actually using PCI to talk to the device.

> Ideas that have been implemented before include:
> a) have a limited set of device IDs (e.g. 65535 devices, or a hierarchic tree),
>    and try to access each one of them in order to find out if it's there. We
>    do that for PCI or CCW, for instance.
> b) Have an iterator in the hypervisor (or firmware), to return a handle to
>    the first, next or child of a device. We do that for open firmware.
> c) ask the hypervisor for an unused device of a given class, which needs to
>    be returned to the hypervisor when no longer used. This is how the PS3
>    hypervisor works, but it does not play well with the Linux driver model.
>   

Xen has xenbus, which is essentially a filesystem-like namespace which
can be walked to find the devices being exposed to a guest.  It is
fairly similar to OFW's device tree.

> 2. Device access:
> When talking to a virtual device, you want to have at least a way to give
> commands to it and a way to get interrupts back. Again, multiple ideas
> have been used in the past, and we should choose a subset:
>   

Let me say up front that I'm skeptical that we can come up with a single
bus-like abstraction which can be a both simple and efficient interface
to all the virtual architectures.  I think a more fruitful path is to
find what pieces of functionality can be made common, with the aim of
having small, simple and self-contained hypervisor-specific backends.

I think this needs to be considered on a class by class basis.  This
thread started with a discussion about entropy sources.  In theory you
could implement it as simply as exposing a mmaped ringbuffer.  There are
some extra complexities deriving from the security requirements though;
for example, all the entropy needs to be kept strictly private to the
domain that consumes it.

But beyond that, there are 3 other important classes of device:

    * console
    * disk
    * networking

(There are obviously more, but these are the must-have.)

Console already provides us with a model to work on, in the form of
hvc-console.  The hvc-console code itself has the bulk of the common
console code, along with a set of very small hypervisor-specific
backends. The Xen console implementation shrunk considerably when we
switched to using it.

If we could do the same thing with disk and net, I would be very happy.

For example, if we wanted to change the Xen frontend/backend disk
interface, we could use SCSI as the basic protocol, and then convert
netfront into a relatively simple scsi driver.  There would still be a
Xen-specific piece, but it should be fairly small and have a clean
interface.  Though the existing interface is pretty simple
shove-this-block-there affair.

I'm not sure what similar common code could be extracted for network
devices.  I haven't looked into it all that closely.

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 19:07             ` Jeremy Fitzhardinge
@ 2007-04-03 19:42               ` Arnd Bergmann
  2007-04-03 19:55                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2007-04-03 19:42 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Cornelia Huck, Andi Kleen, Christian Borntraeger, virtualization,
	H. Peter Anvin, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote:
> Arnd Bergmann wrote:
> > I think we need to separate two problems here:
> >
> > 1. Probing:
> > That's really what triggered the discussion, PCI probing is well-understood
> > and implemented on _most_ platforms, so there is some value in reusing it.
> > When you talk about 'very simple probing', I'm not sure what the most simple
> > approach could be. 
> 
> Is probing an interesting problem to consider on its own?  If there's
> some hypervisor-agnostic device driver in Linux, then obviously it needs
> some way to find the the corresponding (virtual) hardware for it to talk
> to.  But that probing mechanism will depend on the actual interface
> structure, and is just one of the many problems that need to be solved. 
> There's no point in overloading PCI to probe for the device unless
> you're actually using PCI to talk to the device.

We already have device drivers for physical devices that can be attached
to different buses. The EHCI USB is an example of a driver that can 
be for instance PCI, OF or an on-chip device. Moreover, you can have an
abstracted device behind it that does not need to know about the transport,
like the SCSI disk driver does not care if it is talking to an ATA, 
parallel SCSI or SAS chip, or even which controller that is.

> Let me say up front that I'm skeptical that we can come up with a single
> bus-like abstraction which can be a both simple and efficient interface
> to all the virtual architectures.  I think a more fruitful path is to
> find what pieces of functionality can be made common, with the aim of
> having small, simple and self-contained hypervisor-specific backends.
> 
> I think this needs to be considered on a class by class basis.  This
> thread started with a discussion about entropy sources.  In theory you
> could implement it as simply as exposing a mmaped ringbuffer.  There are
> some extra complexities deriving from the security requirements though;
> for example, all the entropy needs to be kept strictly private to the
> domain that consumes it.
> 
> But beyond that, there are 3 other important classes of device:
> 
>     * console
>     * disk
>     * networking
> 
> (There are obviously more, but these are the must-have.)
> 
> Console already provides us with a model to work on, in the form of
> hvc-console.  The hvc-console code itself has the bulk of the common
> console code, along with a set of very small hypervisor-specific
> backends. The Xen console implementation shrunk considerably when we
> switched to using it.

console is also the least problematic interface, you can do it over
practically anything.
 
> If we could do the same thing with disk and net, I would be very happy.
> 
> For example, if we wanted to change the Xen frontend/backend disk
> interface, we could use SCSI as the basic protocol, and then convert
> netfront into a relatively simple scsi driver.  There would still be a
> Xen-specific piece, but it should be fairly small and have a clean
> interface.  Though the existing interface is pretty simple
> shove-this-block-there affair.

Doing a SCSI driver has been tried before, with ibmvscsi. Not good.
The interesting question about block devices is how to handle concurrency
and interrupt mitigation. An efficient interface should

- have asynchronous notification, not sleep until the transfer is complete
- allow multiple blocks to be in flight simultaneously, so the host can
  reorder the requests if it is smart enough
- give only a single interrupt when multiple transfers have completed

minor optimizations could be
- give an interrupt early when some transfers are complete
- allow I/O barriers to be inserted in the stream
- allow marking blocks as more or less important (readahead vs. read)
- provide passthrough of SG_IO or similar for optical media
  (e.g. DVD writer)

> I'm not sure what similar common code could be extracted for network
> devices.  I haven't looked into it all that closely.

One way to do networking would be to simply provide a shared memory area
that everyone can write to, then use a ring buffer and atomic operations
to synchronize between the guests, and a method to send interrupts to the
others for flow control.

	Arnd <><

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 19:42               ` Arnd Bergmann
@ 2007-04-03 19:55                 ` Jeremy Fitzhardinge
  2007-04-03 20:03                   ` H. Peter Anvin
  2007-04-03 20:50                   ` Arnd Bergmann
  0 siblings, 2 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-03 19:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Cornelia Huck, Andi Kleen, Christian Borntraeger, virtualization,
	H. Peter Anvin, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

Arnd Bergmann wrote:
> We already have device drivers for physical devices that can be attached
> to different buses. The EHCI USB is an example of a driver that can 
> be for instance PCI, OF or an on-chip device. Moreover, you can have an
> abstracted device behind it that does not need to know about the transport,
> like the SCSI disk driver does not care if it is talking to an ATA, 
> parallel SCSI or SAS chip, or even which controller that is.
>   

Yes, that kind of layering is useful when there's enough of an
abstraction gap to fit the layers into.  USB is particularly simple in
that way, since it can be made to travel nicely over any number of
transports.

> console is also the least problematic interface, you can do it over
> practically anything.
>   

Sure.  But its interesting that there are savings to be had.

> Doing a SCSI driver has been tried before, with ibmvscsi. Not good.
>   

OK, interesting.  People had proposed using SCSI as the interface, but I
wasn't aware of any results from doing that.  How is it not good?

> The interesting question about block devices is how to handle concurrency
> and interrupt mitigation. An efficient interface should
>
> - have asynchronous notification, not sleep until the transfer is complete
> - allow multiple blocks to be in flight simultaneously, so the host can
>   reorder the requests if it is smart enough
> - give only a single interrupt when multiple transfers have completed
>   

Yes.  The Xen block interface is already pretty efficient in these respects.

>> I'm not sure what similar common code could be extracted for network
>> devices.  I haven't looked into it all that closely.
>>     
>
> One way to do networking would be to simply provide a shared memory area
> that everyone can write to, then use a ring buffer and atomic operations
> to synchronize between the guests, and a method to send interrupts to the
> others for flow control.
>   

Yes, and that's the core of the Xen netfront.  But is there really much
code which can be shared between different hypervisors?  When you get
down to it, all the real code is hypervisor-specific stuff for setting
up ringbuffers and dealing with interrupts.  Like all the other network
drivers.

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 19:55                 ` Jeremy Fitzhardinge
@ 2007-04-03 20:03                   ` H. Peter Anvin
  2007-04-03 21:00                     ` Jeremy Fitzhardinge
  2007-04-03 20:50                   ` Arnd Bergmann
  1 sibling, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2007-04-03 20:03 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Virtualization Mailing List, Arnd Bergmann, Cornelia Huck,
	Linux Kernel Mailing List, mathiasen, virtualization

Jeremy Fitzhardinge wrote:
> 
> Yes, and that's the core of the Xen netfront.  But is there really much
> code which can be shared between different hypervisors?  When you get
> down to it, all the real code is hypervisor-specific stuff for setting
> up ringbuffers and dealing with interrupts.  Like all the other network
> drivers.
> 

One thing, Jeremy, which I think is being a bit misleading here: you're 
focusing on big, performance-critical stuff.  Those things are going to 
be the ones which has the most win to implement in hypervisor-specific 
ways.  Although we can offer models for some hypervisors (and G-d knows 
there are enough implementations out there of virtual disk which are 
almost identical), they're clearly not going to be universal.

However, there are other things; console is some, or my original 
example, which was random number generation.  For those, the benefit of 
unification is proportionally greater, simply because the win of 
anything hypervisor-specific is much smaller.

	-hpa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 20:03                   ` H. Peter Anvin
@ 2007-04-03 21:00                     ` Jeremy Fitzhardinge
  2007-04-03 21:45                       ` H. Peter Anvin
  2007-04-03 21:51                       ` Arnd Bergmann
  0 siblings, 2 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-03 21:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Virtualization Mailing List, Arnd Bergmann, Cornelia Huck,
	Linux Kernel Mailing List, mathiasen, virtualization

H. Peter Anvin wrote:
> However, there are other things; console is some, or my original
> example, which was random number generation.  For those, the benefit
> of unification is proportionally greater, simply because the win of
> anything hypervisor-specific is much smaller. 

So, what you're saying is:

   1. assuming there's going to be a vast number of miscellaneous devices
   2. it would be best if there were one per device rather than one per
      hypervisor per device
   3. so we'd have one linux device driver

But this implies that the work is just pushed off into all the
hypervisors to support this new device over the generic interface;
there's no overall reduction of code or complexity, other than making
"wc" on the kernel source smaller.

That said, something like USB is probably the best bet for this kind of
low-performance device.  I think.  Not that I really know anything about
USB.

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 21:00                     ` Jeremy Fitzhardinge
@ 2007-04-03 21:45                       ` H. Peter Anvin
  2007-04-03 21:51                       ` Arnd Bergmann
  1 sibling, 0 replies; 36+ messages in thread
From: H. Peter Anvin @ 2007-04-03 21:45 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Arnd Bergmann, Cornelia Huck, Andi Kleen, Christian Borntraeger,
	virtualization, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

Jeremy Fitzhardinge wrote:
> 
> So, what you're saying is:
> 
>    1. assuming there's going to be a vast number of miscellaneous devices
>    2. it would be best if there were one per device rather than one per
>       hypervisor per device
>    3. so we'd have one linux device driver
> 
> But this implies that the work is just pushed off into all the
> hypervisors to support this new device over the generic interface;
> there's no overall reduction of code or complexity, other than making
> "wc" on the kernel source smaller.
> 

Sure there is, assuming you deal about heterogenous clients.  I'm not 
sure Xen is (although that is, as far as I understand, being remedied), 
which might explain your different perspective.

Consider that this may not even be about Linux -- having these standard 
devices would enable, say, 'doze device drivers to be written and shared.

> That said, something like USB is probably the best bet for this kind of
> low-performance device.  I think.  Not that I really know anything about
> USB.

USB is evil in the extreme for this kind of stuff.  Although in theory 
you can have any HCI you want, in practice the ones that are implemented 
requires a very complex framework for full compatiblity.

	-hpa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 21:00                     ` Jeremy Fitzhardinge
  2007-04-03 21:45                       ` H. Peter Anvin
@ 2007-04-03 21:51                       ` Arnd Bergmann
  2007-04-03 22:10                         ` H. Peter Anvin
  1 sibling, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2007-04-03 21:51 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Cornelia Huck, Andi Kleen, Christian Borntraeger,
	virtualization, Virtualization Mailing List,
	Linux Kernel Mailing List, mathiasen

On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote:
> That said, something like USB is probably the best bet for this kind of
> low-performance device.  I think.  Not that I really know anything about
> USB.

USB has the disadvantage that it is more complex than PCI and requires
significantly more code to simulate on the host side.

On the plus side, I think it should be possible to implement a virtual
USB host on s390, which is not possible with PCI, but that again takes
a lot of work to implement.

One interesting aspect of the PS3 hypervisor is that some of the
low-speed interfaces are implemented as a virtual UART, meaning
something that only has read and write operations and uses an
interrupt for flow control. The implementation in 
drivers/ps3/vuart.c is probably more complex than what we want
as a generic transport mechanism, but simply having a bidirectional
data stream sounds like an ideal abstraction for the "simple"
case. Some more or less obvious users of this include:

- console
- additional tty
- random
- slow network (using ppp)
- printer
- watchdog
- hid (e.g. mouse)
- system management (like ps3)
- fast network (in combination with
  shared memory segment)

The transport can be hypervisor specific, e.g. there could be
a virtual PCI serial port on kvm, an hcall interface on the ps3
and a virtual CTC on s390 (kidding), while all of them can have
the same kind of hardware _behind_ the serial connection.

	Arnd <><

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 21:51                       ` Arnd Bergmann
@ 2007-04-03 22:10                         ` H. Peter Anvin
  2007-04-03 22:49                           ` Arnd Bergmann
  0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2007-04-03 22:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Virtualization Mailing List, Cornelia Huck,
	Linux Kernel Mailing List, mathiasen, virtualization

Arnd Bergmann wrote:
> 
> One interesting aspect of the PS3 hypervisor is that some of the
> low-speed interfaces are implemented as a virtual UART, meaning
> something that only has read and write operations and uses an
> interrupt for flow control. The implementation in 
> drivers/ps3/vuart.c is probably more complex than what we want
> as a generic transport mechanism, but simply having a bidirectional
> data stream sounds like an ideal abstraction for the "simple"
> case. Some more or less obvious users of this include:
> 
> - console
> - additional tty
> - random
> - slow network (using ppp)
> - printer
> - watchdog
> - hid (e.g. mouse)
> - system management (like ps3)
> - fast network (in combination with
>   shared memory segment)
> 
> The transport can be hypervisor specific, e.g. there could be
> a virtual PCI serial port on kvm, an hcall interface on the ps3
> and a virtual CTC on s390 (kidding), while all of them can have
> the same kind of hardware _behind_ the serial connection.
> 

Note that at least for PIO-based devices, there is nothing that says you 
can't implement PCI over another transport, if you wish.  It's really 
just a very simple RPC protocol.

DMA is trickier, as it makes the data appear into the address space of 
the guest in a way that is both device- and host-dependent (in the 
presence of PCI domains, IOMMU etc.)  There may be reason to avoid DMA 
for that reason.

	-hpa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 22:10                         ` H. Peter Anvin
@ 2007-04-03 22:49                           ` Arnd Bergmann
  2007-04-04  0:52                             ` H. Peter Anvin
  0 siblings, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2007-04-03 22:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Cornelia Huck, Andi Kleen,
	Christian Borntraeger, virtualization,
	Virtualization Mailing List, Linux Kernel Mailing List, mathiasen

On Wednesday 04 April 2007, H. Peter Anvin wrote:
> Note that at least for PIO-based devices, there is nothing that says you 
> can't implement PCI over another transport, if you wish.  It's really 
> just a very simple RPC protocol.

The PIO aspect of PCI is simple, yes, except on architectures that don't
have the concept of PIO or even uncached memory, but even that can
be done by defining readl/writel/inl/outl/... as hcalls.

The tricky part about PCI is the device probing, everything about config
space accesses, interrupt swizzling, bus/device/function numbers and
base address registers becomes a pointless excercise when the other side
is just faking it.

> DMA is trickier, as it makes the data appear into the address space of 
> the guest in a way that is both device- and host-dependent (in the 
> presence of PCI domains, IOMMU etc.)  There may be reason to avoid DMA 
> for that reason.

Right, PCI DMA and virtualization don't mix. DMA in general is fine though,
as long as your devices (real or virtual) see the guest physical addresses
as a contiguous 64 bit range and have well-defined semantics about what
addresses are accessed in what way.
When you think of file read/write syscalls as DMA into user space, it's
a very clean concept. Async I/O somewhat less so, but still pretty good.

	Arnd <><

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 22:49                           ` Arnd Bergmann
@ 2007-04-04  0:52                             ` H. Peter Anvin
  2007-04-04 13:11                               ` Arnd Bergmann
  0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2007-04-04  0:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Virtualization Mailing List, Cornelia Huck,
	Linux Kernel Mailing List, mathiasen, virtualization

Arnd Bergmann wrote:
> On Wednesday 04 April 2007, H. Peter Anvin wrote:
>> Note that at least for PIO-based devices, there is nothing that says you 
>> can't implement PCI over another transport, if you wish.  It's really 
>> just a very simple RPC protocol.
> 
> The PIO aspect of PCI is simple, yes, except on architectures that don't
> have the concept of PIO or even uncached memory, but even that can
> be done by defining readl/writel/inl/outl/... as hcalls.
> 
> The tricky part about PCI is the device probing, everything about config
> space accesses, interrupt swizzling, bus/device/function numbers and
> base address registers becomes a pointless excercise when the other side
> is just faking it.

Configuration space access is platform-dependent.  It's only defined to 
work in a specific way on x86 platforms.

"Interrupt swizzling" is really totally independent of PCI.  ALL PCI 
really provides is up to four interrupts per device (not counting 
MSI/MSI-X) and an 8-bit writable field which the platform can choose to 
use to hold interrupt information.  That's all.  The rest is all 
platform information.

PCI enumeration is hardly complex.  Most of the stuff that doesn't apply 
to you you can generally ignore, as is done by other busses like 
HyperTransport when they emulate PCI.

That being said, on platforms which are PCI-centric, such as x86, this 
of course makes it a lot easier to produce virtual devices which work 
across hypervisors, since the device model, of *any* operating system is 
set up to handle them.

	-hpa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-04  0:52                             ` H. Peter Anvin
@ 2007-04-04 13:11                               ` Arnd Bergmann
  2007-04-04 15:50                                 ` H. Peter Anvin
  0 siblings, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2007-04-04 13:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Virtualization Mailing List, Cornelia Huck,
	Linux Kernel Mailing List, mathiasen, virtualization

On Wednesday 04 April 2007, H. Peter Anvin wrote:
> Configuration space access is platform-dependent.  It's only defined to 
> work in a specific way on x86 platforms.
> 
> "Interrupt swizzling" is really totally independent of PCI.  ALL PCI 
> really provides is up to four interrupts per device (not counting 
> MSI/MSI-X) and an 8-bit writable field which the platform can choose to 
> use to hold interrupt information.  That's all.  The rest is all 
> platform information.
> 
> PCI enumeration is hardly complex.  Most of the stuff that doesn't apply 
> to you you can generally ignore, as is done by other busses like 
> HyperTransport when they emulate PCI.

You still don't get my point: On a platform that doesn't have interrupt
numbers, and where most of the fields in the config space don't correspond
do anything that is already there, you really don't want to invent
a set of new hcalls that implement emulation, to get something as
simple as a pipe.

wc drivers/pci/*.[ch] include/asm-i386/{pci,io}.h lib/iomap*.c \
	arch/i386/pci/*.c kernel/irq/*.c
17015  59037 463967 total

Even if you only need half of that code in reality, reimplementing
all that in both the kernel and in the hypervisor is an enourmous
effort. We've seen that before on the ps3, which initially faked
a virtual PCI bus just for the USB controller, but doing something
like that requires adding abstraction layers, to decide whether to
implement e.g. an inb as a hypercall or as a memory read.

> That being said, on platforms which are PCI-centric, such as x86, this 
> of course makes it a lot easier to produce virtual devices which work 
> across hypervisors, since the device model, of *any* operating system is 
> set up to handle them.

Yes, as I said there are two separate problems. I really think that
a standardized virtual driver interface should be modeled after
kernel <-> user interfaces, not hardware <-> kernel interfaces.

Once we know what operations we want (e.g. read, write and SIGIO,
or some other set of primitives), it will be good to provide a
virtual PCI device that can be used as one transport mechanism
below it. Using PCI device IDs to tell what functionality is
provided by the device would provide a reasonable method for
autoprobing.

	Arnd <><

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-04 13:11                               ` Arnd Bergmann
@ 2007-04-04 15:50                                 ` H. Peter Anvin
  0 siblings, 0 replies; 36+ messages in thread
From: H. Peter Anvin @ 2007-04-04 15:50 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Virtualization Mailing List, Cornelia Huck,
	Linux Kernel Mailing List, mathiasen, virtualization

Arnd Bergmann wrote:
> 
>> That being said, on platforms which are PCI-centric, such as x86, this 
>> of course makes it a lot easier to produce virtual devices which work 
>> across hypervisors, since the device model, of *any* operating system is 
>> set up to handle them.
> 
> Yes, as I said there are two separate problems. I really think that
> a standardized virtual driver interface should be modeled after
> kernel <-> user interfaces, not hardware <-> kernel interfaces.
> 
> Once we know what operations we want (e.g. read, write and SIGIO,
> or some other set of primitives), it will be good to provide a
> virtual PCI device that can be used as one transport mechanism
> below it. Using PCI device IDs to tell what functionality is
> provided by the device would provide a reasonable method for
> autoprobing.
> 

That seems like a reasonable approach.  I *do* care about 
hardware-equivalent interfaces, because they, too, keep getting 
reinvented, but it seems reasonable to approach it in a layered fashion 
like you describe.

	-hpa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: A set of "standard" virtual devices?
  2007-04-03 19:55                 ` Jeremy Fitzhardinge
  2007-04-03 20:03                   ` H. Peter Anvin
@ 2007-04-03 20:50                   ` Arnd Bergmann
  1 sibling, 0 replies; 36+ messages in thread
From: Arnd Bergmann @ 2007-04-03 20:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Virtualization Mailing List, H. Peter Anvin, Cornelia Huck,
	Linux Kernel Mailing List, mathiasen, virtualization

On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote:
> > Doing a SCSI driver has been tried before, with ibmvscsi. Not good.
> >   
> 
> OK, interesting.  People had proposed using SCSI as the interface, but I
> wasn't aware of any results from doing that.  How is it not good?
> 

SCSI is really overengineered for something as simple as a block interface.
A large part of the SCSI stack deals only with error handling, which
you don't want to burden the guests with at all, since most error conditions
can be handled fine by the host.
Another big aspect of SCSI is device enumeration and probing. Doing it
the SCSI way is particularly pointless. It's much simpler to have one
device with its own I/O interface at the hcall layer, and one interrupt
number for the block device, instead of faking the full hca/bus/dev/lun
hierarchy.

	Arnd <><

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2007-04-04 15:50 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4611652F.700@zytor.com>
2007-04-02 20:56 ` A set of "standard" virtual devices? Jeremy Fitzhardinge
2007-04-02 21:12   ` Andi Kleen
2007-04-02 21:33     ` Jeff Garzik
2007-04-02 21:36       ` Andi Kleen
2007-04-02 21:42         ` Jeremy Fitzhardinge
2007-04-02 21:53           ` Anthony Liguori
2007-04-02 22:04             ` Jeremy Fitzhardinge
2007-04-02 22:10           ` H. Peter Anvin
2007-04-02 22:25             ` Jeff Garzik
2007-04-02 22:30               ` H. Peter Anvin
2007-04-03  9:41             ` Arnd Bergmann
2007-04-03 10:41               ` Cornelia Huck
2007-04-03 12:15                 ` Arnd Bergmann
2007-04-03 13:39                   ` Cornelia Huck
2007-04-03 14:03                     ` Arnd Bergmann
2007-04-03 16:07                       ` Cornelia Huck
2007-04-03  8:29     ` Christian Borntraeger
2007-04-03  8:30       ` Andi Kleen
2007-04-03  9:17         ` Cornelia Huck
2007-04-03  9:26           ` Andi Kleen
2007-04-03 10:51             ` Cornelia Huck
2007-04-03 15:00             ` Adrian Bunk
2007-04-03 17:50           ` Arnd Bergmann
2007-04-03 19:07             ` Jeremy Fitzhardinge
2007-04-03 19:42               ` Arnd Bergmann
2007-04-03 19:55                 ` Jeremy Fitzhardinge
2007-04-03 20:03                   ` H. Peter Anvin
2007-04-03 21:00                     ` Jeremy Fitzhardinge
2007-04-03 21:45                       ` H. Peter Anvin
2007-04-03 21:51                       ` Arnd Bergmann
2007-04-03 22:10                         ` H. Peter Anvin
2007-04-03 22:49                           ` Arnd Bergmann
2007-04-04  0:52                             ` H. Peter Anvin
2007-04-04 13:11                               ` Arnd Bergmann
2007-04-04 15:50                                 ` H. Peter Anvin
2007-04-03 20:50                   ` Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).