Re: A set of "standard" virtual devices?

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

From: Arnd Bergmann <arnd@arndb.de>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>, Andi Kleen <ak@suse.de>,
	Christian Borntraeger <borntrae@de.ibm.com>,
	virtualization@lists.linux-foundation.org,
	"H. Peter Anvin" <hpa@zytor.com>,
	Virtualization Mailing List <virtualization@lists.osdl.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	mathiasen@gmail.com
Subject: Re: A set of "standard" virtual devices?
Date: Tue, 3 Apr 2007 21:42:51 +0200	[thread overview]
Message-ID: <200704032142.51976.arnd@arndb.de> (raw)
In-Reply-To: <4612A5F0.2080609@goop.org>

On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote:
> Arnd Bergmann wrote:
> > I think we need to separate two problems here:
> >
> > 1. Probing:
> > That's really what triggered the discussion, PCI probing is well-understood
> > and implemented on _most_ platforms, so there is some value in reusing it.
> > When you talk about 'very simple probing', I'm not sure what the most simple
> > approach could be. 
> 
> Is probing an interesting problem to consider on its own?  If there's
> some hypervisor-agnostic device driver in Linux, then obviously it needs
> some way to find the the corresponding (virtual) hardware for it to talk
> to.  But that probing mechanism will depend on the actual interface
> structure, and is just one of the many problems that need to be solved. 
> There's no point in overloading PCI to probe for the device unless
> you're actually using PCI to talk to the device.

We already have device drivers for physical devices that can be attached
to different buses. The EHCI USB is an example of a driver that can 
be for instance PCI, OF or an on-chip device. Moreover, you can have an
abstracted device behind it that does not need to know about the transport,
like the SCSI disk driver does not care if it is talking to an ATA, 
parallel SCSI or SAS chip, or even which controller that is.

> Let me say up front that I'm skeptical that we can come up with a single
> bus-like abstraction which can be a both simple and efficient interface
> to all the virtual architectures.  I think a more fruitful path is to
> find what pieces of functionality can be made common, with the aim of
> having small, simple and self-contained hypervisor-specific backends.
> 
> I think this needs to be considered on a class by class basis.  This
> thread started with a discussion about entropy sources.  In theory you
> could implement it as simply as exposing a mmaped ringbuffer.  There are
> some extra complexities deriving from the security requirements though;
> for example, all the entropy needs to be kept strictly private to the
> domain that consumes it.
> 
> But beyond that, there are 3 other important classes of device:
> 
>     * console
>     * disk
>     * networking
> 
> (There are obviously more, but these are the must-have.)
> 
> Console already provides us with a model to work on, in the form of
> hvc-console.  The hvc-console code itself has the bulk of the common
> console code, along with a set of very small hypervisor-specific
> backends. The Xen console implementation shrunk considerably when we
> switched to using it.

console is also the least problematic interface, you can do it over
practically anything.
 
> If we could do the same thing with disk and net, I would be very happy.
> 
> For example, if we wanted to change the Xen frontend/backend disk
> interface, we could use SCSI as the basic protocol, and then convert
> netfront into a relatively simple scsi driver.  There would still be a
> Xen-specific piece, but it should be fairly small and have a clean
> interface.  Though the existing interface is pretty simple
> shove-this-block-there affair.

Doing a SCSI driver has been tried before, with ibmvscsi. Not good.
The interesting question about block devices is how to handle concurrency
and interrupt mitigation. An efficient interface should

- have asynchronous notification, not sleep until the transfer is complete
- allow multiple blocks to be in flight simultaneously, so the host can
  reorder the requests if it is smart enough
- give only a single interrupt when multiple transfers have completed

minor optimizations could be
- give an interrupt early when some transfers are complete
- allow I/O barriers to be inserted in the stream
- allow marking blocks as more or less important (readahead vs. read)
- provide passthrough of SG_IO or similar for optical media
  (e.g. DVD writer)

> I'm not sure what similar common code could be extracted for network
> devices.  I haven't looked into it all that closely.

One way to do networking would be to simply provide a shared memory area
that everyone can write to, then use a ring buffer and atomic operations
to synchronize between the guests, and a method to send interrupts to the
others for flow control.

	Arnd <><

next prev parent reply	other threads:[~2007-04-03 19:42 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4611652F.700@zytor.com>
2007-04-02 20:56 ` A set of "standard" virtual devices? Jeremy Fitzhardinge
2007-04-02 21:12   ` Andi Kleen
2007-04-02 21:33     ` Jeff Garzik
2007-04-02 21:36       ` Andi Kleen
2007-04-02 21:42         ` Jeremy Fitzhardinge
2007-04-02 21:53           ` Anthony Liguori
2007-04-02 22:04             ` Jeremy Fitzhardinge
2007-04-02 22:10           ` H. Peter Anvin
2007-04-02 22:25             ` Jeff Garzik
2007-04-02 22:30               ` H. Peter Anvin
2007-04-03  9:41             ` Arnd Bergmann
2007-04-03 10:41               ` Cornelia Huck
2007-04-03 12:15                 ` Arnd Bergmann
2007-04-03 13:39                   ` Cornelia Huck
2007-04-03 14:03                     ` Arnd Bergmann
2007-04-03 16:07                       ` Cornelia Huck
2007-04-03  8:29     ` Christian Borntraeger
2007-04-03  8:30       ` Andi Kleen
2007-04-03  9:17         ` Cornelia Huck
2007-04-03  9:26           ` Andi Kleen
2007-04-03 10:51             ` Cornelia Huck
2007-04-03 15:00             ` Adrian Bunk
2007-04-03 17:50           ` Arnd Bergmann
2007-04-03 19:07             ` Jeremy Fitzhardinge
2007-04-03 19:42               ` Arnd Bergmann [this message]
2007-04-03 19:55                 ` Jeremy Fitzhardinge
2007-04-03 20:03                   ` H. Peter Anvin
2007-04-03 21:00                     ` Jeremy Fitzhardinge
2007-04-03 21:45                       ` H. Peter Anvin
2007-04-03 21:51                       ` Arnd Bergmann
2007-04-03 22:10                         ` H. Peter Anvin
2007-04-03 22:49                           ` Arnd Bergmann
2007-04-04  0:52                             ` H. Peter Anvin
2007-04-04 13:11                               ` Arnd Bergmann
2007-04-04 15:50                                 ` H. Peter Anvin
2007-04-03 20:50                   ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200704032142.51976.arnd@arndb.de \
    --to=arnd@arndb.de \
    --cc=ak@suse.de \
    --cc=borntrae@de.ibm.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathiasen@gmail.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=virtualization@lists.osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).