qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stuart Yoder <b08248@gmail.com>
Cc: kvm@vger.kernel.org, Benjamin Herrenschmidt <benh@au.ibm.com>,
	qemu-devel@nongnu.org, agraf@suse.de, alex.williamson@redhat.com,
	avi@redhat.com, Scott Wood <scottwood@freescale.com>
Subject: Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
Date: Mon, 26 Sep 2011 17:51:44 +1000	[thread overview]
Message-ID: <20110926075144.GT12286@yookeroo.fritz.box> (raw)
In-Reply-To: <CALRxmdCmJ8u7913iBnfdWfo3G_O1ij7iJ8bxiC7L+8Ne5_482A@mail.gmail.com>

On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote:
> Based on the discussions over the last couple of weeks
> I have updated the device fd file layout proposal and
> tried to specify it a bit more formally.
> 
> ===============================================================
> 
> 1.  Overview
> 
>   This specification describes the layout of device files
>   used in the context of vfio, which gives user space
>   direct access to I/O devices that have been bound to
>   vfio.
> 
>   When a device fd is opened and read, offset 0x0 contains
>   a fixed sized header followed by a number of variable length
>   records that describe different characteristics
>   of the device-- addressable regions, interrupts, etc.
> 
>   0x0  +-------------+-------------+
>        |         magic             | u32  // identifies this as a vfio
> device file
>        +---------------------------+         and identifies the type of bus
>        |         version           | u32  // specifies the version of this
>        +---------------------------+
>        |         flags             | u32  // encodes any flags
>        +---------------------------+
>        |  dev info record 0        |
>        |    type                   | u32   // type of record
>        |    rec_len                | u32   // length in bytes of record
>        |                           |          (including record header)
>        |    flags                  | u32   // type specific flags
>        |    ...content...          |       // record content, which could
>        +---------------------------+       // include sub-records
>        |  dev info record 1        |
>        +---------------------------+
>        |  dev info record N        |
>        +---------------------------+

I really should have chimed in on this earlier, but I've been very
busy.

Um, not to put too fine a point on it, this is madness.

Yes, it's very flexible and can thereby cover a very wide range of
cases.  But it's much, much too complex.  Userspace has to parse a
complex, multilayered data structure, with variable length elements
just to get an address at which to do IO.  I can pretty much guarantee
that if we went with this, most userspace programs using this
interface would just ignore this metadata and directly map the
offsets at which they happen to know the kernel will put things for
the type of device they care about.

_At least_ for PCI, I think the original VFIO layout of each BAR at a
fixed, well known offset is much better.  Despite its limitations,
just advertising a "device type" ID which describes one of a few fixed
layouts would be preferable to this.  I'm still hoping, that we can do
a bit better than that.  But we should try really hard to at the very
least force the metadata into a simple array of resources each with a
fixed size record describing it, even if it means some space wastage
with occasionally-used fields.  Anything more complex than that and
userspace is just never going to use it properly.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

  parent reply	other threads:[~2011-09-26  8:21 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-09 13:11 [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files Stuart Yoder
2011-09-09 13:16 ` Stuart Yoder
2011-09-19 15:16 ` Alex Williamson
2011-09-19 19:37   ` Scott Wood
2011-09-19 21:07     ` Alex Williamson
2011-09-19 21:15       ` Scott Wood
2011-09-26  7:51 ` David Gibson [this message]
2011-09-26 10:04   ` Alexander Graf
2011-09-26 18:34     ` Alex Williamson
2011-09-26 20:03       ` Stuart Yoder
2011-09-26 20:42         ` Alex Williamson
2011-09-26 23:59       ` Scott Wood
2011-09-27  0:45         ` Alex Williamson
2011-09-27 21:28           ` Scott Wood
2011-09-28  2:40             ` Alex Williamson
2011-09-28  8:58               ` Alexander Graf
2011-09-30  8:55                 ` David Gibson
2011-09-30  8:50         ` David Gibson
2011-09-30  8:46       ` David Gibson
2011-09-30 16:37         ` Alex Williamson
2011-09-30 21:59         ` Alex Williamson
2011-09-30  8:40     ` David Gibson
2011-09-26 19:57   ` Stuart Yoder
2011-09-27  0:25     ` Scott Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110926075144.GT12286@yookeroo.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=avi@redhat.com \
    --cc=b08248@gmail.com \
    --cc=benh@au.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).