qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Stuart Yoder <b08248@gmail.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, agraf@suse.de,
	avi@redhat.com, Scott Wood <scottwood@freescale.com>
Subject: Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
Date: Mon, 19 Sep 2011 09:16:00 -0600	[thread overview]
Message-ID: <1316445361.4443.29.camel@bling.home> (raw)
In-Reply-To: <CALRxmdCmJ8u7913iBnfdWfo3G_O1ij7iJ8bxiC7L+8Ne5_482A@mail.gmail.com>

On Fri, 2011-09-09 at 08:11 -0500, Stuart Yoder wrote:
> Based on the discussions over the last couple of weeks
> I have updated the device fd file layout proposal and
> tried to specify it a bit more formally.
> 
> ===============================================================
> 
> 1.  Overview
> 
>   This specification describes the layout of device files
>   used in the context of vfio, which gives user space
>   direct access to I/O devices that have been bound to
>   vfio.
> 
>   When a device fd is opened and read, offset 0x0 contains
>   a fixed sized header followed by a number of variable length
>   records that describe different characteristics
>   of the device-- addressable regions, interrupts, etc.
> 
>   0x0  +-------------+-------------+
>        |         magic             | u32  // identifies this as a vfio
> device file
>        +---------------------------+         and identifies the type of bus
>        |         version           | u32  // specifies the version of this
>        +---------------------------+
>        |         flags             | u32  // encodes any flags
>        +---------------------------+
>        |  dev info record 0        |
>        |    type                   | u32   // type of record
>        |    rec_len                | u32   // length in bytes of record
>        |                           |          (including record header)
>        |    flags                  | u32   // type specific flags
>        |    ...content...          |       // record content, which could
>        +---------------------------+       // include sub-records
>        |  dev info record 1        |
>        +---------------------------+
>        |  dev info record N        |
>        +---------------------------+
> 
>   The device info records following the file header may have
>   the following record types each with content encoded in
>   a record specific way:
> 
>   ------------+-------+------------------------------------------------------
>               |  type |
>    Region     |  num  | Description
>   ---------------------------------------------------------------------------
>   REGION           1    describes an addressable address range for the device
>   DTPATH           2    describes the device tree path for the device
>   DTINDEX          3    describes the index into the related device tree
>                           property (reg,ranges,interrupts,interrupt-map)
>   INTERRUPT        4    describes an interrupt for the device
>   PCI_CONFIG_SPACE 5    property identifying a region as PCI config space
>   PCI_BAR_INDEX    6    describes the BAR index for a PCI region
>   PHYS_ADDR        7    describes the physical address of the region
>   ---------------------------------------------------------------------------
> 
> 2. Header
> 
> The header is located at offset 0x0 in the device fd
> and has the following format:
> 
>     struct devfd_header {
>         __u32 magic;
>         __u32 version;
>         __u32 flags;
>     };
> 
>     The 'magic' field contains a magic value that will
>     identify the type bus the device is on.  Valid values
>     are:
> 
>         0x70636900   // "pci" - PCI device
>         0x64740000   // "dt" - device tree (system bus)
> 
> 3. Region
> 
>   A REGION record an addressable address region for the device.
> 
>     struct devfd_region {
>         __u32 type;   // must be 0x1
>         __u32 record_len;
>         __u32 flags;
>         __u64 offset; // seek offset to region from beginning
>                       // of file
>         __u64 len   ; // length of the region
>     };
> 
>   The 'flags' field supports one flag:
> 
>       IS_MMAPABLE
> 
> 4. Device Tree Path (DTPATH)
> 
>   A DTPATH record is a sub-record of a REGION and describes
>   the path to a device tree node for the region

Can we better distinguish sub-records from records?  I assume we're
trying to be as versatile as possible by having a single "type" address
space, but is this going to lead to implementation problems?  A DTPATH
as a record, an INTERRUPT as a sub-record, etc.  Should we instead have
a "subtype" address space per "type" and per device type?  For a "dt"
device, it looks like we really have:

      * REGION (type 0)
              * DTPATH (subtype 0)
              * DTINDEX (subtype 1)
              * PHYS_ADDR (subtype 2)
      * INTERRUPT (type 1)
              * DTPATH (subtype 0)
              * DTINDEX (subtype 1)

While "pci" is:

      * REGION (type 0)
              * PCI_CONFIG_SPACE (subtype 0)
              * PCI_BAR_INDEX (subtype 1)
      * INTERRUPT (type 1)

>     struct devfd_dtpath {
>         __u32 type;   // must be 0x2
>         __u32 record_len;
>         __u64 char[]   ; // length of the region
>     };
> 
> 5. Device Tree Index (DTINDEX)
> 
>   A DTINDEX record is a sub-record of a REGION and specifies
>   the index into the resource list encoded in the associated
>   device tree property-- "reg", "ranges", "interrupts", or
>   "interrupt-map".
> 
>     struct devfd_dtindex {
>         __u32 type;   // must be 0x3
>         __u32 record_len;
>         __u32 prop_type;
>         __u32 prop_index;  // index into the resource list
>     };
> 
>     prop_type must have one of the follow values:
>        1   // "reg" property
>        2   // "ranges" property
>        3   // "interrupts" property
>        4   // "interrupts" property
> 
>     Note: prop_index is not the byte offset into the property,
>     but the logical index.
> 
> 6. Interrupts (INTERRUPT)
> 
>   An INTERRUPT record describes one of a device's interrupts.
>   The handle field is an argument to VFIO_DEVICE_GET_IRQ_FD
>   which user space can use to receive device interrupts.
> 
>     struct devfd_interrupts {
>         __u32 type;   // must be 0x4
>         __u32 record_len;
>         __u32 flags;
>         __u32 handle;  // parameter to VFIO_DEVICE_GET_IRQ_FD
>     };

I'm still on the fence whether we should implement INTERRUPT for PCI or
only assume handle 0x0 or maybe assume handle == interrupt pin.

> 
> 7.  PCI Config Space (PCI_CONFIG_SPACE)
> 
>     A PCI_CONFIG_SPACE record is a sub-record of a REGION record
>     and identifies the region as PCI configuration space.
> 
>     struct devfd_cfgspace {
>         __u32 type;   // must be 0x5
>         __u32 record_len;
>         __u32 flags;
>     }
> 
> 8.  PCI Bar Index (PCI_BAR_INDEX)
> 
>     A PCI_BAR_INDEX record is a sub-record of a REGION record
>     and identifies the PCI BAR index for the region.
> 
>     struct devfd_barindex {
>         __u32 type;   // must be 0x6
>         __u32 record_len;
>         __u32 flags;
>         __u32 bar_index;
>     }

I suppose we're more concerned with easy parsing and alignment than
compactness, so a u32 to differentiate 6 BARS + 1 ROM is probably ok.

> 
> 9.  Physical Address (PHYS_ADDR)
> 
>     A PHYS_ADDR record is a sub-record of a REGION record
>     and specifies the physical address of the region.
> 
>     struct devfd_physaddr {
>         __u32 type;   // must be 0x7
>         __u32 record_len;
>         __u32 flags;
>         __u64 phys_addr;
>     }

Thanks,

Alex

  parent reply	other threads:[~2011-09-19 15:16 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-09 13:11 [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files Stuart Yoder
2011-09-09 13:16 ` Stuart Yoder
2011-09-19 15:16 ` Alex Williamson [this message]
2011-09-19 19:37   ` Scott Wood
2011-09-19 21:07     ` Alex Williamson
2011-09-19 21:15       ` Scott Wood
2011-09-26  7:51 ` David Gibson
2011-09-26 10:04   ` Alexander Graf
2011-09-26 18:34     ` Alex Williamson
2011-09-26 20:03       ` Stuart Yoder
2011-09-26 20:42         ` Alex Williamson
2011-09-26 23:59       ` Scott Wood
2011-09-27  0:45         ` Alex Williamson
2011-09-27 21:28           ` Scott Wood
2011-09-28  2:40             ` Alex Williamson
2011-09-28  8:58               ` Alexander Graf
2011-09-30  8:55                 ` David Gibson
2011-09-30  8:50         ` David Gibson
2011-09-30  8:46       ` David Gibson
2011-09-30 16:37         ` Alex Williamson
2011-09-30 21:59         ` Alex Williamson
2011-09-30  8:40     ` David Gibson
2011-09-26 19:57   ` Stuart Yoder
2011-09-27  0:25     ` Scott Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1316445361.4443.29.camel@bling.home \
    --to=alex.williamson@redhat.com \
    --cc=agraf@suse.de \
    --cc=avi@redhat.com \
    --cc=b08248@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).