All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: vfio interface for platform devices
@ 2013-07-02 23:25 ` Yoder Stuart-B08248
  0 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-02 23:25 UTC (permalink / raw)
  To: Alex Williamson, Alexander Graf, Wood Scott-B07421
  Cc: Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

The write-up below is the first draft of a proposal for how the kernel can expose
platform devices to user space using vfio.

In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
allows user space to correlate regions and interrupts to the corresponding
device tree node structure that is defined for most platform devices.

Regards,
Stuart Yoder

------------------------------------------------------------------------------
VFIO for Platform Devices

The existing infrastructure for vfio-pci is pretty close to what we need:
   -mechanism to create a container
   -add groups/devices to a container
   -set the IOMMU model
   -map DMA regions
   -get an fd for a specific device, which allows user space to determine
    info about device regions (e.g. registers) and interrupt info
   -support for mmapping device regions
   -mechanism to set how interrupts are signaled

Platform devices can get complicated-- potentially with a tree hierarchy
of nodes, and links/phandles pointing to other platform 
devices.   The kernel doesn't expose relationships between
devices.  The kernel just exposes mappable register regions and interrupts.
It's up to user space to work out relationships between devices
if it needs to-- this can be determined in the device tree exposed in
/proc/device-tree.

I think the changes needed for vfio are around some of the device tree
related info that needs to be available with the device fd.

1.  VFIO_GROUP_GET_DEVICE_FD

  User space has to know which device it is accessing and will call
  VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
  get the device information:

  fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");

  (whether the path is a device tree path or a sysfs path is up for
  discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")

2.  VFIO_DEVICE_GET_INFO

   Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
   than adding a new flag identifying a devices as a 'platform'
   device.

   This ioctl simply returns the number of regions and number of irqs.

   The number of regions corresponds to the number of regions
   that can be mapped for the device-- corresponds to the regions defined
   in "reg" and "ranges" in the device tree.  

3.  VFIO_DEVICE_GET_REGION_INFO

   No changes needed, except perhaps adding a new flag.  Freescale has some
   devices with regions that must be mapped cacheable.

3.  VFIO_DEVICE_GET_IRQ_INFO

   No changes needed.

4. VFIO_DEVICE_GET_DEVTREE_INFO

   The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
   expose device regions and interrupts, but it's not enough to know
   that there are X regions and Y interrupts.  User space needs to
   know what the resources are for-- to correlate those regions/interrupts
   to the device tree structure that drivers use.  The device tree
   structure could consist of multiple nodes and it is necessary to
   identify the node corresponding to the region/interrupt exposed
   by VFIO.

   The following information is needed:
      -the device tree path to the node corresponding to the
       region or interrupt
      -for a region, whether it corresponds to a "reg" or "ranges"
       property
      -there could be multiple sub-regions per "reg" or "ranges" and
       the sub-index within the reg/ranges is needed

   The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.

   ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
   
   struct vfio_path_info {
        __u32   argsz;
        __u32   flags;
   #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a "ranges" property */
        __u32   index;          /* input: index of region or irq for which we are getting info */
        __u32   type;           /* input: 0 - get devtree info for a region
                                          1 - get devtree info for an irq
                                 */
        __u32   start;          /* output: identifies the index within the reg/ranges */
        __u8    path[];         /* output: Full path to associated device tree node */
   };

   User space allocates enough space for the device tree path, sets
   the type field identifying whether this is a region, or irq,
   and sets argsz appropriately.

5.  EXAMPLE 1

    Example, Freescale SATA controller:

     sata@220000 {
         compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
         reg = <0x220000 0x1000>;
         interrupts = <0x44 0x2 0x0 0x0>;
     };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -1 region
      -1 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0, size=0x10000 -- allows mmap of physical 0xffe220000

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
    for the single interrupt.

    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:

      -for region index 0:
          flags: 0x0     // i.e. this is a "reg" property
          start: 0x0     // i.e. index 0x0 in "reg"
          path: "/soc@ffe000000/sata@220000"

      -for interrupt index 0:
          path: "/soc@ffe000000/sata@220000"

6.  EXAMPLE 2

    Example, Freescale crypto device (modified to illustrate):

     crypto@300000 {
        compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
        #address-cells = <0x1>;
        #size-cells = <0x1>;
        reg = <0x300000 0x10000>;
        interrupts = <0x5c 0x2 0x0 0x0>;
  
        jr@1000 {
           compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
           interrupts = <0x58 0x2 0x0 0x0>;
        };
  
        jr@2000 {
           compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
           interrupts = <0x59 0x2 0x0 0x0>;
        };
     };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -1 region
      -3 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0, size=0x10000 -- allows mmap of physical 0xffe300000

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
    for each of the IRQs-- indexes 0-4.

    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:

      -for region index 0:
          flags: 0x0     // i.e. this is a "reg" property
          start: 0x0     // i.e. index 0x0 in "reg"
          path: "/soc@ffe000000/crypto@300000"

      -for interrupt index 0:
          path: "/soc@ffe000000/crypto@300000/jr@1000"

      -for interrupt index 1:
          path: "/soc@ffe000000/crypto@300000/jr@2000"

7.  EXAMPLE 3

    Example, Freescale DMA engine (modified to illustrate):

    dma@101300 {
       cell-index = <0x1>;
       ranges = <0x0 0x101100 0x200>;
       reg = <0x101300 0x4>;
       compatible = "fsl,eloplus-dma";
       #size-cells = <0x1>;
       #address-cells = <0x1>;
       fsl,liodn = <0xc6>;
    
       dma-channel@180 {
          interrupts = <0x23 0x2 0x0 0x0>;
          cell-index = <0x3>;
          reg = <0x180 0x80>;
          compatible = "fsl,eloplus-dma-channel";
       };
    
       dma-channel@100 {
          interrupts = <0x22 0x2 0x0 0x0>;
          cell-index = <0x2>;
          reg = <0x100 0x80>;
          compatible = "fsl,eloplus-dma-channel";
       };

    };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -2 regions
      -2 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
      -for index 1:
           offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
    for each of the IRQs-- indexes 0-3.

    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:

      -for region index 0:
          flags: 0x1     // i.e. this is a "ranges" property
          start: 0x0     // i.e. index 0x0 in "ranges"
          path: "/soc@ffe000000/dma@101300"

      -for region index 1:
          flags: 0x0     // i.e. this is a "reg" property
          start: 0x0     // i.e. index 0x0 in "ranges"
          path: "/soc@ffe000000/dma@101300"

      -for interrupt index 0:
          path: "/soc@ffe000000/dma@101300/dma-channel@180"

      -for interrupt index 1:
          path: "/soc@ffe000000/dma@101300/dma-channel@100"

8.  Open Issues

   -how to handle cases where VFIO is requested to handle 
    a device where the valid, mappable range for a region
    is less than a page size.   See example above where an 
    advertised region in the DMA node is 4 bytes.  If exposed
    to a guest VM, the guest has to be able to map a full page
    of I/O space which opens a potential security issue.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* RFC: vfio interface for platform devices
@ 2013-07-02 23:25 ` Yoder Stuart-B08248
  0 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-02 23:25 UTC (permalink / raw)
  To: Alex Williamson, Alexander Graf, Wood Scott-B07421
  Cc: Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

The write-up below is the first draft of a proposal for how the kernel can expose
platform devices to user space using vfio.

In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
allows user space to correlate regions and interrupts to the corresponding
device tree node structure that is defined for most platform devices.

Regards,
Stuart Yoder

------------------------------------------------------------------------------
VFIO for Platform Devices

The existing infrastructure for vfio-pci is pretty close to what we need:
   -mechanism to create a container
   -add groups/devices to a container
   -set the IOMMU model
   -map DMA regions
   -get an fd for a specific device, which allows user space to determine
    info about device regions (e.g. registers) and interrupt info
   -support for mmapping device regions
   -mechanism to set how interrupts are signaled

Platform devices can get complicated-- potentially with a tree hierarchy
of nodes, and links/phandles pointing to other platform 
devices.   The kernel doesn't expose relationships between
devices.  The kernel just exposes mappable register regions and interrupts.
It's up to user space to work out relationships between devices
if it needs to-- this can be determined in the device tree exposed in
/proc/device-tree.

I think the changes needed for vfio are around some of the device tree
related info that needs to be available with the device fd.

1.  VFIO_GROUP_GET_DEVICE_FD

  User space has to know which device it is accessing and will call
  VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
  get the device information:

  fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");

  (whether the path is a device tree path or a sysfs path is up for
  discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")

2.  VFIO_DEVICE_GET_INFO

   Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
   than adding a new flag identifying a devices as a 'platform'
   device.

   This ioctl simply returns the number of regions and number of irqs.

   The number of regions corresponds to the number of regions
   that can be mapped for the device-- corresponds to the regions defined
   in "reg" and "ranges" in the device tree.  

3.  VFIO_DEVICE_GET_REGION_INFO

   No changes needed, except perhaps adding a new flag.  Freescale has some
   devices with regions that must be mapped cacheable.

3.  VFIO_DEVICE_GET_IRQ_INFO

   No changes needed.

4. VFIO_DEVICE_GET_DEVTREE_INFO

   The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
   expose device regions and interrupts, but it's not enough to know
   that there are X regions and Y interrupts.  User space needs to
   know what the resources are for-- to correlate those regions/interrupts
   to the device tree structure that drivers use.  The device tree
   structure could consist of multiple nodes and it is necessary to
   identify the node corresponding to the region/interrupt exposed
   by VFIO.

   The following information is needed:
      -the device tree path to the node corresponding to the
       region or interrupt
      -for a region, whether it corresponds to a "reg" or "ranges"
       property
      -there could be multiple sub-regions per "reg" or "ranges" and
       the sub-index within the reg/ranges is needed

   The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.

   ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
   
   struct vfio_path_info {
        __u32   argsz;
        __u32   flags;
   #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a "ranges" property */
        __u32   index;          /* input: index of region or irq for which we are getting info */
        __u32   type;           /* input: 0 - get devtree info for a region
                                          1 - get devtree info for an irq
                                 */
        __u32   start;          /* output: identifies the index within the reg/ranges */
        __u8    path[];         /* output: Full path to associated device tree node */
   };

   User space allocates enough space for the device tree path, sets
   the type field identifying whether this is a region, or irq,
   and sets argsz appropriately.

5.  EXAMPLE 1

    Example, Freescale SATA controller:

     sata@220000 {
         compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
         reg = <0x220000 0x1000>;
         interrupts = <0x44 0x2 0x0 0x0>;
     };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -1 region
      -1 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0, size=0x10000 -- allows mmap of physical 0xffe220000

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
    for the single interrupt.

    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:

      -for region index 0:
          flags: 0x0     // i.e. this is a "reg" property
          start: 0x0     // i.e. index 0x0 in "reg"
          path: "/soc@ffe000000/sata@220000"

      -for interrupt index 0:
          path: "/soc@ffe000000/sata@220000"

6.  EXAMPLE 2

    Example, Freescale crypto device (modified to illustrate):

     crypto@300000 {
        compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
        #address-cells = <0x1>;
        #size-cells = <0x1>;
        reg = <0x300000 0x10000>;
        interrupts = <0x5c 0x2 0x0 0x0>;
  
        jr@1000 {
           compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
           interrupts = <0x58 0x2 0x0 0x0>;
        };
  
        jr@2000 {
           compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
           interrupts = <0x59 0x2 0x0 0x0>;
        };
     };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -1 region
      -3 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0, size=0x10000 -- allows mmap of physical 0xffe300000

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
    for each of the IRQs-- indexes 0-4.

    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:

      -for region index 0:
          flags: 0x0     // i.e. this is a "reg" property
          start: 0x0     // i.e. index 0x0 in "reg"
          path: "/soc@ffe000000/crypto@300000"

      -for interrupt index 0:
          path: "/soc@ffe000000/crypto@300000/jr@1000"

      -for interrupt index 1:
          path: "/soc@ffe000000/crypto@300000/jr@2000"

7.  EXAMPLE 3

    Example, Freescale DMA engine (modified to illustrate):

    dma@101300 {
       cell-index = <0x1>;
       ranges = <0x0 0x101100 0x200>;
       reg = <0x101300 0x4>;
       compatible = "fsl,eloplus-dma";
       #size-cells = <0x1>;
       #address-cells = <0x1>;
       fsl,liodn = <0xc6>;
    
       dma-channel@180 {
          interrupts = <0x23 0x2 0x0 0x0>;
          cell-index = <0x3>;
          reg = <0x180 0x80>;
          compatible = "fsl,eloplus-dma-channel";
       };
    
       dma-channel@100 {
          interrupts = <0x22 0x2 0x0 0x0>;
          cell-index = <0x2>;
          reg = <0x100 0x80>;
          compatible = "fsl,eloplus-dma-channel";
       };

    };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -2 regions
      -2 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
      -for index 1:
           offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
    for each of the IRQs-- indexes 0-3.

    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:

      -for region index 0:
          flags: 0x1     // i.e. this is a "ranges" property
          start: 0x0     // i.e. index 0x0 in "ranges"
          path: "/soc@ffe000000/dma@101300"

      -for region index 1:
          flags: 0x0     // i.e. this is a "reg" property
          start: 0x0     // i.e. index 0x0 in "ranges"
          path: "/soc@ffe000000/dma@101300"

      -for interrupt index 0:
          path: "/soc@ffe000000/dma@101300/dma-channel@180"

      -for interrupt index 1:
          path: "/soc@ffe000000/dma@101300/dma-channel@100"

8.  Open Issues

   -how to handle cases where VFIO is requested to handle 
    a device where the valid, mappable range for a region
    is less than a page size.   See example above where an 
    advertised region in the DMA node is 4 bytes.  If exposed
    to a guest VM, the guest has to be able to map a full page
    of I/O space which opens a potential security issue.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-02 23:25 ` Yoder Stuart-B08248
@ 2013-07-03  1:07   ` Alexander Graf
  -1 siblings, 0 replies; 51+ messages in thread
From: Alexander Graf @ 2013-07-03  1:07 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Alex Williamson, Wood Scott-B07421, Bhushan Bharat-R65777,
	Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
	Antonios Motakis, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org, kvmarm@lists.cs.columbia.edu


On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:

> The write-up below is the first draft of a proposal for how the kernel can expose
> platform devices to user space using vfio.
> 
> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> allows user space to correlate regions and interrupts to the corresponding
> device tree node structure that is defined for most platform devices.
> 
> Regards,
> Stuart Yoder
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing infrastructure for vfio-pci is pretty close to what we need:
>   -mechanism to create a container
>   -add groups/devices to a container
>   -set the IOMMU model
>   -map DMA regions
>   -get an fd for a specific device, which allows user space to determine
>    info about device regions (e.g. registers) and interrupt info
>   -support for mmapping device regions
>   -mechanism to set how interrupts are signaled
> 
> Platform devices can get complicated-- potentially with a tree hierarchy
> of nodes, and links/phandles pointing to other platform 
> devices.   The kernel doesn't expose relationships between
> devices.  The kernel just exposes mappable register regions and interrupts.
> It's up to user space to work out relationships between devices
> if it needs to-- this can be determined in the device tree exposed in
> /proc/device-tree.
> 
> I think the changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>  User space has to know which device it is accessing and will call
>  VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>  get the device information:
> 
>  fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
> 
>  (whether the path is a device tree path or a sysfs path is up for
>  discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
> 
> 2.  VFIO_DEVICE_GET_INFO
> 
>   Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>   than adding a new flag identifying a devices as a 'platform'
>   device.
> 
>   This ioctl simply returns the number of regions and number of irqs.
> 
>   The number of regions corresponds to the number of regions
>   that can be mapped for the device-- corresponds to the regions defined
>   in "reg" and "ranges" in the device tree.  
> 
> 3.  VFIO_DEVICE_GET_REGION_INFO
> 
>   No changes needed, except perhaps adding a new flag.  Freescale has some
>   devices with regions that must be mapped cacheable.
> 
> 3.  VFIO_DEVICE_GET_IRQ_INFO
> 
>   No changes needed.
> 
> 4. VFIO_DEVICE_GET_DEVTREE_INFO
> 
>   The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>   expose device regions and interrupts, but it's not enough to know
>   that there are X regions and Y interrupts.  User space needs to
>   know what the resources are for-- to correlate those regions/interrupts
>   to the device tree structure that drivers use.  The device tree
>   structure could consist of multiple nodes and it is necessary to
>   identify the node corresponding to the region/interrupt exposed
>   by VFIO.
> 
>   The following information is needed:
>      -the device tree path to the node corresponding to the
>       region or interrupt
>      -for a region, whether it corresponds to a "reg" or "ranges"
>       property
>      -there could be multiple sub-regions per "reg" or "ranges" and
>       the sub-index within the reg/ranges is needed
> 
>   The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
> 
>   ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
> 
>   struct vfio_path_info {
>        __u32   argsz;
>        __u32   flags;
>   #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a "ranges" property */
>        __u32   index;          /* input: index of region or irq for which we are getting info */
>        __u32   type;           /* input: 0 - get devtree info for a region
>                                          1 - get devtree info for an irq
>                                 */
>        __u32   start;          /* output: identifies the index within the reg/ranges */
>        __u8    path[];         /* output: Full path to associated device tree node */
>   };
> 
>   User space allocates enough space for the device tree path, sets
>   the type field identifying whether this is a region, or irq,
>   and sets argsz appropriately.
> 
> 5.  EXAMPLE 1
> 
>    Example, Freescale SATA controller:
> 
>     sata@220000 {
>         compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>         reg = <0x220000 0x1000>;
>         interrupts = <0x44 0x2 0x0 0x0>;
>     };
> 
>    request to get device FD would look like:
>      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
> 
>    The VFIO_DEVICE_GET_INFO ioctl would return:
>      -1 region
>      -1 interrupts
> 
>    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>      -for index 0:
>           offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
> 
>    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>    for the single interrupt.
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>      -for region index 0:
>          flags: 0x0     // i.e. this is a "reg" property
>          start: 0x0     // i.e. index 0x0 in "reg"
>          path: "/soc@ffe000000/sata@220000"
> 
>      -for interrupt index 0:
>          path: "/soc@ffe000000/sata@220000"
> 
> 6.  EXAMPLE 2
> 
>    Example, Freescale crypto device (modified to illustrate):
> 
>     crypto@300000 {
>        compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
>        #address-cells = <0x1>;
>        #size-cells = <0x1>;
>        reg = <0x300000 0x10000>;
>        interrupts = <0x5c 0x2 0x0 0x0>;
> 
>        jr@1000 {
>           compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>           interrupts = <0x58 0x2 0x0 0x0>;
>        };
> 
>        jr@2000 {
>           compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>           interrupts = <0x59 0x2 0x0 0x0>;
>        };
>     };
> 
>    request to get device FD would look like:
>      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
> 
>    The VFIO_DEVICE_GET_INFO ioctl would return:
>      -1 region
>      -3 interrupts
> 
>    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>      -for index 0:
>           offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
> 
>    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>    for each of the IRQs-- indexes 0-4.
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>      -for region index 0:
>          flags: 0x0     // i.e. this is a "reg" property
>          start: 0x0     // i.e. index 0x0 in "reg"
>          path: "/soc@ffe000000/crypto@300000"
> 
>      -for interrupt index 0:
>          path: "/soc@ffe000000/crypto@300000/jr@1000"
> 
>      -for interrupt index 1:
>          path: "/soc@ffe000000/crypto@300000/jr@2000"
> 
> 7.  EXAMPLE 3
> 
>    Example, Freescale DMA engine (modified to illustrate):
> 
>    dma@101300 {
>       cell-index = <0x1>;
>       ranges = <0x0 0x101100 0x200>;
>       reg = <0x101300 0x4>;
>       compatible = "fsl,eloplus-dma";
>       #size-cells = <0x1>;
>       #address-cells = <0x1>;
>       fsl,liodn = <0xc6>;
> 
>       dma-channel@180 {
>          interrupts = <0x23 0x2 0x0 0x0>;
>          cell-index = <0x3>;
>          reg = <0x180 0x80>;
>          compatible = "fsl,eloplus-dma-channel";
>       };
> 
>       dma-channel@100 {
>          interrupts = <0x22 0x2 0x0 0x0>;
>          cell-index = <0x2>;
>          reg = <0x100 0x80>;
>          compatible = "fsl,eloplus-dma-channel";
>       };
> 
>    };
> 
>    request to get device FD would look like:
>      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
> 
>    The VFIO_DEVICE_GET_INFO ioctl would return:
>      -2 regions
>      -2 interrupts
> 
>    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>      -for index 0:
>           offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>      -for index 1:
>           offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
> 
>    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>    for each of the IRQs-- indexes 0-3.
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>      -for region index 0:
>          flags: 0x1     // i.e. this is a "ranges" property
>          start: 0x0     // i.e. index 0x0 in "ranges"
>          path: "/soc@ffe000000/dma@101300"
> 
>      -for region index 1:
>          flags: 0x0     // i.e. this is a "reg" property
>          start: 0x0     // i.e. index 0x0 in "ranges"
>          path: "/soc@ffe000000/dma@101300"
> 
>      -for interrupt index 0:
>          path: "/soc@ffe000000/dma@101300/dma-channel@180"
> 
>      -for interrupt index 1:
>          path: "/soc@ffe000000/dma@101300/dma-channel@100"
> 
> 8.  Open Issues
> 
>   -how to handle cases where VFIO is requested to handle 
>    a device where the valid, mappable range for a region
>    is less than a page size.   See example above where an 
>    advertised region in the DMA node is 4 bytes.  If exposed
>    to a guest VM, the guest has to be able to map a full page
>    of I/O space which opens a potential security issue.

The way we solved this for legacy PCI device assignment was by going through QEMU for emulation and falling back to legacy read/write IIRC. We could probably do the same here. IIRC there was a way for a normal Linux mmap'ed device region to trap individual accesses too, so we could just use that one too.

The slow path emulation would then happen magically in QEMU, since MMIO writes will get reinjected into the normal QEMU MMIO handling path which will just issue a read/write on the mmap'ed region if it's not declared as emulated.


Alex


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-02 23:25 ` Yoder Stuart-B08248
  (?)
  (?)
@ 2013-07-03  1:07 ` Alexander Graf
  -1 siblings, 0 replies; 51+ messages in thread
From: Alexander Graf @ 2013-07-03  1:07 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	Bhushan Bharat-R65777, kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu


On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:

> The write-up below is the first draft of a proposal for how the kernel can expose
> platform devices to user space using vfio.
> 
> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> allows user space to correlate regions and interrupts to the corresponding
> device tree node structure that is defined for most platform devices.
> 
> Regards,
> Stuart Yoder
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing infrastructure for vfio-pci is pretty close to what we need:
>   -mechanism to create a container
>   -add groups/devices to a container
>   -set the IOMMU model
>   -map DMA regions
>   -get an fd for a specific device, which allows user space to determine
>    info about device regions (e.g. registers) and interrupt info
>   -support for mmapping device regions
>   -mechanism to set how interrupts are signaled
> 
> Platform devices can get complicated-- potentially with a tree hierarchy
> of nodes, and links/phandles pointing to other platform 
> devices.   The kernel doesn't expose relationships between
> devices.  The kernel just exposes mappable register regions and interrupts.
> It's up to user space to work out relationships between devices
> if it needs to-- this can be determined in the device tree exposed in
> /proc/device-tree.
> 
> I think the changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>  User space has to know which device it is accessing and will call
>  VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>  get the device information:
> 
>  fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
> 
>  (whether the path is a device tree path or a sysfs path is up for
>  discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
> 
> 2.  VFIO_DEVICE_GET_INFO
> 
>   Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>   than adding a new flag identifying a devices as a 'platform'
>   device.
> 
>   This ioctl simply returns the number of regions and number of irqs.
> 
>   The number of regions corresponds to the number of regions
>   that can be mapped for the device-- corresponds to the regions defined
>   in "reg" and "ranges" in the device tree.  
> 
> 3.  VFIO_DEVICE_GET_REGION_INFO
> 
>   No changes needed, except perhaps adding a new flag.  Freescale has some
>   devices with regions that must be mapped cacheable.
> 
> 3.  VFIO_DEVICE_GET_IRQ_INFO
> 
>   No changes needed.
> 
> 4. VFIO_DEVICE_GET_DEVTREE_INFO
> 
>   The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>   expose device regions and interrupts, but it's not enough to know
>   that there are X regions and Y interrupts.  User space needs to
>   know what the resources are for-- to correlate those regions/interrupts
>   to the device tree structure that drivers use.  The device tree
>   structure could consist of multiple nodes and it is necessary to
>   identify the node corresponding to the region/interrupt exposed
>   by VFIO.
> 
>   The following information is needed:
>      -the device tree path to the node corresponding to the
>       region or interrupt
>      -for a region, whether it corresponds to a "reg" or "ranges"
>       property
>      -there could be multiple sub-regions per "reg" or "ranges" and
>       the sub-index within the reg/ranges is needed
> 
>   The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
> 
>   ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
> 
>   struct vfio_path_info {
>        __u32   argsz;
>        __u32   flags;
>   #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a "ranges" property */
>        __u32   index;          /* input: index of region or irq for which we are getting info */
>        __u32   type;           /* input: 0 - get devtree info for a region
>                                          1 - get devtree info for an irq
>                                 */
>        __u32   start;          /* output: identifies the index within the reg/ranges */
>        __u8    path[];         /* output: Full path to associated device tree node */
>   };
> 
>   User space allocates enough space for the device tree path, sets
>   the type field identifying whether this is a region, or irq,
>   and sets argsz appropriately.
> 
> 5.  EXAMPLE 1
> 
>    Example, Freescale SATA controller:
> 
>     sata@220000 {
>         compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>         reg = <0x220000 0x1000>;
>         interrupts = <0x44 0x2 0x0 0x0>;
>     };
> 
>    request to get device FD would look like:
>      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
> 
>    The VFIO_DEVICE_GET_INFO ioctl would return:
>      -1 region
>      -1 interrupts
> 
>    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>      -for index 0:
>           offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
> 
>    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>    for the single interrupt.
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>      -for region index 0:
>          flags: 0x0     // i.e. this is a "reg" property
>          start: 0x0     // i.e. index 0x0 in "reg"
>          path: "/soc@ffe000000/sata@220000"
> 
>      -for interrupt index 0:
>          path: "/soc@ffe000000/sata@220000"
> 
> 6.  EXAMPLE 2
> 
>    Example, Freescale crypto device (modified to illustrate):
> 
>     crypto@300000 {
>        compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
>        #address-cells = <0x1>;
>        #size-cells = <0x1>;
>        reg = <0x300000 0x10000>;
>        interrupts = <0x5c 0x2 0x0 0x0>;
> 
>        jr@1000 {
>           compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>           interrupts = <0x58 0x2 0x0 0x0>;
>        };
> 
>        jr@2000 {
>           compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>           interrupts = <0x59 0x2 0x0 0x0>;
>        };
>     };
> 
>    request to get device FD would look like:
>      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
> 
>    The VFIO_DEVICE_GET_INFO ioctl would return:
>      -1 region
>      -3 interrupts
> 
>    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>      -for index 0:
>           offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
> 
>    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>    for each of the IRQs-- indexes 0-4.
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>      -for region index 0:
>          flags: 0x0     // i.e. this is a "reg" property
>          start: 0x0     // i.e. index 0x0 in "reg"
>          path: "/soc@ffe000000/crypto@300000"
> 
>      -for interrupt index 0:
>          path: "/soc@ffe000000/crypto@300000/jr@1000"
> 
>      -for interrupt index 1:
>          path: "/soc@ffe000000/crypto@300000/jr@2000"
> 
> 7.  EXAMPLE 3
> 
>    Example, Freescale DMA engine (modified to illustrate):
> 
>    dma@101300 {
>       cell-index = <0x1>;
>       ranges = <0x0 0x101100 0x200>;
>       reg = <0x101300 0x4>;
>       compatible = "fsl,eloplus-dma";
>       #size-cells = <0x1>;
>       #address-cells = <0x1>;
>       fsl,liodn = <0xc6>;
> 
>       dma-channel@180 {
>          interrupts = <0x23 0x2 0x0 0x0>;
>          cell-index = <0x3>;
>          reg = <0x180 0x80>;
>          compatible = "fsl,eloplus-dma-channel";
>       };
> 
>       dma-channel@100 {
>          interrupts = <0x22 0x2 0x0 0x0>;
>          cell-index = <0x2>;
>          reg = <0x100 0x80>;
>          compatible = "fsl,eloplus-dma-channel";
>       };
> 
>    };
> 
>    request to get device FD would look like:
>      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
> 
>    The VFIO_DEVICE_GET_INFO ioctl would return:
>      -2 regions
>      -2 interrupts
> 
>    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>      -for index 0:
>           offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>      -for index 1:
>           offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
> 
>    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>    for each of the IRQs-- indexes 0-3.
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>      -for region index 0:
>          flags: 0x1     // i.e. this is a "ranges" property
>          start: 0x0     // i.e. index 0x0 in "ranges"
>          path: "/soc@ffe000000/dma@101300"
> 
>      -for region index 1:
>          flags: 0x0     // i.e. this is a "reg" property
>          start: 0x0     // i.e. index 0x0 in "ranges"
>          path: "/soc@ffe000000/dma@101300"
> 
>      -for interrupt index 0:
>          path: "/soc@ffe000000/dma@101300/dma-channel@180"
> 
>      -for interrupt index 1:
>          path: "/soc@ffe000000/dma@101300/dma-channel@100"
> 
> 8.  Open Issues
> 
>   -how to handle cases where VFIO is requested to handle 
>    a device where the valid, mappable range for a region
>    is less than a page size.   See example above where an 
>    advertised region in the DMA node is 4 bytes.  If exposed
>    to a guest VM, the guest has to be able to map a full page
>    of I/O space which opens a potential security issue.

The way we solved this for legacy PCI device assignment was by going through QEMU for emulation and falling back to legacy read/write IIRC. We could probably do the same here. IIRC there was a way for a normal Linux mmap'ed device region to trap individual accesses too, so we could just use that one too.

The slow path emulation would then happen magically in QEMU, since MMIO writes will get reinjected into the normal QEMU MMIO handling path which will just issue a read/write on the mmap'ed region if it's not declared as emulated.


Alex

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
@ 2013-07-03  1:07   ` Alexander Graf
  0 siblings, 0 replies; 51+ messages in thread
From: Alexander Graf @ 2013-07-03  1:07 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Alex Williamson, Wood Scott-B07421, Bhushan Bharat-R65777,
	Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
	Antonios Motakis, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org, kvmarm@lists.cs.columbia.edu


On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:

> The write-up below is the first draft of a proposal for how the kernel can expose
> platform devices to user space using vfio.
> 
> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> allows user space to correlate regions and interrupts to the corresponding
> device tree node structure that is defined for most platform devices.
> 
> Regards,
> Stuart Yoder
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing infrastructure for vfio-pci is pretty close to what we need:
>   -mechanism to create a container
>   -add groups/devices to a container
>   -set the IOMMU model
>   -map DMA regions
>   -get an fd for a specific device, which allows user space to determine
>    info about device regions (e.g. registers) and interrupt info
>   -support for mmapping device regions
>   -mechanism to set how interrupts are signaled
> 
> Platform devices can get complicated-- potentially with a tree hierarchy
> of nodes, and links/phandles pointing to other platform 
> devices.   The kernel doesn't expose relationships between
> devices.  The kernel just exposes mappable register regions and interrupts.
> It's up to user space to work out relationships between devices
> if it needs to-- this can be determined in the device tree exposed in
> /proc/device-tree.
> 
> I think the changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>  User space has to know which device it is accessing and will call
>  VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>  get the device information:
> 
>  fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
> 
>  (whether the path is a device tree path or a sysfs path is up for
>  discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
> 
> 2.  VFIO_DEVICE_GET_INFO
> 
>   Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>   than adding a new flag identifying a devices as a 'platform'
>   device.
> 
>   This ioctl simply returns the number of regions and number of irqs.
> 
>   The number of regions corresponds to the number of regions
>   that can be mapped for the device-- corresponds to the regions defined
>   in "reg" and "ranges" in the device tree.  
> 
> 3.  VFIO_DEVICE_GET_REGION_INFO
> 
>   No changes needed, except perhaps adding a new flag.  Freescale has some
>   devices with regions that must be mapped cacheable.
> 
> 3.  VFIO_DEVICE_GET_IRQ_INFO
> 
>   No changes needed.
> 
> 4. VFIO_DEVICE_GET_DEVTREE_INFO
> 
>   The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>   expose device regions and interrupts, but it's not enough to know
>   that there are X regions and Y interrupts.  User space needs to
>   know what the resources are for-- to correlate those regions/interrupts
>   to the device tree structure that drivers use.  The device tree
>   structure could consist of multiple nodes and it is necessary to
>   identify the node corresponding to the region/interrupt exposed
>   by VFIO.
> 
>   The following information is needed:
>      -the device tree path to the node corresponding to the
>       region or interrupt
>      -for a region, whether it corresponds to a "reg" or "ranges"
>       property
>      -there could be multiple sub-regions per "reg" or "ranges" and
>       the sub-index within the reg/ranges is needed
> 
>   The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
> 
>   ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
> 
>   struct vfio_path_info {
>        __u32   argsz;
>        __u32   flags;
>   #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a "ranges" property */
>        __u32   index;          /* input: index of region or irq for which we are getting info */
>        __u32   type;           /* input: 0 - get devtree info for a region
>                                          1 - get devtree info for an irq
>                                 */
>        __u32   start;          /* output: identifies the index within the reg/ranges */
>        __u8    path[];         /* output: Full path to associated device tree node */
>   };
> 
>   User space allocates enough space for the device tree path, sets
>   the type field identifying whether this is a region, or irq,
>   and sets argsz appropriately.
> 
> 5.  EXAMPLE 1
> 
>    Example, Freescale SATA controller:
> 
>     sata@220000 {
>         compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>         reg = <0x220000 0x1000>;
>         interrupts = <0x44 0x2 0x0 0x0>;
>     };
> 
>    request to get device FD would look like:
>      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
> 
>    The VFIO_DEVICE_GET_INFO ioctl would return:
>      -1 region
>      -1 interrupts
> 
>    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>      -for index 0:
>           offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
> 
>    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>    for the single interrupt.
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>      -for region index 0:
>          flags: 0x0     // i.e. this is a "reg" property
>          start: 0x0     // i.e. index 0x0 in "reg"
>          path: "/soc@ffe000000/sata@220000"
> 
>      -for interrupt index 0:
>          path: "/soc@ffe000000/sata@220000"
> 
> 6.  EXAMPLE 2
> 
>    Example, Freescale crypto device (modified to illustrate):
> 
>     crypto@300000 {
>        compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
>        #address-cells = <0x1>;
>        #size-cells = <0x1>;
>        reg = <0x300000 0x10000>;
>        interrupts = <0x5c 0x2 0x0 0x0>;
> 
>        jr@1000 {
>           compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>           interrupts = <0x58 0x2 0x0 0x0>;
>        };
> 
>        jr@2000 {
>           compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>           interrupts = <0x59 0x2 0x0 0x0>;
>        };
>     };
> 
>    request to get device FD would look like:
>      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
> 
>    The VFIO_DEVICE_GET_INFO ioctl would return:
>      -1 region
>      -3 interrupts
> 
>    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>      -for index 0:
>           offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
> 
>    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>    for each of the IRQs-- indexes 0-4.
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>      -for region index 0:
>          flags: 0x0     // i.e. this is a "reg" property
>          start: 0x0     // i.e. index 0x0 in "reg"
>          path: "/soc@ffe000000/crypto@300000"
> 
>      -for interrupt index 0:
>          path: "/soc@ffe000000/crypto@300000/jr@1000"
> 
>      -for interrupt index 1:
>          path: "/soc@ffe000000/crypto@300000/jr@2000"
> 
> 7.  EXAMPLE 3
> 
>    Example, Freescale DMA engine (modified to illustrate):
> 
>    dma@101300 {
>       cell-index = <0x1>;
>       ranges = <0x0 0x101100 0x200>;
>       reg = <0x101300 0x4>;
>       compatible = "fsl,eloplus-dma";
>       #size-cells = <0x1>;
>       #address-cells = <0x1>;
>       fsl,liodn = <0xc6>;
> 
>       dma-channel@180 {
>          interrupts = <0x23 0x2 0x0 0x0>;
>          cell-index = <0x3>;
>          reg = <0x180 0x80>;
>          compatible = "fsl,eloplus-dma-channel";
>       };
> 
>       dma-channel@100 {
>          interrupts = <0x22 0x2 0x0 0x0>;
>          cell-index = <0x2>;
>          reg = <0x100 0x80>;
>          compatible = "fsl,eloplus-dma-channel";
>       };
> 
>    };
> 
>    request to get device FD would look like:
>      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
> 
>    The VFIO_DEVICE_GET_INFO ioctl would return:
>      -2 regions
>      -2 interrupts
> 
>    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>      -for index 0:
>           offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>      -for index 1:
>           offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
> 
>    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>    for each of the IRQs-- indexes 0-3.
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>      -for region index 0:
>          flags: 0x1     // i.e. this is a "ranges" property
>          start: 0x0     // i.e. index 0x0 in "ranges"
>          path: "/soc@ffe000000/dma@101300"
> 
>      -for region index 1:
>          flags: 0x0     // i.e. this is a "reg" property
>          start: 0x0     // i.e. index 0x0 in "ranges"
>          path: "/soc@ffe000000/dma@101300"
> 
>      -for interrupt index 0:
>          path: "/soc@ffe000000/dma@101300/dma-channel@180"
> 
>      -for interrupt index 1:
>          path: "/soc@ffe000000/dma@101300/dma-channel@100"
> 
> 8.  Open Issues
> 
>   -how to handle cases where VFIO is requested to handle 
>    a device where the valid, mappable range for a region
>    is less than a page size.   See example above where an 
>    advertised region in the DMA node is 4 bytes.  If exposed
>    to a guest VM, the guest has to be able to map a full page
>    of I/O space which opens a potential security issue.

The way we solved this for legacy PCI device assignment was by going through QEMU for emulation and falling back to legacy read/write IIRC. We could probably do the same here. IIRC there was a way for a normal Linux mmap'ed device region to trap individual accesses too, so we could just use that one too.

The slow path emulation would then happen magically in QEMU, since MMIO writes will get reinjected into the normal QEMU MMIO handling path which will just issue a read/write on the mmap'ed region if it's not declared as emulated.


Alex

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-02 23:25 ` Yoder Stuart-B08248
@ 2013-07-03  3:07   ` Alex Williamson
  -1 siblings, 0 replies; 51+ messages in thread
From: Alex Williamson @ 2013-07-03  3:07 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	Bhushan Bharat-R65777, kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
	Antonios Motakis, kvmarm@lists.cs.columbia.edu

On Tue, 2013-07-02 at 23:25 +0000, Yoder Stuart-B08248 wrote:
> The write-up below is the first draft of a proposal for how the kernel can expose
> platform devices to user space using vfio.
> 
> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> allows user space to correlate regions and interrupts to the corresponding
> device tree node structure that is defined for most platform devices.
> 
> Regards,
> Stuart Yoder
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing infrastructure for vfio-pci is pretty close to what we need:
>    -mechanism to create a container
>    -add groups/devices to a container
>    -set the IOMMU model
>    -map DMA regions
>    -get an fd for a specific device, which allows user space to determine
>     info about device regions (e.g. registers) and interrupt info
>    -support for mmapping device regions
>    -mechanism to set how interrupts are signaled
> 
> Platform devices can get complicated-- potentially with a tree hierarchy
> of nodes, and links/phandles pointing to other platform 
> devices.   The kernel doesn't expose relationships between
> devices.  The kernel just exposes mappable register regions and interrupts.
> It's up to user space to work out relationships between devices
> if it needs to-- this can be determined in the device tree exposed in
> /proc/device-tree.
> 
> I think the changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>   User space has to know which device it is accessing and will call
>   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>   get the device information:
> 
>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
> 
>   (whether the path is a device tree path or a sysfs path is up for
>   discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
> 
> 2.  VFIO_DEVICE_GET_INFO
> 
>    Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>    than adding a new flag identifying a devices as a 'platform'
>    device.
> 
>    This ioctl simply returns the number of regions and number of irqs.
> 
>    The number of regions corresponds to the number of regions
>    that can be mapped for the device-- corresponds to the regions defined
>    in "reg" and "ranges" in the device tree.  
> 
> 3.  VFIO_DEVICE_GET_REGION_INFO
> 
>    No changes needed, except perhaps adding a new flag.  Freescale has some
>    devices with regions that must be mapped cacheable.
> 
> 3.  VFIO_DEVICE_GET_IRQ_INFO
> 
>    No changes needed.
> 
> 4. VFIO_DEVICE_GET_DEVTREE_INFO
> 
>    The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>    expose device regions and interrupts, but it's not enough to know
>    that there are X regions and Y interrupts.  User space needs to
>    know what the resources are for-- to correlate those regions/interrupts
>    to the device tree structure that drivers use.  The device tree
>    structure could consist of multiple nodes and it is necessary to
>    identify the node corresponding to the region/interrupt exposed
>    by VFIO.
> 
>    The following information is needed:
>       -the device tree path to the node corresponding to the
>        region or interrupt
>       -for a region, whether it corresponds to a "reg" or "ranges"
>        property
>       -there could be multiple sub-regions per "reg" or "ranges" and
>        the sub-index within the reg/ranges is needed
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
> 
>    ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
>    
>    struct vfio_path_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a "ranges" property */

(1 << 0)?

Having flags = 0x0 for regs and 0x1 for ranges is a bit awkward.  I'd
suggest a bit for each.  Otherwise, what does it mean when this returns
flags = 0x0 for an irq?

>         __u32   index;          /* input: index of region or irq for which we are getting info */
>         __u32   type;           /* input: 0 - get devtree info for a region
>                                           1 - get devtree info for an irq
>                                  */
>         __u32   start;          /* output: identifies the index within the reg/ranges */
>         __u8    path[];         /* output: Full path to associated device tree node */
>    };
> 
>    User space allocates enough space for the device tree path, sets
>    the type field identifying whether this is a region, or irq,
>    and sets argsz appropriately.
> 
> 5.  EXAMPLE 1
> 
>     Example, Freescale SATA controller:
> 
>      sata@220000 {
>          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>          reg = <0x220000 0x1000>;
>          interrupts = <0x44 0x2 0x0 0x0>;
>      };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -1 region
>       -1 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>     for the single interrupt.
> 
>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>       -for region index 0:
>           flags: 0x0     // i.e. this is a "reg" property
>           start: 0x0     // i.e. index 0x0 in "reg"
>           path: "/soc@ffe000000/sata@220000"
> 
>       -for interrupt index 0:
>           path: "/soc@ffe000000/sata@220000"
> 
> 6.  EXAMPLE 2
> 
>     Example, Freescale crypto device (modified to illustrate):
> 
>      crypto@300000 {
>         compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
>         #address-cells = <0x1>;
>         #size-cells = <0x1>;
>         reg = <0x300000 0x10000>;
>         interrupts = <0x5c 0x2 0x0 0x0>;
>   
>         jr@1000 {
>            compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>            interrupts = <0x58 0x2 0x0 0x0>;
>         };
>   
>         jr@2000 {
>            compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>            interrupts = <0x59 0x2 0x0 0x0>;
>         };
>      };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -1 region
>       -3 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>     for each of the IRQs-- indexes 0-4.
> 
>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>       -for region index 0:
>           flags: 0x0     // i.e. this is a "reg" property
>           start: 0x0     // i.e. index 0x0 in "reg"
>           path: "/soc@ffe000000/crypto@300000"
> 
>       -for interrupt index 0:
>           path: "/soc@ffe000000/crypto@300000/jr@1000"
> 
>       -for interrupt index 1:
>           path: "/soc@ffe000000/crypto@300000/jr@2000"
> 
> 7.  EXAMPLE 3
> 
>     Example, Freescale DMA engine (modified to illustrate):
> 
>     dma@101300 {
>        cell-index = <0x1>;
>        ranges = <0x0 0x101100 0x200>;
>        reg = <0x101300 0x4>;
>        compatible = "fsl,eloplus-dma";
>        #size-cells = <0x1>;
>        #address-cells = <0x1>;
>        fsl,liodn = <0xc6>;
>     
>        dma-channel@180 {
>           interrupts = <0x23 0x2 0x0 0x0>;
>           cell-index = <0x3>;
>           reg = <0x180 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
>     
>        dma-channel@100 {
>           interrupts = <0x22 0x2 0x0 0x0>;
>           cell-index = <0x2>;
>           reg = <0x100 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
> 
>     };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -2 regions
>       -2 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>       -for index 1:
>            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>     for each of the IRQs-- indexes 0-3.
> 
>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>       -for region index 0:
>           flags: 0x1     // i.e. this is a "ranges" property
>           start: 0x0     // i.e. index 0x0 in "ranges"
>           path: "/soc@ffe000000/dma@101300"
> 
>       -for region index 1:
>           flags: 0x0     // i.e. this is a "reg" property
>           start: 0x0     // i.e. index 0x0 in "ranges"
>           path: "/soc@ffe000000/dma@101300"
> 
>       -for interrupt index 0:
>           path: "/soc@ffe000000/dma@101300/dma-channel@180"
> 
>       -for interrupt index 1:
>           path: "/soc@ffe000000/dma@101300/dma-channel@100"
> 
> 8.  Open Issues
> 
>    -how to handle cases where VFIO is requested to handle 
>     a device where the valid, mappable range for a region
>     is less than a page size.   See example above where an 
>     advertised region in the DMA node is 4 bytes.  If exposed
>     to a guest VM, the guest has to be able to map a full page
>     of I/O space which opens a potential security issue.

As AlexG points out, we solve that on vfio-pci by not supporting mmap on
those regions and only allowing read/write.  If you could make the
platform map regions on page size boundaries and there's nothing bad a
guest can do by accessing the empty space, you could still support mmap.
We can't make such requirements or guarantees on PCI though.  The PCI
spec also suggests for devices to use page size regions and high
performance devices generally follow that request, so it has become a
fallback for low performance devices and I/O port space, which we can't
mmap on x86 anyway.

So overall the interface and extension makes sense.  My only question is
whether it's better to get complete reuse out of GET_REGION_INFO and
GET_IRQ_INFO and then add another device tree specific ioctl or is it
better to add a device tree index and path to the existing GET_*_INFO
ioctls?  Getting some information from one ioctl and passing pieces of
it back to another ioctl feels a little clunky.

DEVICE_GET_INFO will identify the device as device tree, which gives you
the opportunity to extend or replace vfio_region_info and vfio_irq_info.
It seems like it could even be done in a compatible way.  For example,
if you were to call VFIO_DEVICE_GET_REGION_INFO with argsz sizeof(struct vfio_region_info), the kernel could fill in all the info
up to that size and fill argsz with the size needed for the remaining
info.  You could then realloc the buffer and the kernel would add the
extra info on the next call, setting a flag for each additional field
returned.  Userspace could also just be sloppy and call it with a lot of
padding and get everything in one shot.

We'd need to define which flags have associated structures and define
those structures.  For instance, some require no space:

#define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)

Others imply a structure added to the end:

#define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)

struct vfio_devtree_region_info_index
{
	u32	index;
}

#define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)

struct vfio_devtree_region_info_path
{
	u32	len;
	u8	path[];
}

The order of the flags indicates the order of the structures at the end.
We'd need to have some rules about alignment, probably always dword
aligned.  I'm not sure if it would be necessary each structure to have a
length.  It would only be needed if we want to let userspace skip over
structures they don't understand how to parse.

Another idea is that the space after struct vfio_region/irq_info could
be a self describing capabilities area, much like PCI config space.
Starting immediately after the static structure we'd have:

struct vfio_info_cap_header
{
	u16	type;
	u16	next;
};

Where type defines the structure that follows and next indicates the
offset of then next header (could also be len of current cap).

Anyway, it seems like there are possibilities that would allow us to
extend the info ioctls in ways that would be generic for any device
type.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
@ 2013-07-03  3:07   ` Alex Williamson
  0 siblings, 0 replies; 51+ messages in thread
From: Alex Williamson @ 2013-07-03  3:07 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	Bhushan Bharat-R65777, kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
	Antonios Motakis, kvmarm@lists.cs.columbia.edu

On Tue, 2013-07-02 at 23:25 +0000, Yoder Stuart-B08248 wrote:
> The write-up below is the first draft of a proposal for how the kernel can expose
> platform devices to user space using vfio.
> 
> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> allows user space to correlate regions and interrupts to the corresponding
> device tree node structure that is defined for most platform devices.
> 
> Regards,
> Stuart Yoder
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing infrastructure for vfio-pci is pretty close to what we need:
>    -mechanism to create a container
>    -add groups/devices to a container
>    -set the IOMMU model
>    -map DMA regions
>    -get an fd for a specific device, which allows user space to determine
>     info about device regions (e.g. registers) and interrupt info
>    -support for mmapping device regions
>    -mechanism to set how interrupts are signaled
> 
> Platform devices can get complicated-- potentially with a tree hierarchy
> of nodes, and links/phandles pointing to other platform 
> devices.   The kernel doesn't expose relationships between
> devices.  The kernel just exposes mappable register regions and interrupts.
> It's up to user space to work out relationships between devices
> if it needs to-- this can be determined in the device tree exposed in
> /proc/device-tree.
> 
> I think the changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>   User space has to know which device it is accessing and will call
>   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>   get the device information:
> 
>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
> 
>   (whether the path is a device tree path or a sysfs path is up for
>   discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
> 
> 2.  VFIO_DEVICE_GET_INFO
> 
>    Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>    than adding a new flag identifying a devices as a 'platform'
>    device.
> 
>    This ioctl simply returns the number of regions and number of irqs.
> 
>    The number of regions corresponds to the number of regions
>    that can be mapped for the device-- corresponds to the regions defined
>    in "reg" and "ranges" in the device tree.  
> 
> 3.  VFIO_DEVICE_GET_REGION_INFO
> 
>    No changes needed, except perhaps adding a new flag.  Freescale has some
>    devices with regions that must be mapped cacheable.
> 
> 3.  VFIO_DEVICE_GET_IRQ_INFO
> 
>    No changes needed.
> 
> 4. VFIO_DEVICE_GET_DEVTREE_INFO
> 
>    The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>    expose device regions and interrupts, but it's not enough to know
>    that there are X regions and Y interrupts.  User space needs to
>    know what the resources are for-- to correlate those regions/interrupts
>    to the device tree structure that drivers use.  The device tree
>    structure could consist of multiple nodes and it is necessary to
>    identify the node corresponding to the region/interrupt exposed
>    by VFIO.
> 
>    The following information is needed:
>       -the device tree path to the node corresponding to the
>        region or interrupt
>       -for a region, whether it corresponds to a "reg" or "ranges"
>        property
>       -there could be multiple sub-regions per "reg" or "ranges" and
>        the sub-index within the reg/ranges is needed
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
> 
>    ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
>    
>    struct vfio_path_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a "ranges" property */

(1 << 0)?

Having flags = 0x0 for regs and 0x1 for ranges is a bit awkward.  I'd
suggest a bit for each.  Otherwise, what does it mean when this returns
flags = 0x0 for an irq?

>         __u32   index;          /* input: index of region or irq for which we are getting info */
>         __u32   type;           /* input: 0 - get devtree info for a region
>                                           1 - get devtree info for an irq
>                                  */
>         __u32   start;          /* output: identifies the index within the reg/ranges */
>         __u8    path[];         /* output: Full path to associated device tree node */
>    };
> 
>    User space allocates enough space for the device tree path, sets
>    the type field identifying whether this is a region, or irq,
>    and sets argsz appropriately.
> 
> 5.  EXAMPLE 1
> 
>     Example, Freescale SATA controller:
> 
>      sata@220000 {
>          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>          reg = <0x220000 0x1000>;
>          interrupts = <0x44 0x2 0x0 0x0>;
>      };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -1 region
>       -1 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>     for the single interrupt.
> 
>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>       -for region index 0:
>           flags: 0x0     // i.e. this is a "reg" property
>           start: 0x0     // i.e. index 0x0 in "reg"
>           path: "/soc@ffe000000/sata@220000"
> 
>       -for interrupt index 0:
>           path: "/soc@ffe000000/sata@220000"
> 
> 6.  EXAMPLE 2
> 
>     Example, Freescale crypto device (modified to illustrate):
> 
>      crypto@300000 {
>         compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
>         #address-cells = <0x1>;
>         #size-cells = <0x1>;
>         reg = <0x300000 0x10000>;
>         interrupts = <0x5c 0x2 0x0 0x0>;
>   
>         jr@1000 {
>            compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>            interrupts = <0x58 0x2 0x0 0x0>;
>         };
>   
>         jr@2000 {
>            compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>            interrupts = <0x59 0x2 0x0 0x0>;
>         };
>      };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -1 region
>       -3 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>     for each of the IRQs-- indexes 0-4.
> 
>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>       -for region index 0:
>           flags: 0x0     // i.e. this is a "reg" property
>           start: 0x0     // i.e. index 0x0 in "reg"
>           path: "/soc@ffe000000/crypto@300000"
> 
>       -for interrupt index 0:
>           path: "/soc@ffe000000/crypto@300000/jr@1000"
> 
>       -for interrupt index 1:
>           path: "/soc@ffe000000/crypto@300000/jr@2000"
> 
> 7.  EXAMPLE 3
> 
>     Example, Freescale DMA engine (modified to illustrate):
> 
>     dma@101300 {
>        cell-index = <0x1>;
>        ranges = <0x0 0x101100 0x200>;
>        reg = <0x101300 0x4>;
>        compatible = "fsl,eloplus-dma";
>        #size-cells = <0x1>;
>        #address-cells = <0x1>;
>        fsl,liodn = <0xc6>;
>     
>        dma-channel@180 {
>           interrupts = <0x23 0x2 0x0 0x0>;
>           cell-index = <0x3>;
>           reg = <0x180 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
>     
>        dma-channel@100 {
>           interrupts = <0x22 0x2 0x0 0x0>;
>           cell-index = <0x2>;
>           reg = <0x100 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
> 
>     };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -2 regions
>       -2 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>       -for index 1:
>            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>     for each of the IRQs-- indexes 0-3.
> 
>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>       -for region index 0:
>           flags: 0x1     // i.e. this is a "ranges" property
>           start: 0x0     // i.e. index 0x0 in "ranges"
>           path: "/soc@ffe000000/dma@101300"
> 
>       -for region index 1:
>           flags: 0x0     // i.e. this is a "reg" property
>           start: 0x0     // i.e. index 0x0 in "ranges"
>           path: "/soc@ffe000000/dma@101300"
> 
>       -for interrupt index 0:
>           path: "/soc@ffe000000/dma@101300/dma-channel@180"
> 
>       -for interrupt index 1:
>           path: "/soc@ffe000000/dma@101300/dma-channel@100"
> 
> 8.  Open Issues
> 
>    -how to handle cases where VFIO is requested to handle 
>     a device where the valid, mappable range for a region
>     is less than a page size.   See example above where an 
>     advertised region in the DMA node is 4 bytes.  If exposed
>     to a guest VM, the guest has to be able to map a full page
>     of I/O space which opens a potential security issue.

As AlexG points out, we solve that on vfio-pci by not supporting mmap on
those regions and only allowing read/write.  If you could make the
platform map regions on page size boundaries and there's nothing bad a
guest can do by accessing the empty space, you could still support mmap.
We can't make such requirements or guarantees on PCI though.  The PCI
spec also suggests for devices to use page size regions and high
performance devices generally follow that request, so it has become a
fallback for low performance devices and I/O port space, which we can't
mmap on x86 anyway.

So overall the interface and extension makes sense.  My only question is
whether it's better to get complete reuse out of GET_REGION_INFO and
GET_IRQ_INFO and then add another device tree specific ioctl or is it
better to add a device tree index and path to the existing GET_*_INFO
ioctls?  Getting some information from one ioctl and passing pieces of
it back to another ioctl feels a little clunky.

DEVICE_GET_INFO will identify the device as device tree, which gives you
the opportunity to extend or replace vfio_region_info and vfio_irq_info.
It seems like it could even be done in a compatible way.  For example,
if you were to call VFIO_DEVICE_GET_REGION_INFO with argsz =
sizeof(struct vfio_region_info), the kernel could fill in all the info
up to that size and fill argsz with the size needed for the remaining
info.  You could then realloc the buffer and the kernel would add the
extra info on the next call, setting a flag for each additional field
returned.  Userspace could also just be sloppy and call it with a lot of
padding and get everything in one shot.

We'd need to define which flags have associated structures and define
those structures.  For instance, some require no space:

#define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)

Others imply a structure added to the end:

#define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)

struct vfio_devtree_region_info_index
{
	u32	index;
}

#define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)

struct vfio_devtree_region_info_path
{
	u32	len;
	u8	path[];
}

The order of the flags indicates the order of the structures at the end.
We'd need to have some rules about alignment, probably always dword
aligned.  I'm not sure if it would be necessary each structure to have a
length.  It would only be needed if we want to let userspace skip over
structures they don't understand how to parse.

Another idea is that the space after struct vfio_region/irq_info could
be a self describing capabilities area, much like PCI config space.
Starting immediately after the static structure we'd have:

struct vfio_info_cap_header
{
	u16	type;
	u16	next;
};

Where type defines the structure that follows and next indicates the
offset of then next header (could also be len of current cap).

Anyway, it seems like there are possibilities that would allow us to
extend the info ioctls in ways that would be generic for any device
type.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-03  3:07   ` Alex Williamson
@ 2013-07-03 10:44     ` Antonios Motakis
  -1 siblings, 0 replies; 51+ messages in thread
From: Antonios Motakis @ 2013-07-03 10:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yoder Stuart-B08248, Alexander Graf, Wood Scott-B07421,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On Wed, Jul 3, 2013 at 5:07 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2013-07-02 at 23:25 +0000, Yoder Stuart-B08248 wrote:
>> The write-up below is the first draft of a proposal for how the kernel can expose
>> platform devices to user space using vfio.
>>
>> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
>> allows user space to correlate regions and interrupts to the corresponding
>> device tree node structure that is defined for most platform devices.
>>
>> Regards,
>> Stuart Yoder
>>
>> ------------------------------------------------------------------------------
>> VFIO for Platform Devices
>>
>> The existing infrastructure for vfio-pci is pretty close to what we need:
>>    -mechanism to create a container
>>    -add groups/devices to a container
>>    -set the IOMMU model
>>    -map DMA regions
>>    -get an fd for a specific device, which allows user space to determine
>>     info about device regions (e.g. registers) and interrupt info
>>    -support for mmapping device regions
>>    -mechanism to set how interrupts are signaled
>>
>> Platform devices can get complicated-- potentially with a tree hierarchy
>> of nodes, and links/phandles pointing to other platform
>> devices.   The kernel doesn't expose relationships between
>> devices.  The kernel just exposes mappable register regions and interrupts.
>> It's up to user space to work out relationships between devices
>> if it needs to-- this can be determined in the device tree exposed in
>> /proc/device-tree.
>>
>> I think the changes needed for vfio are around some of the device tree
>> related info that needs to be available with the device fd.
>>
>> 1.  VFIO_GROUP_GET_DEVICE_FD
>>
>>   User space has to know which device it is accessing and will call
>>   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>>   get the device information:
>>
>>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
>>
>>   (whether the path is a device tree path or a sysfs path is up for
>>   discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
>>
>> 2.  VFIO_DEVICE_GET_INFO
>>
>>    Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>>    than adding a new flag identifying a devices as a 'platform'
>>    device.
>>
>>    This ioctl simply returns the number of regions and number of irqs.
>>
>>    The number of regions corresponds to the number of regions
>>    that can be mapped for the device-- corresponds to the regions defined
>>    in "reg" and "ranges" in the device tree.
>>
>> 3.  VFIO_DEVICE_GET_REGION_INFO
>>
>>    No changes needed, except perhaps adding a new flag.  Freescale has some
>>    devices with regions that must be mapped cacheable.
>>
>> 3.  VFIO_DEVICE_GET_IRQ_INFO
>>
>>    No changes needed.
>>
>> 4. VFIO_DEVICE_GET_DEVTREE_INFO
>>
>>    The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>>    expose device regions and interrupts, but it's not enough to know
>>    that there are X regions and Y interrupts.  User space needs to
>>    know what the resources are for-- to correlate those regions/interrupts
>>    to the device tree structure that drivers use.  The device tree
>>    structure could consist of multiple nodes and it is necessary to
>>    identify the node corresponding to the region/interrupt exposed
>>    by VFIO.
>>
>>    The following information is needed:
>>       -the device tree path to the node corresponding to the
>>        region or interrupt
>>       -for a region, whether it corresponds to a "reg" or "ranges"
>>        property
>>       -there could be multiple sub-regions per "reg" or "ranges" and
>>        the sub-index within the reg/ranges is needed
>>
>>    The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
>>
>>    ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
>>
>>    struct vfio_path_info {
>>         __u32   argsz;
>>         __u32   flags;
>>    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a "ranges" property */
>
> (1 << 0)?
>
> Having flags = 0x0 for regs and 0x1 for ranges is a bit awkward.  I'd
> suggest a bit for each.  Otherwise, what does it mean when this returns
> flags = 0x0 for an irq?
>
>>         __u32   index;          /* input: index of region or irq for which we are getting info */
>>         __u32   type;           /* input: 0 - get devtree info for a region
>>                                           1 - get devtree info for an irq
>>                                  */
>>         __u32   start;          /* output: identifies the index within the reg/ranges */
>>         __u8    path[];         /* output: Full path to associated device tree node */
>>    };
>>
>>    User space allocates enough space for the device tree path, sets
>>    the type field identifying whether this is a region, or irq,
>>    and sets argsz appropriately.
>>
>> 5.  EXAMPLE 1
>>
>>     Example, Freescale SATA controller:
>>
>>      sata@220000 {
>>          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>>          reg = <0x220000 0x1000>;
>>          interrupts = <0x44 0x2 0x0 0x0>;
>>      };
>>
>>     request to get device FD would look like:
>>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
>>
>>     The VFIO_DEVICE_GET_INFO ioctl would return:
>>       -1 region
>>       -1 interrupts
>>
>>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>>       -for index 0:
>>            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>>
>>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>>     for the single interrupt.
>>
>>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>>       -for region index 0:
>>           flags: 0x0     // i.e. this is a "reg" property
>>           start: 0x0     // i.e. index 0x0 in "reg"
>>           path: "/soc@ffe000000/sata@220000"
>>
>>       -for interrupt index 0:
>>           path: "/soc@ffe000000/sata@220000"
>>
>> 6.  EXAMPLE 2
>>
>>     Example, Freescale crypto device (modified to illustrate):
>>
>>      crypto@300000 {
>>         compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
>>         #address-cells = <0x1>;
>>         #size-cells = <0x1>;
>>         reg = <0x300000 0x10000>;
>>         interrupts = <0x5c 0x2 0x0 0x0>;
>>
>>         jr@1000 {
>>            compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>>            interrupts = <0x58 0x2 0x0 0x0>;
>>         };
>>
>>         jr@2000 {
>>            compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>>            interrupts = <0x59 0x2 0x0 0x0>;
>>         };
>>      };
>>
>>     request to get device FD would look like:
>>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
>>
>>     The VFIO_DEVICE_GET_INFO ioctl would return:
>>       -1 region
>>       -3 interrupts
>>
>>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>>       -for index 0:
>>            offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
>>
>>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>>     for each of the IRQs-- indexes 0-4.
>>
>>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>>       -for region index 0:
>>           flags: 0x0     // i.e. this is a "reg" property
>>           start: 0x0     // i.e. index 0x0 in "reg"
>>           path: "/soc@ffe000000/crypto@300000"
>>
>>       -for interrupt index 0:
>>           path: "/soc@ffe000000/crypto@300000/jr@1000"
>>
>>       -for interrupt index 1:
>>           path: "/soc@ffe000000/crypto@300000/jr@2000"
>>
>> 7.  EXAMPLE 3
>>
>>     Example, Freescale DMA engine (modified to illustrate):
>>
>>     dma@101300 {
>>        cell-index = <0x1>;
>>        ranges = <0x0 0x101100 0x200>;
>>        reg = <0x101300 0x4>;
>>        compatible = "fsl,eloplus-dma";
>>        #size-cells = <0x1>;
>>        #address-cells = <0x1>;
>>        fsl,liodn = <0xc6>;
>>
>>        dma-channel@180 {
>>           interrupts = <0x23 0x2 0x0 0x0>;
>>           cell-index = <0x3>;
>>           reg = <0x180 0x80>;
>>           compatible = "fsl,eloplus-dma-channel";
>>        };
>>
>>        dma-channel@100 {
>>           interrupts = <0x22 0x2 0x0 0x0>;
>>           cell-index = <0x2>;
>>           reg = <0x100 0x80>;
>>           compatible = "fsl,eloplus-dma-channel";
>>        };
>>
>>     };
>>
>>     request to get device FD would look like:
>>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
>>
>>     The VFIO_DEVICE_GET_INFO ioctl would return:
>>       -2 regions
>>       -2 interrupts
>>
>>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>>       -for index 0:
>>            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>>       -for index 1:
>>            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>>
>>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>>     for each of the IRQs-- indexes 0-3.
>>
>>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>>       -for region index 0:
>>           flags: 0x1     // i.e. this is a "ranges" property
>>           start: 0x0     // i.e. index 0x0 in "ranges"
>>           path: "/soc@ffe000000/dma@101300"
>>
>>       -for region index 1:
>>           flags: 0x0     // i.e. this is a "reg" property
>>           start: 0x0     // i.e. index 0x0 in "ranges"
>>           path: "/soc@ffe000000/dma@101300"
>>
>>       -for interrupt index 0:
>>           path: "/soc@ffe000000/dma@101300/dma-channel@180"
>>
>>       -for interrupt index 1:
>>           path: "/soc@ffe000000/dma@101300/dma-channel@100"
>>
>> 8.  Open Issues
>>
>>    -how to handle cases where VFIO is requested to handle
>>     a device where the valid, mappable range for a region
>>     is less than a page size.   See example above where an
>>     advertised region in the DMA node is 4 bytes.  If exposed
>>     to a guest VM, the guest has to be able to map a full page
>>     of I/O space which opens a potential security issue.
>
> As AlexG points out, we solve that on vfio-pci by not supporting mmap on
> those regions and only allowing read/write.  If you could make the
> platform map regions on page size boundaries and there's nothing bad a
> guest can do by accessing the empty space, you could still support mmap.
> We can't make such requirements or guarantees on PCI though.  The PCI
> spec also suggests for devices to use page size regions and high
> performance devices generally follow that request, so it has become a
> fallback for low performance devices and I/O port space, which we can't
> mmap on x86 anyway.
>
> So overall the interface and extension makes sense.  My only question is
> whether it's better to get complete reuse out of GET_REGION_INFO and
> GET_IRQ_INFO and then add another device tree specific ioctl or is it
> better to add a device tree index and path to the existing GET_*_INFO
> ioctls?  Getting some information from one ioctl and passing pieces of
> it back to another ioctl feels a little clunky.
>

I thing at this point we should clearly separate the info we need to
pass for the core functionality (assigning the device's resources),
and the information we want to pass in order to generate a guest DT.
For ARM a DT is not generated by QEMU yet, but instead a proper DTB
needs to be passed by the user (granted, this will not be the case for
ever). So I think even if we treat them the same in code, we should be
discussing them separately.

Other than that I think it is preferable to extend the existing ioctls
rather than add new ones.

> DEVICE_GET_INFO will identify the device as device tree, which gives you
> the opportunity to extend or replace vfio_region_info and vfio_irq_info.
> It seems like it could even be done in a compatible way.  For example,
> if you were to call VFIO_DEVICE_GET_REGION_INFO with argsz > sizeof(struct vfio_region_info), the kernel could fill in all the info
> up to that size and fill argsz with the size needed for the remaining
> info.  You could then realloc the buffer and the kernel would add the
> extra info on the next call, setting a flag for each additional field
> returned.  Userspace could also just be sloppy and call it with a lot of
> padding and get everything in one shot.
>
> We'd need to define which flags have associated structures and define
> those structures.  For instance, some require no space:
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
> #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
>
> Others imply a structure added to the end:
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
>
> struct vfio_devtree_region_info_index
> {
>         u32     index;
> }
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
>
> struct vfio_devtree_region_info_path
> {
>         u32     len;
>         u8      path[];
> }
>
> The order of the flags indicates the order of the structures at the end.
> We'd need to have some rules about alignment, probably always dword
> aligned.  I'm not sure if it would be necessary each structure to have a
> length.  It would only be needed if we want to let userspace skip over
> structures they don't understand how to parse.
>
> Another idea is that the space after struct vfio_region/irq_info could
> be a self describing capabilities area, much like PCI config space.
> Starting immediately after the static structure we'd have:
>
> struct vfio_info_cap_header
> {
>         u16     type;
>         u16     next;
> };
>
> Where type defines the structure that follows and next indicates the
> offset of then next header (could also be len of current cap).
>
> Anyway, it seems like there are possibilities that would allow us to
> extend the info ioctls in ways that would be generic for any device
> type.  Thanks,
>
> Alex
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-03  3:07   ` Alex Williamson
  (?)
@ 2013-07-03 10:44   ` Antonios Motakis
  -1 siblings, 0 replies; 51+ messages in thread
From: Antonios Motakis @ 2013-07-03 10:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Yoder Stuart-B08248,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	kvmarm@lists.cs.columbia.edu

On Wed, Jul 3, 2013 at 5:07 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2013-07-02 at 23:25 +0000, Yoder Stuart-B08248 wrote:
>> The write-up below is the first draft of a proposal for how the kernel can expose
>> platform devices to user space using vfio.
>>
>> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
>> allows user space to correlate regions and interrupts to the corresponding
>> device tree node structure that is defined for most platform devices.
>>
>> Regards,
>> Stuart Yoder
>>
>> ------------------------------------------------------------------------------
>> VFIO for Platform Devices
>>
>> The existing infrastructure for vfio-pci is pretty close to what we need:
>>    -mechanism to create a container
>>    -add groups/devices to a container
>>    -set the IOMMU model
>>    -map DMA regions
>>    -get an fd for a specific device, which allows user space to determine
>>     info about device regions (e.g. registers) and interrupt info
>>    -support for mmapping device regions
>>    -mechanism to set how interrupts are signaled
>>
>> Platform devices can get complicated-- potentially with a tree hierarchy
>> of nodes, and links/phandles pointing to other platform
>> devices.   The kernel doesn't expose relationships between
>> devices.  The kernel just exposes mappable register regions and interrupts.
>> It's up to user space to work out relationships between devices
>> if it needs to-- this can be determined in the device tree exposed in
>> /proc/device-tree.
>>
>> I think the changes needed for vfio are around some of the device tree
>> related info that needs to be available with the device fd.
>>
>> 1.  VFIO_GROUP_GET_DEVICE_FD
>>
>>   User space has to know which device it is accessing and will call
>>   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>>   get the device information:
>>
>>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
>>
>>   (whether the path is a device tree path or a sysfs path is up for
>>   discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
>>
>> 2.  VFIO_DEVICE_GET_INFO
>>
>>    Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>>    than adding a new flag identifying a devices as a 'platform'
>>    device.
>>
>>    This ioctl simply returns the number of regions and number of irqs.
>>
>>    The number of regions corresponds to the number of regions
>>    that can be mapped for the device-- corresponds to the regions defined
>>    in "reg" and "ranges" in the device tree.
>>
>> 3.  VFIO_DEVICE_GET_REGION_INFO
>>
>>    No changes needed, except perhaps adding a new flag.  Freescale has some
>>    devices with regions that must be mapped cacheable.
>>
>> 3.  VFIO_DEVICE_GET_IRQ_INFO
>>
>>    No changes needed.
>>
>> 4. VFIO_DEVICE_GET_DEVTREE_INFO
>>
>>    The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>>    expose device regions and interrupts, but it's not enough to know
>>    that there are X regions and Y interrupts.  User space needs to
>>    know what the resources are for-- to correlate those regions/interrupts
>>    to the device tree structure that drivers use.  The device tree
>>    structure could consist of multiple nodes and it is necessary to
>>    identify the node corresponding to the region/interrupt exposed
>>    by VFIO.
>>
>>    The following information is needed:
>>       -the device tree path to the node corresponding to the
>>        region or interrupt
>>       -for a region, whether it corresponds to a "reg" or "ranges"
>>        property
>>       -there could be multiple sub-regions per "reg" or "ranges" and
>>        the sub-index within the reg/ranges is needed
>>
>>    The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
>>
>>    ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
>>
>>    struct vfio_path_info {
>>         __u32   argsz;
>>         __u32   flags;
>>    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a "ranges" property */
>
> (1 << 0)?
>
> Having flags = 0x0 for regs and 0x1 for ranges is a bit awkward.  I'd
> suggest a bit for each.  Otherwise, what does it mean when this returns
> flags = 0x0 for an irq?
>
>>         __u32   index;          /* input: index of region or irq for which we are getting info */
>>         __u32   type;           /* input: 0 - get devtree info for a region
>>                                           1 - get devtree info for an irq
>>                                  */
>>         __u32   start;          /* output: identifies the index within the reg/ranges */
>>         __u8    path[];         /* output: Full path to associated device tree node */
>>    };
>>
>>    User space allocates enough space for the device tree path, sets
>>    the type field identifying whether this is a region, or irq,
>>    and sets argsz appropriately.
>>
>> 5.  EXAMPLE 1
>>
>>     Example, Freescale SATA controller:
>>
>>      sata@220000 {
>>          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>>          reg = <0x220000 0x1000>;
>>          interrupts = <0x44 0x2 0x0 0x0>;
>>      };
>>
>>     request to get device FD would look like:
>>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
>>
>>     The VFIO_DEVICE_GET_INFO ioctl would return:
>>       -1 region
>>       -1 interrupts
>>
>>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>>       -for index 0:
>>            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>>
>>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>>     for the single interrupt.
>>
>>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>>       -for region index 0:
>>           flags: 0x0     // i.e. this is a "reg" property
>>           start: 0x0     // i.e. index 0x0 in "reg"
>>           path: "/soc@ffe000000/sata@220000"
>>
>>       -for interrupt index 0:
>>           path: "/soc@ffe000000/sata@220000"
>>
>> 6.  EXAMPLE 2
>>
>>     Example, Freescale crypto device (modified to illustrate):
>>
>>      crypto@300000 {
>>         compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
>>         #address-cells = <0x1>;
>>         #size-cells = <0x1>;
>>         reg = <0x300000 0x10000>;
>>         interrupts = <0x5c 0x2 0x0 0x0>;
>>
>>         jr@1000 {
>>            compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>>            interrupts = <0x58 0x2 0x0 0x0>;
>>         };
>>
>>         jr@2000 {
>>            compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>>            interrupts = <0x59 0x2 0x0 0x0>;
>>         };
>>      };
>>
>>     request to get device FD would look like:
>>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
>>
>>     The VFIO_DEVICE_GET_INFO ioctl would return:
>>       -1 region
>>       -3 interrupts
>>
>>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>>       -for index 0:
>>            offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
>>
>>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>>     for each of the IRQs-- indexes 0-4.
>>
>>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>>       -for region index 0:
>>           flags: 0x0     // i.e. this is a "reg" property
>>           start: 0x0     // i.e. index 0x0 in "reg"
>>           path: "/soc@ffe000000/crypto@300000"
>>
>>       -for interrupt index 0:
>>           path: "/soc@ffe000000/crypto@300000/jr@1000"
>>
>>       -for interrupt index 1:
>>           path: "/soc@ffe000000/crypto@300000/jr@2000"
>>
>> 7.  EXAMPLE 3
>>
>>     Example, Freescale DMA engine (modified to illustrate):
>>
>>     dma@101300 {
>>        cell-index = <0x1>;
>>        ranges = <0x0 0x101100 0x200>;
>>        reg = <0x101300 0x4>;
>>        compatible = "fsl,eloplus-dma";
>>        #size-cells = <0x1>;
>>        #address-cells = <0x1>;
>>        fsl,liodn = <0xc6>;
>>
>>        dma-channel@180 {
>>           interrupts = <0x23 0x2 0x0 0x0>;
>>           cell-index = <0x3>;
>>           reg = <0x180 0x80>;
>>           compatible = "fsl,eloplus-dma-channel";
>>        };
>>
>>        dma-channel@100 {
>>           interrupts = <0x22 0x2 0x0 0x0>;
>>           cell-index = <0x2>;
>>           reg = <0x100 0x80>;
>>           compatible = "fsl,eloplus-dma-channel";
>>        };
>>
>>     };
>>
>>     request to get device FD would look like:
>>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
>>
>>     The VFIO_DEVICE_GET_INFO ioctl would return:
>>       -2 regions
>>       -2 interrupts
>>
>>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>>       -for index 0:
>>            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>>       -for index 1:
>>            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>>
>>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>>     for each of the IRQs-- indexes 0-3.
>>
>>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>>       -for region index 0:
>>           flags: 0x1     // i.e. this is a "ranges" property
>>           start: 0x0     // i.e. index 0x0 in "ranges"
>>           path: "/soc@ffe000000/dma@101300"
>>
>>       -for region index 1:
>>           flags: 0x0     // i.e. this is a "reg" property
>>           start: 0x0     // i.e. index 0x0 in "ranges"
>>           path: "/soc@ffe000000/dma@101300"
>>
>>       -for interrupt index 0:
>>           path: "/soc@ffe000000/dma@101300/dma-channel@180"
>>
>>       -for interrupt index 1:
>>           path: "/soc@ffe000000/dma@101300/dma-channel@100"
>>
>> 8.  Open Issues
>>
>>    -how to handle cases where VFIO is requested to handle
>>     a device where the valid, mappable range for a region
>>     is less than a page size.   See example above where an
>>     advertised region in the DMA node is 4 bytes.  If exposed
>>     to a guest VM, the guest has to be able to map a full page
>>     of I/O space which opens a potential security issue.
>
> As AlexG points out, we solve that on vfio-pci by not supporting mmap on
> those regions and only allowing read/write.  If you could make the
> platform map regions on page size boundaries and there's nothing bad a
> guest can do by accessing the empty space, you could still support mmap.
> We can't make such requirements or guarantees on PCI though.  The PCI
> spec also suggests for devices to use page size regions and high
> performance devices generally follow that request, so it has become a
> fallback for low performance devices and I/O port space, which we can't
> mmap on x86 anyway.
>
> So overall the interface and extension makes sense.  My only question is
> whether it's better to get complete reuse out of GET_REGION_INFO and
> GET_IRQ_INFO and then add another device tree specific ioctl or is it
> better to add a device tree index and path to the existing GET_*_INFO
> ioctls?  Getting some information from one ioctl and passing pieces of
> it back to another ioctl feels a little clunky.
>

I thing at this point we should clearly separate the info we need to
pass for the core functionality (assigning the device's resources),
and the information we want to pass in order to generate a guest DT.
For ARM a DT is not generated by QEMU yet, but instead a proper DTB
needs to be passed by the user (granted, this will not be the case for
ever). So I think even if we treat them the same in code, we should be
discussing them separately.

Other than that I think it is preferable to extend the existing ioctls
rather than add new ones.

> DEVICE_GET_INFO will identify the device as device tree, which gives you
> the opportunity to extend or replace vfio_region_info and vfio_irq_info.
> It seems like it could even be done in a compatible way.  For example,
> if you were to call VFIO_DEVICE_GET_REGION_INFO with argsz =
> sizeof(struct vfio_region_info), the kernel could fill in all the info
> up to that size and fill argsz with the size needed for the remaining
> info.  You could then realloc the buffer and the kernel would add the
> extra info on the next call, setting a flag for each additional field
> returned.  Userspace could also just be sloppy and call it with a lot of
> padding and get everything in one shot.
>
> We'd need to define which flags have associated structures and define
> those structures.  For instance, some require no space:
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
> #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
>
> Others imply a structure added to the end:
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
>
> struct vfio_devtree_region_info_index
> {
>         u32     index;
> }
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
>
> struct vfio_devtree_region_info_path
> {
>         u32     len;
>         u8      path[];
> }
>
> The order of the flags indicates the order of the structures at the end.
> We'd need to have some rules about alignment, probably always dword
> aligned.  I'm not sure if it would be necessary each structure to have a
> length.  It would only be needed if we want to let userspace skip over
> structures they don't understand how to parse.
>
> Another idea is that the space after struct vfio_region/irq_info could
> be a self describing capabilities area, much like PCI config space.
> Starting immediately after the static structure we'd have:
>
> struct vfio_info_cap_header
> {
>         u16     type;
>         u16     next;
> };
>
> Where type defines the structure that follows and next indicates the
> offset of then next header (could also be len of current cap).
>
> Anyway, it seems like there are possibilities that would allow us to
> extend the info ioctls in ways that would be generic for any device
> type.  Thanks,
>
> Alex
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
@ 2013-07-03 10:44     ` Antonios Motakis
  0 siblings, 0 replies; 51+ messages in thread
From: Antonios Motakis @ 2013-07-03 10:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yoder Stuart-B08248, Alexander Graf, Wood Scott-B07421,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On Wed, Jul 3, 2013 at 5:07 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2013-07-02 at 23:25 +0000, Yoder Stuart-B08248 wrote:
>> The write-up below is the first draft of a proposal for how the kernel can expose
>> platform devices to user space using vfio.
>>
>> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
>> allows user space to correlate regions and interrupts to the corresponding
>> device tree node structure that is defined for most platform devices.
>>
>> Regards,
>> Stuart Yoder
>>
>> ------------------------------------------------------------------------------
>> VFIO for Platform Devices
>>
>> The existing infrastructure for vfio-pci is pretty close to what we need:
>>    -mechanism to create a container
>>    -add groups/devices to a container
>>    -set the IOMMU model
>>    -map DMA regions
>>    -get an fd for a specific device, which allows user space to determine
>>     info about device regions (e.g. registers) and interrupt info
>>    -support for mmapping device regions
>>    -mechanism to set how interrupts are signaled
>>
>> Platform devices can get complicated-- potentially with a tree hierarchy
>> of nodes, and links/phandles pointing to other platform
>> devices.   The kernel doesn't expose relationships between
>> devices.  The kernel just exposes mappable register regions and interrupts.
>> It's up to user space to work out relationships between devices
>> if it needs to-- this can be determined in the device tree exposed in
>> /proc/device-tree.
>>
>> I think the changes needed for vfio are around some of the device tree
>> related info that needs to be available with the device fd.
>>
>> 1.  VFIO_GROUP_GET_DEVICE_FD
>>
>>   User space has to know which device it is accessing and will call
>>   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>>   get the device information:
>>
>>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
>>
>>   (whether the path is a device tree path or a sysfs path is up for
>>   discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
>>
>> 2.  VFIO_DEVICE_GET_INFO
>>
>>    Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>>    than adding a new flag identifying a devices as a 'platform'
>>    device.
>>
>>    This ioctl simply returns the number of regions and number of irqs.
>>
>>    The number of regions corresponds to the number of regions
>>    that can be mapped for the device-- corresponds to the regions defined
>>    in "reg" and "ranges" in the device tree.
>>
>> 3.  VFIO_DEVICE_GET_REGION_INFO
>>
>>    No changes needed, except perhaps adding a new flag.  Freescale has some
>>    devices with regions that must be mapped cacheable.
>>
>> 3.  VFIO_DEVICE_GET_IRQ_INFO
>>
>>    No changes needed.
>>
>> 4. VFIO_DEVICE_GET_DEVTREE_INFO
>>
>>    The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>>    expose device regions and interrupts, but it's not enough to know
>>    that there are X regions and Y interrupts.  User space needs to
>>    know what the resources are for-- to correlate those regions/interrupts
>>    to the device tree structure that drivers use.  The device tree
>>    structure could consist of multiple nodes and it is necessary to
>>    identify the node corresponding to the region/interrupt exposed
>>    by VFIO.
>>
>>    The following information is needed:
>>       -the device tree path to the node corresponding to the
>>        region or interrupt
>>       -for a region, whether it corresponds to a "reg" or "ranges"
>>        property
>>       -there could be multiple sub-regions per "reg" or "ranges" and
>>        the sub-index within the reg/ranges is needed
>>
>>    The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
>>
>>    ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
>>
>>    struct vfio_path_info {
>>         __u32   argsz;
>>         __u32   flags;
>>    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a "ranges" property */
>
> (1 << 0)?
>
> Having flags = 0x0 for regs and 0x1 for ranges is a bit awkward.  I'd
> suggest a bit for each.  Otherwise, what does it mean when this returns
> flags = 0x0 for an irq?
>
>>         __u32   index;          /* input: index of region or irq for which we are getting info */
>>         __u32   type;           /* input: 0 - get devtree info for a region
>>                                           1 - get devtree info for an irq
>>                                  */
>>         __u32   start;          /* output: identifies the index within the reg/ranges */
>>         __u8    path[];         /* output: Full path to associated device tree node */
>>    };
>>
>>    User space allocates enough space for the device tree path, sets
>>    the type field identifying whether this is a region, or irq,
>>    and sets argsz appropriately.
>>
>> 5.  EXAMPLE 1
>>
>>     Example, Freescale SATA controller:
>>
>>      sata@220000 {
>>          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>>          reg = <0x220000 0x1000>;
>>          interrupts = <0x44 0x2 0x0 0x0>;
>>      };
>>
>>     request to get device FD would look like:
>>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
>>
>>     The VFIO_DEVICE_GET_INFO ioctl would return:
>>       -1 region
>>       -1 interrupts
>>
>>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>>       -for index 0:
>>            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>>
>>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>>     for the single interrupt.
>>
>>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>>       -for region index 0:
>>           flags: 0x0     // i.e. this is a "reg" property
>>           start: 0x0     // i.e. index 0x0 in "reg"
>>           path: "/soc@ffe000000/sata@220000"
>>
>>       -for interrupt index 0:
>>           path: "/soc@ffe000000/sata@220000"
>>
>> 6.  EXAMPLE 2
>>
>>     Example, Freescale crypto device (modified to illustrate):
>>
>>      crypto@300000 {
>>         compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
>>         #address-cells = <0x1>;
>>         #size-cells = <0x1>;
>>         reg = <0x300000 0x10000>;
>>         interrupts = <0x5c 0x2 0x0 0x0>;
>>
>>         jr@1000 {
>>            compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>>            interrupts = <0x58 0x2 0x0 0x0>;
>>         };
>>
>>         jr@2000 {
>>            compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>>            interrupts = <0x59 0x2 0x0 0x0>;
>>         };
>>      };
>>
>>     request to get device FD would look like:
>>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
>>
>>     The VFIO_DEVICE_GET_INFO ioctl would return:
>>       -1 region
>>       -3 interrupts
>>
>>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>>       -for index 0:
>>            offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
>>
>>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>>     for each of the IRQs-- indexes 0-4.
>>
>>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>>       -for region index 0:
>>           flags: 0x0     // i.e. this is a "reg" property
>>           start: 0x0     // i.e. index 0x0 in "reg"
>>           path: "/soc@ffe000000/crypto@300000"
>>
>>       -for interrupt index 0:
>>           path: "/soc@ffe000000/crypto@300000/jr@1000"
>>
>>       -for interrupt index 1:
>>           path: "/soc@ffe000000/crypto@300000/jr@2000"
>>
>> 7.  EXAMPLE 3
>>
>>     Example, Freescale DMA engine (modified to illustrate):
>>
>>     dma@101300 {
>>        cell-index = <0x1>;
>>        ranges = <0x0 0x101100 0x200>;
>>        reg = <0x101300 0x4>;
>>        compatible = "fsl,eloplus-dma";
>>        #size-cells = <0x1>;
>>        #address-cells = <0x1>;
>>        fsl,liodn = <0xc6>;
>>
>>        dma-channel@180 {
>>           interrupts = <0x23 0x2 0x0 0x0>;
>>           cell-index = <0x3>;
>>           reg = <0x180 0x80>;
>>           compatible = "fsl,eloplus-dma-channel";
>>        };
>>
>>        dma-channel@100 {
>>           interrupts = <0x22 0x2 0x0 0x0>;
>>           cell-index = <0x2>;
>>           reg = <0x100 0x80>;
>>           compatible = "fsl,eloplus-dma-channel";
>>        };
>>
>>     };
>>
>>     request to get device FD would look like:
>>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
>>
>>     The VFIO_DEVICE_GET_INFO ioctl would return:
>>       -2 regions
>>       -2 interrupts
>>
>>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>>       -for index 0:
>>            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>>       -for index 1:
>>            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>>
>>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>>     for each of the IRQs-- indexes 0-3.
>>
>>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>>       -for region index 0:
>>           flags: 0x1     // i.e. this is a "ranges" property
>>           start: 0x0     // i.e. index 0x0 in "ranges"
>>           path: "/soc@ffe000000/dma@101300"
>>
>>       -for region index 1:
>>           flags: 0x0     // i.e. this is a "reg" property
>>           start: 0x0     // i.e. index 0x0 in "ranges"
>>           path: "/soc@ffe000000/dma@101300"
>>
>>       -for interrupt index 0:
>>           path: "/soc@ffe000000/dma@101300/dma-channel@180"
>>
>>       -for interrupt index 1:
>>           path: "/soc@ffe000000/dma@101300/dma-channel@100"
>>
>> 8.  Open Issues
>>
>>    -how to handle cases where VFIO is requested to handle
>>     a device where the valid, mappable range for a region
>>     is less than a page size.   See example above where an
>>     advertised region in the DMA node is 4 bytes.  If exposed
>>     to a guest VM, the guest has to be able to map a full page
>>     of I/O space which opens a potential security issue.
>
> As AlexG points out, we solve that on vfio-pci by not supporting mmap on
> those regions and only allowing read/write.  If you could make the
> platform map regions on page size boundaries and there's nothing bad a
> guest can do by accessing the empty space, you could still support mmap.
> We can't make such requirements or guarantees on PCI though.  The PCI
> spec also suggests for devices to use page size regions and high
> performance devices generally follow that request, so it has become a
> fallback for low performance devices and I/O port space, which we can't
> mmap on x86 anyway.
>
> So overall the interface and extension makes sense.  My only question is
> whether it's better to get complete reuse out of GET_REGION_INFO and
> GET_IRQ_INFO and then add another device tree specific ioctl or is it
> better to add a device tree index and path to the existing GET_*_INFO
> ioctls?  Getting some information from one ioctl and passing pieces of
> it back to another ioctl feels a little clunky.
>

I thing at this point we should clearly separate the info we need to
pass for the core functionality (assigning the device's resources),
and the information we want to pass in order to generate a guest DT.
For ARM a DT is not generated by QEMU yet, but instead a proper DTB
needs to be passed by the user (granted, this will not be the case for
ever). So I think even if we treat them the same in code, we should be
discussing them separately.

Other than that I think it is preferable to extend the existing ioctls
rather than add new ones.

> DEVICE_GET_INFO will identify the device as device tree, which gives you
> the opportunity to extend or replace vfio_region_info and vfio_irq_info.
> It seems like it could even be done in a compatible way.  For example,
> if you were to call VFIO_DEVICE_GET_REGION_INFO with argsz =
> sizeof(struct vfio_region_info), the kernel could fill in all the info
> up to that size and fill argsz with the size needed for the remaining
> info.  You could then realloc the buffer and the kernel would add the
> extra info on the next call, setting a flag for each additional field
> returned.  Userspace could also just be sloppy and call it with a lot of
> padding and get everything in one shot.
>
> We'd need to define which flags have associated structures and define
> those structures.  For instance, some require no space:
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
> #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
>
> Others imply a structure added to the end:
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
>
> struct vfio_devtree_region_info_index
> {
>         u32     index;
> }
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
>
> struct vfio_devtree_region_info_path
> {
>         u32     len;
>         u8      path[];
> }
>
> The order of the flags indicates the order of the structures at the end.
> We'd need to have some rules about alignment, probably always dword
> aligned.  I'm not sure if it would be necessary each structure to have a
> length.  It would only be needed if we want to let userspace skip over
> structures they don't understand how to parse.
>
> Another idea is that the space after struct vfio_region/irq_info could
> be a self describing capabilities area, much like PCI config space.
> Starting immediately after the static structure we'd have:
>
> struct vfio_info_cap_header
> {
>         u16     type;
>         u16     next;
> };
>
> Where type defines the structure that follows and next indicates the
> offset of then next header (could also be len of current cap).
>
> Anyway, it seems like there are possibilities that would allow us to
> extend the info ioctls in ways that would be generic for any device
> type.  Thanks,
>
> Alex
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices
  2013-07-03  3:07   ` Alex Williamson
@ 2013-07-03 17:20     ` Yoder Stuart-B08248
  -1 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 17:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	Bhushan Bharat-R65777, kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
	Antonios Motakis, kvmarm@lists.cs.columbia.edu

W2N1dF0NCg0KPiBTbyBvdmVyYWxsIHRoZSBpbnRlcmZhY2UgYW5kIGV4dGVuc2lvbiBtYWtlcyBz
ZW5zZS4gIE15IG9ubHkgcXVlc3Rpb24gaXMNCj4gd2hldGhlciBpdCdzIGJldHRlciB0byBnZXQg
Y29tcGxldGUgcmV1c2Ugb3V0IG9mIEdFVF9SRUdJT05fSU5GTyBhbmQNCj4gR0VUX0lSUV9JTkZP
IGFuZCB0aGVuIGFkZCBhbm90aGVyIGRldmljZSB0cmVlIHNwZWNpZmljIGlvY3RsIG9yIGlzIGl0
DQo+IGJldHRlciB0byBhZGQgYSBkZXZpY2UgdHJlZSBpbmRleCBhbmQgcGF0aCB0byB0aGUgZXhp
c3RpbmcgR0VUXypfSU5GTw0KPiBpb2N0bHM/ICBHZXR0aW5nIHNvbWUgaW5mb3JtYXRpb24gZnJv
bSBvbmUgaW9jdGwgYW5kIHBhc3NpbmcgcGllY2VzIG9mDQo+IGl0IGJhY2sgdG8gYW5vdGhlciBp
b2N0bCBmZWVscyBhIGxpdHRsZSBjbHVua3kuDQoNCkhlaC4uLmV4dGVuZGluZyByZWdpb24vaXJx
IGluZm8gaXMgdGhlIGRpcmVjdGlvbiBJIHN0YXJ0ZWQgd2l0aCwgYnV0IA0KYmVjYXVzZSBvZiB0
aGUgdmFyaWFibGUgbmF0dXJlIG9mIHRoZSBkZXZpY2UgdHJlZSBkYXRhIHRob3VnaHQgbWF5YmUN
Cml0IHdhcyBiZXR0ZXIgdG8gbm90IGFkZCBjb21wbGV4aXR5IHRvIHRob3NlIEFQSXMgYW5kIGxl
YXZlIHRoZW0NCmFsb25lLg0KDQpNYW55IG9yIG1vc3QgcGxhdGZvcm0gZGV2aWNlcyB3aWxsIGhh
dmUgMSByZWdpb24gYW5kIDEgaW50ZXJydXB0LCBhbmQNCnNvIGl0IHdvdWxkbid0IGJlIG5lY2Vz
c2FyeSBpbiBtb3N0IGNhc2VzIHRvIG5lZWQgZGV2aWNlIHRyZWUgaW5mbyBhdA0KYWxsIHNpbmNl
IHRoZXJlIGlzIG5vIGFtYmlndWl0eS4gIFNvLCB3YXMgdGhpbmtpbmcgdGhhdCBmb3IgdGhlIG1v
cmUNCnJhcmUsIGNvbXBsaWNhdGVkIGRldmljZXMgdGhhdCBhIGJpdCB3b3VsZCBhZHZlcnRpc2Ug
dGhlIGV4aXN0ZW5jZSBvZg0KdGhlIGRldmljZSB0cmVlIGluZm8gYW5kIHRoZSBzZXBhcmF0ZSBp
b2N0bCB3b3VsZCBiZSB1c2VkIHRvIGFjY2VzcyBpdC4NCg0KQnV0LCBJJ20gY29tcGxldGVseSBv
cGVuIHRvIGV4dGVuZGluZyB0aGUgZ2V0IHJlZ2lvbi9pcnEgaW5mbw0KaW9jdGxzIGlmIHRoYXQg
ZGlyZWN0aW9uIGlzIHdoYXQgeW91IHByZWZlci4uLndoaWNoIHNlZW1zIHRvIGJlDQp0aGUgY2Fz
ZS4NCg0KPiBERVZJQ0VfR0VUX0lORk8gd2lsbCBpZGVudGlmeSB0aGUgZGV2aWNlIGFzIGRldmlj
ZSB0cmVlLCB3aGljaCBnaXZlcyB5b3UNCj4gdGhlIG9wcG9ydHVuaXR5IHRvIGV4dGVuZCBvciBy
ZXBsYWNlIHZmaW9fcmVnaW9uX2luZm8gYW5kIHZmaW9faXJxX2luZm8uDQo+IEl0IHNlZW1zIGxp
a2UgaXQgY291bGQgZXZlbiBiZSBkb25lIGluIGEgY29tcGF0aWJsZSB3YXkuICBGb3IgZXhhbXBs
ZSwNCj4gaWYgeW91IHdlcmUgdG8gY2FsbCBWRklPX0RFVklDRV9HRVRfUkVHSU9OX0lORk8gd2l0
aCBhcmdzeiA9DQo+IHNpemVvZihzdHJ1Y3QgdmZpb19yZWdpb25faW5mbyksIHRoZSBrZXJuZWwg
Y291bGQgZmlsbCBpbiBhbGwgdGhlIGluZm8NCj4gdXAgdG8gdGhhdCBzaXplIGFuZCBmaWxsIGFy
Z3N6IHdpdGggdGhlIHNpemUgbmVlZGVkIGZvciB0aGUgcmVtYWluaW5nDQo+IGluZm8uICBZb3Ug
Y291bGQgdGhlbiByZWFsbG9jIHRoZSBidWZmZXIgYW5kIHRoZSBrZXJuZWwgd291bGQgYWRkIHRo
ZQ0KPiBleHRyYSBpbmZvIG9uIHRoZSBuZXh0IGNhbGwsIHNldHRpbmcgYSBmbGFnIGZvciBlYWNo
IGFkZGl0aW9uYWwgZmllbGQNCj4gcmV0dXJuZWQuICBVc2Vyc3BhY2UgY291bGQgYWxzbyBqdXN0
IGJlIHNsb3BweSBhbmQgY2FsbCBpdCB3aXRoIGEgbG90IG9mDQo+IHBhZGRpbmcgYW5kIGdldCBl
dmVyeXRoaW5nIGluIG9uZSBzaG90Lg0KPiANCj4gV2UnZCBuZWVkIHRvIGRlZmluZSB3aGljaCBm
bGFncyBoYXZlIGFzc29jaWF0ZWQgc3RydWN0dXJlcyBhbmQgZGVmaW5lDQo+IHRob3NlIHN0cnVj
dHVyZXMuICBGb3IgaW5zdGFuY2UsIHNvbWUgcmVxdWlyZSBubyBzcGFjZToNCj4gDQo+ICNkZWZp
bmUgVkZJT19ERVZUUkVFX1JFR0lPTl9JTkZPX0ZMQUdfUkVHICgxIDw8ID8pDQo+ICNkZWZpbmUg
VkZJT19ERVZUUkVFX1JFR0lPTl9JTkZPX0ZMQUdfUkFOR0UgKDEgPDwgPykNCj4gDQo+IE90aGVy
cyBpbXBseSBhIHN0cnVjdHVyZSBhZGRlZCB0byB0aGUgZW5kOg0KPiANCj4gI2RlZmluZSBWRklP
X0RFVlRSRUVfUkVHSU9OX0lORk9fRkxBR19JTkRFWCAoMSA8PCA/KQ0KPiANCj4gc3RydWN0IHZm
aW9fZGV2dHJlZV9yZWdpb25faW5mb19pbmRleA0KPiB7DQo+IAl1MzIJaW5kZXg7DQo+IH0NCj4g
DQo+ICNkZWZpbmUgVkZJT19ERVZUUkVFX1JFR0lPTl9JTkZPX0ZMQUdfUEFUSCAoMSA8PCA/KQ0K
PiANCj4gc3RydWN0IHZmaW9fZGV2dHJlZV9yZWdpb25faW5mb19wYXRoDQo+IHsNCj4gCXUzMgls
ZW47DQo+IAl1OAlwYXRoW107DQo+IH0NCj4gDQo+IFRoZSBvcmRlciBvZiB0aGUgZmxhZ3MgaW5k
aWNhdGVzIHRoZSBvcmRlciBvZiB0aGUgc3RydWN0dXJlcyBhdCB0aGUgZW5kLg0KPiBXZSdkIG5l
ZWQgdG8gaGF2ZSBzb21lIHJ1bGVzIGFib3V0IGFsaWdubWVudCwgcHJvYmFibHkgYWx3YXlzIGR3
b3JkDQo+IGFsaWduZWQuICBJJ20gbm90IHN1cmUgaWYgaXQgd291bGQgYmUgbmVjZXNzYXJ5IGVh
Y2ggc3RydWN0dXJlIHRvIGhhdmUgYQ0KPiBsZW5ndGguICBJdCB3b3VsZCBvbmx5IGJlIG5lZWRl
ZCBpZiB3ZSB3YW50IHRvIGxldCB1c2Vyc3BhY2Ugc2tpcCBvdmVyDQo+IHN0cnVjdHVyZXMgdGhl
eSBkb24ndCB1bmRlcnN0YW5kIGhvdyB0byBwYXJzZS4NCj4gDQo+IEFub3RoZXIgaWRlYSBpcyB0
aGF0IHRoZSBzcGFjZSBhZnRlciBzdHJ1Y3QgdmZpb19yZWdpb24vaXJxX2luZm8gY291bGQNCj4g
YmUgYSBzZWxmIGRlc2NyaWJpbmcgY2FwYWJpbGl0aWVzIGFyZWEsIG11Y2ggbGlrZSBQQ0kgY29u
ZmlnIHNwYWNlLg0KPiBTdGFydGluZyBpbW1lZGlhdGVseSBhZnRlciB0aGUgc3RhdGljIHN0cnVj
dHVyZSB3ZSdkIGhhdmU6DQo+IA0KPiBzdHJ1Y3QgdmZpb19pbmZvX2NhcF9oZWFkZXINCj4gew0K
PiAJdTE2CXR5cGU7DQo+IAl1MTYJbmV4dDsNCj4gfTsNCj4gDQo+IFdoZXJlIHR5cGUgZGVmaW5l
cyB0aGUgc3RydWN0dXJlIHRoYXQgZm9sbG93cyBhbmQgbmV4dCBpbmRpY2F0ZXMgdGhlDQo+IG9m
ZnNldCBvZiB0aGVuIG5leHQgaGVhZGVyIChjb3VsZCBhbHNvIGJlIGxlbiBvZiBjdXJyZW50IGNh
cCkuDQo+IA0KPiBBbnl3YXksIGl0IHNlZW1zIGxpa2UgdGhlcmUgYXJlIHBvc3NpYmlsaXRpZXMg
dGhhdCB3b3VsZCBhbGxvdyB1cyB0bw0KPiBleHRlbmQgdGhlIGluZm8gaW9jdGxzIGluIHdheXMg
dGhhdCB3b3VsZCBiZSBnZW5lcmljIGZvciBhbnkgZGV2aWNlDQo+IHR5cGUuICBUaGFua3MsDQoN
CkkgdGhpbmsgSSBsaWtlIHRoZSBhcHByb2FjaCB1c2luZyB0aGUgZmxhZ3MgYW5kIHN0cnVjdA0K
ZXh0ZW5zaW9ucy4NCg0KU3R1YXJ0DQo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices
@ 2013-07-03 17:20     ` Yoder Stuart-B08248
  0 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 17:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	Bhushan Bharat-R65777, kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
	Antonios Motakis, kvmarm@lists.cs.columbia.edu

[cut]

> So overall the interface and extension makes sense.  My only question is
> whether it's better to get complete reuse out of GET_REGION_INFO and
> GET_IRQ_INFO and then add another device tree specific ioctl or is it
> better to add a device tree index and path to the existing GET_*_INFO
> ioctls?  Getting some information from one ioctl and passing pieces of
> it back to another ioctl feels a little clunky.

Heh...extending region/irq info is the direction I started with, but 
because of the variable nature of the device tree data thought maybe
it was better to not add complexity to those APIs and leave them
alone.

Many or most platform devices will have 1 region and 1 interrupt, and
so it wouldn't be necessary in most cases to need device tree info at
all since there is no ambiguity.  So, was thinking that for the more
rare, complicated devices that a bit would advertise the existence of
the device tree info and the separate ioctl would be used to access it.

But, I'm completely open to extending the get region/irq info
ioctls if that direction is what you prefer...which seems to be
the case.

> DEVICE_GET_INFO will identify the device as device tree, which gives you
> the opportunity to extend or replace vfio_region_info and vfio_irq_info.
> It seems like it could even be done in a compatible way.  For example,
> if you were to call VFIO_DEVICE_GET_REGION_INFO with argsz =
> sizeof(struct vfio_region_info), the kernel could fill in all the info
> up to that size and fill argsz with the size needed for the remaining
> info.  You could then realloc the buffer and the kernel would add the
> extra info on the next call, setting a flag for each additional field
> returned.  Userspace could also just be sloppy and call it with a lot of
> padding and get everything in one shot.
> 
> We'd need to define which flags have associated structures and define
> those structures.  For instance, some require no space:
> 
> #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
> #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
> 
> Others imply a structure added to the end:
> 
> #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
> 
> struct vfio_devtree_region_info_index
> {
> 	u32	index;
> }
> 
> #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
> 
> struct vfio_devtree_region_info_path
> {
> 	u32	len;
> 	u8	path[];
> }
> 
> The order of the flags indicates the order of the structures at the end.
> We'd need to have some rules about alignment, probably always dword
> aligned.  I'm not sure if it would be necessary each structure to have a
> length.  It would only be needed if we want to let userspace skip over
> structures they don't understand how to parse.
> 
> Another idea is that the space after struct vfio_region/irq_info could
> be a self describing capabilities area, much like PCI config space.
> Starting immediately after the static structure we'd have:
> 
> struct vfio_info_cap_header
> {
> 	u16	type;
> 	u16	next;
> };
> 
> Where type defines the structure that follows and next indicates the
> offset of then next header (could also be len of current cap).
> 
> Anyway, it seems like there are possibilities that would allow us to
> extend the info ioctls in ways that would be generic for any device
> type.  Thanks,

I think I like the approach using the flags and struct
extensions.

Stuart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-03  1:07   ` Alexander Graf
@ 2013-07-03 18:51     ` Scott Wood
  -1 siblings, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-03 18:51 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Yoder Stuart-B08248, Alex Williamson, Wood Scott-B07421,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On 07/02/2013 08:07:53 PM, Alexander Graf wrote:
> 
> On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:
> 
> > 8.  Open Issues
> >
> >   -how to handle cases where VFIO is requested to handle
> >    a device where the valid, mappable range for a region
> >    is less than a page size.   See example above where an
> >    advertised region in the DMA node is 4 bytes.  If exposed
> >    to a guest VM, the guest has to be able to map a full page
> >    of I/O space which opens a potential security issue.
> 
> The way we solved this for legacy PCI device assignment was by going  
> through QEMU for emulation and falling back to legacy read/write  
> IIRC. We could probably do the same here. IIRC there was a way for a  
> normal Linux mmap'ed device region to trap individual accesses too,  
> so we could just use that one too.
> 
> The slow path emulation would then happen magically in QEMU, since  
> MMIO writes will get reinjected into the normal QEMU MMIO handling  
> path which will just issue a read/write on the mmap'ed region if it's  
> not declared as emulated.

I agree that's what should happen by default, but there should be a way  
for root to tell vfio that a device is allowed to overmap, in order to  
get the performance benefit of direct access in cases where root knows  
(or explicitly doesn't care) that it is safe.

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-03  1:07   ` Alexander Graf
  (?)
  (?)
@ 2013-07-03 18:51   ` Scott Wood
  -1 siblings, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-03 18:51 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list, Antonios Motakis,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Yoder Stuart-B08248,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	kvmarm@lists.cs.columbia.edu

On 07/02/2013 08:07:53 PM, Alexander Graf wrote:
> 
> On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:
> 
> > 8.  Open Issues
> >
> >   -how to handle cases where VFIO is requested to handle
> >    a device where the valid, mappable range for a region
> >    is less than a page size.   See example above where an
> >    advertised region in the DMA node is 4 bytes.  If exposed
> >    to a guest VM, the guest has to be able to map a full page
> >    of I/O space which opens a potential security issue.
> 
> The way we solved this for legacy PCI device assignment was by going  
> through QEMU for emulation and falling back to legacy read/write  
> IIRC. We could probably do the same here. IIRC there was a way for a  
> normal Linux mmap'ed device region to trap individual accesses too,  
> so we could just use that one too.
> 
> The slow path emulation would then happen magically in QEMU, since  
> MMIO writes will get reinjected into the normal QEMU MMIO handling  
> path which will just issue a read/write on the mmap'ed region if it's  
> not declared as emulated.

I agree that's what should happen by default, but there should be a way  
for root to tell vfio that a device is allowed to overmap, in order to  
get the performance benefit of direct access in cases where root knows  
(or explicitly doesn't care) that it is safe.

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
@ 2013-07-03 18:51     ` Scott Wood
  0 siblings, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-03 18:51 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Yoder Stuart-B08248, Alex Williamson, Wood Scott-B07421,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On 07/02/2013 08:07:53 PM, Alexander Graf wrote:
> 
> On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:
> 
> > 8.  Open Issues
> >
> >   -how to handle cases where VFIO is requested to handle
> >    a device where the valid, mappable range for a region
> >    is less than a page size.   See example above where an
> >    advertised region in the DMA node is 4 bytes.  If exposed
> >    to a guest VM, the guest has to be able to map a full page
> >    of I/O space which opens a potential security issue.
> 
> The way we solved this for legacy PCI device assignment was by going  
> through QEMU for emulation and falling back to legacy read/write  
> IIRC. We could probably do the same here. IIRC there was a way for a  
> normal Linux mmap'ed device region to trap individual accesses too,  
> so we could just use that one too.
> 
> The slow path emulation would then happen magically in QEMU, since  
> MMIO writes will get reinjected into the normal QEMU MMIO handling  
> path which will just issue a read/write on the mmap'ed region if it's  
> not declared as emulated.

I agree that's what should happen by default, but there should be a way  
for root to tell vfio that a device is allowed to overmap, in order to  
get the performance benefit of direct access in cases where root knows  
(or explicitly doesn't care) that it is safe.

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices
  2013-07-03 18:51     ` Scott Wood
@ 2013-07-03 19:08       ` Yoder Stuart-B08248
  -1 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 19:08 UTC (permalink / raw)
  To: Wood Scott-B07421, Alexander Graf
  Cc: kvm@vger.kernel.org list, Bhushan Bharat-R65777,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu



> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Wednesday, July 03, 2013 1:52 PM
> To: Alexander Graf
> Cc: Yoder Stuart-B08248; Alex Williamson; Wood Scott-B07421; Bhushan Bharat-R65777; Sethi Varun-B16395;
> virtualization@lists.linux-foundation.org; Antonios Motakis; kvm@vger.kernel.org list; kvm-
> ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices
> 
> On 07/02/2013 08:07:53 PM, Alexander Graf wrote:
> >
> > On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:
> >
> > > 8.  Open Issues
> > >
> > >   -how to handle cases where VFIO is requested to handle
> > >    a device where the valid, mappable range for a region
> > >    is less than a page size.   See example above where an
> > >    advertised region in the DMA node is 4 bytes.  If exposed
> > >    to a guest VM, the guest has to be able to map a full page
> > >    of I/O space which opens a potential security issue.
> >
> > The way we solved this for legacy PCI device assignment was by going
> > through QEMU for emulation and falling back to legacy read/write
> > IIRC. We could probably do the same here. IIRC there was a way for a
> > normal Linux mmap'ed device region to trap individual accesses too,
> > so we could just use that one too.
> >
> > The slow path emulation would then happen magically in QEMU, since
> > MMIO writes will get reinjected into the normal QEMU MMIO handling
> > path which will just issue a read/write on the mmap'ed region if it's
> > not declared as emulated.
> 
> I agree that's what should happen by default, but there should be a way
> for root to tell vfio that a device is allowed to overmap, in order to
> get the performance benefit of direct access in cases where root knows
> (or explicitly doesn't care) that it is safe.

Perhaps a sysfs mechanism like this:

echo "/sys/bus/platform/devices/ffe210000.usb" > /sys/bus/platform/drivers/vfio-platform/allow_overmap

Stuart





^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices
@ 2013-07-03 19:08       ` Yoder Stuart-B08248
  0 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 19:08 UTC (permalink / raw)
  To: Wood Scott-B07421, Alexander Graf
  Cc: kvm@vger.kernel.org list, Bhushan Bharat-R65777,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu



> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Wednesday, July 03, 2013 1:52 PM
> To: Alexander Graf
> Cc: Yoder Stuart-B08248; Alex Williamson; Wood Scott-B07421; Bhushan Bharat-R65777; Sethi Varun-B16395;
> virtualization@lists.linux-foundation.org; Antonios Motakis; kvm@vger.kernel.org list; kvm-
> ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices
> 
> On 07/02/2013 08:07:53 PM, Alexander Graf wrote:
> >
> > On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:
> >
> > > 8.  Open Issues
> > >
> > >   -how to handle cases where VFIO is requested to handle
> > >    a device where the valid, mappable range for a region
> > >    is less than a page size.   See example above where an
> > >    advertised region in the DMA node is 4 bytes.  If exposed
> > >    to a guest VM, the guest has to be able to map a full page
> > >    of I/O space which opens a potential security issue.
> >
> > The way we solved this for legacy PCI device assignment was by going
> > through QEMU for emulation and falling back to legacy read/write
> > IIRC. We could probably do the same here. IIRC there was a way for a
> > normal Linux mmap'ed device region to trap individual accesses too,
> > so we could just use that one too.
> >
> > The slow path emulation would then happen magically in QEMU, since
> > MMIO writes will get reinjected into the normal QEMU MMIO handling
> > path which will just issue a read/write on the mmap'ed region if it's
> > not declared as emulated.
> 
> I agree that's what should happen by default, but there should be a way
> for root to tell vfio that a device is allowed to overmap, in order to
> get the performance benefit of direct access in cases where root knows
> (or explicitly doesn't care) that it is safe.

Perhaps a sysfs mechanism like this:

echo "/sys/bus/platform/devices/ffe210000.usb" > /sys/bus/platform/drivers/vfio-platform/allow_overmap

Stuart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices
  2013-07-03 10:44     ` Antonios Motakis
@ 2013-07-03 19:23       ` Yoder Stuart-B08248
  -1 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 19:23 UTC (permalink / raw)
  To: Antonios Motakis, Alex Williamson
  Cc: Alexander Graf, Wood Scott-B07421, Bhushan Bharat-R65777,
	Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

[cut]
> > So overall the interface and extension makes sense.  My only question is
> > whether it's better to get complete reuse out of GET_REGION_INFO and
> > GET_IRQ_INFO and then add another device tree specific ioctl or is it
> > better to add a device tree index and path to the existing GET_*_INFO
> > ioctls?  Getting some information from one ioctl and passing pieces of
> > it back to another ioctl feels a little clunky.
> >
> 
> I thing at this point we should clearly separate the info we need to
> pass for the core functionality (assigning the device's resources),
> and the information we want to pass in order to generate a guest DT.
> For ARM a DT is not generated by QEMU yet, but instead a proper DTB
> needs to be passed by the user (granted, this will not be the case for
> ever). So I think even if we treat them the same in code, we should be
> discussing them separately.

We do need to keep core resources separate from what it takes
to generate a guest DT, but note the purpose of the devtree info
is not primarily to help generate a guest DT.

User space (not just QEMU) needs to know what the regions
and interrupts advertised by DEVICE_GET_INFO correspond to.
If there are 4 interrupts and 2 register regions, how does user
space know the purpose/function of each?
Apart from something like the devtree info I don't see
how a user space driver can know how to use the regions
and interrupts.   The kernel is not guaranteeing any
particular ordering of resources.

So in the DMA engine example I gave, the devtree info
let's user space know which interrupt corresponds to
which DMA channel.

QEMU is a special case in that it is going to expose
the device to a virtual machine and needs to generate
a normal device tree node...but that is a separate problem
that needs to be solved in QEMU.

Stuart


^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices
  2013-07-03 10:44     ` Antonios Motakis
  (?)
  (?)
@ 2013-07-03 19:23     ` Yoder Stuart-B08248
  -1 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 19:23 UTC (permalink / raw)
  To: Antonios Motakis, Alex Williamson
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Bhushan Bharat-R65777,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu

[cut]
> > So overall the interface and extension makes sense.  My only question is
> > whether it's better to get complete reuse out of GET_REGION_INFO and
> > GET_IRQ_INFO and then add another device tree specific ioctl or is it
> > better to add a device tree index and path to the existing GET_*_INFO
> > ioctls?  Getting some information from one ioctl and passing pieces of
> > it back to another ioctl feels a little clunky.
> >
> 
> I thing at this point we should clearly separate the info we need to
> pass for the core functionality (assigning the device's resources),
> and the information we want to pass in order to generate a guest DT.
> For ARM a DT is not generated by QEMU yet, but instead a proper DTB
> needs to be passed by the user (granted, this will not be the case for
> ever). So I think even if we treat them the same in code, we should be
> discussing them separately.

We do need to keep core resources separate from what it takes
to generate a guest DT, but note the purpose of the devtree info
is not primarily to help generate a guest DT.

User space (not just QEMU) needs to know what the regions
and interrupts advertised by DEVICE_GET_INFO correspond to.
If there are 4 interrupts and 2 register regions, how does user
space know the purpose/function of each?
Apart from something like the devtree info I don't see
how a user space driver can know how to use the regions
and interrupts.   The kernel is not guaranteeing any
particular ordering of resources.

So in the DMA engine example I gave, the devtree info
let's user space know which interrupt corresponds to
which DMA channel.

QEMU is a special case in that it is going to expose
the device to a virtual machine and needs to generate
a normal device tree node...but that is a separate problem
that needs to be solved in QEMU.

Stuart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices
@ 2013-07-03 19:23       ` Yoder Stuart-B08248
  0 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 19:23 UTC (permalink / raw)
  To: Antonios Motakis, Alex Williamson
  Cc: Alexander Graf, Wood Scott-B07421, Bhushan Bharat-R65777,
	Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

[cut]
> > So overall the interface and extension makes sense.  My only question is
> > whether it's better to get complete reuse out of GET_REGION_INFO and
> > GET_IRQ_INFO and then add another device tree specific ioctl or is it
> > better to add a device tree index and path to the existing GET_*_INFO
> > ioctls?  Getting some information from one ioctl and passing pieces of
> > it back to another ioctl feels a little clunky.
> >
> 
> I thing at this point we should clearly separate the info we need to
> pass for the core functionality (assigning the device's resources),
> and the information we want to pass in order to generate a guest DT.
> For ARM a DT is not generated by QEMU yet, but instead a proper DTB
> needs to be passed by the user (granted, this will not be the case for
> ever). So I think even if we treat them the same in code, we should be
> discussing them separately.

We do need to keep core resources separate from what it takes
to generate a guest DT, but note the purpose of the devtree info
is not primarily to help generate a guest DT.

User space (not just QEMU) needs to know what the regions
and interrupts advertised by DEVICE_GET_INFO correspond to.
If there are 4 interrupts and 2 register regions, how does user
space know the purpose/function of each?
Apart from something like the devtree info I don't see
how a user space driver can know how to use the regions
and interrupts.   The kernel is not guaranteeing any
particular ordering of resources.

So in the DMA engine example I gave, the devtree info
let's user space know which interrupt corresponds to
which DMA channel.

QEMU is a special case in that it is going to expose
the device to a virtual machine and needs to generate
a normal device tree node...but that is a separate problem
that needs to be solved in QEMU.

Stuart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RFC: vfio interface for platform devices (v2)
@ 2013-07-03 21:40   ` Yoder Stuart-B08248
  0 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 21:40 UTC (permalink / raw)
  To: Alex Williamson, Alexander Graf, Wood Scott-B07421
  Cc: Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

Version 2
  -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
  -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
  -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
  -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
   defined 5 new flags and associated structs
  -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
   defined 1 new flag and associated struct
  -removed redundant example

------------------------------------------------------------------------------
VFIO for Platform Devices

The existing kernel interface for vfio-pci is pretty close to what is needed
for platform devices:
   -mechanism to create a container
   -add groups/devices to a container
   -set the IOMMU model
   -map DMA regions
   -get an fd for a specific device, which allows user space to determine
    info about device regions (e.g. registers) and interrupt info
   -support for mmapping device regions
   -mechanism to set how interrupts are signaled

Many platform device are simple and consist of a single register
region and a single interrupt.  For these types of devices the
existing vfio interfaces should be sufficient.

However, platform devices can get complicated-- logically represented
as a device tree hierarchy of nodes.  For devices with multiple regions
and interrupts, new mechanisms are needed in vfio to correlate the
regions/interrupts with the device tree structure that drivers use
to determine the meaning of device resources.

In some cases there are relationships between device, and devices
reference other devices using phandle links.  The kernel won't expose
relationships between devices, but just exposes mappable register
regions and interrupts.

The changes needed for vfio are around some of the device tree
related info that needs to be available with the device fd.

1.  VFIO_GROUP_GET_DEVICE_FD

  User space knows by out-of-band means which device it is accessing
  and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
  to get the device information:

  fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
             "/sys/bus/platform/devices/ffe210000.usb"));

2.  VFIO_DEVICE_GET_INFO

   The number of regions corresponds to the regions defined
   in "reg" and "ranges" in the device tree.  

   Two new flags are added to struct vfio_device_info:

   #define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
   #define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info available */

   It is possible that there could be platform bus devices 
   that are not in the device tree, so we use 2 flags to
   allow for that.

   If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
   that there are regions and IRQs but no device tree info
   available.

   If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
   there is device tree info available.

3. VFIO_DEVICE_GET_REGION_INFO

   For platform devices with multiple regions, information
   is needed to correlate the regions with the device 
   tree structure that drivers use to determine the meaning
   of device resources.
   
   The VFIO_DEVICE_GET_REGION_INFO is extended to provide
   device tree information.

   The following information is needed:
      -the device tree path to the node corresponding to the
       region
      -whether it corresponds to a "reg" or "ranges" property
      -there could be multiple sub-regions per "reg" or "ranges" and
       the sub-index within the reg/ranges is needed

   There are 5 new flags added to vfio_region_info :

   struct vfio_region_info {
        __u32   argsz;
        __u32   flags;
   #define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
        __u32   index;          /* Region index */
        __u32   resv;           /* Reserved for alignment */
        __u64   size;           /* Region size (bytes) */
        __u64   offset;         /* Region offset from start of device fd */
   };
 
   VFIO_REGION_INFO_FLAG_CACHEABLE
       -if set indicates that the region must be mapped as cacheable

   VFIO_DEVTREE_REGION_INFO_FLAG_REG
       -if set indicates that the region corresponds to a "reg" property
        in the device tree representation of the device

   VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
       -if set indicates that the region corresponds to a "ranges" property
        in the device tree representation of the device

   VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
       -if set indicates that there is a dword aligned struct
        struct vfio_devtree_region_info_index appended to the
        end of vfio_region_info:

        struct vfio_devtree_region_info_index
        {
	      u32 index;
        }

        A reg or ranges property may have multiple regsion.  The index
        specifies the index within the "reg" or "ranges"
        that this region corresponds to.

   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
       -if set indicates that there is a dword aligned struct
        struct vfio_devtree_info_path appended to the
        end of vfio_region_info:

        struct vfio_devtree_info_path
        {
            u32 len;
            u8 path[];
        } 

        The path is the full path to the corresponding device
        tree node.  The len field specifies the length of the
        path string.

   If multiple flags are set that indicate that there is
   an appended struct, the order of the flags indicates
   the order of the structs.

   argsz is set by the kernel specifying the total size of
   struct vfio_region_info and all appended structs.

   Suggested usage:
      -call VFIO_DEVICE_GET_REGION_INFO with argsz        sizeof(struct vfio_region_info)
      -realloc the buffer
      -call VFIO_DEVICE_GET_REGION_INFO again, and the appended
       structs will be returned

4.  VFIO_DEVICE_GET_IRQ_INFO

   For platform devices with multiple interrupts that 
   correspond to different subnodes in the device tree,
   information is needed to correlate the interrupts
   to the the device tree structure.

   The VFIO_DEVICE_GET_REGION_INFO is extended to provide
   device tree information.

   1 new flag is added to vfio_irq_info :

   struct vfio_irq_info {
        __u32   argsz;
        __u32   flags;
   #define VFIO_DEVTREE_IRQ_INFO_FLAG_PATH (1 << ?)
        __u32   index;    /* IRQ index */
        __u32   count;    /* Number of IRQs within this index */
    };

   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH 
       -if set indicates that there is a dword aligned struct
        struct vfio_devtree_info_path appended to the
        end of vfio_irq_info :

        struct vfio_devtree_info_path
        {
            u32 len;
            u8 path[];
        } 

        The path is the full path to the corresponding device
        tree node.  The len field specifies the length of the
        path string.

   argsz is set by the kernel specifying the total size of
   struct vfio_region_info and all appended structs.

5.  EXAMPLE 1

    Example, Freescale SATA controller:

     sata@220000 {
         compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
         reg = <0x220000 0x1000>;
         interrupts = <0x44 0x2 0x0 0x0>;
     };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe220000.sata");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -1 region
      -1 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
           flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
                   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
           vfio_devtree_info_path
              len = 26
              path = "/soc@ffe000000/sata@220000"

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
      -for index 0:
          flags = VFIO_IRQ_INFO_EVENTFD | 
                  VFIO_IRQ_INFO_MASKABLE |
                  VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
          vfio_devtree_info_path
              len = 26
              path = "/soc@ffe000000/sata@220000"

6.  EXAMPLE 2

    Example, Freescale DMA engine (modified to illustrate):

    dma@101300 {
       cell-index = <0x1>;
       ranges = <0x0 0x101100 0x200>;
       reg = <0x101300 0x4>;
       compatible = "fsl,eloplus-dma";
       #size-cells = <0x1>;
       #address-cells = <0x1>;
       fsl,liodn = <0xc6>;
    
       dma-channel@180 {
          interrupts = <0x23 0x2 0x0 0x0>;
          cell-index = <0x3>;
          reg = <0x180 0x80>;
          compatible = "fsl,eloplus-dma-channel";
       };
    
       dma-channel@100 {
          interrupts = <0x22 0x2 0x0 0x0>;
          cell-index = <0x2>;
          reg = <0x100 0x80>;
          compatible = "fsl,eloplus-dma-channel";
       };

    };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe101300.dma");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -2 regions
      -2 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
           flags = VFIO_DEVTREE_REGION_INFO_FLAG_RANGES |
                   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
           vfio_devtree_info_path
              len = 25
              path = "/soc@ffe000000/dma@101300"

      -for index 1:
           offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
           flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
                   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
           vfio_devtree_info_path
              len = 25
              path = "/soc@ffe000000/dma@101300"

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
      -for index 0:
          flags = VFIO_IRQ_INFO_EVENTFD | 
                  VFIO_IRQ_INFO_MASKABLE |
                  VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
          vfio_devtree_info_path
              len = 41
              path = "/soc@ffe000000/dma@101300/dma-channel@180"

      -for index 0:
          flags = VFIO_IRQ_INFO_EVENTFD | 
                  VFIO_IRQ_INFO_MASKABLE |
                  VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
          vfio_devtree_info_path
              len = 41
              path = "/soc@ffe000000/dma@101300/dma-channel@100"


Regards,
Stuart


^ permalink raw reply	[flat|nested] 51+ messages in thread

* RFC: vfio interface for platform devices (v2)
@ 2013-07-03 21:40   ` Yoder Stuart-B08248
  0 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 21:40 UTC (permalink / raw)
  To: Alex Williamson, Alexander Graf, Wood Scott-B07421
  Cc: Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

Version 2
  -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
  -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
  -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
  -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
   defined 5 new flags and associated structs
  -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
   defined 1 new flag and associated struct
  -removed redundant example

------------------------------------------------------------------------------
VFIO for Platform Devices

The existing kernel interface for vfio-pci is pretty close to what is needed
for platform devices:
   -mechanism to create a container
   -add groups/devices to a container
   -set the IOMMU model
   -map DMA regions
   -get an fd for a specific device, which allows user space to determine
    info about device regions (e.g. registers) and interrupt info
   -support for mmapping device regions
   -mechanism to set how interrupts are signaled

Many platform device are simple and consist of a single register
region and a single interrupt.  For these types of devices the
existing vfio interfaces should be sufficient.

However, platform devices can get complicated-- logically represented
as a device tree hierarchy of nodes.  For devices with multiple regions
and interrupts, new mechanisms are needed in vfio to correlate the
regions/interrupts with the device tree structure that drivers use
to determine the meaning of device resources.

In some cases there are relationships between device, and devices
reference other devices using phandle links.  The kernel won't expose
relationships between devices, but just exposes mappable register
regions and interrupts.

The changes needed for vfio are around some of the device tree
related info that needs to be available with the device fd.

1.  VFIO_GROUP_GET_DEVICE_FD

  User space knows by out-of-band means which device it is accessing
  and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
  to get the device information:

  fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
             "/sys/bus/platform/devices/ffe210000.usb"));

2.  VFIO_DEVICE_GET_INFO

   The number of regions corresponds to the regions defined
   in "reg" and "ranges" in the device tree.  

   Two new flags are added to struct vfio_device_info:

   #define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
   #define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info available */

   It is possible that there could be platform bus devices 
   that are not in the device tree, so we use 2 flags to
   allow for that.

   If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
   that there are regions and IRQs but no device tree info
   available.

   If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
   there is device tree info available.

3. VFIO_DEVICE_GET_REGION_INFO

   For platform devices with multiple regions, information
   is needed to correlate the regions with the device 
   tree structure that drivers use to determine the meaning
   of device resources.
   
   The VFIO_DEVICE_GET_REGION_INFO is extended to provide
   device tree information.

   The following information is needed:
      -the device tree path to the node corresponding to the
       region
      -whether it corresponds to a "reg" or "ranges" property
      -there could be multiple sub-regions per "reg" or "ranges" and
       the sub-index within the reg/ranges is needed

   There are 5 new flags added to vfio_region_info :

   struct vfio_region_info {
        __u32   argsz;
        __u32   flags;
   #define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
        __u32   index;          /* Region index */
        __u32   resv;           /* Reserved for alignment */
        __u64   size;           /* Region size (bytes) */
        __u64   offset;         /* Region offset from start of device fd */
   };
 
   VFIO_REGION_INFO_FLAG_CACHEABLE
       -if set indicates that the region must be mapped as cacheable

   VFIO_DEVTREE_REGION_INFO_FLAG_REG
       -if set indicates that the region corresponds to a "reg" property
        in the device tree representation of the device

   VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
       -if set indicates that the region corresponds to a "ranges" property
        in the device tree representation of the device

   VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
       -if set indicates that there is a dword aligned struct
        struct vfio_devtree_region_info_index appended to the
        end of vfio_region_info:

        struct vfio_devtree_region_info_index
        {
	      u32 index;
        }

        A reg or ranges property may have multiple regsion.  The index
        specifies the index within the "reg" or "ranges"
        that this region corresponds to.

   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
       -if set indicates that there is a dword aligned struct
        struct vfio_devtree_info_path appended to the
        end of vfio_region_info:

        struct vfio_devtree_info_path
        {
            u32 len;
            u8 path[];
        } 

        The path is the full path to the corresponding device
        tree node.  The len field specifies the length of the
        path string.

   If multiple flags are set that indicate that there is
   an appended struct, the order of the flags indicates
   the order of the structs.

   argsz is set by the kernel specifying the total size of
   struct vfio_region_info and all appended structs.

   Suggested usage:
      -call VFIO_DEVICE_GET_REGION_INFO with argsz =
       sizeof(struct vfio_region_info)
      -realloc the buffer
      -call VFIO_DEVICE_GET_REGION_INFO again, and the appended
       structs will be returned

4.  VFIO_DEVICE_GET_IRQ_INFO

   For platform devices with multiple interrupts that 
   correspond to different subnodes in the device tree,
   information is needed to correlate the interrupts
   to the the device tree structure.

   The VFIO_DEVICE_GET_REGION_INFO is extended to provide
   device tree information.

   1 new flag is added to vfio_irq_info :

   struct vfio_irq_info {
        __u32   argsz;
        __u32   flags;
   #define VFIO_DEVTREE_IRQ_INFO_FLAG_PATH (1 << ?)
        __u32   index;    /* IRQ index */
        __u32   count;    /* Number of IRQs within this index */
    };

   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH 
       -if set indicates that there is a dword aligned struct
        struct vfio_devtree_info_path appended to the
        end of vfio_irq_info :

        struct vfio_devtree_info_path
        {
            u32 len;
            u8 path[];
        } 

        The path is the full path to the corresponding device
        tree node.  The len field specifies the length of the
        path string.

   argsz is set by the kernel specifying the total size of
   struct vfio_region_info and all appended structs.

5.  EXAMPLE 1

    Example, Freescale SATA controller:

     sata@220000 {
         compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
         reg = <0x220000 0x1000>;
         interrupts = <0x44 0x2 0x0 0x0>;
     };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe220000.sata");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -1 region
      -1 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
           flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
                   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
           vfio_devtree_info_path
              len = 26
              path = "/soc@ffe000000/sata@220000"

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
      -for index 0:
          flags = VFIO_IRQ_INFO_EVENTFD | 
                  VFIO_IRQ_INFO_MASKABLE |
                  VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
          vfio_devtree_info_path
              len = 26
              path = "/soc@ffe000000/sata@220000"

6.  EXAMPLE 2

    Example, Freescale DMA engine (modified to illustrate):

    dma@101300 {
       cell-index = <0x1>;
       ranges = <0x0 0x101100 0x200>;
       reg = <0x101300 0x4>;
       compatible = "fsl,eloplus-dma";
       #size-cells = <0x1>;
       #address-cells = <0x1>;
       fsl,liodn = <0xc6>;
    
       dma-channel@180 {
          interrupts = <0x23 0x2 0x0 0x0>;
          cell-index = <0x3>;
          reg = <0x180 0x80>;
          compatible = "fsl,eloplus-dma-channel";
       };
    
       dma-channel@100 {
          interrupts = <0x22 0x2 0x0 0x0>;
          cell-index = <0x2>;
          reg = <0x100 0x80>;
          compatible = "fsl,eloplus-dma-channel";
       };

    };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe101300.dma");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -2 regions
      -2 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
           flags = VFIO_DEVTREE_REGION_INFO_FLAG_RANGES |
                   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
           vfio_devtree_info_path
              len = 25
              path = "/soc@ffe000000/dma@101300"

      -for index 1:
           offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
           flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
                   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
           vfio_devtree_info_path
              len = 25
              path = "/soc@ffe000000/dma@101300"

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
      -for index 0:
          flags = VFIO_IRQ_INFO_EVENTFD | 
                  VFIO_IRQ_INFO_MASKABLE |
                  VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
          vfio_devtree_info_path
              len = 41
              path = "/soc@ffe000000/dma@101300/dma-channel@180"

      -for index 0:
          flags = VFIO_IRQ_INFO_EVENTFD | 
                  VFIO_IRQ_INFO_MASKABLE |
                  VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
          vfio_devtree_info_path
              len = 41
              path = "/soc@ffe000000/dma@101300/dma-channel@100"


Regards,
Stuart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RFC: vfio interface for platform devices (v2)
@ 2013-07-03 21:40 Yoder Stuart-B08248
  0 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 21:40 UTC (permalink / raw)
  To: Alex Williamson, Alexander Graf, Wood Scott-B07421
  Cc: kvm@vger.kernel.org list, Bhushan Bharat-R65777,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu

Version 2
  -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
  -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
  -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
  -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
   defined 5 new flags and associated structs
  -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
   defined 1 new flag and associated struct
  -removed redundant example

------------------------------------------------------------------------------
VFIO for Platform Devices

The existing kernel interface for vfio-pci is pretty close to what is needed
for platform devices:
   -mechanism to create a container
   -add groups/devices to a container
   -set the IOMMU model
   -map DMA regions
   -get an fd for a specific device, which allows user space to determine
    info about device regions (e.g. registers) and interrupt info
   -support for mmapping device regions
   -mechanism to set how interrupts are signaled

Many platform device are simple and consist of a single register
region and a single interrupt.  For these types of devices the
existing vfio interfaces should be sufficient.

However, platform devices can get complicated-- logically represented
as a device tree hierarchy of nodes.  For devices with multiple regions
and interrupts, new mechanisms are needed in vfio to correlate the
regions/interrupts with the device tree structure that drivers use
to determine the meaning of device resources.

In some cases there are relationships between device, and devices
reference other devices using phandle links.  The kernel won't expose
relationships between devices, but just exposes mappable register
regions and interrupts.

The changes needed for vfio are around some of the device tree
related info that needs to be available with the device fd.

1.  VFIO_GROUP_GET_DEVICE_FD

  User space knows by out-of-band means which device it is accessing
  and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
  to get the device information:

  fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
             "/sys/bus/platform/devices/ffe210000.usb"));

2.  VFIO_DEVICE_GET_INFO

   The number of regions corresponds to the regions defined
   in "reg" and "ranges" in the device tree.  

   Two new flags are added to struct vfio_device_info:

   #define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
   #define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info available */

   It is possible that there could be platform bus devices 
   that are not in the device tree, so we use 2 flags to
   allow for that.

   If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
   that there are regions and IRQs but no device tree info
   available.

   If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
   there is device tree info available.

3. VFIO_DEVICE_GET_REGION_INFO

   For platform devices with multiple regions, information
   is needed to correlate the regions with the device 
   tree structure that drivers use to determine the meaning
   of device resources.
   
   The VFIO_DEVICE_GET_REGION_INFO is extended to provide
   device tree information.

   The following information is needed:
      -the device tree path to the node corresponding to the
       region
      -whether it corresponds to a "reg" or "ranges" property
      -there could be multiple sub-regions per "reg" or "ranges" and
       the sub-index within the reg/ranges is needed

   There are 5 new flags added to vfio_region_info :

   struct vfio_region_info {
        __u32   argsz;
        __u32   flags;
   #define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
   #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
        __u32   index;          /* Region index */
        __u32   resv;           /* Reserved for alignment */
        __u64   size;           /* Region size (bytes) */
        __u64   offset;         /* Region offset from start of device fd */
   };
 
   VFIO_REGION_INFO_FLAG_CACHEABLE
       -if set indicates that the region must be mapped as cacheable

   VFIO_DEVTREE_REGION_INFO_FLAG_REG
       -if set indicates that the region corresponds to a "reg" property
        in the device tree representation of the device

   VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
       -if set indicates that the region corresponds to a "ranges" property
        in the device tree representation of the device

   VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
       -if set indicates that there is a dword aligned struct
        struct vfio_devtree_region_info_index appended to the
        end of vfio_region_info:

        struct vfio_devtree_region_info_index
        {
	      u32 index;
        }

        A reg or ranges property may have multiple regsion.  The index
        specifies the index within the "reg" or "ranges"
        that this region corresponds to.

   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
       -if set indicates that there is a dword aligned struct
        struct vfio_devtree_info_path appended to the
        end of vfio_region_info:

        struct vfio_devtree_info_path
        {
            u32 len;
            u8 path[];
        } 

        The path is the full path to the corresponding device
        tree node.  The len field specifies the length of the
        path string.

   If multiple flags are set that indicate that there is
   an appended struct, the order of the flags indicates
   the order of the structs.

   argsz is set by the kernel specifying the total size of
   struct vfio_region_info and all appended structs.

   Suggested usage:
      -call VFIO_DEVICE_GET_REGION_INFO with argsz =
       sizeof(struct vfio_region_info)
      -realloc the buffer
      -call VFIO_DEVICE_GET_REGION_INFO again, and the appended
       structs will be returned

4.  VFIO_DEVICE_GET_IRQ_INFO

   For platform devices with multiple interrupts that 
   correspond to different subnodes in the device tree,
   information is needed to correlate the interrupts
   to the the device tree structure.

   The VFIO_DEVICE_GET_REGION_INFO is extended to provide
   device tree information.

   1 new flag is added to vfio_irq_info :

   struct vfio_irq_info {
        __u32   argsz;
        __u32   flags;
   #define VFIO_DEVTREE_IRQ_INFO_FLAG_PATH (1 << ?)
        __u32   index;    /* IRQ index */
        __u32   count;    /* Number of IRQs within this index */
    };

   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH 
       -if set indicates that there is a dword aligned struct
        struct vfio_devtree_info_path appended to the
        end of vfio_irq_info :

        struct vfio_devtree_info_path
        {
            u32 len;
            u8 path[];
        } 

        The path is the full path to the corresponding device
        tree node.  The len field specifies the length of the
        path string.

   argsz is set by the kernel specifying the total size of
   struct vfio_region_info and all appended structs.

5.  EXAMPLE 1

    Example, Freescale SATA controller:

     sata@220000 {
         compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
         reg = <0x220000 0x1000>;
         interrupts = <0x44 0x2 0x0 0x0>;
     };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe220000.sata");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -1 region
      -1 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
           flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
                   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
           vfio_devtree_info_path
              len = 26
              path = "/soc@ffe000000/sata@220000"

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
      -for index 0:
          flags = VFIO_IRQ_INFO_EVENTFD | 
                  VFIO_IRQ_INFO_MASKABLE |
                  VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
          vfio_devtree_info_path
              len = 26
              path = "/soc@ffe000000/sata@220000"

6.  EXAMPLE 2

    Example, Freescale DMA engine (modified to illustrate):

    dma@101300 {
       cell-index = <0x1>;
       ranges = <0x0 0x101100 0x200>;
       reg = <0x101300 0x4>;
       compatible = "fsl,eloplus-dma";
       #size-cells = <0x1>;
       #address-cells = <0x1>;
       fsl,liodn = <0xc6>;
    
       dma-channel@180 {
          interrupts = <0x23 0x2 0x0 0x0>;
          cell-index = <0x3>;
          reg = <0x180 0x80>;
          compatible = "fsl,eloplus-dma-channel";
       };
    
       dma-channel@100 {
          interrupts = <0x22 0x2 0x0 0x0>;
          cell-index = <0x2>;
          reg = <0x100 0x80>;
          compatible = "fsl,eloplus-dma-channel";
       };

    };

    request to get device FD would look like:
      fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe101300.dma");

    The VFIO_DEVICE_GET_INFO ioctl would return:
      -2 regions
      -2 interrupts

    The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
      -for index 0:
           offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
           flags = VFIO_DEVTREE_REGION_INFO_FLAG_RANGES |
                   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
           vfio_devtree_info_path
              len = 25
              path = "/soc@ffe000000/dma@101300"

      -for index 1:
           offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
           flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
                   VFIO_DEVTREE_REGION_INFO_FLAG_PATH
           vfio_devtree_info_path
              len = 25
              path = "/soc@ffe000000/dma@101300"

    The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
      -for index 0:
          flags = VFIO_IRQ_INFO_EVENTFD | 
                  VFIO_IRQ_INFO_MASKABLE |
                  VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
          vfio_devtree_info_path
              len = 41
              path = "/soc@ffe000000/dma@101300/dma-channel@180"

      -for index 0:
          flags = VFIO_IRQ_INFO_EVENTFD | 
                  VFIO_IRQ_INFO_MASKABLE |
                  VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
          vfio_devtree_info_path
              len = 41
              path = "/soc@ffe000000/dma@101300/dma-channel@100"


Regards,
Stuart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-02 23:25 ` Yoder Stuart-B08248
@ 2013-07-03 22:31   ` Scott Wood
  -1 siblings, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-03 22:31 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Alex Williamson, Alexander Graf, Wood Scott-B07421,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On 07/02/2013 06:25:59 PM, Yoder Stuart-B08248 wrote:
> The write-up below is the first draft of a proposal for how the  
> kernel can expose
> platform devices to user space using vfio.
> 
> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> allows user space to correlate regions and interrupts to the  
> corresponding
> device tree node structure that is defined for most platform devices.
> 
> Regards,
> Stuart Yoder
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing infrastructure for vfio-pci is pretty close to what we  
> need:
>    -mechanism to create a container
>    -add groups/devices to a container
>    -set the IOMMU model
>    -map DMA regions
>    -get an fd for a specific device, which allows user space to  
> determine
>     info about device regions (e.g. registers) and interrupt info
>    -support for mmapping device regions
>    -mechanism to set how interrupts are signaled
> 
> Platform devices can get complicated-- potentially with a tree  
> hierarchy
> of nodes, and links/phandles pointing to other platform
> devices.   The kernel doesn't expose relationships between
> devices.  The kernel just exposes mappable register regions and  
> interrupts.
> It's up to user space to work out relationships between devices
> if it needs to-- this can be determined in the device tree exposed in
> /proc/device-tree.
> 
> I think the changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>   User space has to know which device it is accessing and will call
>   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>   get the device information:
> 
>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,  
> "/soc@ffe000000/usb@210000");
> 
>   (whether the path is a device tree path or a sysfs path is up for
>   discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")

Doesn't VFIO need to operate on an actual Linux device, rather than  
just an OF node?

Are we going to have a fixed assumption that you always want all the  
children of the node corresponding to the assigned device, or will it  
be possible to exclude some?

> 2.  VFIO_DEVICE_GET_INFO
> 
>    Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>    than adding a new flag identifying a devices as a 'platform'
>    device.
> 
>    This ioctl simply returns the number of regions and number of irqs.
> 
>    The number of regions corresponds to the number of regions
>    that can be mapped for the device-- corresponds to the regions  
> defined
>    in "reg" and "ranges" in the device tree.
> 
> 3.  VFIO_DEVICE_GET_REGION_INFO
> 
>    No changes needed, except perhaps adding a new flag.  Freescale  
> has some
>    devices with regions that must be mapped cacheable.

While I don't object to making the information available to the user  
just in case, the main thing we need here is to influence what the  
kernel does when the user tries to map it.  At least on PPC it's not up  
to userspace to select whether a mmap is cacheable.

> 4. VFIO_DEVICE_GET_DEVTREE_INFO
> 
>    The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>    expose device regions and interrupts, but it's not enough to know
>    that there are X regions and Y interrupts.  User space needs to
>    know what the resources are for-- to correlate those  
> regions/interrupts
>    to the device tree structure that drivers use.  The device tree
>    structure could consist of multiple nodes and it is necessary to
>    identify the node corresponding to the region/interrupt exposed
>    by VFIO.
> 
>    The following information is needed:
>       -the device tree path to the node corresponding to the
>        region or interrupt
>       -for a region, whether it corresponds to a "reg" or "ranges"
>        property
>       -there could be multiple sub-regions per "reg" or "ranges" and
>        the sub-index within the reg/ranges is needed
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
> 
>    ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
> 
>    struct vfio_path_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a  
> "ranges" property */

What about distinguishing a normal interrupt from one found in an  
interrupt-map?

In the case of both ranges and interrupt-maps, we'll also want to  
decide what the policy is for when to expose them directly, versus just  
using them to translate regs and interrupts of child nodes.

>         __u32   index;          /* input: index of region or irq for  
> which we are getting info */
>         __u32   type;           /* input: 0 - get devtree info for a  
> region
>                                           1 - get devtree info for an  
> irq
>                                  */
>         __u32   start;          /* output: identifies the index  
> within the reg/ranges */

"start" is an odd name for this.  I'd rename "index" to "vfio_index"  
and this to "dt_index".

>         __u8    path[];         /* output: Full path to associated  
> device tree node */

How does the caller know what size buffer to supply for this?

>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>       -for region index 0:
>           flags: 0x0     // i.e. this is a "reg" property
>           start: 0x0     // i.e. index 0x0 in "reg"
>           path: "/soc@ffe000000/crypto@300000"
> 
>       -for interrupt index 0:
>           path: "/soc@ffe000000/crypto@300000/jr@1000"
> 
>       -for interrupt index 1:
>           path: "/soc@ffe000000/crypto@300000/jr@2000"

Where is "start" for the interrupts?

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-02 23:25 ` Yoder Stuart-B08248
                   ` (4 preceding siblings ...)
  (?)
@ 2013-07-03 22:31 ` Scott Wood
  -1 siblings, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-03 22:31 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list, Antonios Motakis,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Bhushan Bharat-R65777,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu

On 07/02/2013 06:25:59 PM, Yoder Stuart-B08248 wrote:
> The write-up below is the first draft of a proposal for how the  
> kernel can expose
> platform devices to user space using vfio.
> 
> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> allows user space to correlate regions and interrupts to the  
> corresponding
> device tree node structure that is defined for most platform devices.
> 
> Regards,
> Stuart Yoder
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing infrastructure for vfio-pci is pretty close to what we  
> need:
>    -mechanism to create a container
>    -add groups/devices to a container
>    -set the IOMMU model
>    -map DMA regions
>    -get an fd for a specific device, which allows user space to  
> determine
>     info about device regions (e.g. registers) and interrupt info
>    -support for mmapping device regions
>    -mechanism to set how interrupts are signaled
> 
> Platform devices can get complicated-- potentially with a tree  
> hierarchy
> of nodes, and links/phandles pointing to other platform
> devices.   The kernel doesn't expose relationships between
> devices.  The kernel just exposes mappable register regions and  
> interrupts.
> It's up to user space to work out relationships between devices
> if it needs to-- this can be determined in the device tree exposed in
> /proc/device-tree.
> 
> I think the changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>   User space has to know which device it is accessing and will call
>   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>   get the device information:
> 
>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,  
> "/soc@ffe000000/usb@210000");
> 
>   (whether the path is a device tree path or a sysfs path is up for
>   discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")

Doesn't VFIO need to operate on an actual Linux device, rather than  
just an OF node?

Are we going to have a fixed assumption that you always want all the  
children of the node corresponding to the assigned device, or will it  
be possible to exclude some?

> 2.  VFIO_DEVICE_GET_INFO
> 
>    Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>    than adding a new flag identifying a devices as a 'platform'
>    device.
> 
>    This ioctl simply returns the number of regions and number of irqs.
> 
>    The number of regions corresponds to the number of regions
>    that can be mapped for the device-- corresponds to the regions  
> defined
>    in "reg" and "ranges" in the device tree.
> 
> 3.  VFIO_DEVICE_GET_REGION_INFO
> 
>    No changes needed, except perhaps adding a new flag.  Freescale  
> has some
>    devices with regions that must be mapped cacheable.

While I don't object to making the information available to the user  
just in case, the main thing we need here is to influence what the  
kernel does when the user tries to map it.  At least on PPC it's not up  
to userspace to select whether a mmap is cacheable.

> 4. VFIO_DEVICE_GET_DEVTREE_INFO
> 
>    The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>    expose device regions and interrupts, but it's not enough to know
>    that there are X regions and Y interrupts.  User space needs to
>    know what the resources are for-- to correlate those  
> regions/interrupts
>    to the device tree structure that drivers use.  The device tree
>    structure could consist of multiple nodes and it is necessary to
>    identify the node corresponding to the region/interrupt exposed
>    by VFIO.
> 
>    The following information is needed:
>       -the device tree path to the node corresponding to the
>        region or interrupt
>       -for a region, whether it corresponds to a "reg" or "ranges"
>        property
>       -there could be multiple sub-regions per "reg" or "ranges" and
>        the sub-index within the reg/ranges is needed
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
> 
>    ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
> 
>    struct vfio_path_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a  
> "ranges" property */

What about distinguishing a normal interrupt from one found in an  
interrupt-map?

In the case of both ranges and interrupt-maps, we'll also want to  
decide what the policy is for when to expose them directly, versus just  
using them to translate regs and interrupts of child nodes.

>         __u32   index;          /* input: index of region or irq for  
> which we are getting info */
>         __u32   type;           /* input: 0 - get devtree info for a  
> region
>                                           1 - get devtree info for an  
> irq
>                                  */
>         __u32   start;          /* output: identifies the index  
> within the reg/ranges */

"start" is an odd name for this.  I'd rename "index" to "vfio_index"  
and this to "dt_index".

>         __u8    path[];         /* output: Full path to associated  
> device tree node */

How does the caller know what size buffer to supply for this?

>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>       -for region index 0:
>           flags: 0x0     // i.e. this is a "reg" property
>           start: 0x0     // i.e. index 0x0 in "reg"
>           path: "/soc@ffe000000/crypto@300000"
> 
>       -for interrupt index 0:
>           path: "/soc@ffe000000/crypto@300000/jr@1000"
> 
>       -for interrupt index 1:
>           path: "/soc@ffe000000/crypto@300000/jr@2000"

Where is "start" for the interrupts?

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
@ 2013-07-03 22:31   ` Scott Wood
  0 siblings, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-03 22:31 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Alex Williamson, Alexander Graf, Wood Scott-B07421,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On 07/02/2013 06:25:59 PM, Yoder Stuart-B08248 wrote:
> The write-up below is the first draft of a proposal for how the  
> kernel can expose
> platform devices to user space using vfio.
> 
> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> allows user space to correlate regions and interrupts to the  
> corresponding
> device tree node structure that is defined for most platform devices.
> 
> Regards,
> Stuart Yoder
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing infrastructure for vfio-pci is pretty close to what we  
> need:
>    -mechanism to create a container
>    -add groups/devices to a container
>    -set the IOMMU model
>    -map DMA regions
>    -get an fd for a specific device, which allows user space to  
> determine
>     info about device regions (e.g. registers) and interrupt info
>    -support for mmapping device regions
>    -mechanism to set how interrupts are signaled
> 
> Platform devices can get complicated-- potentially with a tree  
> hierarchy
> of nodes, and links/phandles pointing to other platform
> devices.   The kernel doesn't expose relationships between
> devices.  The kernel just exposes mappable register regions and  
> interrupts.
> It's up to user space to work out relationships between devices
> if it needs to-- this can be determined in the device tree exposed in
> /proc/device-tree.
> 
> I think the changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>   User space has to know which device it is accessing and will call
>   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>   get the device information:
> 
>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,  
> "/soc@ffe000000/usb@210000");
> 
>   (whether the path is a device tree path or a sysfs path is up for
>   discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")

Doesn't VFIO need to operate on an actual Linux device, rather than  
just an OF node?

Are we going to have a fixed assumption that you always want all the  
children of the node corresponding to the assigned device, or will it  
be possible to exclude some?

> 2.  VFIO_DEVICE_GET_INFO
> 
>    Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>    than adding a new flag identifying a devices as a 'platform'
>    device.
> 
>    This ioctl simply returns the number of regions and number of irqs.
> 
>    The number of regions corresponds to the number of regions
>    that can be mapped for the device-- corresponds to the regions  
> defined
>    in "reg" and "ranges" in the device tree.
> 
> 3.  VFIO_DEVICE_GET_REGION_INFO
> 
>    No changes needed, except perhaps adding a new flag.  Freescale  
> has some
>    devices with regions that must be mapped cacheable.

While I don't object to making the information available to the user  
just in case, the main thing we need here is to influence what the  
kernel does when the user tries to map it.  At least on PPC it's not up  
to userspace to select whether a mmap is cacheable.

> 4. VFIO_DEVICE_GET_DEVTREE_INFO
> 
>    The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>    expose device regions and interrupts, but it's not enough to know
>    that there are X regions and Y interrupts.  User space needs to
>    know what the resources are for-- to correlate those  
> regions/interrupts
>    to the device tree structure that drivers use.  The device tree
>    structure could consist of multiple nodes and it is necessary to
>    identify the node corresponding to the region/interrupt exposed
>    by VFIO.
> 
>    The following information is needed:
>       -the device tree path to the node corresponding to the
>        region or interrupt
>       -for a region, whether it corresponds to a "reg" or "ranges"
>        property
>       -there could be multiple sub-regions per "reg" or "ranges" and
>        the sub-index within the reg/ranges is needed
> 
>    The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
> 
>    ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
> 
>    struct vfio_path_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a  
> "ranges" property */

What about distinguishing a normal interrupt from one found in an  
interrupt-map?

In the case of both ranges and interrupt-maps, we'll also want to  
decide what the policy is for when to expose them directly, versus just  
using them to translate regs and interrupts of child nodes.

>         __u32   index;          /* input: index of region or irq for  
> which we are getting info */
>         __u32   type;           /* input: 0 - get devtree info for a  
> region
>                                           1 - get devtree info for an  
> irq
>                                  */
>         __u32   start;          /* output: identifies the index  
> within the reg/ranges */

"start" is an odd name for this.  I'd rename "index" to "vfio_index"  
and this to "dt_index".

>         __u8    path[];         /* output: Full path to associated  
> device tree node */

How does the caller know what size buffer to supply for this?

>     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> 
>       -for region index 0:
>           flags: 0x0     // i.e. this is a "reg" property
>           start: 0x0     // i.e. index 0x0 in "reg"
>           path: "/soc@ffe000000/crypto@300000"
> 
>       -for interrupt index 0:
>           path: "/soc@ffe000000/crypto@300000/jr@1000"
> 
>       -for interrupt index 1:
>           path: "/soc@ffe000000/crypto@300000/jr@2000"

Where is "start" for the interrupts?

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
  2013-07-03 21:40   ` Yoder Stuart-B08248
@ 2013-07-03 22:53     ` Alex Williamson
  -1 siblings, 0 replies; 51+ messages in thread
From: Alex Williamson @ 2013-07-03 22:53 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Alexander Graf, Wood Scott-B07421, Bhushan Bharat-R65777,
	Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
	Antonios Motakis, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org, kvmarm@lists.cs.columbia.edu

On Wed, 2013-07-03 at 21:40 +0000, Yoder Stuart-B08248 wrote:
> Version 2
>   -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
>   -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
>   -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
>   -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
>    defined 5 new flags and associated structs
>   -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
>    defined 1 new flag and associated struct
>   -removed redundant example
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing kernel interface for vfio-pci is pretty close to what is needed
> for platform devices:
>    -mechanism to create a container
>    -add groups/devices to a container
>    -set the IOMMU model
>    -map DMA regions
>    -get an fd for a specific device, which allows user space to determine
>     info about device regions (e.g. registers) and interrupt info
>    -support for mmapping device regions
>    -mechanism to set how interrupts are signaled
> 
> Many platform device are simple and consist of a single register
> region and a single interrupt.  For these types of devices the
> existing vfio interfaces should be sufficient.
> 
> However, platform devices can get complicated-- logically represented
> as a device tree hierarchy of nodes.  For devices with multiple regions
> and interrupts, new mechanisms are needed in vfio to correlate the
> regions/interrupts with the device tree structure that drivers use
> to determine the meaning of device resources.
> 
> In some cases there are relationships between device, and devices
> reference other devices using phandle links.  The kernel won't expose
> relationships between devices, but just exposes mappable register
> regions and interrupts.
> 
> The changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>   User space knows by out-of-band means which device it is accessing
>   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
>   to get the device information:
> 
>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
>              "/sys/bus/platform/devices/ffe210000.usb"));

FWIW, I'm in favor of whichever way works out cleaner in the code for
pre-pending "/sys/bus" or not.  It sort of seems like it's unnecessary.
It's also a little inconsistent that the returned path doesn't
pre-pend /sys in the examples below.

> 2.  VFIO_DEVICE_GET_INFO
> 
>    The number of regions corresponds to the regions defined
>    in "reg" and "ranges" in the device tree.  
> 
>    Two new flags are added to struct vfio_device_info:
> 
>    #define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
>    #define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info available */
> 
>    It is possible that there could be platform bus devices 
>    that are not in the device tree, so we use 2 flags to
>    allow for that.
> 
>    If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
>    that there are regions and IRQs but no device tree info
>    available.
> 
>    If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
>    there is device tree info available.

But it would be invalid to only have DEVTREE w/o PLATFORM for now,
right?

> 3. VFIO_DEVICE_GET_REGION_INFO
> 
>    For platform devices with multiple regions, information
>    is needed to correlate the regions with the device 
>    tree structure that drivers use to determine the meaning
>    of device resources.
>    
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    The following information is needed:
>       -the device tree path to the node corresponding to the
>        region
>       -whether it corresponds to a "reg" or "ranges" property
>       -there could be multiple sub-regions per "reg" or "ranges" and
>        the sub-index within the reg/ranges is needed
> 
>    There are 5 new flags added to vfio_region_info :
> 
>    struct vfio_region_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
>         __u32   index;          /* Region index */
>         __u32   resv;           /* Reserved for alignment */
>         __u64   size;           /* Region size (bytes) */
>         __u64   offset;         /* Region offset from start of device fd */
>    };
>  
>    VFIO_REGION_INFO_FLAG_CACHEABLE
>        -if set indicates that the region must be mapped as cacheable
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_REG
>        -if set indicates that the region corresponds to a "reg" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
>        -if set indicates that the region corresponds to a "ranges" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_region_info_index appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_region_info_index
>         {
> 	      u32 index;
>         }
> 
>         A reg or ranges property may have multiple regsion.  The index
>         specifies the index within the "reg" or "ranges"
>         that this region corresponds to.
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    If multiple flags are set that indicate that there is
>    an appended struct, the order of the flags indicates
>    the order of the structs.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
>    Suggested usage:
>       -call VFIO_DEVICE_GET_REGION_INFO with argsz >        sizeof(struct vfio_region_info)
>       -realloc the buffer
>       -call VFIO_DEVICE_GET_REGION_INFO again, and the appended
>        structs will be returned
> 
> 4.  VFIO_DEVICE_GET_IRQ_INFO
> 
>    For platform devices with multiple interrupts that 
>    correspond to different subnodes in the device tree,
>    information is needed to correlate the interrupts
>    to the the device tree structure.
> 
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    1 new flag is added to vfio_irq_info :
> 
>    struct vfio_irq_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_DEVTREE_IRQ_INFO_FLAG_PATH (1 << ?)
>         __u32   index;    /* IRQ index */
>         __u32   count;    /* Number of IRQs within this index */
>     };
> 
>    VFIO_DEVTREE_IRQ_INFO_FLAG_PATH 
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_irq_info :
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
> 5.  EXAMPLE 1
> 
>     Example, Freescale SATA controller:
> 
>      sata@220000 {
>          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>          reg = <0x220000 0x1000>;
>          interrupts = <0x44 0x2 0x0 0x0>;
>      };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe220000.sata");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -1 region
>       -1 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
> 6.  EXAMPLE 2
> 
>     Example, Freescale DMA engine (modified to illustrate):
> 
>     dma@101300 {
>        cell-index = <0x1>;
>        ranges = <0x0 0x101100 0x200>;
>        reg = <0x101300 0x4>;
>        compatible = "fsl,eloplus-dma";
>        #size-cells = <0x1>;
>        #address-cells = <0x1>;
>        fsl,liodn = <0xc6>;
>     
>        dma-channel@180 {
>           interrupts = <0x23 0x2 0x0 0x0>;
>           cell-index = <0x3>;
>           reg = <0x180 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
>     
>        dma-channel@100 {
>           interrupts = <0x22 0x2 0x0 0x0>;
>           cell-index = <0x2>;
>           reg = <0x100 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
> 
>     };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe101300.dma");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -2 regions
>       -2 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_RANGES |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>       -for index 1:
>            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@180"
> 
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@100"


Seems like it should work.  My only API concern with this model of
appending structs is that a user needs to know the size of each struct
even if they don't otherwise care about it in order to step over it.  In
some cases, like the path, the size is variable and the user needs to
look into it.  The structs must also be strictly ordered based on the
order of the flags or all hope is lost.  If we assign flags sequentially
there should be no case where the user needs to step over something that
they doesn't know the size of.  Even so, we may still be ahead to define
the first word of each struct as the length (I'm guessing a byte might
be too limiting).  It would sure make walking it easier.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
  2013-07-03 21:40   ` Yoder Stuart-B08248
  (?)
@ 2013-07-03 22:53   ` Alex Williamson
  -1 siblings, 0 replies; 51+ messages in thread
From: Alex Williamson @ 2013-07-03 22:53 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	Bhushan Bharat-R65777, kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
	Antonios Motakis, kvmarm@lists.cs.columbia.edu

On Wed, 2013-07-03 at 21:40 +0000, Yoder Stuart-B08248 wrote:
> Version 2
>   -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
>   -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
>   -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
>   -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
>    defined 5 new flags and associated structs
>   -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
>    defined 1 new flag and associated struct
>   -removed redundant example
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing kernel interface for vfio-pci is pretty close to what is needed
> for platform devices:
>    -mechanism to create a container
>    -add groups/devices to a container
>    -set the IOMMU model
>    -map DMA regions
>    -get an fd for a specific device, which allows user space to determine
>     info about device regions (e.g. registers) and interrupt info
>    -support for mmapping device regions
>    -mechanism to set how interrupts are signaled
> 
> Many platform device are simple and consist of a single register
> region and a single interrupt.  For these types of devices the
> existing vfio interfaces should be sufficient.
> 
> However, platform devices can get complicated-- logically represented
> as a device tree hierarchy of nodes.  For devices with multiple regions
> and interrupts, new mechanisms are needed in vfio to correlate the
> regions/interrupts with the device tree structure that drivers use
> to determine the meaning of device resources.
> 
> In some cases there are relationships between device, and devices
> reference other devices using phandle links.  The kernel won't expose
> relationships between devices, but just exposes mappable register
> regions and interrupts.
> 
> The changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>   User space knows by out-of-band means which device it is accessing
>   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
>   to get the device information:
> 
>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
>              "/sys/bus/platform/devices/ffe210000.usb"));

FWIW, I'm in favor of whichever way works out cleaner in the code for
pre-pending "/sys/bus" or not.  It sort of seems like it's unnecessary.
It's also a little inconsistent that the returned path doesn't
pre-pend /sys in the examples below.

> 2.  VFIO_DEVICE_GET_INFO
> 
>    The number of regions corresponds to the regions defined
>    in "reg" and "ranges" in the device tree.  
> 
>    Two new flags are added to struct vfio_device_info:
> 
>    #define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
>    #define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info available */
> 
>    It is possible that there could be platform bus devices 
>    that are not in the device tree, so we use 2 flags to
>    allow for that.
> 
>    If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
>    that there are regions and IRQs but no device tree info
>    available.
> 
>    If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
>    there is device tree info available.

But it would be invalid to only have DEVTREE w/o PLATFORM for now,
right?

> 3. VFIO_DEVICE_GET_REGION_INFO
> 
>    For platform devices with multiple regions, information
>    is needed to correlate the regions with the device 
>    tree structure that drivers use to determine the meaning
>    of device resources.
>    
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    The following information is needed:
>       -the device tree path to the node corresponding to the
>        region
>       -whether it corresponds to a "reg" or "ranges" property
>       -there could be multiple sub-regions per "reg" or "ranges" and
>        the sub-index within the reg/ranges is needed
> 
>    There are 5 new flags added to vfio_region_info :
> 
>    struct vfio_region_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
>         __u32   index;          /* Region index */
>         __u32   resv;           /* Reserved for alignment */
>         __u64   size;           /* Region size (bytes) */
>         __u64   offset;         /* Region offset from start of device fd */
>    };
>  
>    VFIO_REGION_INFO_FLAG_CACHEABLE
>        -if set indicates that the region must be mapped as cacheable
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_REG
>        -if set indicates that the region corresponds to a "reg" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
>        -if set indicates that the region corresponds to a "ranges" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_region_info_index appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_region_info_index
>         {
> 	      u32 index;
>         }
> 
>         A reg or ranges property may have multiple regsion.  The index
>         specifies the index within the "reg" or "ranges"
>         that this region corresponds to.
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    If multiple flags are set that indicate that there is
>    an appended struct, the order of the flags indicates
>    the order of the structs.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
>    Suggested usage:
>       -call VFIO_DEVICE_GET_REGION_INFO with argsz =
>        sizeof(struct vfio_region_info)
>       -realloc the buffer
>       -call VFIO_DEVICE_GET_REGION_INFO again, and the appended
>        structs will be returned
> 
> 4.  VFIO_DEVICE_GET_IRQ_INFO
> 
>    For platform devices with multiple interrupts that 
>    correspond to different subnodes in the device tree,
>    information is needed to correlate the interrupts
>    to the the device tree structure.
> 
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    1 new flag is added to vfio_irq_info :
> 
>    struct vfio_irq_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_DEVTREE_IRQ_INFO_FLAG_PATH (1 << ?)
>         __u32   index;    /* IRQ index */
>         __u32   count;    /* Number of IRQs within this index */
>     };
> 
>    VFIO_DEVTREE_IRQ_INFO_FLAG_PATH 
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_irq_info :
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
> 5.  EXAMPLE 1
> 
>     Example, Freescale SATA controller:
> 
>      sata@220000 {
>          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>          reg = <0x220000 0x1000>;
>          interrupts = <0x44 0x2 0x0 0x0>;
>      };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe220000.sata");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -1 region
>       -1 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
> 6.  EXAMPLE 2
> 
>     Example, Freescale DMA engine (modified to illustrate):
> 
>     dma@101300 {
>        cell-index = <0x1>;
>        ranges = <0x0 0x101100 0x200>;
>        reg = <0x101300 0x4>;
>        compatible = "fsl,eloplus-dma";
>        #size-cells = <0x1>;
>        #address-cells = <0x1>;
>        fsl,liodn = <0xc6>;
>     
>        dma-channel@180 {
>           interrupts = <0x23 0x2 0x0 0x0>;
>           cell-index = <0x3>;
>           reg = <0x180 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
>     
>        dma-channel@100 {
>           interrupts = <0x22 0x2 0x0 0x0>;
>           cell-index = <0x2>;
>           reg = <0x100 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
> 
>     };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe101300.dma");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -2 regions
>       -2 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_RANGES |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>       -for index 1:
>            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@180"
> 
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@100"


Seems like it should work.  My only API concern with this model of
appending structs is that a user needs to know the size of each struct
even if they don't otherwise care about it in order to step over it.  In
some cases, like the path, the size is variable and the user needs to
look into it.  The structs must also be strictly ordered based on the
order of the flags or all hope is lost.  If we assign flags sequentially
there should be no case where the user needs to step over something that
they doesn't know the size of.  Even so, we may still be ahead to define
the first word of each struct as the length (I'm guessing a byte might
be too limiting).  It would sure make walking it easier.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
@ 2013-07-03 22:53     ` Alex Williamson
  0 siblings, 0 replies; 51+ messages in thread
From: Alex Williamson @ 2013-07-03 22:53 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Alexander Graf, Wood Scott-B07421, Bhushan Bharat-R65777,
	Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
	Antonios Motakis, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org, kvmarm@lists.cs.columbia.edu

On Wed, 2013-07-03 at 21:40 +0000, Yoder Stuart-B08248 wrote:
> Version 2
>   -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
>   -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
>   -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
>   -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
>    defined 5 new flags and associated structs
>   -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
>    defined 1 new flag and associated struct
>   -removed redundant example
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing kernel interface for vfio-pci is pretty close to what is needed
> for platform devices:
>    -mechanism to create a container
>    -add groups/devices to a container
>    -set the IOMMU model
>    -map DMA regions
>    -get an fd for a specific device, which allows user space to determine
>     info about device regions (e.g. registers) and interrupt info
>    -support for mmapping device regions
>    -mechanism to set how interrupts are signaled
> 
> Many platform device are simple and consist of a single register
> region and a single interrupt.  For these types of devices the
> existing vfio interfaces should be sufficient.
> 
> However, platform devices can get complicated-- logically represented
> as a device tree hierarchy of nodes.  For devices with multiple regions
> and interrupts, new mechanisms are needed in vfio to correlate the
> regions/interrupts with the device tree structure that drivers use
> to determine the meaning of device resources.
> 
> In some cases there are relationships between device, and devices
> reference other devices using phandle links.  The kernel won't expose
> relationships between devices, but just exposes mappable register
> regions and interrupts.
> 
> The changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>   User space knows by out-of-band means which device it is accessing
>   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
>   to get the device information:
> 
>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
>              "/sys/bus/platform/devices/ffe210000.usb"));

FWIW, I'm in favor of whichever way works out cleaner in the code for
pre-pending "/sys/bus" or not.  It sort of seems like it's unnecessary.
It's also a little inconsistent that the returned path doesn't
pre-pend /sys in the examples below.

> 2.  VFIO_DEVICE_GET_INFO
> 
>    The number of regions corresponds to the regions defined
>    in "reg" and "ranges" in the device tree.  
> 
>    Two new flags are added to struct vfio_device_info:
> 
>    #define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
>    #define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info available */
> 
>    It is possible that there could be platform bus devices 
>    that are not in the device tree, so we use 2 flags to
>    allow for that.
> 
>    If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
>    that there are regions and IRQs but no device tree info
>    available.
> 
>    If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
>    there is device tree info available.

But it would be invalid to only have DEVTREE w/o PLATFORM for now,
right?

> 3. VFIO_DEVICE_GET_REGION_INFO
> 
>    For platform devices with multiple regions, information
>    is needed to correlate the regions with the device 
>    tree structure that drivers use to determine the meaning
>    of device resources.
>    
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    The following information is needed:
>       -the device tree path to the node corresponding to the
>        region
>       -whether it corresponds to a "reg" or "ranges" property
>       -there could be multiple sub-regions per "reg" or "ranges" and
>        the sub-index within the reg/ranges is needed
> 
>    There are 5 new flags added to vfio_region_info :
> 
>    struct vfio_region_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
>         __u32   index;          /* Region index */
>         __u32   resv;           /* Reserved for alignment */
>         __u64   size;           /* Region size (bytes) */
>         __u64   offset;         /* Region offset from start of device fd */
>    };
>  
>    VFIO_REGION_INFO_FLAG_CACHEABLE
>        -if set indicates that the region must be mapped as cacheable
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_REG
>        -if set indicates that the region corresponds to a "reg" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
>        -if set indicates that the region corresponds to a "ranges" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_region_info_index appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_region_info_index
>         {
> 	      u32 index;
>         }
> 
>         A reg or ranges property may have multiple regsion.  The index
>         specifies the index within the "reg" or "ranges"
>         that this region corresponds to.
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    If multiple flags are set that indicate that there is
>    an appended struct, the order of the flags indicates
>    the order of the structs.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
>    Suggested usage:
>       -call VFIO_DEVICE_GET_REGION_INFO with argsz =
>        sizeof(struct vfio_region_info)
>       -realloc the buffer
>       -call VFIO_DEVICE_GET_REGION_INFO again, and the appended
>        structs will be returned
> 
> 4.  VFIO_DEVICE_GET_IRQ_INFO
> 
>    For platform devices with multiple interrupts that 
>    correspond to different subnodes in the device tree,
>    information is needed to correlate the interrupts
>    to the the device tree structure.
> 
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    1 new flag is added to vfio_irq_info :
> 
>    struct vfio_irq_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_DEVTREE_IRQ_INFO_FLAG_PATH (1 << ?)
>         __u32   index;    /* IRQ index */
>         __u32   count;    /* Number of IRQs within this index */
>     };
> 
>    VFIO_DEVTREE_IRQ_INFO_FLAG_PATH 
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_irq_info :
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
> 5.  EXAMPLE 1
> 
>     Example, Freescale SATA controller:
> 
>      sata@220000 {
>          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>          reg = <0x220000 0x1000>;
>          interrupts = <0x44 0x2 0x0 0x0>;
>      };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe220000.sata");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -1 region
>       -1 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
> 6.  EXAMPLE 2
> 
>     Example, Freescale DMA engine (modified to illustrate):
> 
>     dma@101300 {
>        cell-index = <0x1>;
>        ranges = <0x0 0x101100 0x200>;
>        reg = <0x101300 0x4>;
>        compatible = "fsl,eloplus-dma";
>        #size-cells = <0x1>;
>        #address-cells = <0x1>;
>        fsl,liodn = <0xc6>;
>     
>        dma-channel@180 {
>           interrupts = <0x23 0x2 0x0 0x0>;
>           cell-index = <0x3>;
>           reg = <0x180 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
>     
>        dma-channel@100 {
>           interrupts = <0x22 0x2 0x0 0x0>;
>           cell-index = <0x2>;
>           reg = <0x100 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
> 
>     };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe101300.dma");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -2 regions
>       -2 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_RANGES |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>       -for index 1:
>            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@180"
> 
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@100"


Seems like it should work.  My only API concern with this model of
appending structs is that a user needs to know the size of each struct
even if they don't otherwise care about it in order to step over it.  In
some cases, like the path, the size is variable and the user needs to
look into it.  The structs must also be strictly ordered based on the
order of the flags or all hope is lost.  If we assign flags sequentially
there should be no case where the user needs to step over something that
they doesn't know the size of.  Even so, we may still be ahead to define
the first word of each struct as the length (I'm guessing a byte might
be too limiting).  It would sure make walking it easier.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
  2013-07-03 22:53     ` Alex Williamson
@ 2013-07-03 23:06       ` Scott Wood
  -1 siblings, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-03 23:06 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yoder Stuart-B08248, Alexander Graf, Wood Scott-B07421,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On 07/03/2013 05:53:09 PM, Alex Williamson wrote:
> Seems like it should work.  My only API concern with this model of
> appending structs is that a user needs to know the size of each struct
> even if they don't otherwise care about it in order to step over it.

In that case, it might be better to make the struct grow linearly  
rather than with options, and just have a version number on the struct  
indicating how far the caller thinks struct has grown.  The kernel  
could respond back with a lower version to reflect that it only filled  
in the fields it knows about.  Flags could still be used to indicate  
which portions of the struct are relevant, but not the physical layout  
of the struct.

> In some cases, like the path, the size is variable and the user needs  
> to
> look into it.

For things like path, maybe the caller should just pass in a string  
buffer that is separate from the struct buffer?

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
  2013-07-03 22:53     ` Alex Williamson
  (?)
  (?)
@ 2013-07-03 23:06     ` Scott Wood
  -1 siblings, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-03 23:06 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list, Antonios Motakis,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Yoder Stuart-B08248,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	kvmarm@lists.cs.columbia.edu

On 07/03/2013 05:53:09 PM, Alex Williamson wrote:
> Seems like it should work.  My only API concern with this model of
> appending structs is that a user needs to know the size of each struct
> even if they don't otherwise care about it in order to step over it.

In that case, it might be better to make the struct grow linearly  
rather than with options, and just have a version number on the struct  
indicating how far the caller thinks struct has grown.  The kernel  
could respond back with a lower version to reflect that it only filled  
in the fields it knows about.  Flags could still be used to indicate  
which portions of the struct are relevant, but not the physical layout  
of the struct.

> In some cases, like the path, the size is variable and the user needs  
> to
> look into it.

For things like path, maybe the caller should just pass in a string  
buffer that is separate from the struct buffer?

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
@ 2013-07-03 23:06       ` Scott Wood
  0 siblings, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-03 23:06 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yoder Stuart-B08248, Alexander Graf, Wood Scott-B07421,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On 07/03/2013 05:53:09 PM, Alex Williamson wrote:
> Seems like it should work.  My only API concern with this model of
> appending structs is that a user needs to know the size of each struct
> even if they don't otherwise care about it in order to step over it.

In that case, it might be better to make the struct grow linearly  
rather than with options, and just have a version number on the struct  
indicating how far the caller thinks struct has grown.  The kernel  
could respond back with a lower version to reflect that it only filled  
in the fields it knows about.  Flags could still be used to indicate  
which portions of the struct are relevant, but not the physical layout  
of the struct.

> In some cases, like the path, the size is variable and the user needs  
> to
> look into it.

For things like path, maybe the caller should just pass in a string  
buffer that is separate from the struct buffer?

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
  2013-07-03 21:40   ` Yoder Stuart-B08248
@ 2013-07-04 14:44     ` Mario Smarduch
  -1 siblings, 0 replies; 51+ messages in thread
From: Mario Smarduch @ 2013-07-04 14:44 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Alex Williamson, Alexander Graf, Wood Scott-B07421,
	kvm@vger.kernel.org list, Bhushan Bharat-R65777,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
	kvmarm@lists.cs.columbia.edu


I'm having trouble understanding how this works where
the Guest Device Model != Host. How do you inform the guest
where the device is mapped in its physical address space,
and handle GPA faults?

- Mario

On 7/3/2013 11:40 PM, Yoder Stuart-B08248 wrote:
> Version 2
>   -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
>   -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
>   -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
>   -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
>    defined 5 new flags and associated structs
>   -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
>    defined 1 new flag and associated struct
>   -removed redundant example
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing kernel interface for vfio-pci is pretty close to what is needed
> for platform devices:
>    -mechanism to create a container
>    -add groups/devices to a container
>    -set the IOMMU model
>    -map DMA regions
>    -get an fd for a specific device, which allows user space to determine
>     info about device regions (e.g. registers) and interrupt info
>    -support for mmapping device regions
>    -mechanism to set how interrupts are signaled
> 
> Many platform device are simple and consist of a single register
> region and a single interrupt.  For these types of devices the
> existing vfio interfaces should be sufficient.
> 
> However, platform devices can get complicated-- logically represented
> as a device tree hierarchy of nodes.  For devices with multiple regions
> and interrupts, new mechanisms are needed in vfio to correlate the
> regions/interrupts with the device tree structure that drivers use
> to determine the meaning of device resources.
> 
> In some cases there are relationships between device, and devices
> reference other devices using phandle links.  The kernel won't expose
> relationships between devices, but just exposes mappable register
> regions and interrupts.
> 
> The changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>   User space knows by out-of-band means which device it is accessing
>   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
>   to get the device information:
> 
>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
>              "/sys/bus/platform/devices/ffe210000.usb"));
> 
> 2.  VFIO_DEVICE_GET_INFO
> 
>    The number of regions corresponds to the regions defined
>    in "reg" and "ranges" in the device tree.  
> 
>    Two new flags are added to struct vfio_device_info:
> 
>    #define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
>    #define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info available */
> 
>    It is possible that there could be platform bus devices 
>    that are not in the device tree, so we use 2 flags to
>    allow for that.
> 
>    If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
>    that there are regions and IRQs but no device tree info
>    available.
> 
>    If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
>    there is device tree info available.
> 
> 3. VFIO_DEVICE_GET_REGION_INFO
> 
>    For platform devices with multiple regions, information
>    is needed to correlate the regions with the device 
>    tree structure that drivers use to determine the meaning
>    of device resources.
>    
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    The following information is needed:
>       -the device tree path to the node corresponding to the
>        region
>       -whether it corresponds to a "reg" or "ranges" property
>       -there could be multiple sub-regions per "reg" or "ranges" and
>        the sub-index within the reg/ranges is needed
> 
>    There are 5 new flags added to vfio_region_info :
> 
>    struct vfio_region_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
>         __u32   index;          /* Region index */
>         __u32   resv;           /* Reserved for alignment */
>         __u64   size;           /* Region size (bytes) */
>         __u64   offset;         /* Region offset from start of device fd */
>    };
>  
>    VFIO_REGION_INFO_FLAG_CACHEABLE
>        -if set indicates that the region must be mapped as cacheable
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_REG
>        -if set indicates that the region corresponds to a "reg" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
>        -if set indicates that the region corresponds to a "ranges" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_region_info_index appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_region_info_index
>         {
> 	      u32 index;
>         }
> 
>         A reg or ranges property may have multiple regsion.  The index
>         specifies the index within the "reg" or "ranges"
>         that this region corresponds to.
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    If multiple flags are set that indicate that there is
>    an appended struct, the order of the flags indicates
>    the order of the structs.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
>    Suggested usage:
>       -call VFIO_DEVICE_GET_REGION_INFO with argsz >        sizeof(struct vfio_region_info)
>       -realloc the buffer
>       -call VFIO_DEVICE_GET_REGION_INFO again, and the appended
>        structs will be returned
> 
> 4.  VFIO_DEVICE_GET_IRQ_INFO
> 
>    For platform devices with multiple interrupts that 
>    correspond to different subnodes in the device tree,
>    information is needed to correlate the interrupts
>    to the the device tree structure.
> 
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    1 new flag is added to vfio_irq_info :
> 
>    struct vfio_irq_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_DEVTREE_IRQ_INFO_FLAG_PATH (1 << ?)
>         __u32   index;    /* IRQ index */
>         __u32   count;    /* Number of IRQs within this index */
>     };
> 
>    VFIO_DEVTREE_IRQ_INFO_FLAG_PATH 
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_irq_info :
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
> 5.  EXAMPLE 1
> 
>     Example, Freescale SATA controller:
> 
>      sata@220000 {
>          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>          reg = <0x220000 0x1000>;
>          interrupts = <0x44 0x2 0x0 0x0>;
>      };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe220000.sata");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -1 region
>       -1 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
> 6.  EXAMPLE 2
> 
>     Example, Freescale DMA engine (modified to illustrate):
> 
>     dma@101300 {
>        cell-index = <0x1>;
>        ranges = <0x0 0x101100 0x200>;
>        reg = <0x101300 0x4>;
>        compatible = "fsl,eloplus-dma";
>        #size-cells = <0x1>;
>        #address-cells = <0x1>;
>        fsl,liodn = <0xc6>;
>     
>        dma-channel@180 {
>           interrupts = <0x23 0x2 0x0 0x0>;
>           cell-index = <0x3>;
>           reg = <0x180 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
>     
>        dma-channel@100 {
>           interrupts = <0x22 0x2 0x0 0x0>;
>           cell-index = <0x2>;
>           reg = <0x100 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
> 
>     };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe101300.dma");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -2 regions
>       -2 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_RANGES |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>       -for index 1:
>            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@180"
> 
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@100"
> 
> 
> Regards,
> Stuart
> 
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
> 



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
  2013-07-03 21:40   ` Yoder Stuart-B08248
                     ` (2 preceding siblings ...)
  (?)
@ 2013-07-04 14:44   ` Mario Smarduch
  -1 siblings, 0 replies; 51+ messages in thread
From: Mario Smarduch @ 2013-07-04 14:44 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Bhushan Bharat-R65777,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu


I'm having trouble understanding how this works where
the Guest Device Model != Host. How do you inform the guest
where the device is mapped in its physical address space,
and handle GPA faults?

- Mario

On 7/3/2013 11:40 PM, Yoder Stuart-B08248 wrote:
> Version 2
>   -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
>   -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
>   -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
>   -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
>    defined 5 new flags and associated structs
>   -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
>    defined 1 new flag and associated struct
>   -removed redundant example
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing kernel interface for vfio-pci is pretty close to what is needed
> for platform devices:
>    -mechanism to create a container
>    -add groups/devices to a container
>    -set the IOMMU model
>    -map DMA regions
>    -get an fd for a specific device, which allows user space to determine
>     info about device regions (e.g. registers) and interrupt info
>    -support for mmapping device regions
>    -mechanism to set how interrupts are signaled
> 
> Many platform device are simple and consist of a single register
> region and a single interrupt.  For these types of devices the
> existing vfio interfaces should be sufficient.
> 
> However, platform devices can get complicated-- logically represented
> as a device tree hierarchy of nodes.  For devices with multiple regions
> and interrupts, new mechanisms are needed in vfio to correlate the
> regions/interrupts with the device tree structure that drivers use
> to determine the meaning of device resources.
> 
> In some cases there are relationships between device, and devices
> reference other devices using phandle links.  The kernel won't expose
> relationships between devices, but just exposes mappable register
> regions and interrupts.
> 
> The changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>   User space knows by out-of-band means which device it is accessing
>   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
>   to get the device information:
> 
>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
>              "/sys/bus/platform/devices/ffe210000.usb"));
> 
> 2.  VFIO_DEVICE_GET_INFO
> 
>    The number of regions corresponds to the regions defined
>    in "reg" and "ranges" in the device tree.  
> 
>    Two new flags are added to struct vfio_device_info:
> 
>    #define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
>    #define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info available */
> 
>    It is possible that there could be platform bus devices 
>    that are not in the device tree, so we use 2 flags to
>    allow for that.
> 
>    If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
>    that there are regions and IRQs but no device tree info
>    available.
> 
>    If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
>    there is device tree info available.
> 
> 3. VFIO_DEVICE_GET_REGION_INFO
> 
>    For platform devices with multiple regions, information
>    is needed to correlate the regions with the device 
>    tree structure that drivers use to determine the meaning
>    of device resources.
>    
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    The following information is needed:
>       -the device tree path to the node corresponding to the
>        region
>       -whether it corresponds to a "reg" or "ranges" property
>       -there could be multiple sub-regions per "reg" or "ranges" and
>        the sub-index within the reg/ranges is needed
> 
>    There are 5 new flags added to vfio_region_info :
> 
>    struct vfio_region_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
>         __u32   index;          /* Region index */
>         __u32   resv;           /* Reserved for alignment */
>         __u64   size;           /* Region size (bytes) */
>         __u64   offset;         /* Region offset from start of device fd */
>    };
>  
>    VFIO_REGION_INFO_FLAG_CACHEABLE
>        -if set indicates that the region must be mapped as cacheable
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_REG
>        -if set indicates that the region corresponds to a "reg" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
>        -if set indicates that the region corresponds to a "ranges" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_region_info_index appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_region_info_index
>         {
> 	      u32 index;
>         }
> 
>         A reg or ranges property may have multiple regsion.  The index
>         specifies the index within the "reg" or "ranges"
>         that this region corresponds to.
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    If multiple flags are set that indicate that there is
>    an appended struct, the order of the flags indicates
>    the order of the structs.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
>    Suggested usage:
>       -call VFIO_DEVICE_GET_REGION_INFO with argsz =
>        sizeof(struct vfio_region_info)
>       -realloc the buffer
>       -call VFIO_DEVICE_GET_REGION_INFO again, and the appended
>        structs will be returned
> 
> 4.  VFIO_DEVICE_GET_IRQ_INFO
> 
>    For platform devices with multiple interrupts that 
>    correspond to different subnodes in the device tree,
>    information is needed to correlate the interrupts
>    to the the device tree structure.
> 
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    1 new flag is added to vfio_irq_info :
> 
>    struct vfio_irq_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_DEVTREE_IRQ_INFO_FLAG_PATH (1 << ?)
>         __u32   index;    /* IRQ index */
>         __u32   count;    /* Number of IRQs within this index */
>     };
> 
>    VFIO_DEVTREE_IRQ_INFO_FLAG_PATH 
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_irq_info :
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
> 5.  EXAMPLE 1
> 
>     Example, Freescale SATA controller:
> 
>      sata@220000 {
>          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>          reg = <0x220000 0x1000>;
>          interrupts = <0x44 0x2 0x0 0x0>;
>      };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe220000.sata");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -1 region
>       -1 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
> 6.  EXAMPLE 2
> 
>     Example, Freescale DMA engine (modified to illustrate):
> 
>     dma@101300 {
>        cell-index = <0x1>;
>        ranges = <0x0 0x101100 0x200>;
>        reg = <0x101300 0x4>;
>        compatible = "fsl,eloplus-dma";
>        #size-cells = <0x1>;
>        #address-cells = <0x1>;
>        fsl,liodn = <0xc6>;
>     
>        dma-channel@180 {
>           interrupts = <0x23 0x2 0x0 0x0>;
>           cell-index = <0x3>;
>           reg = <0x180 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
>     
>        dma-channel@100 {
>           interrupts = <0x22 0x2 0x0 0x0>;
>           cell-index = <0x2>;
>           reg = <0x100 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
> 
>     };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe101300.dma");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -2 regions
>       -2 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_RANGES |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>       -for index 1:
>            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@180"
> 
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@100"
> 
> 
> Regards,
> Stuart
> 
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
@ 2013-07-04 14:44     ` Mario Smarduch
  0 siblings, 0 replies; 51+ messages in thread
From: Mario Smarduch @ 2013-07-04 14:44 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Alex Williamson, Alexander Graf, Wood Scott-B07421,
	kvm@vger.kernel.org list, Bhushan Bharat-R65777,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
	kvmarm@lists.cs.columbia.edu


I'm having trouble understanding how this works where
the Guest Device Model != Host. How do you inform the guest
where the device is mapped in its physical address space,
and handle GPA faults?

- Mario

On 7/3/2013 11:40 PM, Yoder Stuart-B08248 wrote:
> Version 2
>   -VFIO_GROUP_GET_DEVICE_FD-- specified that the path is a sysfs path
>   -VFIO_DEVICE_GET_INFO-- defined 2 flags instead of 1
>   -deleted VFIO_DEVICE_GET_DEVTREE_INFO ioctl
>   -VFIO_DEVICE_GET_REGION_INFO-- updated as per AlexW's suggestion,
>    defined 5 new flags and associated structs
>   -VFIO_DEVICE_GET_IRQ_INFO-- updated as per AlexW's suggestion,
>    defined 1 new flag and associated struct
>   -removed redundant example
> 
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
> 
> The existing kernel interface for vfio-pci is pretty close to what is needed
> for platform devices:
>    -mechanism to create a container
>    -add groups/devices to a container
>    -set the IOMMU model
>    -map DMA regions
>    -get an fd for a specific device, which allows user space to determine
>     info about device regions (e.g. registers) and interrupt info
>    -support for mmapping device regions
>    -mechanism to set how interrupts are signaled
> 
> Many platform device are simple and consist of a single register
> region and a single interrupt.  For these types of devices the
> existing vfio interfaces should be sufficient.
> 
> However, platform devices can get complicated-- logically represented
> as a device tree hierarchy of nodes.  For devices with multiple regions
> and interrupts, new mechanisms are needed in vfio to correlate the
> regions/interrupts with the device tree structure that drivers use
> to determine the meaning of device resources.
> 
> In some cases there are relationships between device, and devices
> reference other devices using phandle links.  The kernel won't expose
> relationships between devices, but just exposes mappable register
> regions and interrupts.
> 
> The changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
> 
> 1.  VFIO_GROUP_GET_DEVICE_FD
> 
>   User space knows by out-of-band means which device it is accessing
>   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
>   to get the device information:
> 
>   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
>              "/sys/bus/platform/devices/ffe210000.usb"));
> 
> 2.  VFIO_DEVICE_GET_INFO
> 
>    The number of regions corresponds to the regions defined
>    in "reg" and "ranges" in the device tree.  
> 
>    Two new flags are added to struct vfio_device_info:
> 
>    #define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
>    #define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info available */
> 
>    It is possible that there could be platform bus devices 
>    that are not in the device tree, so we use 2 flags to
>    allow for that.
> 
>    If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
>    that there are regions and IRQs but no device tree info
>    available.
> 
>    If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
>    there is device tree info available.
> 
> 3. VFIO_DEVICE_GET_REGION_INFO
> 
>    For platform devices with multiple regions, information
>    is needed to correlate the regions with the device 
>    tree structure that drivers use to determine the meaning
>    of device resources.
>    
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    The following information is needed:
>       -the device tree path to the node corresponding to the
>        region
>       -whether it corresponds to a "reg" or "ranges" property
>       -there could be multiple sub-regions per "reg" or "ranges" and
>        the sub-index within the reg/ranges is needed
> 
>    There are 5 new flags added to vfio_region_info :
> 
>    struct vfio_region_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
>    #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
>         __u32   index;          /* Region index */
>         __u32   resv;           /* Reserved for alignment */
>         __u64   size;           /* Region size (bytes) */
>         __u64   offset;         /* Region offset from start of device fd */
>    };
>  
>    VFIO_REGION_INFO_FLAG_CACHEABLE
>        -if set indicates that the region must be mapped as cacheable
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_REG
>        -if set indicates that the region corresponds to a "reg" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
>        -if set indicates that the region corresponds to a "ranges" property
>         in the device tree representation of the device
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_region_info_index appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_region_info_index
>         {
> 	      u32 index;
>         }
> 
>         A reg or ranges property may have multiple regsion.  The index
>         specifies the index within the "reg" or "ranges"
>         that this region corresponds to.
> 
>    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_region_info:
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    If multiple flags are set that indicate that there is
>    an appended struct, the order of the flags indicates
>    the order of the structs.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
>    Suggested usage:
>       -call VFIO_DEVICE_GET_REGION_INFO with argsz =
>        sizeof(struct vfio_region_info)
>       -realloc the buffer
>       -call VFIO_DEVICE_GET_REGION_INFO again, and the appended
>        structs will be returned
> 
> 4.  VFIO_DEVICE_GET_IRQ_INFO
> 
>    For platform devices with multiple interrupts that 
>    correspond to different subnodes in the device tree,
>    information is needed to correlate the interrupts
>    to the the device tree structure.
> 
>    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
>    device tree information.
> 
>    1 new flag is added to vfio_irq_info :
> 
>    struct vfio_irq_info {
>         __u32   argsz;
>         __u32   flags;
>    #define VFIO_DEVTREE_IRQ_INFO_FLAG_PATH (1 << ?)
>         __u32   index;    /* IRQ index */
>         __u32   count;    /* Number of IRQs within this index */
>     };
> 
>    VFIO_DEVTREE_IRQ_INFO_FLAG_PATH 
>        -if set indicates that there is a dword aligned struct
>         struct vfio_devtree_info_path appended to the
>         end of vfio_irq_info :
> 
>         struct vfio_devtree_info_path
>         {
>             u32 len;
>             u8 path[];
>         } 
> 
>         The path is the full path to the corresponding device
>         tree node.  The len field specifies the length of the
>         path string.
> 
>    argsz is set by the kernel specifying the total size of
>    struct vfio_region_info and all appended structs.
> 
> 5.  EXAMPLE 1
> 
>     Example, Freescale SATA controller:
> 
>      sata@220000 {
>          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>          reg = <0x220000 0x1000>;
>          interrupts = <0x44 0x2 0x0 0x0>;
>      };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe220000.sata");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -1 region
>       -1 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 26
>               path = "/soc@ffe000000/sata@220000"
> 
> 6.  EXAMPLE 2
> 
>     Example, Freescale DMA engine (modified to illustrate):
> 
>     dma@101300 {
>        cell-index = <0x1>;
>        ranges = <0x0 0x101100 0x200>;
>        reg = <0x101300 0x4>;
>        compatible = "fsl,eloplus-dma";
>        #size-cells = <0x1>;
>        #address-cells = <0x1>;
>        fsl,liodn = <0xc6>;
>     
>        dma-channel@180 {
>           interrupts = <0x23 0x2 0x0 0x0>;
>           cell-index = <0x3>;
>           reg = <0x180 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
>     
>        dma-channel@100 {
>           interrupts = <0x22 0x2 0x0 0x0>;
>           cell-index = <0x2>;
>           reg = <0x100 0x80>;
>           compatible = "fsl,eloplus-dma-channel";
>        };
> 
>     };
> 
>     request to get device FD would look like:
>       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe101300.dma");
> 
>     The VFIO_DEVICE_GET_INFO ioctl would return:
>       -2 regions
>       -2 interrupts
> 
>     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>       -for index 0:
>            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_RANGES |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>       -for index 1:
>            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
>                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
>            vfio_devtree_info_path
>               len = 25
>               path = "/soc@ffe000000/dma@101300"
> 
>     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@180"
> 
>       -for index 0:
>           flags = VFIO_IRQ_INFO_EVENTFD | 
>                   VFIO_IRQ_INFO_MASKABLE |
>                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH  
>           vfio_devtree_info_path
>               len = 41
>               path = "/soc@ffe000000/dma@101300/dma-channel@100"
> 
> 
> Regards,
> Stuart
> 
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
> 



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
  2013-07-04 14:44     ` Mario Smarduch
@ 2013-07-04 14:47       ` Alexander Graf
  -1 siblings, 0 replies; 51+ messages in thread
From: Alexander Graf @ 2013-07-04 14:47 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: Yoder Stuart-B08248, Alex Williamson, Wood Scott-B07421,
	kvm@vger.kernel.org list, Bhushan Bharat-R65777,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
	kvmarm@lists.cs.columbia.edu


On 04.07.2013, at 16:44, Mario Smarduch wrote:

> 
> I'm having trouble understanding how this works where
> the Guest Device Model != Host. How do you inform the guest
> where the device is mapped in its physical address space,
> and handle GPA faults?

The same way as you would for emulated devices.


Alex


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
  2013-07-04 14:44     ` Mario Smarduch
  (?)
@ 2013-07-04 14:47     ` Alexander Graf
  -1 siblings, 0 replies; 51+ messages in thread
From: Alexander Graf @ 2013-07-04 14:47 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Yoder Stuart-B08248,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	kvmarm@lists.cs.columbia.edu


On 04.07.2013, at 16:44, Mario Smarduch wrote:

> 
> I'm having trouble understanding how this works where
> the Guest Device Model != Host. How do you inform the guest
> where the device is mapped in its physical address space,
> and handle GPA faults?

The same way as you would for emulated devices.


Alex

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices (v2)
@ 2013-07-04 14:47       ` Alexander Graf
  0 siblings, 0 replies; 51+ messages in thread
From: Alexander Graf @ 2013-07-04 14:47 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: Yoder Stuart-B08248, Alex Williamson, Wood Scott-B07421,
	kvm@vger.kernel.org list, Bhushan Bharat-R65777,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
	kvmarm@lists.cs.columbia.edu


On 04.07.2013, at 16:44, Mario Smarduch wrote:

> 
> I'm having trouble understanding how this works where
> the Guest Device Model != Host. How do you inform the guest
> where the device is mapped in its physical address space,
> and handle GPA faults?

The same way as you would for emulated devices.


Alex

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices (v2)
  2013-07-04 14:44     ` Mario Smarduch
                       ` (2 preceding siblings ...)
  (?)
@ 2013-07-16 15:25     ` Yoder Stuart-B08248
  -1 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-16 15:25 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Bhushan Bharat-R65777,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu



> -----Original Message-----
> From: Mario Smarduch [mailto:mario.smarduch@huawei.com]
> Sent: Thursday, July 04, 2013 9:45 AM
> To: Yoder Stuart-B08248
> Cc: Alex Williamson; Alexander Graf; Wood Scott-B07421; kvm@vger.kernel.org list; Bhushan Bharat-R65777;
> kvm-ppc@vger.kernel.org; virtualization@lists.linux-foundation.org; Sethi Varun-B16395;
> kvmarm@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices (v2)
> 
> 
> I'm having trouble understanding how this works where
> the Guest Device Model != Host. How do you inform the guest
> where the device is mapped in its physical address space,
> and handle GPA faults?

The vfio mechanisms just expose hardware to user space
and the user space app may or may not QEMU.  So there
may be no 'guest' at all.

The intent of this RFC is to provide enough info to user space so
an application can use the device, or in the case of QEMU expose
the device to a VM.  Platform devices are typically exposed via
the device tree and that is how I envision them being presented
to a guest.

Are there real cases you see where guest device model != host?
I don't envision ever presenting a platform device as a PCI device
or vise versa.

Stuart


^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices
  2013-07-03 22:31   ` Scott Wood
  (?)
@ 2013-07-16 21:51   ` Yoder Stuart-B08248
  2013-07-16 22:01     ` Scott Wood
  2013-07-16 22:01       ` Scott Wood
  -1 siblings, 2 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-16 21:51 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: Alex Williamson, Alexander Graf, Bhushan Bharat-R65777,
	Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
	Antonios Motakis, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org, kvmarm@lists.cs.columbia.edu


> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Wednesday, July 03, 2013 5:32 PM
> To: Yoder Stuart-B08248
> Cc: Alex Williamson; Alexander Graf; Wood Scott-B07421; Bhushan Bharat-R65777; Sethi Varun-B16395;
> virtualization@lists.linux-foundation.org; Antonios Motakis; kvm@vger.kernel.org list; kvm-
> ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices
> 
> On 07/02/2013 06:25:59 PM, Yoder Stuart-B08248 wrote:
> > The write-up below is the first draft of a proposal for how the
> > kernel can expose
> > platform devices to user space using vfio.
> >
> > In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> > allows user space to correlate regions and interrupts to the
> > corresponding
> > device tree node structure that is defined for most platform devices.
> >
> > Regards,
> > Stuart Yoder
> >
> > ------------------------------------------------------------------------------
> > VFIO for Platform Devices
> >
> > The existing infrastructure for vfio-pci is pretty close to what we
> > need:
> >    -mechanism to create a container
> >    -add groups/devices to a container
> >    -set the IOMMU model
> >    -map DMA regions
> >    -get an fd for a specific device, which allows user space to
> > determine
> >     info about device regions (e.g. registers) and interrupt info
> >    -support for mmapping device regions
> >    -mechanism to set how interrupts are signaled
> >
> > Platform devices can get complicated-- potentially with a tree
> > hierarchy
> > of nodes, and links/phandles pointing to other platform
> > devices.   The kernel doesn't expose relationships between
> > devices.  The kernel just exposes mappable register regions and
> > interrupts.
> > It's up to user space to work out relationships between devices
> > if it needs to-- this can be determined in the device tree exposed in
> > /proc/device-tree.
> >
> > I think the changes needed for vfio are around some of the device tree
> > related info that needs to be available with the device fd.
> >
> > 1.  VFIO_GROUP_GET_DEVICE_FD
> >
> >   User space has to know which device it is accessing and will call
> >   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
> >   get the device information:
> >
> >   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
> > "/soc@ffe000000/usb@210000");
> >
> >   (whether the path is a device tree path or a sysfs path is up for
> >   discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
> 
> Doesn't VFIO need to operate on an actual Linux device, rather than
> just an OF node?
> 
> Are we going to have a fixed assumption that you always want all the
> children of the node corresponding to the assigned device, or will it
> be possible to exclude some?

My assumption is that you always get all the children of the
node corresponding to the assigned device.

> > 2.  VFIO_DEVICE_GET_INFO
> >
> >    Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
> >    than adding a new flag identifying a devices as a 'platform'
> >    device.
> >
> >    This ioctl simply returns the number of regions and number of irqs.
> >
> >    The number of regions corresponds to the number of regions
> >    that can be mapped for the device-- corresponds to the regions
> > defined
> >    in "reg" and "ranges" in the device tree.
> >
> > 3.  VFIO_DEVICE_GET_REGION_INFO
> >
> >    No changes needed, except perhaps adding a new flag.  Freescale
> > has some
> >    devices with regions that must be mapped cacheable.
> 
> While I don't object to making the information available to the user
> just in case, the main thing we need here is to influence what the
> kernel does when the user tries to map it.  At least on PPC it's not up
> to userspace to select whether a mmap is cacheable.

If user space really can't do anything with the 'cacheable'
flag, do you think there is good reason to keep it?   Will it
help any decision that user space makes?  Maybe we should just
drop it.
 
> > 4. VFIO_DEVICE_GET_DEVTREE_INFO
> >
> >    The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
> >    expose device regions and interrupts, but it's not enough to know
> >    that there are X regions and Y interrupts.  User space needs to
> >    know what the resources are for-- to correlate those
> > regions/interrupts
> >    to the device tree structure that drivers use.  The device tree
> >    structure could consist of multiple nodes and it is necessary to
> >    identify the node corresponding to the region/interrupt exposed
> >    by VFIO.
> >
> >    The following information is needed:
> >       -the device tree path to the node corresponding to the
> >        region or interrupt
> >       -for a region, whether it corresponds to a "reg" or "ranges"
> >        property
> >       -there could be multiple sub-regions per "reg" or "ranges" and
> >        the sub-index within the reg/ranges is needed
> >
> >    The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
> >
> >    ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
> >
> >    struct vfio_path_info {
> >         __u32   argsz;
> >         __u32   flags;
> >    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a
> > "ranges" property */
> 
> What about distinguishing a normal interrupt from one found in an
> interrupt-map?

I'm not sure we need that.  The kernel needs to use the interrupt
map to get interrupts hooked up right, but all user space needs to
know is that there are N interrupts and possibly device tree
paths to help user space interpret which interrupt is which.

> In the case of both ranges and interrupt-maps, we'll also want to
> decide what the policy is for when to expose them directly, versus just
> using them to translate regs and interrupts of child nodes

Yes, not sure the best approach there...but guess we can cross
that bridge when we implement this.  It doesn't affect this
interface.

> >         __u32   index;          /* input: index of region or irq for
> > which we are getting info */
> >         __u32   type;           /* input: 0 - get devtree info for a
> > region
> >                                           1 - get devtree info for an
> > irq
> >                                  */
> >         __u32   start;          /* output: identifies the index
> > within the reg/ranges */
> 
> "start" is an odd name for this.  I'd rename "index" to "vfio_index"
> and this to "dt_index".
> 
> >         __u8    path[];         /* output: Full path to associated
> > device tree node */
> 
> How does the caller know what size buffer to supply for this?
> 
> >     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> >
> >       -for region index 0:
> >           flags: 0x0     // i.e. this is a "reg" property
> >           start: 0x0     // i.e. index 0x0 in "reg"
> >           path: "/soc@ffe000000/crypto@300000"
> >
> >       -for interrupt index 0:
> >           path: "/soc@ffe000000/crypto@300000/jr@1000"
> >
> >       -for interrupt index 1:
> >           path: "/soc@ffe000000/crypto@300000/jr@2000"
> 
> Where is "start" for the interrupts?

v2 of the proposal made changes that got rid of that stuff.

Stuart


^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices
  2013-07-03 22:31   ` Scott Wood
  (?)
  (?)
@ 2013-07-16 21:51   ` Yoder Stuart-B08248
  -1 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-16 21:51 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: kvm@vger.kernel.org list, Bhushan Bharat-R65777,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu


> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Wednesday, July 03, 2013 5:32 PM
> To: Yoder Stuart-B08248
> Cc: Alex Williamson; Alexander Graf; Wood Scott-B07421; Bhushan Bharat-R65777; Sethi Varun-B16395;
> virtualization@lists.linux-foundation.org; Antonios Motakis; kvm@vger.kernel.org list; kvm-
> ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices
> 
> On 07/02/2013 06:25:59 PM, Yoder Stuart-B08248 wrote:
> > The write-up below is the first draft of a proposal for how the
> > kernel can expose
> > platform devices to user space using vfio.
> >
> > In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> > allows user space to correlate regions and interrupts to the
> > corresponding
> > device tree node structure that is defined for most platform devices.
> >
> > Regards,
> > Stuart Yoder
> >
> > ------------------------------------------------------------------------------
> > VFIO for Platform Devices
> >
> > The existing infrastructure for vfio-pci is pretty close to what we
> > need:
> >    -mechanism to create a container
> >    -add groups/devices to a container
> >    -set the IOMMU model
> >    -map DMA regions
> >    -get an fd for a specific device, which allows user space to
> > determine
> >     info about device regions (e.g. registers) and interrupt info
> >    -support for mmapping device regions
> >    -mechanism to set how interrupts are signaled
> >
> > Platform devices can get complicated-- potentially with a tree
> > hierarchy
> > of nodes, and links/phandles pointing to other platform
> > devices.   The kernel doesn't expose relationships between
> > devices.  The kernel just exposes mappable register regions and
> > interrupts.
> > It's up to user space to work out relationships between devices
> > if it needs to-- this can be determined in the device tree exposed in
> > /proc/device-tree.
> >
> > I think the changes needed for vfio are around some of the device tree
> > related info that needs to be available with the device fd.
> >
> > 1.  VFIO_GROUP_GET_DEVICE_FD
> >
> >   User space has to know which device it is accessing and will call
> >   VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
> >   get the device information:
> >
> >   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
> > "/soc@ffe000000/usb@210000");
> >
> >   (whether the path is a device tree path or a sysfs path is up for
> >   discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
> 
> Doesn't VFIO need to operate on an actual Linux device, rather than
> just an OF node?
> 
> Are we going to have a fixed assumption that you always want all the
> children of the node corresponding to the assigned device, or will it
> be possible to exclude some?

My assumption is that you always get all the children of the
node corresponding to the assigned device.

> > 2.  VFIO_DEVICE_GET_INFO
> >
> >    Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
> >    than adding a new flag identifying a devices as a 'platform'
> >    device.
> >
> >    This ioctl simply returns the number of regions and number of irqs.
> >
> >    The number of regions corresponds to the number of regions
> >    that can be mapped for the device-- corresponds to the regions
> > defined
> >    in "reg" and "ranges" in the device tree.
> >
> > 3.  VFIO_DEVICE_GET_REGION_INFO
> >
> >    No changes needed, except perhaps adding a new flag.  Freescale
> > has some
> >    devices with regions that must be mapped cacheable.
> 
> While I don't object to making the information available to the user
> just in case, the main thing we need here is to influence what the
> kernel does when the user tries to map it.  At least on PPC it's not up
> to userspace to select whether a mmap is cacheable.

If user space really can't do anything with the 'cacheable'
flag, do you think there is good reason to keep it?   Will it
help any decision that user space makes?  Maybe we should just
drop it.
 
> > 4. VFIO_DEVICE_GET_DEVTREE_INFO
> >
> >    The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
> >    expose device regions and interrupts, but it's not enough to know
> >    that there are X regions and Y interrupts.  User space needs to
> >    know what the resources are for-- to correlate those
> > regions/interrupts
> >    to the device tree structure that drivers use.  The device tree
> >    structure could consist of multiple nodes and it is necessary to
> >    identify the node corresponding to the region/interrupt exposed
> >    by VFIO.
> >
> >    The following information is needed:
> >       -the device tree path to the node corresponding to the
> >        region or interrupt
> >       -for a region, whether it corresponds to a "reg" or "ranges"
> >        property
> >       -there could be multiple sub-regions per "reg" or "ranges" and
> >        the sub-index within the reg/ranges is needed
> >
> >    The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
> >
> >    ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
> >
> >    struct vfio_path_info {
> >         __u32   argsz;
> >         __u32   flags;
> >    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region is a
> > "ranges" property */
> 
> What about distinguishing a normal interrupt from one found in an
> interrupt-map?

I'm not sure we need that.  The kernel needs to use the interrupt
map to get interrupts hooked up right, but all user space needs to
know is that there are N interrupts and possibly device tree
paths to help user space interpret which interrupt is which.

> In the case of both ranges and interrupt-maps, we'll also want to
> decide what the policy is for when to expose them directly, versus just
> using them to translate regs and interrupts of child nodes

Yes, not sure the best approach there...but guess we can cross
that bridge when we implement this.  It doesn't affect this
interface.

> >         __u32   index;          /* input: index of region or irq for
> > which we are getting info */
> >         __u32   type;           /* input: 0 - get devtree info for a
> > region
> >                                           1 - get devtree info for an
> > irq
> >                                  */
> >         __u32   start;          /* output: identifies the index
> > within the reg/ranges */
> 
> "start" is an odd name for this.  I'd rename "index" to "vfio_index"
> and this to "dt_index".
> 
> >         __u8    path[];         /* output: Full path to associated
> > device tree node */
> 
> How does the caller know what size buffer to supply for this?
> 
> >     The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> >
> >       -for region index 0:
> >           flags: 0x0     // i.e. this is a "reg" property
> >           start: 0x0     // i.e. index 0x0 in "reg"
> >           path: "/soc@ffe000000/crypto@300000"
> >
> >       -for interrupt index 0:
> >           path: "/soc@ffe000000/crypto@300000/jr@1000"
> >
> >       -for interrupt index 1:
> >           path: "/soc@ffe000000/crypto@300000/jr@2000"
> 
> Where is "start" for the interrupts?

v2 of the proposal made changes that got rid of that stuff.

Stuart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices (v2)
  2013-07-03 22:53     ` Alex Williamson
@ 2013-07-16 21:57       ` Yoder Stuart-B08248
  -1 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-16 21:57 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Alexander Graf, Wood Scott-B07421, Bhushan Bharat-R65777,
	Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
	Antonios Motakis, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org, kvmarm@lists.cs.columbia.edu

KHNvcnJ5IGZvciB0aGUgZGVsYXllZCByZXNwb25zZSwgYnV0IEkndmUgYmVlbiBvbiBQVE8pDQoN
Cj4gPiAxLiAgVkZJT19HUk9VUF9HRVRfREVWSUNFX0ZEDQo+ID4NCj4gPiAgIFVzZXIgc3BhY2Ug
a25vd3MgYnkgb3V0LW9mLWJhbmQgbWVhbnMgd2hpY2ggZGV2aWNlIGl0IGlzIGFjY2Vzc2luZw0K
PiA+ICAgYW5kIHdpbGwgY2FsbCBWRklPX0dST1VQX0dFVF9ERVZJQ0VfRkQgcGFzc2luZyBhIHNw
ZWNpZmljIHN5c2ZzIHBhdGgNCj4gPiAgIHRvIGdldCB0aGUgZGV2aWNlIGluZm9ybWF0aW9uOg0K
PiA+DQo+ID4gICBmZCA9IGlvY3RsKGdyb3VwLCBWRklPX0dST1VQX0dFVF9ERVZJQ0VfRkQsDQo+
ID4gICAgICAgICAgICAgICIvc3lzL2J1cy9wbGF0Zm9ybS9kZXZpY2VzL2ZmZTIxMDAwMC51c2Ii
KSk7DQo+IA0KPiBGV0lXLCBJJ20gaW4gZmF2b3Igb2Ygd2hpY2hldmVyIHdheSB3b3JrcyBvdXQg
Y2xlYW5lciBpbiB0aGUgY29kZSBmb3INCj4gcHJlLXBlbmRpbmcgIi9zeXMvYnVzIiBvciBub3Qu
ICBJdCBzb3J0IG9mIHNlZW1zIGxpa2UgaXQncyB1bm5lY2Vzc2FyeS4NCj4gSXQncyBhbHNvIGEg
bGl0dGxlIGluY29uc2lzdGVudCB0aGF0IHRoZSByZXR1cm5lZCBwYXRoIGRvZXNuJ3QNCj4gcHJl
LXBlbmQgL3N5cyBpbiB0aGUgZXhhbXBsZXMgYmVsb3cuDQoNCk9rLiAgRm9yIHRoZSByZXR1cm5l
ZCBwYXRoIGluIHRoZSBleGFtcGxlcyBJIGhhdmUgdGhlIGFjdHVhbCBkZXZpY2UgdHJlZQ0KcGF0
aCB3aGljaCBpcyBzbGlnaHRseSBkaWZmZXJlbnQgZnJvbSB0aGUgcGF0aCBpbiAvc3lzLiAgVGhl
IGRldmljZQ0KdHJlZSBwYXRoIGlzIHdoYXQgdXNlciBzcGFjZSB3b3VsZCBuZWVkIHRvIGludGVy
cHJldCAvcHJvYy9kZXZpY2UtdHJlZS4NCg0KPiA+IDIuICBWRklPX0RFVklDRV9HRVRfSU5GTw0K
PiA+DQo+ID4gICAgVGhlIG51bWJlciBvZiByZWdpb25zIGNvcnJlc3BvbmRzIHRvIHRoZSByZWdp
b25zIGRlZmluZWQNCj4gPiAgICBpbiAicmVnIiBhbmQgInJhbmdlcyIgaW4gdGhlIGRldmljZSB0
cmVlLg0KPiA+DQo+ID4gICAgVHdvIG5ldyBmbGFncyBhcmUgYWRkZWQgdG8gc3RydWN0IHZmaW9f
ZGV2aWNlX2luZm86DQo+ID4NCj4gPiAgICAjZGVmaW5lIFZGSU9fREVWSUNFX0ZMQUdTX1BMQVRG
T1JNICgxIDw8ID8pIC8qIEEgcGxhdGZvcm0gYnVzIGRldmljZSAqLw0KPiA+ICAgICNkZWZpbmUg
VkZJT19ERVZJQ0VfRkxBR1NfREVWVFJFRSAgKDEgPDwgPykgLyogZGV2aWNlIHRyZWUgaW5mbyBh
dmFpbGFibGUgKi8NCj4gPg0KPiA+ICAgIEl0IGlzIHBvc3NpYmxlIHRoYXQgdGhlcmUgY291bGQg
YmUgcGxhdGZvcm0gYnVzIGRldmljZXMNCj4gPiAgICB0aGF0IGFyZSBub3QgaW4gdGhlIGRldmlj
ZSB0cmVlLCBzbyB3ZSB1c2UgMiBmbGFncyB0bw0KPiA+ICAgIGFsbG93IGZvciB0aGF0Lg0KPiA+
DQo+ID4gICAgSWYganVzdCBWRklPX0RFVklDRV9GTEFHU19QTEFURk9STSBpcyBzZXQsIGl0IG1l
YW5zDQo+ID4gICAgdGhhdCB0aGVyZSBhcmUgcmVnaW9ucyBhbmQgSVJRcyBidXQgbm8gZGV2aWNl
IHRyZWUgaW5mbw0KPiA+ICAgIGF2YWlsYWJsZS4NCj4gPg0KPiA+ICAgIElmIGp1c3QgVkZJT19E
RVZJQ0VfRkxBR1NfREVWVFJFRSBpcyBzZXQsIGl0IG1lYW5zDQo+ID4gICAgdGhlcmUgaXMgZGV2
aWNlIHRyZWUgaW5mbyBhdmFpbGFibGUuDQo+IA0KPiBCdXQgaXQgd291bGQgYmUgaW52YWxpZCB0
byBvbmx5IGhhdmUgREVWVFJFRSB3L28gUExBVEZPUk0gZm9yIG5vdywNCj4gcmlnaHQ/DQoNClJp
Z2h0LiAgVGhlIHdheSBJIHN0YXRlZCBpdCBpcyBpbmNvcnJlY3QuIERFVlRSRUUgd291bGQgbmV2
ZXINCmJlIHNldCBieSBpdHNlbGYuDQoNCj4gPiAzLiBWRklPX0RFVklDRV9HRVRfUkVHSU9OX0lO
Rk8NCj4gPg0KPiA+ICAgIEZvciBwbGF0Zm9ybSBkZXZpY2VzIHdpdGggbXVsdGlwbGUgcmVnaW9u
cywgaW5mb3JtYXRpb24NCj4gPiAgICBpcyBuZWVkZWQgdG8gY29ycmVsYXRlIHRoZSByZWdpb25z
IHdpdGggdGhlIGRldmljZQ0KPiA+ICAgIHRyZWUgc3RydWN0dXJlIHRoYXQgZHJpdmVycyB1c2Ug
dG8gZGV0ZXJtaW5lIHRoZSBtZWFuaW5nDQo+ID4gICAgb2YgZGV2aWNlIHJlc291cmNlcy4NCj4g
Pg0KPiA+ICAgIFRoZSBWRklPX0RFVklDRV9HRVRfUkVHSU9OX0lORk8gaXMgZXh0ZW5kZWQgdG8g
cHJvdmlkZQ0KPiA+ICAgIGRldmljZSB0cmVlIGluZm9ybWF0aW9uLg0KPiA+DQo+ID4gICAgVGhl
IGZvbGxvd2luZyBpbmZvcm1hdGlvbiBpcyBuZWVkZWQ6DQo+ID4gICAgICAgLXRoZSBkZXZpY2Ug
dHJlZSBwYXRoIHRvIHRoZSBub2RlIGNvcnJlc3BvbmRpbmcgdG8gdGhlDQo+ID4gICAgICAgIHJl
Z2lvbg0KPiA+ICAgICAgIC13aGV0aGVyIGl0IGNvcnJlc3BvbmRzIHRvIGEgInJlZyIgb3IgInJh
bmdlcyIgcHJvcGVydHkNCj4gPiAgICAgICAtdGhlcmUgY291bGQgYmUgbXVsdGlwbGUgc3ViLXJl
Z2lvbnMgcGVyICJyZWciIG9yICJyYW5nZXMiIGFuZA0KPiA+ICAgICAgICB0aGUgc3ViLWluZGV4
IHdpdGhpbiB0aGUgcmVnL3JhbmdlcyBpcyBuZWVkZWQNCj4gPg0KPiA+ICAgIFRoZXJlIGFyZSA1
IG5ldyBmbGFncyBhZGRlZCB0byB2ZmlvX3JlZ2lvbl9pbmZvIDoNCj4gPg0KPiA+ICAgIHN0cnVj
dCB2ZmlvX3JlZ2lvbl9pbmZvIHsNCj4gPiAgICAgICAgIF9fdTMyICAgYXJnc3o7DQo+ID4gICAg
ICAgICBfX3UzMiAgIGZsYWdzOw0KPiA+ICAgICNkZWZpbmUgVkZJT19SRUdJT05fSU5GT19GTEFH
X0NBQ0hFQUJMRSAoMSA8PCA/KQ0KPiA+ICAgICNkZWZpbmUgVkZJT19ERVZUUkVFX1JFR0lPTl9J
TkZPX0ZMQUdfUkVHICgxIDw8ID8pDQo+ID4gICAgI2RlZmluZSBWRklPX0RFVlRSRUVfUkVHSU9O
X0lORk9fRkxBR19SQU5HRSAoMSA8PCA/KQ0KPiA+ICAgICNkZWZpbmUgVkZJT19ERVZUUkVFX1JF
R0lPTl9JTkZPX0ZMQUdfSU5ERVggKDEgPDwgPykNCj4gPiAgICAjZGVmaW5lIFZGSU9fREVWVFJF
RV9SRUdJT05fSU5GT19GTEFHX1BBVEggKDEgPDwgPykNCj4gPiAgICAgICAgIF9fdTMyICAgaW5k
ZXg7ICAgICAgICAgIC8qIFJlZ2lvbiBpbmRleCAqLw0KPiA+ICAgICAgICAgX191MzIgICByZXN2
OyAgICAgICAgICAgLyogUmVzZXJ2ZWQgZm9yIGFsaWdubWVudCAqLw0KPiA+ICAgICAgICAgX191
NjQgICBzaXplOyAgICAgICAgICAgLyogUmVnaW9uIHNpemUgKGJ5dGVzKSAqLw0KPiA+ICAgICAg
ICAgX191NjQgICBvZmZzZXQ7ICAgICAgICAgLyogUmVnaW9uIG9mZnNldCBmcm9tIHN0YXJ0IG9m
IGRldmljZSBmZCAqLw0KPiA+ICAgIH07DQo+ID4NCj4gPiAgICBWRklPX1JFR0lPTl9JTkZPX0ZM
QUdfQ0FDSEVBQkxFDQo+ID4gICAgICAgIC1pZiBzZXQgaW5kaWNhdGVzIHRoYXQgdGhlIHJlZ2lv
biBtdXN0IGJlIG1hcHBlZCBhcyBjYWNoZWFibGUNCj4gPg0KPiA+ICAgIFZGSU9fREVWVFJFRV9S
RUdJT05fSU5GT19GTEFHX1JFRw0KPiA+ICAgICAgICAtaWYgc2V0IGluZGljYXRlcyB0aGF0IHRo
ZSByZWdpb24gY29ycmVzcG9uZHMgdG8gYSAicmVnIiBwcm9wZXJ0eQ0KPiA+ICAgICAgICAgaW4g
dGhlIGRldmljZSB0cmVlIHJlcHJlc2VudGF0aW9uIG9mIHRoZSBkZXZpY2UNCj4gPg0KPiA+ICAg
IFZGSU9fREVWVFJFRV9SRUdJT05fSU5GT19GTEFHX1JBTkdFDQo+ID4gICAgICAgIC1pZiBzZXQg
aW5kaWNhdGVzIHRoYXQgdGhlIHJlZ2lvbiBjb3JyZXNwb25kcyB0byBhICJyYW5nZXMiIHByb3Bl
cnR5DQo+ID4gICAgICAgICBpbiB0aGUgZGV2aWNlIHRyZWUgcmVwcmVzZW50YXRpb24gb2YgdGhl
IGRldmljZQ0KPiA+DQo+ID4gICAgVkZJT19ERVZUUkVFX1JFR0lPTl9JTkZPX0ZMQUdfSU5ERVgN
Cj4gPiAgICAgICAgLWlmIHNldCBpbmRpY2F0ZXMgdGhhdCB0aGVyZSBpcyBhIGR3b3JkIGFsaWdu
ZWQgc3RydWN0DQo+ID4gICAgICAgICBzdHJ1Y3QgdmZpb19kZXZ0cmVlX3JlZ2lvbl9pbmZvX2lu
ZGV4IGFwcGVuZGVkIHRvIHRoZQ0KPiA+ICAgICAgICAgZW5kIG9mIHZmaW9fcmVnaW9uX2luZm86
DQo+ID4NCj4gPiAgICAgICAgIHN0cnVjdCB2ZmlvX2RldnRyZWVfcmVnaW9uX2luZm9faW5kZXgN
Cj4gPiAgICAgICAgIHsNCj4gPiAJICAgICAgdTMyIGluZGV4Ow0KPiA+ICAgICAgICAgfQ0KPiA+
DQo+ID4gICAgICAgICBBIHJlZyBvciByYW5nZXMgcHJvcGVydHkgbWF5IGhhdmUgbXVsdGlwbGUg
cmVnc2lvbi4gIFRoZSBpbmRleA0KPiA+ICAgICAgICAgc3BlY2lmaWVzIHRoZSBpbmRleCB3aXRo
aW4gdGhlICJyZWciIG9yICJyYW5nZXMiDQo+ID4gICAgICAgICB0aGF0IHRoaXMgcmVnaW9uIGNv
cnJlc3BvbmRzIHRvLg0KPiA+DQo+ID4gICAgVkZJT19ERVZUUkVFX1JFR0lPTl9JTkZPX0ZMQUdf
UEFUSA0KPiA+ICAgICAgICAtaWYgc2V0IGluZGljYXRlcyB0aGF0IHRoZXJlIGlzIGEgZHdvcmQg
YWxpZ25lZCBzdHJ1Y3QNCj4gPiAgICAgICAgIHN0cnVjdCB2ZmlvX2RldnRyZWVfaW5mb19wYXRo
IGFwcGVuZGVkIHRvIHRoZQ0KPiA+ICAgICAgICAgZW5kIG9mIHZmaW9fcmVnaW9uX2luZm86DQo+
ID4NCj4gPiAgICAgICAgIHN0cnVjdCB2ZmlvX2RldnRyZWVfaW5mb19wYXRoDQo+ID4gICAgICAg
ICB7DQo+ID4gICAgICAgICAgICAgdTMyIGxlbjsNCj4gPiAgICAgICAgICAgICB1OCBwYXRoW107
DQo+ID4gICAgICAgICB9DQo+ID4NCj4gPiAgICAgICAgIFRoZSBwYXRoIGlzIHRoZSBmdWxsIHBh
dGggdG8gdGhlIGNvcnJlc3BvbmRpbmcgZGV2aWNlDQo+ID4gICAgICAgICB0cmVlIG5vZGUuICBU
aGUgbGVuIGZpZWxkIHNwZWNpZmllcyB0aGUgbGVuZ3RoIG9mIHRoZQ0KPiA+ICAgICAgICAgcGF0
aCBzdHJpbmcuDQo+ID4NCj4gPiAgICBJZiBtdWx0aXBsZSBmbGFncyBhcmUgc2V0IHRoYXQgaW5k
aWNhdGUgdGhhdCB0aGVyZSBpcw0KPiA+ICAgIGFuIGFwcGVuZGVkIHN0cnVjdCwgdGhlIG9yZGVy
IG9mIHRoZSBmbGFncyBpbmRpY2F0ZXMNCj4gPiAgICB0aGUgb3JkZXIgb2YgdGhlIHN0cnVjdHMu
DQo+ID4NCj4gPiAgICBhcmdzeiBpcyBzZXQgYnkgdGhlIGtlcm5lbCBzcGVjaWZ5aW5nIHRoZSB0
b3RhbCBzaXplIG9mDQo+ID4gICAgc3RydWN0IHZmaW9fcmVnaW9uX2luZm8gYW5kIGFsbCBhcHBl
bmRlZCBzdHJ1Y3RzLg0KPiA+DQo+ID4gICAgU3VnZ2VzdGVkIHVzYWdlOg0KPiA+ICAgICAgIC1j
YWxsIFZGSU9fREVWSUNFX0dFVF9SRUdJT05fSU5GTyB3aXRoIGFyZ3N6ID0NCj4gPiAgICAgICAg
c2l6ZW9mKHN0cnVjdCB2ZmlvX3JlZ2lvbl9pbmZvKQ0KPiA+ICAgICAgIC1yZWFsbG9jIHRoZSBi
dWZmZXINCj4gPiAgICAgICAtY2FsbCBWRklPX0RFVklDRV9HRVRfUkVHSU9OX0lORk8gYWdhaW4s
IGFuZCB0aGUgYXBwZW5kZWQNCj4gPiAgICAgICAgc3RydWN0cyB3aWxsIGJlIHJldHVybmVkDQo+
ID4NCj4gPiA0LiAgVkZJT19ERVZJQ0VfR0VUX0lSUV9JTkZPDQo+ID4NCj4gPiAgICBGb3IgcGxh
dGZvcm0gZGV2aWNlcyB3aXRoIG11bHRpcGxlIGludGVycnVwdHMgdGhhdA0KPiA+ICAgIGNvcnJl
c3BvbmQgdG8gZGlmZmVyZW50IHN1Ym5vZGVzIGluIHRoZSBkZXZpY2UgdHJlZSwNCj4gPiAgICBp
bmZvcm1hdGlvbiBpcyBuZWVkZWQgdG8gY29ycmVsYXRlIHRoZSBpbnRlcnJ1cHRzDQo+ID4gICAg
dG8gdGhlIHRoZSBkZXZpY2UgdHJlZSBzdHJ1Y3R1cmUuDQo+ID4NCj4gPiAgICBUaGUgVkZJT19E
RVZJQ0VfR0VUX1JFR0lPTl9JTkZPIGlzIGV4dGVuZGVkIHRvIHByb3ZpZGUNCj4gPiAgICBkZXZp
Y2UgdHJlZSBpbmZvcm1hdGlvbi4NCj4gPg0KPiA+ICAgIDEgbmV3IGZsYWcgaXMgYWRkZWQgdG8g
dmZpb19pcnFfaW5mbyA6DQo+ID4NCj4gPiAgICBzdHJ1Y3QgdmZpb19pcnFfaW5mbyB7DQo+ID4g
ICAgICAgICBfX3UzMiAgIGFyZ3N6Ow0KPiA+ICAgICAgICAgX191MzIgICBmbGFnczsNCj4gPiAg
ICAjZGVmaW5lIFZGSU9fREVWVFJFRV9JUlFfSU5GT19GTEFHX1BBVEggKDEgPDwgPykNCj4gPiAg
ICAgICAgIF9fdTMyICAgaW5kZXg7ICAgIC8qIElSUSBpbmRleCAqLw0KPiA+ICAgICAgICAgX191
MzIgICBjb3VudDsgICAgLyogTnVtYmVyIG9mIElSUXMgd2l0aGluIHRoaXMgaW5kZXggKi8NCj4g
PiAgICAgfTsNCj4gPg0KPiA+ICAgIFZGSU9fREVWVFJFRV9JUlFfSU5GT19GTEFHX1BBVEgNCj4g
PiAgICAgICAgLWlmIHNldCBpbmRpY2F0ZXMgdGhhdCB0aGVyZSBpcyBhIGR3b3JkIGFsaWduZWQg
c3RydWN0DQo+ID4gICAgICAgICBzdHJ1Y3QgdmZpb19kZXZ0cmVlX2luZm9fcGF0aCBhcHBlbmRl
ZCB0byB0aGUNCj4gPiAgICAgICAgIGVuZCBvZiB2ZmlvX2lycV9pbmZvIDoNCj4gPg0KPiA+ICAg
ICAgICAgc3RydWN0IHZmaW9fZGV2dHJlZV9pbmZvX3BhdGgNCj4gPiAgICAgICAgIHsNCj4gPiAg
ICAgICAgICAgICB1MzIgbGVuOw0KPiA+ICAgICAgICAgICAgIHU4IHBhdGhbXTsNCj4gPiAgICAg
ICAgIH0NCj4gPg0KPiA+ICAgICAgICAgVGhlIHBhdGggaXMgdGhlIGZ1bGwgcGF0aCB0byB0aGUg
Y29ycmVzcG9uZGluZyBkZXZpY2UNCj4gPiAgICAgICAgIHRyZWUgbm9kZS4gIFRoZSBsZW4gZmll
bGQgc3BlY2lmaWVzIHRoZSBsZW5ndGggb2YgdGhlDQo+ID4gICAgICAgICBwYXRoIHN0cmluZy4N
Cj4gPg0KPiA+ICAgIGFyZ3N6IGlzIHNldCBieSB0aGUga2VybmVsIHNwZWNpZnlpbmcgdGhlIHRv
dGFsIHNpemUgb2YNCj4gPiAgICBzdHJ1Y3QgdmZpb19yZWdpb25faW5mbyBhbmQgYWxsIGFwcGVu
ZGVkIHN0cnVjdHMuDQo+ID4NCj4gPiA1LiAgRVhBTVBMRSAxDQo+ID4NCj4gPiAgICAgRXhhbXBs
ZSwgRnJlZXNjYWxlIFNBVEEgY29udHJvbGxlcjoNCj4gPg0KPiA+ICAgICAgc2F0YUAyMjAwMDAg
ew0KPiA+ICAgICAgICAgIGNvbXBhdGlibGUgPSAiZnNsLHAyMDQxLXNhdGEiLCAiZnNsLHBxLXNh
dGEtdjIiOw0KPiA+ICAgICAgICAgIHJlZyA9IDwweDIyMDAwMCAweDEwMDA+Ow0KPiA+ICAgICAg
ICAgIGludGVycnVwdHMgPSA8MHg0NCAweDIgMHgwIDB4MD47DQo+ID4gICAgICB9Ow0KPiA+DQo+
ID4gICAgIHJlcXVlc3QgdG8gZ2V0IGRldmljZSBGRCB3b3VsZCBsb29rIGxpa2U6DQo+ID4gICAg
ICAgZmQgPSBpb2N0bChncm91cCwgVkZJT19HUk9VUF9HRVRfREVWSUNFX0ZELCAiL3N5cy9idXMv
cGxhdGZvcm0vZGV2aWNlcy9mZmUyMjAwMDAuc2F0YSIpOw0KPiA+DQo+ID4gICAgIFRoZSBWRklP
X0RFVklDRV9HRVRfSU5GTyBpb2N0bCB3b3VsZCByZXR1cm46DQo+ID4gICAgICAgLTEgcmVnaW9u
DQo+ID4gICAgICAgLTEgaW50ZXJydXB0cw0KPiA+DQo+ID4gICAgIFRoZSBWRklPX0RFVklDRV9H
RVRfUkVHSU9OX0lORk8gaW9jdGwgd291bGQgcmV0dXJuOg0KPiA+ICAgICAgIC1mb3IgaW5kZXgg
MDoNCj4gPiAgICAgICAgICAgIG9mZnNldD0wLCBzaXplPTB4MTAwMDAgLS0gYWxsb3dzIG1tYXAg
b2YgcGh5c2ljYWwgMHhmZmUyMjAwMDANCj4gPiAgICAgICAgICAgIGZsYWdzID0gVkZJT19ERVZU
UkVFX1JFR0lPTl9JTkZPX0ZMQUdfUkVHIHwNCj4gPiAgICAgICAgICAgICAgICAgICAgVkZJT19E
RVZUUkVFX1JFR0lPTl9JTkZPX0ZMQUdfUEFUSA0KPiA+ICAgICAgICAgICAgdmZpb19kZXZ0cmVl
X2luZm9fcGF0aA0KPiA+ICAgICAgICAgICAgICAgbGVuID0gMjYNCj4gPiAgICAgICAgICAgICAg
IHBhdGggPSAiL3NvY0BmZmUwMDAwMDAvc2F0YUAyMjAwMDAiDQo+ID4NCj4gPiAgICAgVGhlIFZG
SU9fREVWSUNFX0dFVF9JUlFfSU5GTyBpb2N0bCB3b3VsZCByZXR1cm46DQo+ID4gICAgICAgLWZv
ciBpbmRleCAwOg0KPiA+ICAgICAgICAgICBmbGFncyA9IFZGSU9fSVJRX0lORk9fRVZFTlRGRCB8
DQo+ID4gICAgICAgICAgICAgICAgICAgVkZJT19JUlFfSU5GT19NQVNLQUJMRSB8DQo+ID4gICAg
ICAgICAgICAgICAgICAgVkZJT19ERVZUUkVFX0lSUV9JTkZPX0ZMQUdfUEFUSA0KPiA+ICAgICAg
ICAgICB2ZmlvX2RldnRyZWVfaW5mb19wYXRoDQo+ID4gICAgICAgICAgICAgICBsZW4gPSAyNg0K
PiA+ICAgICAgICAgICAgICAgcGF0aCA9ICIvc29jQGZmZTAwMDAwMC9zYXRhQDIyMDAwMCINCj4g
Pg0KPiA+IDYuICBFWEFNUExFIDINCj4gPg0KPiA+ICAgICBFeGFtcGxlLCBGcmVlc2NhbGUgRE1B
IGVuZ2luZSAobW9kaWZpZWQgdG8gaWxsdXN0cmF0ZSk6DQo+ID4NCj4gPiAgICAgZG1hQDEwMTMw
MCB7DQo+ID4gICAgICAgIGNlbGwtaW5kZXggPSA8MHgxPjsNCj4gPiAgICAgICAgcmFuZ2VzID0g
PDB4MCAweDEwMTEwMCAweDIwMD47DQo+ID4gICAgICAgIHJlZyA9IDwweDEwMTMwMCAweDQ+Ow0K
PiA+ICAgICAgICBjb21wYXRpYmxlID0gImZzbCxlbG9wbHVzLWRtYSI7DQo+ID4gICAgICAgICNz
aXplLWNlbGxzID0gPDB4MT47DQo+ID4gICAgICAgICNhZGRyZXNzLWNlbGxzID0gPDB4MT47DQo+
ID4gICAgICAgIGZzbCxsaW9kbiA9IDwweGM2PjsNCj4gPg0KPiA+ICAgICAgICBkbWEtY2hhbm5l
bEAxODAgew0KPiA+ICAgICAgICAgICBpbnRlcnJ1cHRzID0gPDB4MjMgMHgyIDB4MCAweDA+Ow0K
PiA+ICAgICAgICAgICBjZWxsLWluZGV4ID0gPDB4Mz47DQo+ID4gICAgICAgICAgIHJlZyA9IDww
eDE4MCAweDgwPjsNCj4gPiAgICAgICAgICAgY29tcGF0aWJsZSA9ICJmc2wsZWxvcGx1cy1kbWEt
Y2hhbm5lbCI7DQo+ID4gICAgICAgIH07DQo+ID4NCj4gPiAgICAgICAgZG1hLWNoYW5uZWxAMTAw
IHsNCj4gPiAgICAgICAgICAgaW50ZXJydXB0cyA9IDwweDIyIDB4MiAweDAgMHgwPjsNCj4gPiAg
ICAgICAgICAgY2VsbC1pbmRleCA9IDwweDI+Ow0KPiA+ICAgICAgICAgICByZWcgPSA8MHgxMDAg
MHg4MD47DQo+ID4gICAgICAgICAgIGNvbXBhdGlibGUgPSAiZnNsLGVsb3BsdXMtZG1hLWNoYW5u
ZWwiOw0KPiA+ICAgICAgICB9Ow0KPiA+DQo+ID4gICAgIH07DQo+ID4NCj4gPiAgICAgcmVxdWVz
dCB0byBnZXQgZGV2aWNlIEZEIHdvdWxkIGxvb2sgbGlrZToNCj4gPiAgICAgICBmZCA9IGlvY3Rs
KGdyb3VwLCBWRklPX0dST1VQX0dFVF9ERVZJQ0VfRkQsICIvc3lzL2J1cy9wbGF0Zm9ybS9kZXZp
Y2VzL2ZmZTEwMTMwMC5kbWEiKTsNCj4gPg0KPiA+ICAgICBUaGUgVkZJT19ERVZJQ0VfR0VUX0lO
Rk8gaW9jdGwgd291bGQgcmV0dXJuOg0KPiA+ICAgICAgIC0yIHJlZ2lvbnMNCj4gPiAgICAgICAt
MiBpbnRlcnJ1cHRzDQo+ID4NCj4gPiAgICAgVGhlIFZGSU9fREVWSUNFX0dFVF9SRUdJT05fSU5G
TyBpb2N0bCB3b3VsZCByZXR1cm46DQo+ID4gICAgICAgLWZvciBpbmRleCAwOg0KPiA+ICAgICAg
ICAgICAgb2Zmc2V0PTB4MTAwLCBzaXplPTB4MjAwIC0tIGFsbG93cyBtbWFwIG9mIHBoeXNpY2Fs
IDB4ZmZlMTAxMTAwDQo+ID4gICAgICAgICAgICBmbGFncyA9IFZGSU9fREVWVFJFRV9SRUdJT05f
SU5GT19GTEFHX1JBTkdFUyB8DQo+ID4gICAgICAgICAgICAgICAgICAgIFZGSU9fREVWVFJFRV9S
RUdJT05fSU5GT19GTEFHX1BBVEgNCj4gPiAgICAgICAgICAgIHZmaW9fZGV2dHJlZV9pbmZvX3Bh
dGgNCj4gPiAgICAgICAgICAgICAgIGxlbiA9IDI1DQo+ID4gICAgICAgICAgICAgICBwYXRoID0g
Ii9zb2NAZmZlMDAwMDAwL2RtYUAxMDEzMDAiDQo+ID4NCj4gPiAgICAgICAtZm9yIGluZGV4IDE6
DQo+ID4gICAgICAgICAgICBvZmZzZXQ9MHgzMDAsIHNpemU9MHg0IC0tIGFsbG93cyBtbWFwIG9m
IHBoeXNpY2FsIDB4ZmZlMTAxMzAwDQo+ID4gICAgICAgICAgICBmbGFncyA9IFZGSU9fREVWVFJF
RV9SRUdJT05fSU5GT19GTEFHX1JFRyB8DQo+ID4gICAgICAgICAgICAgICAgICAgIFZGSU9fREVW
VFJFRV9SRUdJT05fSU5GT19GTEFHX1BBVEgNCj4gPiAgICAgICAgICAgIHZmaW9fZGV2dHJlZV9p
bmZvX3BhdGgNCj4gPiAgICAgICAgICAgICAgIGxlbiA9IDI1DQo+ID4gICAgICAgICAgICAgICBw
YXRoID0gIi9zb2NAZmZlMDAwMDAwL2RtYUAxMDEzMDAiDQo+ID4NCj4gPiAgICAgVGhlIFZGSU9f
REVWSUNFX0dFVF9JUlFfSU5GTyBpb2N0bCB3b3VsZCByZXR1cm46DQo+ID4gICAgICAgLWZvciBp
bmRleCAwOg0KPiA+ICAgICAgICAgICBmbGFncyA9IFZGSU9fSVJRX0lORk9fRVZFTlRGRCB8DQo+
ID4gICAgICAgICAgICAgICAgICAgVkZJT19JUlFfSU5GT19NQVNLQUJMRSB8DQo+ID4gICAgICAg
ICAgICAgICAgICAgVkZJT19ERVZUUkVFX0lSUV9JTkZPX0ZMQUdfUEFUSA0KPiA+ICAgICAgICAg
ICB2ZmlvX2RldnRyZWVfaW5mb19wYXRoDQo+ID4gICAgICAgICAgICAgICBsZW4gPSA0MQ0KPiA+
ICAgICAgICAgICAgICAgcGF0aCA9ICIvc29jQGZmZTAwMDAwMC9kbWFAMTAxMzAwL2RtYS1jaGFu
bmVsQDE4MCINCj4gPg0KPiA+ICAgICAgIC1mb3IgaW5kZXggMDoNCj4gPiAgICAgICAgICAgZmxh
Z3MgPSBWRklPX0lSUV9JTkZPX0VWRU5URkQgfA0KPiA+ICAgICAgICAgICAgICAgICAgIFZGSU9f
SVJRX0lORk9fTUFTS0FCTEUgfA0KPiA+ICAgICAgICAgICAgICAgICAgIFZGSU9fREVWVFJFRV9J
UlFfSU5GT19GTEFHX1BBVEgNCj4gPiAgICAgICAgICAgdmZpb19kZXZ0cmVlX2luZm9fcGF0aA0K
PiA+ICAgICAgICAgICAgICAgbGVuID0gNDENCj4gPiAgICAgICAgICAgICAgIHBhdGggPSAiL3Nv
Y0BmZmUwMDAwMDAvZG1hQDEwMTMwMC9kbWEtY2hhbm5lbEAxMDAiDQo+IA0KPiANCj4gU2VlbXMg
bGlrZSBpdCBzaG91bGQgd29yay4gIE15IG9ubHkgQVBJIGNvbmNlcm4gd2l0aCB0aGlzIG1vZGVs
IG9mDQo+IGFwcGVuZGluZyBzdHJ1Y3RzIGlzIHRoYXQgYSB1c2VyIG5lZWRzIHRvIGtub3cgdGhl
IHNpemUgb2YgZWFjaCBzdHJ1Y3QNCj4gZXZlbiBpZiB0aGV5IGRvbid0IG90aGVyd2lzZSBjYXJl
IGFib3V0IGl0IGluIG9yZGVyIHRvIHN0ZXAgb3ZlciBpdC4gIEluDQo+IHNvbWUgY2FzZXMsIGxp
a2UgdGhlIHBhdGgsIHRoZSBzaXplIGlzIHZhcmlhYmxlIGFuZCB0aGUgdXNlciBuZWVkcyB0bw0K
PiBsb29rIGludG8gaXQuICBUaGUgc3RydWN0cyBtdXN0IGFsc28gYmUgc3RyaWN0bHkgb3JkZXJl
ZCBiYXNlZCBvbiB0aGUNCj4gb3JkZXIgb2YgdGhlIGZsYWdzIG9yIGFsbCBob3BlIGlzIGxvc3Qu
ICBJZiB3ZSBhc3NpZ24gZmxhZ3Mgc2VxdWVudGlhbGx5DQo+IHRoZXJlIHNob3VsZCBiZSBubyBj
YXNlIHdoZXJlIHRoZSB1c2VyIG5lZWRzIHRvIHN0ZXAgb3ZlciBzb21ldGhpbmcgdGhhdA0KPiB0
aGV5IGRvZXNuJ3Qga25vdyB0aGUgc2l6ZSBvZi4gIEV2ZW4gc28sIHdlIG1heSBzdGlsbCBiZSBh
aGVhZCB0byBkZWZpbmUNCj4gdGhlIGZpcnN0IHdvcmQgb2YgZWFjaCBzdHJ1Y3QgYXMgdGhlIGxl
bmd0aCAoSSdtIGd1ZXNzaW5nIGEgYnl0ZSBtaWdodA0KPiBiZSB0b28gbGltaXRpbmcpLiAgSXQg
d291bGQgc3VyZSBtYWtlIHdhbGtpbmcgaXQgZWFzaWVyLiAgDQoNClRoZSAncGF0aCcgc3RydWN0
cyBhbHJlYWR5IHN0YXJ0IHdpdGggdGhlIGxlbmd0aCwgc28gdGhlIG9ubHkgY2hhbmdlIA0Kd291
bGQgYmUgdG8gYWRkIGEgbGVuZ3RoIHRvIHRoZSB2ZmlvX2RldnRyZWVfcmVnaW9uX2luZm9faW5k
ZXgNCnN0cnVjdCByaWdodD8gICBJIGd1ZXNzIHdpbGwgbWFrZSBpdCBhIHUzMi4NCg0KU3R1YXJ0
DQo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices (v2)
  2013-07-03 22:53     ` Alex Williamson
                       ` (2 preceding siblings ...)
  (?)
@ 2013-07-16 21:57     ` Yoder Stuart-B08248
  -1 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-16 21:57 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
	Bhushan Bharat-R65777, kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
	Antonios Motakis, kvmarm@lists.cs.columbia.edu

(sorry for the delayed response, but I've been on PTO)

> > 1.  VFIO_GROUP_GET_DEVICE_FD
> >
> >   User space knows by out-of-band means which device it is accessing
> >   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
> >   to get the device information:
> >
> >   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
> >              "/sys/bus/platform/devices/ffe210000.usb"));
> 
> FWIW, I'm in favor of whichever way works out cleaner in the code for
> pre-pending "/sys/bus" or not.  It sort of seems like it's unnecessary.
> It's also a little inconsistent that the returned path doesn't
> pre-pend /sys in the examples below.

Ok.  For the returned path in the examples I have the actual device tree
path which is slightly different from the path in /sys.  The device
tree path is what user space would need to interpret /proc/device-tree.

> > 2.  VFIO_DEVICE_GET_INFO
> >
> >    The number of regions corresponds to the regions defined
> >    in "reg" and "ranges" in the device tree.
> >
> >    Two new flags are added to struct vfio_device_info:
> >
> >    #define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
> >    #define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info available */
> >
> >    It is possible that there could be platform bus devices
> >    that are not in the device tree, so we use 2 flags to
> >    allow for that.
> >
> >    If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
> >    that there are regions and IRQs but no device tree info
> >    available.
> >
> >    If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
> >    there is device tree info available.
> 
> But it would be invalid to only have DEVTREE w/o PLATFORM for now,
> right?

Right.  The way I stated it is incorrect. DEVTREE would never
be set by itself.

> > 3. VFIO_DEVICE_GET_REGION_INFO
> >
> >    For platform devices with multiple regions, information
> >    is needed to correlate the regions with the device
> >    tree structure that drivers use to determine the meaning
> >    of device resources.
> >
> >    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
> >    device tree information.
> >
> >    The following information is needed:
> >       -the device tree path to the node corresponding to the
> >        region
> >       -whether it corresponds to a "reg" or "ranges" property
> >       -there could be multiple sub-regions per "reg" or "ranges" and
> >        the sub-index within the reg/ranges is needed
> >
> >    There are 5 new flags added to vfio_region_info :
> >
> >    struct vfio_region_info {
> >         __u32   argsz;
> >         __u32   flags;
> >    #define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
> >    #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
> >    #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
> >    #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
> >    #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
> >         __u32   index;          /* Region index */
> >         __u32   resv;           /* Reserved for alignment */
> >         __u64   size;           /* Region size (bytes) */
> >         __u64   offset;         /* Region offset from start of device fd */
> >    };
> >
> >    VFIO_REGION_INFO_FLAG_CACHEABLE
> >        -if set indicates that the region must be mapped as cacheable
> >
> >    VFIO_DEVTREE_REGION_INFO_FLAG_REG
> >        -if set indicates that the region corresponds to a "reg" property
> >         in the device tree representation of the device
> >
> >    VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
> >        -if set indicates that the region corresponds to a "ranges" property
> >         in the device tree representation of the device
> >
> >    VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
> >        -if set indicates that there is a dword aligned struct
> >         struct vfio_devtree_region_info_index appended to the
> >         end of vfio_region_info:
> >
> >         struct vfio_devtree_region_info_index
> >         {
> > 	      u32 index;
> >         }
> >
> >         A reg or ranges property may have multiple regsion.  The index
> >         specifies the index within the "reg" or "ranges"
> >         that this region corresponds to.
> >
> >    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
> >        -if set indicates that there is a dword aligned struct
> >         struct vfio_devtree_info_path appended to the
> >         end of vfio_region_info:
> >
> >         struct vfio_devtree_info_path
> >         {
> >             u32 len;
> >             u8 path[];
> >         }
> >
> >         The path is the full path to the corresponding device
> >         tree node.  The len field specifies the length of the
> >         path string.
> >
> >    If multiple flags are set that indicate that there is
> >    an appended struct, the order of the flags indicates
> >    the order of the structs.
> >
> >    argsz is set by the kernel specifying the total size of
> >    struct vfio_region_info and all appended structs.
> >
> >    Suggested usage:
> >       -call VFIO_DEVICE_GET_REGION_INFO with argsz =
> >        sizeof(struct vfio_region_info)
> >       -realloc the buffer
> >       -call VFIO_DEVICE_GET_REGION_INFO again, and the appended
> >        structs will be returned
> >
> > 4.  VFIO_DEVICE_GET_IRQ_INFO
> >
> >    For platform devices with multiple interrupts that
> >    correspond to different subnodes in the device tree,
> >    information is needed to correlate the interrupts
> >    to the the device tree structure.
> >
> >    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
> >    device tree information.
> >
> >    1 new flag is added to vfio_irq_info :
> >
> >    struct vfio_irq_info {
> >         __u32   argsz;
> >         __u32   flags;
> >    #define VFIO_DEVTREE_IRQ_INFO_FLAG_PATH (1 << ?)
> >         __u32   index;    /* IRQ index */
> >         __u32   count;    /* Number of IRQs within this index */
> >     };
> >
> >    VFIO_DEVTREE_IRQ_INFO_FLAG_PATH
> >        -if set indicates that there is a dword aligned struct
> >         struct vfio_devtree_info_path appended to the
> >         end of vfio_irq_info :
> >
> >         struct vfio_devtree_info_path
> >         {
> >             u32 len;
> >             u8 path[];
> >         }
> >
> >         The path is the full path to the corresponding device
> >         tree node.  The len field specifies the length of the
> >         path string.
> >
> >    argsz is set by the kernel specifying the total size of
> >    struct vfio_region_info and all appended structs.
> >
> > 5.  EXAMPLE 1
> >
> >     Example, Freescale SATA controller:
> >
> >      sata@220000 {
> >          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
> >          reg = <0x220000 0x1000>;
> >          interrupts = <0x44 0x2 0x0 0x0>;
> >      };
> >
> >     request to get device FD would look like:
> >       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe220000.sata");
> >
> >     The VFIO_DEVICE_GET_INFO ioctl would return:
> >       -1 region
> >       -1 interrupts
> >
> >     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
> >       -for index 0:
> >            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
> >            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
> >                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
> >            vfio_devtree_info_path
> >               len = 26
> >               path = "/soc@ffe000000/sata@220000"
> >
> >     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
> >       -for index 0:
> >           flags = VFIO_IRQ_INFO_EVENTFD |
> >                   VFIO_IRQ_INFO_MASKABLE |
> >                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH
> >           vfio_devtree_info_path
> >               len = 26
> >               path = "/soc@ffe000000/sata@220000"
> >
> > 6.  EXAMPLE 2
> >
> >     Example, Freescale DMA engine (modified to illustrate):
> >
> >     dma@101300 {
> >        cell-index = <0x1>;
> >        ranges = <0x0 0x101100 0x200>;
> >        reg = <0x101300 0x4>;
> >        compatible = "fsl,eloplus-dma";
> >        #size-cells = <0x1>;
> >        #address-cells = <0x1>;
> >        fsl,liodn = <0xc6>;
> >
> >        dma-channel@180 {
> >           interrupts = <0x23 0x2 0x0 0x0>;
> >           cell-index = <0x3>;
> >           reg = <0x180 0x80>;
> >           compatible = "fsl,eloplus-dma-channel";
> >        };
> >
> >        dma-channel@100 {
> >           interrupts = <0x22 0x2 0x0 0x0>;
> >           cell-index = <0x2>;
> >           reg = <0x100 0x80>;
> >           compatible = "fsl,eloplus-dma-channel";
> >        };
> >
> >     };
> >
> >     request to get device FD would look like:
> >       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe101300.dma");
> >
> >     The VFIO_DEVICE_GET_INFO ioctl would return:
> >       -2 regions
> >       -2 interrupts
> >
> >     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
> >       -for index 0:
> >            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
> >            flags = VFIO_DEVTREE_REGION_INFO_FLAG_RANGES |
> >                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
> >            vfio_devtree_info_path
> >               len = 25
> >               path = "/soc@ffe000000/dma@101300"
> >
> >       -for index 1:
> >            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
> >            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
> >                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
> >            vfio_devtree_info_path
> >               len = 25
> >               path = "/soc@ffe000000/dma@101300"
> >
> >     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
> >       -for index 0:
> >           flags = VFIO_IRQ_INFO_EVENTFD |
> >                   VFIO_IRQ_INFO_MASKABLE |
> >                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH
> >           vfio_devtree_info_path
> >               len = 41
> >               path = "/soc@ffe000000/dma@101300/dma-channel@180"
> >
> >       -for index 0:
> >           flags = VFIO_IRQ_INFO_EVENTFD |
> >                   VFIO_IRQ_INFO_MASKABLE |
> >                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH
> >           vfio_devtree_info_path
> >               len = 41
> >               path = "/soc@ffe000000/dma@101300/dma-channel@100"
> 
> 
> Seems like it should work.  My only API concern with this model of
> appending structs is that a user needs to know the size of each struct
> even if they don't otherwise care about it in order to step over it.  In
> some cases, like the path, the size is variable and the user needs to
> look into it.  The structs must also be strictly ordered based on the
> order of the flags or all hope is lost.  If we assign flags sequentially
> there should be no case where the user needs to step over something that
> they doesn't know the size of.  Even so, we may still be ahead to define
> the first word of each struct as the length (I'm guessing a byte might
> be too limiting).  It would sure make walking it easier.  

The 'path' structs already start with the length, so the only change 
would be to add a length to the vfio_devtree_region_info_index
struct right?   I guess will make it a u32.

Stuart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices (v2)
@ 2013-07-16 21:57       ` Yoder Stuart-B08248
  0 siblings, 0 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-16 21:57 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Alexander Graf, Wood Scott-B07421, Bhushan Bharat-R65777,
	Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
	Antonios Motakis, kvm@vger.kernel.org list,
	kvm-ppc@vger.kernel.org, kvmarm@lists.cs.columbia.edu

(sorry for the delayed response, but I've been on PTO)

> > 1.  VFIO_GROUP_GET_DEVICE_FD
> >
> >   User space knows by out-of-band means which device it is accessing
> >   and will call VFIO_GROUP_GET_DEVICE_FD passing a specific sysfs path
> >   to get the device information:
> >
> >   fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
> >              "/sys/bus/platform/devices/ffe210000.usb"));
> 
> FWIW, I'm in favor of whichever way works out cleaner in the code for
> pre-pending "/sys/bus" or not.  It sort of seems like it's unnecessary.
> It's also a little inconsistent that the returned path doesn't
> pre-pend /sys in the examples below.

Ok.  For the returned path in the examples I have the actual device tree
path which is slightly different from the path in /sys.  The device
tree path is what user space would need to interpret /proc/device-tree.

> > 2.  VFIO_DEVICE_GET_INFO
> >
> >    The number of regions corresponds to the regions defined
> >    in "reg" and "ranges" in the device tree.
> >
> >    Two new flags are added to struct vfio_device_info:
> >
> >    #define VFIO_DEVICE_FLAGS_PLATFORM (1 << ?) /* A platform bus device */
> >    #define VFIO_DEVICE_FLAGS_DEVTREE  (1 << ?) /* device tree info available */
> >
> >    It is possible that there could be platform bus devices
> >    that are not in the device tree, so we use 2 flags to
> >    allow for that.
> >
> >    If just VFIO_DEVICE_FLAGS_PLATFORM is set, it means
> >    that there are regions and IRQs but no device tree info
> >    available.
> >
> >    If just VFIO_DEVICE_FLAGS_DEVTREE is set, it means
> >    there is device tree info available.
> 
> But it would be invalid to only have DEVTREE w/o PLATFORM for now,
> right?

Right.  The way I stated it is incorrect. DEVTREE would never
be set by itself.

> > 3. VFIO_DEVICE_GET_REGION_INFO
> >
> >    For platform devices with multiple regions, information
> >    is needed to correlate the regions with the device
> >    tree structure that drivers use to determine the meaning
> >    of device resources.
> >
> >    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
> >    device tree information.
> >
> >    The following information is needed:
> >       -the device tree path to the node corresponding to the
> >        region
> >       -whether it corresponds to a "reg" or "ranges" property
> >       -there could be multiple sub-regions per "reg" or "ranges" and
> >        the sub-index within the reg/ranges is needed
> >
> >    There are 5 new flags added to vfio_region_info :
> >
> >    struct vfio_region_info {
> >         __u32   argsz;
> >         __u32   flags;
> >    #define VFIO_REGION_INFO_FLAG_CACHEABLE (1 << ?)
> >    #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
> >    #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
> >    #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
> >    #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
> >         __u32   index;          /* Region index */
> >         __u32   resv;           /* Reserved for alignment */
> >         __u64   size;           /* Region size (bytes) */
> >         __u64   offset;         /* Region offset from start of device fd */
> >    };
> >
> >    VFIO_REGION_INFO_FLAG_CACHEABLE
> >        -if set indicates that the region must be mapped as cacheable
> >
> >    VFIO_DEVTREE_REGION_INFO_FLAG_REG
> >        -if set indicates that the region corresponds to a "reg" property
> >         in the device tree representation of the device
> >
> >    VFIO_DEVTREE_REGION_INFO_FLAG_RANGE
> >        -if set indicates that the region corresponds to a "ranges" property
> >         in the device tree representation of the device
> >
> >    VFIO_DEVTREE_REGION_INFO_FLAG_INDEX
> >        -if set indicates that there is a dword aligned struct
> >         struct vfio_devtree_region_info_index appended to the
> >         end of vfio_region_info:
> >
> >         struct vfio_devtree_region_info_index
> >         {
> > 	      u32 index;
> >         }
> >
> >         A reg or ranges property may have multiple regsion.  The index
> >         specifies the index within the "reg" or "ranges"
> >         that this region corresponds to.
> >
> >    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
> >        -if set indicates that there is a dword aligned struct
> >         struct vfio_devtree_info_path appended to the
> >         end of vfio_region_info:
> >
> >         struct vfio_devtree_info_path
> >         {
> >             u32 len;
> >             u8 path[];
> >         }
> >
> >         The path is the full path to the corresponding device
> >         tree node.  The len field specifies the length of the
> >         path string.
> >
> >    If multiple flags are set that indicate that there is
> >    an appended struct, the order of the flags indicates
> >    the order of the structs.
> >
> >    argsz is set by the kernel specifying the total size of
> >    struct vfio_region_info and all appended structs.
> >
> >    Suggested usage:
> >       -call VFIO_DEVICE_GET_REGION_INFO with argsz =
> >        sizeof(struct vfio_region_info)
> >       -realloc the buffer
> >       -call VFIO_DEVICE_GET_REGION_INFO again, and the appended
> >        structs will be returned
> >
> > 4.  VFIO_DEVICE_GET_IRQ_INFO
> >
> >    For platform devices with multiple interrupts that
> >    correspond to different subnodes in the device tree,
> >    information is needed to correlate the interrupts
> >    to the the device tree structure.
> >
> >    The VFIO_DEVICE_GET_REGION_INFO is extended to provide
> >    device tree information.
> >
> >    1 new flag is added to vfio_irq_info :
> >
> >    struct vfio_irq_info {
> >         __u32   argsz;
> >         __u32   flags;
> >    #define VFIO_DEVTREE_IRQ_INFO_FLAG_PATH (1 << ?)
> >         __u32   index;    /* IRQ index */
> >         __u32   count;    /* Number of IRQs within this index */
> >     };
> >
> >    VFIO_DEVTREE_IRQ_INFO_FLAG_PATH
> >        -if set indicates that there is a dword aligned struct
> >         struct vfio_devtree_info_path appended to the
> >         end of vfio_irq_info :
> >
> >         struct vfio_devtree_info_path
> >         {
> >             u32 len;
> >             u8 path[];
> >         }
> >
> >         The path is the full path to the corresponding device
> >         tree node.  The len field specifies the length of the
> >         path string.
> >
> >    argsz is set by the kernel specifying the total size of
> >    struct vfio_region_info and all appended structs.
> >
> > 5.  EXAMPLE 1
> >
> >     Example, Freescale SATA controller:
> >
> >      sata@220000 {
> >          compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
> >          reg = <0x220000 0x1000>;
> >          interrupts = <0x44 0x2 0x0 0x0>;
> >      };
> >
> >     request to get device FD would look like:
> >       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe220000.sata");
> >
> >     The VFIO_DEVICE_GET_INFO ioctl would return:
> >       -1 region
> >       -1 interrupts
> >
> >     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
> >       -for index 0:
> >            offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
> >            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
> >                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
> >            vfio_devtree_info_path
> >               len = 26
> >               path = "/soc@ffe000000/sata@220000"
> >
> >     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
> >       -for index 0:
> >           flags = VFIO_IRQ_INFO_EVENTFD |
> >                   VFIO_IRQ_INFO_MASKABLE |
> >                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH
> >           vfio_devtree_info_path
> >               len = 26
> >               path = "/soc@ffe000000/sata@220000"
> >
> > 6.  EXAMPLE 2
> >
> >     Example, Freescale DMA engine (modified to illustrate):
> >
> >     dma@101300 {
> >        cell-index = <0x1>;
> >        ranges = <0x0 0x101100 0x200>;
> >        reg = <0x101300 0x4>;
> >        compatible = "fsl,eloplus-dma";
> >        #size-cells = <0x1>;
> >        #address-cells = <0x1>;
> >        fsl,liodn = <0xc6>;
> >
> >        dma-channel@180 {
> >           interrupts = <0x23 0x2 0x0 0x0>;
> >           cell-index = <0x3>;
> >           reg = <0x180 0x80>;
> >           compatible = "fsl,eloplus-dma-channel";
> >        };
> >
> >        dma-channel@100 {
> >           interrupts = <0x22 0x2 0x0 0x0>;
> >           cell-index = <0x2>;
> >           reg = <0x100 0x80>;
> >           compatible = "fsl,eloplus-dma-channel";
> >        };
> >
> >     };
> >
> >     request to get device FD would look like:
> >       fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/sys/bus/platform/devices/ffe101300.dma");
> >
> >     The VFIO_DEVICE_GET_INFO ioctl would return:
> >       -2 regions
> >       -2 interrupts
> >
> >     The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
> >       -for index 0:
> >            offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
> >            flags = VFIO_DEVTREE_REGION_INFO_FLAG_RANGES |
> >                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
> >            vfio_devtree_info_path
> >               len = 25
> >               path = "/soc@ffe000000/dma@101300"
> >
> >       -for index 1:
> >            offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
> >            flags = VFIO_DEVTREE_REGION_INFO_FLAG_REG |
> >                    VFIO_DEVTREE_REGION_INFO_FLAG_PATH
> >            vfio_devtree_info_path
> >               len = 25
> >               path = "/soc@ffe000000/dma@101300"
> >
> >     The VFIO_DEVICE_GET_IRQ_INFO ioctl would return:
> >       -for index 0:
> >           flags = VFIO_IRQ_INFO_EVENTFD |
> >                   VFIO_IRQ_INFO_MASKABLE |
> >                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH
> >           vfio_devtree_info_path
> >               len = 41
> >               path = "/soc@ffe000000/dma@101300/dma-channel@180"
> >
> >       -for index 0:
> >           flags = VFIO_IRQ_INFO_EVENTFD |
> >                   VFIO_IRQ_INFO_MASKABLE |
> >                   VFIO_DEVTREE_IRQ_INFO_FLAG_PATH
> >           vfio_devtree_info_path
> >               len = 41
> >               path = "/soc@ffe000000/dma@101300/dma-channel@100"
> 
> 
> Seems like it should work.  My only API concern with this model of
> appending structs is that a user needs to know the size of each struct
> even if they don't otherwise care about it in order to step over it.  In
> some cases, like the path, the size is variable and the user needs to
> look into it.  The structs must also be strictly ordered based on the
> order of the flags or all hope is lost.  If we assign flags sequentially
> there should be no case where the user needs to step over something that
> they doesn't know the size of.  Even so, we may still be ahead to define
> the first word of each struct as the length (I'm guessing a byte might
> be too limiting).  It would sure make walking it easier.  

The 'path' structs already start with the length, so the only change 
would be to add a length to the vfio_devtree_region_info_index
struct right?   I guess will make it a u32.

Stuart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-16 21:51   ` Yoder Stuart-B08248
@ 2013-07-16 22:01       ` Scott Wood
  2013-07-16 22:01       ` Scott Wood
  1 sibling, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-16 22:01 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, Alex Williamson, Alexander Graf,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On 07/16/2013 04:51:12 PM, Yoder Stuart-B08248 wrote:
> > > 3.  VFIO_DEVICE_GET_REGION_INFO
> > >
> > >    No changes needed, except perhaps adding a new flag.  Freescale
> > > has some
> > >    devices with regions that must be mapped cacheable.
> >
> > While I don't object to making the information available to the user
> > just in case, the main thing we need here is to influence what the
> > kernel does when the user tries to map it.  At least on PPC it's  
> not up
> > to userspace to select whether a mmap is cacheable.
> 
> If user space really can't do anything with the 'cacheable'
> flag, do you think there is good reason to keep it?   Will it
> help any decision that user space makes?  Maybe we should just
> drop it.

As long as we can be sure all architectures will map things correctly  
without any flags needing to be specified, that's fine.

> > >    struct vfio_path_info {
> > >         __u32   argsz;
> > >         __u32   flags;
> > >    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region  
> is a
> > > "ranges" property */
> >
> > What about distinguishing a normal interrupt from one found in an
> > interrupt-map?
> 
> I'm not sure we need that.  The kernel needs to use the interrupt
> map to get interrupts hooked up right, but all user space needs to
> know is that there are N interrupts and possibly device tree
> paths to help user space interpret which interrupt is which.

What if the interrupt map is for devices without explicit nodes, such  
as with a PCI controller (ignore the fact that we would normally use  
vfio_pci for the indivdual PCI devices instead)?

You could say the same thing about ranges -- why expose ranges instead  
of the individual child node regs after translation?

> > In the case of both ranges and interrupt-maps, we'll also want to
> > decide what the policy is for when to expose them directly, versus  
> just
> > using them to translate regs and interrupts of child nodes
> 
> Yes, not sure the best approach there...but guess we can cross
> that bridge when we implement this.  It doesn't affect this
> interface.

It does affect the interface, because if you allow either of them to be  
mapped directly (rather than implicitly used when mapping a child  
node), you need a way to indicate which type of resource it is you're  
describing (as you already do for reg/ranges).

It also affects how vfio device binding is done, even if only to the  
point of specifying default behavior in the absence of knobs which  
change whether interrupt maps and/or ranges are mapped.

> > >         __u8    path[];         /* output: Full path to associated
> > > device tree node */
> >
> > How does the caller know what size buffer to supply for this?

Ping

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-16 21:51   ` Yoder Stuart-B08248
@ 2013-07-16 22:01     ` Scott Wood
  2013-07-16 22:01       ` Scott Wood
  1 sibling, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-16 22:01 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list, Antonios Motakis,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Bhushan Bharat-R65777,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu

On 07/16/2013 04:51:12 PM, Yoder Stuart-B08248 wrote:
> > > 3.  VFIO_DEVICE_GET_REGION_INFO
> > >
> > >    No changes needed, except perhaps adding a new flag.  Freescale
> > > has some
> > >    devices with regions that must be mapped cacheable.
> >
> > While I don't object to making the information available to the user
> > just in case, the main thing we need here is to influence what the
> > kernel does when the user tries to map it.  At least on PPC it's  
> not up
> > to userspace to select whether a mmap is cacheable.
> 
> If user space really can't do anything with the 'cacheable'
> flag, do you think there is good reason to keep it?   Will it
> help any decision that user space makes?  Maybe we should just
> drop it.

As long as we can be sure all architectures will map things correctly  
without any flags needing to be specified, that's fine.

> > >    struct vfio_path_info {
> > >         __u32   argsz;
> > >         __u32   flags;
> > >    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region  
> is a
> > > "ranges" property */
> >
> > What about distinguishing a normal interrupt from one found in an
> > interrupt-map?
> 
> I'm not sure we need that.  The kernel needs to use the interrupt
> map to get interrupts hooked up right, but all user space needs to
> know is that there are N interrupts and possibly device tree
> paths to help user space interpret which interrupt is which.

What if the interrupt map is for devices without explicit nodes, such  
as with a PCI controller (ignore the fact that we would normally use  
vfio_pci for the indivdual PCI devices instead)?

You could say the same thing about ranges -- why expose ranges instead  
of the individual child node regs after translation?

> > In the case of both ranges and interrupt-maps, we'll also want to
> > decide what the policy is for when to expose them directly, versus  
> just
> > using them to translate regs and interrupts of child nodes
> 
> Yes, not sure the best approach there...but guess we can cross
> that bridge when we implement this.  It doesn't affect this
> interface.

It does affect the interface, because if you allow either of them to be  
mapped directly (rather than implicitly used when mapping a child  
node), you need a way to indicate which type of resource it is you're  
describing (as you already do for reg/ranges).

It also affects how vfio device binding is done, even if only to the  
point of specifying default behavior in the absence of knobs which  
change whether interrupt maps and/or ranges are mapped.

> > >         __u8    path[];         /* output: Full path to associated
> > > device tree node */
> >
> > How does the caller know what size buffer to supply for this?

Ping

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
@ 2013-07-16 22:01       ` Scott Wood
  0 siblings, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-16 22:01 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, Alex Williamson, Alexander Graf,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On 07/16/2013 04:51:12 PM, Yoder Stuart-B08248 wrote:
> > > 3.  VFIO_DEVICE_GET_REGION_INFO
> > >
> > >    No changes needed, except perhaps adding a new flag.  Freescale
> > > has some
> > >    devices with regions that must be mapped cacheable.
> >
> > While I don't object to making the information available to the user
> > just in case, the main thing we need here is to influence what the
> > kernel does when the user tries to map it.  At least on PPC it's  
> not up
> > to userspace to select whether a mmap is cacheable.
> 
> If user space really can't do anything with the 'cacheable'
> flag, do you think there is good reason to keep it?   Will it
> help any decision that user space makes?  Maybe we should just
> drop it.

As long as we can be sure all architectures will map things correctly  
without any flags needing to be specified, that's fine.

> > >    struct vfio_path_info {
> > >         __u32   argsz;
> > >         __u32   flags;
> > >    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region  
> is a
> > > "ranges" property */
> >
> > What about distinguishing a normal interrupt from one found in an
> > interrupt-map?
> 
> I'm not sure we need that.  The kernel needs to use the interrupt
> map to get interrupts hooked up right, but all user space needs to
> know is that there are N interrupts and possibly device tree
> paths to help user space interpret which interrupt is which.

What if the interrupt map is for devices without explicit nodes, such  
as with a PCI controller (ignore the fact that we would normally use  
vfio_pci for the indivdual PCI devices instead)?

You could say the same thing about ranges -- why expose ranges instead  
of the individual child node regs after translation?

> > In the case of both ranges and interrupt-maps, we'll also want to
> > decide what the policy is for when to expose them directly, versus  
> just
> > using them to translate regs and interrupts of child nodes
> 
> Yes, not sure the best approach there...but guess we can cross
> that bridge when we implement this.  It doesn't affect this
> interface.

It does affect the interface, because if you allow either of them to be  
mapped directly (rather than implicitly used when mapping a child  
node), you need a way to indicate which type of resource it is you're  
describing (as you already do for reg/ranges).

It also affects how vfio device binding is done, even if only to the  
point of specifying default behavior in the absence of knobs which  
change whether interrupt maps and/or ranges are mapped.

> > >         __u8    path[];         /* output: Full path to associated
> > > device tree node */
> >
> > How does the caller know what size buffer to supply for this?

Ping

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: RFC: vfio interface for platform devices
  2013-07-16 22:01       ` Scott Wood
  (?)
@ 2013-07-16 22:41       ` Yoder Stuart-B08248
  2013-07-16 22:50         ` Scott Wood
  2013-07-16 22:50           ` Scott Wood
  -1 siblings, 2 replies; 51+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-16 22:41 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: kvm@vger.kernel.org list, Bhushan Bharat-R65777,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu



> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Tuesday, July 16, 2013 5:01 PM
> To: Yoder Stuart-B08248
> Cc: Wood Scott-B07421; Alex Williamson; Alexander Graf; Bhushan Bharat-R65777; Sethi Varun-B16395;
> virtualization@lists.linux-foundation.org; Antonios Motakis; kvm@vger.kernel.org list; kvm-
> ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices
> 
> On 07/16/2013 04:51:12 PM, Yoder Stuart-B08248 wrote:
> > > > 3.  VFIO_DEVICE_GET_REGION_INFO
> > > >
> > > >    No changes needed, except perhaps adding a new flag.  Freescale
> > > > has some
> > > >    devices with regions that must be mapped cacheable.
> > >
> > > While I don't object to making the information available to the user
> > > just in case, the main thing we need here is to influence what the
> > > kernel does when the user tries to map it.  At least on PPC it's
> > not up
> > > to userspace to select whether a mmap is cacheable.
> >
> > If user space really can't do anything with the 'cacheable'
> > flag, do you think there is good reason to keep it?   Will it
> > help any decision that user space makes?  Maybe we should just
> > drop it.
> 
> As long as we can be sure all architectures will map things correctly
> without any flags needing to be specified, that's fine.
> 
> > > >    struct vfio_path_info {
> > > >         __u32   argsz;
> > > >         __u32   flags;
> > > >    #define VFIO_DEVTREE_INFO_RANGES      (1 << 3) /* the region
> > is a
> > > > "ranges" property */
> > >
> > > What about distinguishing a normal interrupt from one found in an
> > > interrupt-map?
> >
> > I'm not sure we need that.  The kernel needs to use the interrupt
> > map to get interrupts hooked up right, but all user space needs to
> > know is that there are N interrupts and possibly device tree
> > paths to help user space interpret which interrupt is which.
> 
> What if the interrupt map is for devices without explicit nodes, such
> as with a PCI controller (ignore the fact that we would normally use
> vfio_pci for the indivdual PCI devices instead)?
> 
> You could say the same thing about ranges -- why expose ranges instead
> of the individual child node regs after translation?

Hmm...yes, I guess ranges and interrupt-map fall into the same
basic type of resource category.  I'm not sure it's realistic
to pass entire bus controllers through to user space vs
just individual devices on a bus, but I guess it's theoretically
possible.

So the question is whether we future proof by adding flags 
for both ranges and interrupt-map, or wait until there is
an actual need for it.

> > > In the case of both ranges and interrupt-maps, we'll also want to
> > > decide what the policy is for when to expose them directly, versus
> > just
> > > using them to translate regs and interrupts of child nodes
> >
> > Yes, not sure the best approach there...but guess we can cross
> > that bridge when we implement this.  It doesn't affect this
> > interface.
> 
> It does affect the interface, because if you allow either of them to be
> mapped directly (rather than implicitly used when mapping a child
> node), you need a way to indicate which type of resource it is you're
> describing (as you already do for reg/ranges).
>
> It also affects how vfio device binding is done, even if only to the
> point of specifying default behavior in the absence of knobs which
> change whether interrupt maps and/or ranges are mapped.

My opinion is that we want to expose the regs and interrupts for
individual nodes by default, not ranges (or interrupt maps).   When someone
needs ranges/interrupt-map in the future they'll need to figure out some
means for the vfio layer to do the right thing.  It's complicated
and I would be surprised to see someone need it.
 
> > > >         __u8    path[];         /* output: Full path to associated
> > > > device tree node */
> > >
> > > How does the caller know what size buffer to supply for this?
> 
> Ping

This is in the v2 RFC... the caller invokes the ioctl which returns
the complete/full size, then re-allocs the buffer and calls the
ioctl again.  Or, as Alex suggested, just use a sufficiently large
buffer to start with.

Stuart


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-16 22:41       ` Yoder Stuart-B08248
@ 2013-07-16 22:50           ` Scott Wood
  2013-07-16 22:50           ` Scott Wood
  1 sibling, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-16 22:50 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, Alex Williamson, Alexander Graf,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On 07/16/2013 05:41:04 PM, Yoder Stuart-B08248 wrote:
> 
> 
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Tuesday, July 16, 2013 5:01 PM
> > To: Yoder Stuart-B08248
> > Cc: Wood Scott-B07421; Alex Williamson; Alexander Graf; Bhushan  
> Bharat-R65777; Sethi Varun-B16395;
> > virtualization@lists.linux-foundation.org; Antonios Motakis;  
> kvm@vger.kernel.org list; kvm-
> > ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> > Subject: Re: RFC: vfio interface for platform devices
> >
> > What if the interrupt map is for devices without explicit nodes,  
> such
> > as with a PCI controller (ignore the fact that we would normally use
> > vfio_pci for the indivdual PCI devices instead)?
> >
> > You could say the same thing about ranges -- why expose ranges  
> instead
> > of the individual child node regs after translation?
> 
> Hmm...yes, I guess ranges and interrupt-map fall into the same
> basic type of resource category.  I'm not sure it's realistic
> to pass entire bus controllers through to user space vs
> just individual devices on a bus, but I guess it's theoretically
> possible.

Where "theoretically possible" means "we've done it before in other  
contexts". :-)

> So the question is whether we future proof by adding flags
> for both ranges and interrupt-map, or wait until there is
> an actual need for it.

We don't need to actually add a flag for it, but we should have a  
flag/type for the resources we do support, so that code written to the  
current API would recognize that it doesn't recognize an interrupt-map  
entry if it's added later.

> > > > >         __u8    path[];         /* output: Full path to  
> associated
> > > > > device tree node */
> > > >
> > > > How does the caller know what size buffer to supply for this?
> >
> > Ping
> 
> This is in the v2 RFC... the caller invokes the ioctl which returns
> the complete/full size, then re-allocs the buffer and calls the
> ioctl again.

OK.

> Or, as Alex suggested, just use a sufficiently large buffer to start  
> with.

It's fine for a user of the API to simplify things by using a large  
fixed buffer, but the API shouldn't force that approach.

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
  2013-07-16 22:41       ` Yoder Stuart-B08248
@ 2013-07-16 22:50         ` Scott Wood
  2013-07-16 22:50           ` Scott Wood
  1 sibling, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-16 22:50 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, kvm@vger.kernel.org list, Antonios Motakis,
	kvm-ppc@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Bhushan Bharat-R65777,
	Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu

On 07/16/2013 05:41:04 PM, Yoder Stuart-B08248 wrote:
> 
> 
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Tuesday, July 16, 2013 5:01 PM
> > To: Yoder Stuart-B08248
> > Cc: Wood Scott-B07421; Alex Williamson; Alexander Graf; Bhushan  
> Bharat-R65777; Sethi Varun-B16395;
> > virtualization@lists.linux-foundation.org; Antonios Motakis;  
> kvm@vger.kernel.org list; kvm-
> > ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> > Subject: Re: RFC: vfio interface for platform devices
> >
> > What if the interrupt map is for devices without explicit nodes,  
> such
> > as with a PCI controller (ignore the fact that we would normally use
> > vfio_pci for the indivdual PCI devices instead)?
> >
> > You could say the same thing about ranges -- why expose ranges  
> instead
> > of the individual child node regs after translation?
> 
> Hmm...yes, I guess ranges and interrupt-map fall into the same
> basic type of resource category.  I'm not sure it's realistic
> to pass entire bus controllers through to user space vs
> just individual devices on a bus, but I guess it's theoretically
> possible.

Where "theoretically possible" means "we've done it before in other  
contexts". :-)

> So the question is whether we future proof by adding flags
> for both ranges and interrupt-map, or wait until there is
> an actual need for it.

We don't need to actually add a flag for it, but we should have a  
flag/type for the resources we do support, so that code written to the  
current API would recognize that it doesn't recognize an interrupt-map  
entry if it's added later.

> > > > >         __u8    path[];         /* output: Full path to  
> associated
> > > > > device tree node */
> > > >
> > > > How does the caller know what size buffer to supply for this?
> >
> > Ping
> 
> This is in the v2 RFC... the caller invokes the ioctl which returns
> the complete/full size, then re-allocs the buffer and calls the
> ioctl again.

OK.

> Or, as Alex suggested, just use a sufficiently large buffer to start  
> with.

It's fine for a user of the API to simplify things by using a large  
fixed buffer, but the API shouldn't force that approach.

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: RFC: vfio interface for platform devices
@ 2013-07-16 22:50           ` Scott Wood
  0 siblings, 0 replies; 51+ messages in thread
From: Scott Wood @ 2013-07-16 22:50 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, Alex Williamson, Alexander Graf,
	Bhushan Bharat-R65777, Sethi Varun-B16395,
	virtualization@lists.linux-foundation.org, Antonios Motakis,
	kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu

On 07/16/2013 05:41:04 PM, Yoder Stuart-B08248 wrote:
> 
> 
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Tuesday, July 16, 2013 5:01 PM
> > To: Yoder Stuart-B08248
> > Cc: Wood Scott-B07421; Alex Williamson; Alexander Graf; Bhushan  
> Bharat-R65777; Sethi Varun-B16395;
> > virtualization@lists.linux-foundation.org; Antonios Motakis;  
> kvm@vger.kernel.org list; kvm-
> > ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> > Subject: Re: RFC: vfio interface for platform devices
> >
> > What if the interrupt map is for devices without explicit nodes,  
> such
> > as with a PCI controller (ignore the fact that we would normally use
> > vfio_pci for the indivdual PCI devices instead)?
> >
> > You could say the same thing about ranges -- why expose ranges  
> instead
> > of the individual child node regs after translation?
> 
> Hmm...yes, I guess ranges and interrupt-map fall into the same
> basic type of resource category.  I'm not sure it's realistic
> to pass entire bus controllers through to user space vs
> just individual devices on a bus, but I guess it's theoretically
> possible.

Where "theoretically possible" means "we've done it before in other  
contexts". :-)

> So the question is whether we future proof by adding flags
> for both ranges and interrupt-map, or wait until there is
> an actual need for it.

We don't need to actually add a flag for it, but we should have a  
flag/type for the resources we do support, so that code written to the  
current API would recognize that it doesn't recognize an interrupt-map  
entry if it's added later.

> > > > >         __u8    path[];         /* output: Full path to  
> associated
> > > > > device tree node */
> > > >
> > > > How does the caller know what size buffer to supply for this?
> >
> > Ping
> 
> This is in the v2 RFC... the caller invokes the ioctl which returns
> the complete/full size, then re-allocs the buffer and calls the
> ioctl again.

OK.

> Or, as Alex suggested, just use a sufficiently large buffer to start  
> with.

It's fine for a user of the API to simplify things by using a large  
fixed buffer, but the API shouldn't force that approach.

-Scott

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2013-07-16 22:50 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-02 23:25 RFC: vfio interface for platform devices Yoder Stuart-B08248
2013-07-02 23:25 ` Yoder Stuart-B08248
2013-07-03  1:07 ` Alexander Graf
2013-07-03  1:07   ` Alexander Graf
2013-07-03 18:51   ` Scott Wood
2013-07-03 18:51     ` Scott Wood
2013-07-03 19:08     ` Yoder Stuart-B08248
2013-07-03 19:08       ` Yoder Stuart-B08248
2013-07-03 18:51   ` Scott Wood
2013-07-03  1:07 ` Alexander Graf
2013-07-03  3:07 ` Alex Williamson
2013-07-03  3:07   ` Alex Williamson
2013-07-03 10:44   ` Antonios Motakis
2013-07-03 10:44   ` Antonios Motakis
2013-07-03 10:44     ` Antonios Motakis
2013-07-03 19:23     ` Yoder Stuart-B08248
2013-07-03 19:23       ` Yoder Stuart-B08248
2013-07-03 19:23     ` Yoder Stuart-B08248
2013-07-03 17:20   ` Yoder Stuart-B08248
2013-07-03 17:20     ` Yoder Stuart-B08248
2013-07-03 21:40 ` RFC: vfio interface for platform devices (v2) Yoder Stuart-B08248
2013-07-03 21:40   ` Yoder Stuart-B08248
2013-07-03 22:53   ` Alex Williamson
2013-07-03 22:53   ` Alex Williamson
2013-07-03 22:53     ` Alex Williamson
2013-07-03 23:06     ` Scott Wood
2013-07-03 23:06       ` Scott Wood
2013-07-03 23:06     ` Scott Wood
2013-07-16 21:57     ` Yoder Stuart-B08248
2013-07-16 21:57     ` Yoder Stuart-B08248
2013-07-16 21:57       ` Yoder Stuart-B08248
2013-07-04 14:44   ` Mario Smarduch
2013-07-04 14:44   ` Mario Smarduch
2013-07-04 14:44     ` Mario Smarduch
2013-07-04 14:47     ` Alexander Graf
2013-07-04 14:47     ` Alexander Graf
2013-07-04 14:47       ` Alexander Graf
2013-07-16 15:25     ` Yoder Stuart-B08248
2013-07-03 22:31 ` RFC: vfio interface for platform devices Scott Wood
2013-07-03 22:31 ` Scott Wood
2013-07-03 22:31   ` Scott Wood
2013-07-16 21:51   ` Yoder Stuart-B08248
2013-07-16 22:01     ` Scott Wood
2013-07-16 22:01     ` Scott Wood
2013-07-16 22:01       ` Scott Wood
2013-07-16 22:41       ` Yoder Stuart-B08248
2013-07-16 22:50         ` Scott Wood
2013-07-16 22:50         ` Scott Wood
2013-07-16 22:50           ` Scott Wood
2013-07-16 21:51   ` Yoder Stuart-B08248
  -- strict thread matches above, loose matches on Subject: below --
2013-07-03 21:40 RFC: vfio interface for platform devices (v2) Yoder Stuart-B08248

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.