* RFC: vfio interface for platform devices
@ 2013-07-02 23:25 Yoder Stuart-B08248
2013-07-03 1:07 ` Alexander Graf
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-02 23:25 UTC (permalink / raw)
To: Alex Williamson, Alexander Graf, Wood Scott-B07421
Cc: Bhushan Bharat-R65777, Sethi Varun-B16395,
virtualization@lists.linux-foundation.org, Antonios Motakis,
kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
kvmarm@lists.cs.columbia.edu
The write-up below is the first draft of a proposal for how the kernel can expose
platform devices to user space using vfio.
In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
allows user space to correlate regions and interrupts to the corresponding
device tree node structure that is defined for most platform devices.
Regards,
Stuart Yoder
------------------------------------------------------------------------------
VFIO for Platform Devices
The existing infrastructure for vfio-pci is pretty close to what we need:
-mechanism to create a container
-add groups/devices to a container
-set the IOMMU model
-map DMA regions
-get an fd for a specific device, which allows user space to determine
info about device regions (e.g. registers) and interrupt info
-support for mmapping device regions
-mechanism to set how interrupts are signaled
Platform devices can get complicated-- potentially with a tree hierarchy
of nodes, and links/phandles pointing to other platform
devices. The kernel doesn't expose relationships between
devices. The kernel just exposes mappable register regions and interrupts.
It's up to user space to work out relationships between devices
if it needs to-- this can be determined in the device tree exposed in
/proc/device-tree.
I think the changes needed for vfio are around some of the device tree
related info that needs to be available with the device fd.
1. VFIO_GROUP_GET_DEVICE_FD
User space has to know which device it is accessing and will call
VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
get the device information:
fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
(whether the path is a device tree path or a sysfs path is up for
discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
2. VFIO_DEVICE_GET_INFO
Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
than adding a new flag identifying a devices as a 'platform'
device.
This ioctl simply returns the number of regions and number of irqs.
The number of regions corresponds to the number of regions
that can be mapped for the device-- corresponds to the regions defined
in "reg" and "ranges" in the device tree.
3. VFIO_DEVICE_GET_REGION_INFO
No changes needed, except perhaps adding a new flag. Freescale has some
devices with regions that must be mapped cacheable.
3. VFIO_DEVICE_GET_IRQ_INFO
No changes needed.
4. VFIO_DEVICE_GET_DEVTREE_INFO
The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
expose device regions and interrupts, but it's not enough to know
that there are X regions and Y interrupts. User space needs to
know what the resources are for-- to correlate those regions/interrupts
to the device tree structure that drivers use. The device tree
structure could consist of multiple nodes and it is necessary to
identify the node corresponding to the region/interrupt exposed
by VFIO.
The following information is needed:
-the device tree path to the node corresponding to the
region or interrupt
-for a region, whether it corresponds to a "reg" or "ranges"
property
-there could be multiple sub-regions per "reg" or "ranges" and
the sub-index within the reg/ranges is needed
The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
struct vfio_path_info {
__u32 argsz;
__u32 flags;
#define VFIO_DEVTREE_INFO_RANGES (1 << 3) /* the region is a "ranges" property */
__u32 index; /* input: index of region or irq for which we are getting info */
__u32 type; /* input: 0 - get devtree info for a region
1 - get devtree info for an irq
*/
__u32 start; /* output: identifies the index within the reg/ranges */
__u8 path[]; /* output: Full path to associated device tree node */
};
User space allocates enough space for the device tree path, sets
the type field identifying whether this is a region, or irq,
and sets argsz appropriately.
5. EXAMPLE 1
Example, Freescale SATA controller:
sata@220000 {
compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
reg = <0x220000 0x1000>;
interrupts = <0x44 0x2 0x0 0x0>;
};
request to get device FD would look like:
fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
The VFIO_DEVICE_GET_INFO ioctl would return:
-1 region
-1 interrupts
The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
-for index 0:
offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
for the single interrupt.
The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
-for region index 0:
flags: 0x0 // i.e. this is a "reg" property
start: 0x0 // i.e. index 0x0 in "reg"
path: "/soc@ffe000000/sata@220000"
-for interrupt index 0:
path: "/soc@ffe000000/sata@220000"
6. EXAMPLE 2
Example, Freescale crypto device (modified to illustrate):
crypto@300000 {
compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
#address-cells = <0x1>;
#size-cells = <0x1>;
reg = <0x300000 0x10000>;
interrupts = <0x5c 0x2 0x0 0x0>;
jr@1000 {
compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
interrupts = <0x58 0x2 0x0 0x0>;
};
jr@2000 {
compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
interrupts = <0x59 0x2 0x0 0x0>;
};
};
request to get device FD would look like:
fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
The VFIO_DEVICE_GET_INFO ioctl would return:
-1 region
-3 interrupts
The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
-for index 0:
offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
for each of the IRQs-- indexes 0-4.
The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
-for region index 0:
flags: 0x0 // i.e. this is a "reg" property
start: 0x0 // i.e. index 0x0 in "reg"
path: "/soc@ffe000000/crypto@300000"
-for interrupt index 0:
path: "/soc@ffe000000/crypto@300000/jr@1000"
-for interrupt index 1:
path: "/soc@ffe000000/crypto@300000/jr@2000"
7. EXAMPLE 3
Example, Freescale DMA engine (modified to illustrate):
dma@101300 {
cell-index = <0x1>;
ranges = <0x0 0x101100 0x200>;
reg = <0x101300 0x4>;
compatible = "fsl,eloplus-dma";
#size-cells = <0x1>;
#address-cells = <0x1>;
fsl,liodn = <0xc6>;
dma-channel@180 {
interrupts = <0x23 0x2 0x0 0x0>;
cell-index = <0x3>;
reg = <0x180 0x80>;
compatible = "fsl,eloplus-dma-channel";
};
dma-channel@100 {
interrupts = <0x22 0x2 0x0 0x0>;
cell-index = <0x2>;
reg = <0x100 0x80>;
compatible = "fsl,eloplus-dma-channel";
};
};
request to get device FD would look like:
fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
The VFIO_DEVICE_GET_INFO ioctl would return:
-2 regions
-2 interrupts
The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
-for index 0:
offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
-for index 1:
offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
for each of the IRQs-- indexes 0-3.
The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
-for region index 0:
flags: 0x1 // i.e. this is a "ranges" property
start: 0x0 // i.e. index 0x0 in "ranges"
path: "/soc@ffe000000/dma@101300"
-for region index 1:
flags: 0x0 // i.e. this is a "reg" property
start: 0x0 // i.e. index 0x0 in "ranges"
path: "/soc@ffe000000/dma@101300"
-for interrupt index 0:
path: "/soc@ffe000000/dma@101300/dma-channel@180"
-for interrupt index 1:
path: "/soc@ffe000000/dma@101300/dma-channel@100"
8. Open Issues
-how to handle cases where VFIO is requested to handle
a device where the valid, mappable range for a region
is less than a page size. See example above where an
advertised region in the DMA node is 4 bytes. If exposed
to a guest VM, the guest has to be able to map a full page
of I/O space which opens a potential security issue.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: vfio interface for platform devices
2013-07-02 23:25 RFC: vfio interface for platform devices Yoder Stuart-B08248
@ 2013-07-03 1:07 ` Alexander Graf
2013-07-03 18:51 ` Scott Wood
2013-07-03 3:07 ` Alex Williamson
2013-07-03 22:31 ` Scott Wood
2 siblings, 1 reply; 13+ messages in thread
From: Alexander Graf @ 2013-07-03 1:07 UTC (permalink / raw)
To: Yoder Stuart-B08248
Cc: Alex Williamson, Wood Scott-B07421, Bhushan Bharat-R65777,
Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
Antonios Motakis, kvm@vger.kernel.org list,
kvm-ppc@vger.kernel.org, kvmarm@lists.cs.columbia.edu
On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:
> The write-up below is the first draft of a proposal for how the kernel can expose
> platform devices to user space using vfio.
>
> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> allows user space to correlate regions and interrupts to the corresponding
> device tree node structure that is defined for most platform devices.
>
> Regards,
> Stuart Yoder
>
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
>
> The existing infrastructure for vfio-pci is pretty close to what we need:
> -mechanism to create a container
> -add groups/devices to a container
> -set the IOMMU model
> -map DMA regions
> -get an fd for a specific device, which allows user space to determine
> info about device regions (e.g. registers) and interrupt info
> -support for mmapping device regions
> -mechanism to set how interrupts are signaled
>
> Platform devices can get complicated-- potentially with a tree hierarchy
> of nodes, and links/phandles pointing to other platform
> devices. The kernel doesn't expose relationships between
> devices. The kernel just exposes mappable register regions and interrupts.
> It's up to user space to work out relationships between devices
> if it needs to-- this can be determined in the device tree exposed in
> /proc/device-tree.
>
> I think the changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
>
> 1. VFIO_GROUP_GET_DEVICE_FD
>
> User space has to know which device it is accessing and will call
> VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
> get the device information:
>
> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
>
> (whether the path is a device tree path or a sysfs path is up for
> discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
>
> 2. VFIO_DEVICE_GET_INFO
>
> Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
> than adding a new flag identifying a devices as a 'platform'
> device.
>
> This ioctl simply returns the number of regions and number of irqs.
>
> The number of regions corresponds to the number of regions
> that can be mapped for the device-- corresponds to the regions defined
> in "reg" and "ranges" in the device tree.
>
> 3. VFIO_DEVICE_GET_REGION_INFO
>
> No changes needed, except perhaps adding a new flag. Freescale has some
> devices with regions that must be mapped cacheable.
>
> 3. VFIO_DEVICE_GET_IRQ_INFO
>
> No changes needed.
>
> 4. VFIO_DEVICE_GET_DEVTREE_INFO
>
> The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
> expose device regions and interrupts, but it's not enough to know
> that there are X regions and Y interrupts. User space needs to
> know what the resources are for-- to correlate those regions/interrupts
> to the device tree structure that drivers use. The device tree
> structure could consist of multiple nodes and it is necessary to
> identify the node corresponding to the region/interrupt exposed
> by VFIO.
>
> The following information is needed:
> -the device tree path to the node corresponding to the
> region or interrupt
> -for a region, whether it corresponds to a "reg" or "ranges"
> property
> -there could be multiple sub-regions per "reg" or "ranges" and
> the sub-index within the reg/ranges is needed
>
> The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
>
> ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
>
> struct vfio_path_info {
> __u32 argsz;
> __u32 flags;
> #define VFIO_DEVTREE_INFO_RANGES (1 << 3) /* the region is a "ranges" property */
> __u32 index; /* input: index of region or irq for which we are getting info */
> __u32 type; /* input: 0 - get devtree info for a region
> 1 - get devtree info for an irq
> */
> __u32 start; /* output: identifies the index within the reg/ranges */
> __u8 path[]; /* output: Full path to associated device tree node */
> };
>
> User space allocates enough space for the device tree path, sets
> the type field identifying whether this is a region, or irq,
> and sets argsz appropriately.
>
> 5. EXAMPLE 1
>
> Example, Freescale SATA controller:
>
> sata@220000 {
> compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
> reg = <0x220000 0x1000>;
> interrupts = <0x44 0x2 0x0 0x0>;
> };
>
> request to get device FD would look like:
> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
>
> The VFIO_DEVICE_GET_INFO ioctl would return:
> -1 region
> -1 interrupts
>
> The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
> -for index 0:
> offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>
> The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
> for the single interrupt.
>
> The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>
> -for region index 0:
> flags: 0x0 // i.e. this is a "reg" property
> start: 0x0 // i.e. index 0x0 in "reg"
> path: "/soc@ffe000000/sata@220000"
>
> -for interrupt index 0:
> path: "/soc@ffe000000/sata@220000"
>
> 6. EXAMPLE 2
>
> Example, Freescale crypto device (modified to illustrate):
>
> crypto@300000 {
> compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
> #address-cells = <0x1>;
> #size-cells = <0x1>;
> reg = <0x300000 0x10000>;
> interrupts = <0x5c 0x2 0x0 0x0>;
>
> jr@1000 {
> compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
> interrupts = <0x58 0x2 0x0 0x0>;
> };
>
> jr@2000 {
> compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
> interrupts = <0x59 0x2 0x0 0x0>;
> };
> };
>
> request to get device FD would look like:
> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
>
> The VFIO_DEVICE_GET_INFO ioctl would return:
> -1 region
> -3 interrupts
>
> The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
> -for index 0:
> offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
>
> The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
> for each of the IRQs-- indexes 0-4.
>
> The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>
> -for region index 0:
> flags: 0x0 // i.e. this is a "reg" property
> start: 0x0 // i.e. index 0x0 in "reg"
> path: "/soc@ffe000000/crypto@300000"
>
> -for interrupt index 0:
> path: "/soc@ffe000000/crypto@300000/jr@1000"
>
> -for interrupt index 1:
> path: "/soc@ffe000000/crypto@300000/jr@2000"
>
> 7. EXAMPLE 3
>
> Example, Freescale DMA engine (modified to illustrate):
>
> dma@101300 {
> cell-index = <0x1>;
> ranges = <0x0 0x101100 0x200>;
> reg = <0x101300 0x4>;
> compatible = "fsl,eloplus-dma";
> #size-cells = <0x1>;
> #address-cells = <0x1>;
> fsl,liodn = <0xc6>;
>
> dma-channel@180 {
> interrupts = <0x23 0x2 0x0 0x0>;
> cell-index = <0x3>;
> reg = <0x180 0x80>;
> compatible = "fsl,eloplus-dma-channel";
> };
>
> dma-channel@100 {
> interrupts = <0x22 0x2 0x0 0x0>;
> cell-index = <0x2>;
> reg = <0x100 0x80>;
> compatible = "fsl,eloplus-dma-channel";
> };
>
> };
>
> request to get device FD would look like:
> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
>
> The VFIO_DEVICE_GET_INFO ioctl would return:
> -2 regions
> -2 interrupts
>
> The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
> -for index 0:
> offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
> -for index 1:
> offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>
> The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
> for each of the IRQs-- indexes 0-3.
>
> The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>
> -for region index 0:
> flags: 0x1 // i.e. this is a "ranges" property
> start: 0x0 // i.e. index 0x0 in "ranges"
> path: "/soc@ffe000000/dma@101300"
>
> -for region index 1:
> flags: 0x0 // i.e. this is a "reg" property
> start: 0x0 // i.e. index 0x0 in "ranges"
> path: "/soc@ffe000000/dma@101300"
>
> -for interrupt index 0:
> path: "/soc@ffe000000/dma@101300/dma-channel@180"
>
> -for interrupt index 1:
> path: "/soc@ffe000000/dma@101300/dma-channel@100"
>
> 8. Open Issues
>
> -how to handle cases where VFIO is requested to handle
> a device where the valid, mappable range for a region
> is less than a page size. See example above where an
> advertised region in the DMA node is 4 bytes. If exposed
> to a guest VM, the guest has to be able to map a full page
> of I/O space which opens a potential security issue.
The way we solved this for legacy PCI device assignment was by going through QEMU for emulation and falling back to legacy read/write IIRC. We could probably do the same here. IIRC there was a way for a normal Linux mmap'ed device region to trap individual accesses too, so we could just use that one too.
The slow path emulation would then happen magically in QEMU, since MMIO writes will get reinjected into the normal QEMU MMIO handling path which will just issue a read/write on the mmap'ed region if it's not declared as emulated.
Alex
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: vfio interface for platform devices
2013-07-02 23:25 RFC: vfio interface for platform devices Yoder Stuart-B08248
2013-07-03 1:07 ` Alexander Graf
@ 2013-07-03 3:07 ` Alex Williamson
2013-07-03 10:44 ` Antonios Motakis
2013-07-03 17:20 ` Yoder Stuart-B08248
2013-07-03 22:31 ` Scott Wood
2 siblings, 2 replies; 13+ messages in thread
From: Alex Williamson @ 2013-07-03 3:07 UTC (permalink / raw)
To: Yoder Stuart-B08248
Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
Bhushan Bharat-R65777, kvm-ppc@vger.kernel.org,
virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
Antonios Motakis, kvmarm@lists.cs.columbia.edu
On Tue, 2013-07-02 at 23:25 +0000, Yoder Stuart-B08248 wrote:
> The write-up below is the first draft of a proposal for how the kernel can expose
> platform devices to user space using vfio.
>
> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> allows user space to correlate regions and interrupts to the corresponding
> device tree node structure that is defined for most platform devices.
>
> Regards,
> Stuart Yoder
>
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
>
> The existing infrastructure for vfio-pci is pretty close to what we need:
> -mechanism to create a container
> -add groups/devices to a container
> -set the IOMMU model
> -map DMA regions
> -get an fd for a specific device, which allows user space to determine
> info about device regions (e.g. registers) and interrupt info
> -support for mmapping device regions
> -mechanism to set how interrupts are signaled
>
> Platform devices can get complicated-- potentially with a tree hierarchy
> of nodes, and links/phandles pointing to other platform
> devices. The kernel doesn't expose relationships between
> devices. The kernel just exposes mappable register regions and interrupts.
> It's up to user space to work out relationships between devices
> if it needs to-- this can be determined in the device tree exposed in
> /proc/device-tree.
>
> I think the changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
>
> 1. VFIO_GROUP_GET_DEVICE_FD
>
> User space has to know which device it is accessing and will call
> VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
> get the device information:
>
> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
>
> (whether the path is a device tree path or a sysfs path is up for
> discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
>
> 2. VFIO_DEVICE_GET_INFO
>
> Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
> than adding a new flag identifying a devices as a 'platform'
> device.
>
> This ioctl simply returns the number of regions and number of irqs.
>
> The number of regions corresponds to the number of regions
> that can be mapped for the device-- corresponds to the regions defined
> in "reg" and "ranges" in the device tree.
>
> 3. VFIO_DEVICE_GET_REGION_INFO
>
> No changes needed, except perhaps adding a new flag. Freescale has some
> devices with regions that must be mapped cacheable.
>
> 3. VFIO_DEVICE_GET_IRQ_INFO
>
> No changes needed.
>
> 4. VFIO_DEVICE_GET_DEVTREE_INFO
>
> The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
> expose device regions and interrupts, but it's not enough to know
> that there are X regions and Y interrupts. User space needs to
> know what the resources are for-- to correlate those regions/interrupts
> to the device tree structure that drivers use. The device tree
> structure could consist of multiple nodes and it is necessary to
> identify the node corresponding to the region/interrupt exposed
> by VFIO.
>
> The following information is needed:
> -the device tree path to the node corresponding to the
> region or interrupt
> -for a region, whether it corresponds to a "reg" or "ranges"
> property
> -there could be multiple sub-regions per "reg" or "ranges" and
> the sub-index within the reg/ranges is needed
>
> The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
>
> ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
>
> struct vfio_path_info {
> __u32 argsz;
> __u32 flags;
> #define VFIO_DEVTREE_INFO_RANGES (1 << 3) /* the region is a "ranges" property */
(1 << 0)?
Having flags = 0x0 for regs and 0x1 for ranges is a bit awkward. I'd
suggest a bit for each. Otherwise, what does it mean when this returns
flags = 0x0 for an irq?
> __u32 index; /* input: index of region or irq for which we are getting info */
> __u32 type; /* input: 0 - get devtree info for a region
> 1 - get devtree info for an irq
> */
> __u32 start; /* output: identifies the index within the reg/ranges */
> __u8 path[]; /* output: Full path to associated device tree node */
> };
>
> User space allocates enough space for the device tree path, sets
> the type field identifying whether this is a region, or irq,
> and sets argsz appropriately.
>
> 5. EXAMPLE 1
>
> Example, Freescale SATA controller:
>
> sata@220000 {
> compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
> reg = <0x220000 0x1000>;
> interrupts = <0x44 0x2 0x0 0x0>;
> };
>
> request to get device FD would look like:
> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
>
> The VFIO_DEVICE_GET_INFO ioctl would return:
> -1 region
> -1 interrupts
>
> The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
> -for index 0:
> offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>
> The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
> for the single interrupt.
>
> The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>
> -for region index 0:
> flags: 0x0 // i.e. this is a "reg" property
> start: 0x0 // i.e. index 0x0 in "reg"
> path: "/soc@ffe000000/sata@220000"
>
> -for interrupt index 0:
> path: "/soc@ffe000000/sata@220000"
>
> 6. EXAMPLE 2
>
> Example, Freescale crypto device (modified to illustrate):
>
> crypto@300000 {
> compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
> #address-cells = <0x1>;
> #size-cells = <0x1>;
> reg = <0x300000 0x10000>;
> interrupts = <0x5c 0x2 0x0 0x0>;
>
> jr@1000 {
> compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
> interrupts = <0x58 0x2 0x0 0x0>;
> };
>
> jr@2000 {
> compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
> interrupts = <0x59 0x2 0x0 0x0>;
> };
> };
>
> request to get device FD would look like:
> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
>
> The VFIO_DEVICE_GET_INFO ioctl would return:
> -1 region
> -3 interrupts
>
> The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
> -for index 0:
> offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
>
> The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
> for each of the IRQs-- indexes 0-4.
>
> The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>
> -for region index 0:
> flags: 0x0 // i.e. this is a "reg" property
> start: 0x0 // i.e. index 0x0 in "reg"
> path: "/soc@ffe000000/crypto@300000"
>
> -for interrupt index 0:
> path: "/soc@ffe000000/crypto@300000/jr@1000"
>
> -for interrupt index 1:
> path: "/soc@ffe000000/crypto@300000/jr@2000"
>
> 7. EXAMPLE 3
>
> Example, Freescale DMA engine (modified to illustrate):
>
> dma@101300 {
> cell-index = <0x1>;
> ranges = <0x0 0x101100 0x200>;
> reg = <0x101300 0x4>;
> compatible = "fsl,eloplus-dma";
> #size-cells = <0x1>;
> #address-cells = <0x1>;
> fsl,liodn = <0xc6>;
>
> dma-channel@180 {
> interrupts = <0x23 0x2 0x0 0x0>;
> cell-index = <0x3>;
> reg = <0x180 0x80>;
> compatible = "fsl,eloplus-dma-channel";
> };
>
> dma-channel@100 {
> interrupts = <0x22 0x2 0x0 0x0>;
> cell-index = <0x2>;
> reg = <0x100 0x80>;
> compatible = "fsl,eloplus-dma-channel";
> };
>
> };
>
> request to get device FD would look like:
> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
>
> The VFIO_DEVICE_GET_INFO ioctl would return:
> -2 regions
> -2 interrupts
>
> The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
> -for index 0:
> offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
> -for index 1:
> offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>
> The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
> for each of the IRQs-- indexes 0-3.
>
> The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>
> -for region index 0:
> flags: 0x1 // i.e. this is a "ranges" property
> start: 0x0 // i.e. index 0x0 in "ranges"
> path: "/soc@ffe000000/dma@101300"
>
> -for region index 1:
> flags: 0x0 // i.e. this is a "reg" property
> start: 0x0 // i.e. index 0x0 in "ranges"
> path: "/soc@ffe000000/dma@101300"
>
> -for interrupt index 0:
> path: "/soc@ffe000000/dma@101300/dma-channel@180"
>
> -for interrupt index 1:
> path: "/soc@ffe000000/dma@101300/dma-channel@100"
>
> 8. Open Issues
>
> -how to handle cases where VFIO is requested to handle
> a device where the valid, mappable range for a region
> is less than a page size. See example above where an
> advertised region in the DMA node is 4 bytes. If exposed
> to a guest VM, the guest has to be able to map a full page
> of I/O space which opens a potential security issue.
As AlexG points out, we solve that on vfio-pci by not supporting mmap on
those regions and only allowing read/write. If you could make the
platform map regions on page size boundaries and there's nothing bad a
guest can do by accessing the empty space, you could still support mmap.
We can't make such requirements or guarantees on PCI though. The PCI
spec also suggests for devices to use page size regions and high
performance devices generally follow that request, so it has become a
fallback for low performance devices and I/O port space, which we can't
mmap on x86 anyway.
So overall the interface and extension makes sense. My only question is
whether it's better to get complete reuse out of GET_REGION_INFO and
GET_IRQ_INFO and then add another device tree specific ioctl or is it
better to add a device tree index and path to the existing GET_*_INFO
ioctls? Getting some information from one ioctl and passing pieces of
it back to another ioctl feels a little clunky.
DEVICE_GET_INFO will identify the device as device tree, which gives you
the opportunity to extend or replace vfio_region_info and vfio_irq_info.
It seems like it could even be done in a compatible way. For example,
if you were to call VFIO_DEVICE_GET_REGION_INFO with argsz =
sizeof(struct vfio_region_info), the kernel could fill in all the info
up to that size and fill argsz with the size needed for the remaining
info. You could then realloc the buffer and the kernel would add the
extra info on the next call, setting a flag for each additional field
returned. Userspace could also just be sloppy and call it with a lot of
padding and get everything in one shot.
We'd need to define which flags have associated structures and define
those structures. For instance, some require no space:
#define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
#define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
Others imply a structure added to the end:
#define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
struct vfio_devtree_region_info_index
{
u32 index;
}
#define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
struct vfio_devtree_region_info_path
{
u32 len;
u8 path[];
}
The order of the flags indicates the order of the structures at the end.
We'd need to have some rules about alignment, probably always dword
aligned. I'm not sure if it would be necessary each structure to have a
length. It would only be needed if we want to let userspace skip over
structures they don't understand how to parse.
Another idea is that the space after struct vfio_region/irq_info could
be a self describing capabilities area, much like PCI config space.
Starting immediately after the static structure we'd have:
struct vfio_info_cap_header
{
u16 type;
u16 next;
};
Where type defines the structure that follows and next indicates the
offset of then next header (could also be len of current cap).
Anyway, it seems like there are possibilities that would allow us to
extend the info ioctls in ways that would be generic for any device
type. Thanks,
Alex
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: vfio interface for platform devices
2013-07-03 3:07 ` Alex Williamson
@ 2013-07-03 10:44 ` Antonios Motakis
2013-07-03 19:23 ` Yoder Stuart-B08248
2013-07-03 17:20 ` Yoder Stuart-B08248
1 sibling, 1 reply; 13+ messages in thread
From: Antonios Motakis @ 2013-07-03 10:44 UTC (permalink / raw)
To: Alex Williamson
Cc: Yoder Stuart-B08248, Alexander Graf, Wood Scott-B07421,
Bhushan Bharat-R65777, Sethi Varun-B16395,
virtualization@lists.linux-foundation.org,
kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
kvmarm@lists.cs.columbia.edu
On Wed, Jul 3, 2013 at 5:07 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2013-07-02 at 23:25 +0000, Yoder Stuart-B08248 wrote:
>> The write-up below is the first draft of a proposal for how the kernel can expose
>> platform devices to user space using vfio.
>>
>> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
>> allows user space to correlate regions and interrupts to the corresponding
>> device tree node structure that is defined for most platform devices.
>>
>> Regards,
>> Stuart Yoder
>>
>> ------------------------------------------------------------------------------
>> VFIO for Platform Devices
>>
>> The existing infrastructure for vfio-pci is pretty close to what we need:
>> -mechanism to create a container
>> -add groups/devices to a container
>> -set the IOMMU model
>> -map DMA regions
>> -get an fd for a specific device, which allows user space to determine
>> info about device regions (e.g. registers) and interrupt info
>> -support for mmapping device regions
>> -mechanism to set how interrupts are signaled
>>
>> Platform devices can get complicated-- potentially with a tree hierarchy
>> of nodes, and links/phandles pointing to other platform
>> devices. The kernel doesn't expose relationships between
>> devices. The kernel just exposes mappable register regions and interrupts.
>> It's up to user space to work out relationships between devices
>> if it needs to-- this can be determined in the device tree exposed in
>> /proc/device-tree.
>>
>> I think the changes needed for vfio are around some of the device tree
>> related info that needs to be available with the device fd.
>>
>> 1. VFIO_GROUP_GET_DEVICE_FD
>>
>> User space has to know which device it is accessing and will call
>> VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
>> get the device information:
>>
>> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/usb@210000");
>>
>> (whether the path is a device tree path or a sysfs path is up for
>> discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
>>
>> 2. VFIO_DEVICE_GET_INFO
>>
>> Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
>> than adding a new flag identifying a devices as a 'platform'
>> device.
>>
>> This ioctl simply returns the number of regions and number of irqs.
>>
>> The number of regions corresponds to the number of regions
>> that can be mapped for the device-- corresponds to the regions defined
>> in "reg" and "ranges" in the device tree.
>>
>> 3. VFIO_DEVICE_GET_REGION_INFO
>>
>> No changes needed, except perhaps adding a new flag. Freescale has some
>> devices with regions that must be mapped cacheable.
>>
>> 3. VFIO_DEVICE_GET_IRQ_INFO
>>
>> No changes needed.
>>
>> 4. VFIO_DEVICE_GET_DEVTREE_INFO
>>
>> The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
>> expose device regions and interrupts, but it's not enough to know
>> that there are X regions and Y interrupts. User space needs to
>> know what the resources are for-- to correlate those regions/interrupts
>> to the device tree structure that drivers use. The device tree
>> structure could consist of multiple nodes and it is necessary to
>> identify the node corresponding to the region/interrupt exposed
>> by VFIO.
>>
>> The following information is needed:
>> -the device tree path to the node corresponding to the
>> region or interrupt
>> -for a region, whether it corresponds to a "reg" or "ranges"
>> property
>> -there could be multiple sub-regions per "reg" or "ranges" and
>> the sub-index within the reg/ranges is needed
>>
>> The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
>>
>> ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
>>
>> struct vfio_path_info {
>> __u32 argsz;
>> __u32 flags;
>> #define VFIO_DEVTREE_INFO_RANGES (1 << 3) /* the region is a "ranges" property */
>
> (1 << 0)?
>
> Having flags = 0x0 for regs and 0x1 for ranges is a bit awkward. I'd
> suggest a bit for each. Otherwise, what does it mean when this returns
> flags = 0x0 for an irq?
>
>> __u32 index; /* input: index of region or irq for which we are getting info */
>> __u32 type; /* input: 0 - get devtree info for a region
>> 1 - get devtree info for an irq
>> */
>> __u32 start; /* output: identifies the index within the reg/ranges */
>> __u8 path[]; /* output: Full path to associated device tree node */
>> };
>>
>> User space allocates enough space for the device tree path, sets
>> the type field identifying whether this is a region, or irq,
>> and sets argsz appropriately.
>>
>> 5. EXAMPLE 1
>>
>> Example, Freescale SATA controller:
>>
>> sata@220000 {
>> compatible = "fsl,p2041-sata", "fsl,pq-sata-v2";
>> reg = <0x220000 0x1000>;
>> interrupts = <0x44 0x2 0x0 0x0>;
>> };
>>
>> request to get device FD would look like:
>> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/sata@220000");
>>
>> The VFIO_DEVICE_GET_INFO ioctl would return:
>> -1 region
>> -1 interrupts
>>
>> The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>> -for index 0:
>> offset=0, size=0x10000 -- allows mmap of physical 0xffe220000
>>
>> The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>> for the single interrupt.
>>
>> The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>> -for region index 0:
>> flags: 0x0 // i.e. this is a "reg" property
>> start: 0x0 // i.e. index 0x0 in "reg"
>> path: "/soc@ffe000000/sata@220000"
>>
>> -for interrupt index 0:
>> path: "/soc@ffe000000/sata@220000"
>>
>> 6. EXAMPLE 2
>>
>> Example, Freescale crypto device (modified to illustrate):
>>
>> crypto@300000 {
>> compatible = "fsl,sec-v4.2", "fsl,sec-v4.0";
>> #address-cells = <0x1>;
>> #size-cells = <0x1>;
>> reg = <0x300000 0x10000>;
>> interrupts = <0x5c 0x2 0x0 0x0>;
>>
>> jr@1000 {
>> compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>> interrupts = <0x58 0x2 0x0 0x0>;
>> };
>>
>> jr@2000 {
>> compatible = "fsl,sec-v4.2-job-ring", "fsl,sec-v4.0-job-ring";
>> interrupts = <0x59 0x2 0x0 0x0>;
>> };
>> };
>>
>> request to get device FD would look like:
>> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/crypto@300000");
>>
>> The VFIO_DEVICE_GET_INFO ioctl would return:
>> -1 region
>> -3 interrupts
>>
>> The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>> -for index 0:
>> offset=0, size=0x10000 -- allows mmap of physical 0xffe300000
>>
>> The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>> for each of the IRQs-- indexes 0-4.
>>
>> The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>> -for region index 0:
>> flags: 0x0 // i.e. this is a "reg" property
>> start: 0x0 // i.e. index 0x0 in "reg"
>> path: "/soc@ffe000000/crypto@300000"
>>
>> -for interrupt index 0:
>> path: "/soc@ffe000000/crypto@300000/jr@1000"
>>
>> -for interrupt index 1:
>> path: "/soc@ffe000000/crypto@300000/jr@2000"
>>
>> 7. EXAMPLE 3
>>
>> Example, Freescale DMA engine (modified to illustrate):
>>
>> dma@101300 {
>> cell-index = <0x1>;
>> ranges = <0x0 0x101100 0x200>;
>> reg = <0x101300 0x4>;
>> compatible = "fsl,eloplus-dma";
>> #size-cells = <0x1>;
>> #address-cells = <0x1>;
>> fsl,liodn = <0xc6>;
>>
>> dma-channel@180 {
>> interrupts = <0x23 0x2 0x0 0x0>;
>> cell-index = <0x3>;
>> reg = <0x180 0x80>;
>> compatible = "fsl,eloplus-dma-channel";
>> };
>>
>> dma-channel@100 {
>> interrupts = <0x22 0x2 0x0 0x0>;
>> cell-index = <0x2>;
>> reg = <0x100 0x80>;
>> compatible = "fsl,eloplus-dma-channel";
>> };
>>
>> };
>>
>> request to get device FD would look like:
>> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "/soc@ffe000000/dma@101300");
>>
>> The VFIO_DEVICE_GET_INFO ioctl would return:
>> -2 regions
>> -2 interrupts
>>
>> The VFIO_DEVICE_GET_REGION_INFO ioctl would return:
>> -for index 0:
>> offset=0x100, size=0x200 -- allows mmap of physical 0xffe101100
>> -for index 1:
>> offset=0x300, size=0x4 -- allows mmap of physical 0xffe101300
>>
>> The VFIO_DEVICE_GET_IRQ_INFO ioctl would return appropriate info
>> for each of the IRQs-- indexes 0-3.
>>
>> The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>>
>> -for region index 0:
>> flags: 0x1 // i.e. this is a "ranges" property
>> start: 0x0 // i.e. index 0x0 in "ranges"
>> path: "/soc@ffe000000/dma@101300"
>>
>> -for region index 1:
>> flags: 0x0 // i.e. this is a "reg" property
>> start: 0x0 // i.e. index 0x0 in "ranges"
>> path: "/soc@ffe000000/dma@101300"
>>
>> -for interrupt index 0:
>> path: "/soc@ffe000000/dma@101300/dma-channel@180"
>>
>> -for interrupt index 1:
>> path: "/soc@ffe000000/dma@101300/dma-channel@100"
>>
>> 8. Open Issues
>>
>> -how to handle cases where VFIO is requested to handle
>> a device where the valid, mappable range for a region
>> is less than a page size. See example above where an
>> advertised region in the DMA node is 4 bytes. If exposed
>> to a guest VM, the guest has to be able to map a full page
>> of I/O space which opens a potential security issue.
>
> As AlexG points out, we solve that on vfio-pci by not supporting mmap on
> those regions and only allowing read/write. If you could make the
> platform map regions on page size boundaries and there's nothing bad a
> guest can do by accessing the empty space, you could still support mmap.
> We can't make such requirements or guarantees on PCI though. The PCI
> spec also suggests for devices to use page size regions and high
> performance devices generally follow that request, so it has become a
> fallback for low performance devices and I/O port space, which we can't
> mmap on x86 anyway.
>
> So overall the interface and extension makes sense. My only question is
> whether it's better to get complete reuse out of GET_REGION_INFO and
> GET_IRQ_INFO and then add another device tree specific ioctl or is it
> better to add a device tree index and path to the existing GET_*_INFO
> ioctls? Getting some information from one ioctl and passing pieces of
> it back to another ioctl feels a little clunky.
>
I thing at this point we should clearly separate the info we need to
pass for the core functionality (assigning the device's resources),
and the information we want to pass in order to generate a guest DT.
For ARM a DT is not generated by QEMU yet, but instead a proper DTB
needs to be passed by the user (granted, this will not be the case for
ever). So I think even if we treat them the same in code, we should be
discussing them separately.
Other than that I think it is preferable to extend the existing ioctls
rather than add new ones.
> DEVICE_GET_INFO will identify the device as device tree, which gives you
> the opportunity to extend or replace vfio_region_info and vfio_irq_info.
> It seems like it could even be done in a compatible way. For example,
> if you were to call VFIO_DEVICE_GET_REGION_INFO with argsz =
> sizeof(struct vfio_region_info), the kernel could fill in all the info
> up to that size and fill argsz with the size needed for the remaining
> info. You could then realloc the buffer and the kernel would add the
> extra info on the next call, setting a flag for each additional field
> returned. Userspace could also just be sloppy and call it with a lot of
> padding and get everything in one shot.
>
> We'd need to define which flags have associated structures and define
> those structures. For instance, some require no space:
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
> #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
>
> Others imply a structure added to the end:
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
>
> struct vfio_devtree_region_info_index
> {
> u32 index;
> }
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
>
> struct vfio_devtree_region_info_path
> {
> u32 len;
> u8 path[];
> }
>
> The order of the flags indicates the order of the structures at the end.
> We'd need to have some rules about alignment, probably always dword
> aligned. I'm not sure if it would be necessary each structure to have a
> length. It would only be needed if we want to let userspace skip over
> structures they don't understand how to parse.
>
> Another idea is that the space after struct vfio_region/irq_info could
> be a self describing capabilities area, much like PCI config space.
> Starting immediately after the static structure we'd have:
>
> struct vfio_info_cap_header
> {
> u16 type;
> u16 next;
> };
>
> Where type defines the structure that follows and next indicates the
> offset of then next header (could also be len of current cap).
>
> Anyway, it seems like there are possibilities that would allow us to
> extend the info ioctls in ways that would be generic for any device
> type. Thanks,
>
> Alex
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: RFC: vfio interface for platform devices
2013-07-03 3:07 ` Alex Williamson
2013-07-03 10:44 ` Antonios Motakis
@ 2013-07-03 17:20 ` Yoder Stuart-B08248
1 sibling, 0 replies; 13+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 17:20 UTC (permalink / raw)
To: Alex Williamson
Cc: Wood Scott-B07421, kvm@vger.kernel.org list,
Bhushan Bharat-R65777, kvm-ppc@vger.kernel.org,
virtualization@lists.linux-foundation.org, Sethi Varun-B16395,
Antonios Motakis, kvmarm@lists.cs.columbia.edu
[cut]
> So overall the interface and extension makes sense. My only question is
> whether it's better to get complete reuse out of GET_REGION_INFO and
> GET_IRQ_INFO and then add another device tree specific ioctl or is it
> better to add a device tree index and path to the existing GET_*_INFO
> ioctls? Getting some information from one ioctl and passing pieces of
> it back to another ioctl feels a little clunky.
Heh...extending region/irq info is the direction I started with, but
because of the variable nature of the device tree data thought maybe
it was better to not add complexity to those APIs and leave them
alone.
Many or most platform devices will have 1 region and 1 interrupt, and
so it wouldn't be necessary in most cases to need device tree info at
all since there is no ambiguity. So, was thinking that for the more
rare, complicated devices that a bit would advertise the existence of
the device tree info and the separate ioctl would be used to access it.
But, I'm completely open to extending the get region/irq info
ioctls if that direction is what you prefer...which seems to be
the case.
> DEVICE_GET_INFO will identify the device as device tree, which gives you
> the opportunity to extend or replace vfio_region_info and vfio_irq_info.
> It seems like it could even be done in a compatible way. For example,
> if you were to call VFIO_DEVICE_GET_REGION_INFO with argsz =
> sizeof(struct vfio_region_info), the kernel could fill in all the info
> up to that size and fill argsz with the size needed for the remaining
> info. You could then realloc the buffer and the kernel would add the
> extra info on the next call, setting a flag for each additional field
> returned. Userspace could also just be sloppy and call it with a lot of
> padding and get everything in one shot.
>
> We'd need to define which flags have associated structures and define
> those structures. For instance, some require no space:
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_REG (1 << ?)
> #define VFIO_DEVTREE_REGION_INFO_FLAG_RANGE (1 << ?)
>
> Others imply a structure added to the end:
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_INDEX (1 << ?)
>
> struct vfio_devtree_region_info_index
> {
> u32 index;
> }
>
> #define VFIO_DEVTREE_REGION_INFO_FLAG_PATH (1 << ?)
>
> struct vfio_devtree_region_info_path
> {
> u32 len;
> u8 path[];
> }
>
> The order of the flags indicates the order of the structures at the end.
> We'd need to have some rules about alignment, probably always dword
> aligned. I'm not sure if it would be necessary each structure to have a
> length. It would only be needed if we want to let userspace skip over
> structures they don't understand how to parse.
>
> Another idea is that the space after struct vfio_region/irq_info could
> be a self describing capabilities area, much like PCI config space.
> Starting immediately after the static structure we'd have:
>
> struct vfio_info_cap_header
> {
> u16 type;
> u16 next;
> };
>
> Where type defines the structure that follows and next indicates the
> offset of then next header (could also be len of current cap).
>
> Anyway, it seems like there are possibilities that would allow us to
> extend the info ioctls in ways that would be generic for any device
> type. Thanks,
I think I like the approach using the flags and struct
extensions.
Stuart
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: vfio interface for platform devices
2013-07-03 1:07 ` Alexander Graf
@ 2013-07-03 18:51 ` Scott Wood
2013-07-03 19:08 ` Yoder Stuart-B08248
0 siblings, 1 reply; 13+ messages in thread
From: Scott Wood @ 2013-07-03 18:51 UTC (permalink / raw)
To: Alexander Graf
Cc: Yoder Stuart-B08248, Alex Williamson, Wood Scott-B07421,
Bhushan Bharat-R65777, Sethi Varun-B16395,
virtualization@lists.linux-foundation.org, Antonios Motakis,
kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
kvmarm@lists.cs.columbia.edu
On 07/02/2013 08:07:53 PM, Alexander Graf wrote:
>
> On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:
>
> > 8. Open Issues
> >
> > -how to handle cases where VFIO is requested to handle
> > a device where the valid, mappable range for a region
> > is less than a page size. See example above where an
> > advertised region in the DMA node is 4 bytes. If exposed
> > to a guest VM, the guest has to be able to map a full page
> > of I/O space which opens a potential security issue.
>
> The way we solved this for legacy PCI device assignment was by going
> through QEMU for emulation and falling back to legacy read/write
> IIRC. We could probably do the same here. IIRC there was a way for a
> normal Linux mmap'ed device region to trap individual accesses too,
> so we could just use that one too.
>
> The slow path emulation would then happen magically in QEMU, since
> MMIO writes will get reinjected into the normal QEMU MMIO handling
> path which will just issue a read/write on the mmap'ed region if it's
> not declared as emulated.
I agree that's what should happen by default, but there should be a way
for root to tell vfio that a device is allowed to overmap, in order to
get the performance benefit of direct access in cases where root knows
(or explicitly doesn't care) that it is safe.
-Scott
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: RFC: vfio interface for platform devices
2013-07-03 18:51 ` Scott Wood
@ 2013-07-03 19:08 ` Yoder Stuart-B08248
0 siblings, 0 replies; 13+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 19:08 UTC (permalink / raw)
To: Wood Scott-B07421, Alexander Graf
Cc: kvm@vger.kernel.org list, Bhushan Bharat-R65777,
kvm-ppc@vger.kernel.org,
virtualization@lists.linux-foundation.org, Antonios Motakis,
Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu
> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Wednesday, July 03, 2013 1:52 PM
> To: Alexander Graf
> Cc: Yoder Stuart-B08248; Alex Williamson; Wood Scott-B07421; Bhushan Bharat-R65777; Sethi Varun-B16395;
> virtualization@lists.linux-foundation.org; Antonios Motakis; kvm@vger.kernel.org list; kvm-
> ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices
>
> On 07/02/2013 08:07:53 PM, Alexander Graf wrote:
> >
> > On 03.07.2013, at 01:25, Yoder Stuart-B08248 wrote:
> >
> > > 8. Open Issues
> > >
> > > -how to handle cases where VFIO is requested to handle
> > > a device where the valid, mappable range for a region
> > > is less than a page size. See example above where an
> > > advertised region in the DMA node is 4 bytes. If exposed
> > > to a guest VM, the guest has to be able to map a full page
> > > of I/O space which opens a potential security issue.
> >
> > The way we solved this for legacy PCI device assignment was by going
> > through QEMU for emulation and falling back to legacy read/write
> > IIRC. We could probably do the same here. IIRC there was a way for a
> > normal Linux mmap'ed device region to trap individual accesses too,
> > so we could just use that one too.
> >
> > The slow path emulation would then happen magically in QEMU, since
> > MMIO writes will get reinjected into the normal QEMU MMIO handling
> > path which will just issue a read/write on the mmap'ed region if it's
> > not declared as emulated.
>
> I agree that's what should happen by default, but there should be a way
> for root to tell vfio that a device is allowed to overmap, in order to
> get the performance benefit of direct access in cases where root knows
> (or explicitly doesn't care) that it is safe.
Perhaps a sysfs mechanism like this:
echo "/sys/bus/platform/devices/ffe210000.usb" > /sys/bus/platform/drivers/vfio-platform/allow_overmap
Stuart
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: RFC: vfio interface for platform devices
2013-07-03 10:44 ` Antonios Motakis
@ 2013-07-03 19:23 ` Yoder Stuart-B08248
0 siblings, 0 replies; 13+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-03 19:23 UTC (permalink / raw)
To: Antonios Motakis, Alex Williamson
Cc: Alexander Graf, Wood Scott-B07421, Bhushan Bharat-R65777,
Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
kvmarm@lists.cs.columbia.edu
[cut]
> > So overall the interface and extension makes sense. My only question is
> > whether it's better to get complete reuse out of GET_REGION_INFO and
> > GET_IRQ_INFO and then add another device tree specific ioctl or is it
> > better to add a device tree index and path to the existing GET_*_INFO
> > ioctls? Getting some information from one ioctl and passing pieces of
> > it back to another ioctl feels a little clunky.
> >
>
> I thing at this point we should clearly separate the info we need to
> pass for the core functionality (assigning the device's resources),
> and the information we want to pass in order to generate a guest DT.
> For ARM a DT is not generated by QEMU yet, but instead a proper DTB
> needs to be passed by the user (granted, this will not be the case for
> ever). So I think even if we treat them the same in code, we should be
> discussing them separately.
We do need to keep core resources separate from what it takes
to generate a guest DT, but note the purpose of the devtree info
is not primarily to help generate a guest DT.
User space (not just QEMU) needs to know what the regions
and interrupts advertised by DEVICE_GET_INFO correspond to.
If there are 4 interrupts and 2 register regions, how does user
space know the purpose/function of each?
Apart from something like the devtree info I don't see
how a user space driver can know how to use the regions
and interrupts. The kernel is not guaranteeing any
particular ordering of resources.
So in the DMA engine example I gave, the devtree info
let's user space know which interrupt corresponds to
which DMA channel.
QEMU is a special case in that it is going to expose
the device to a virtual machine and needs to generate
a normal device tree node...but that is a separate problem
that needs to be solved in QEMU.
Stuart
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: vfio interface for platform devices
2013-07-02 23:25 RFC: vfio interface for platform devices Yoder Stuart-B08248
2013-07-03 1:07 ` Alexander Graf
2013-07-03 3:07 ` Alex Williamson
@ 2013-07-03 22:31 ` Scott Wood
2013-07-16 21:51 ` Yoder Stuart-B08248
2 siblings, 1 reply; 13+ messages in thread
From: Scott Wood @ 2013-07-03 22:31 UTC (permalink / raw)
To: Yoder Stuart-B08248
Cc: Alex Williamson, Alexander Graf, Wood Scott-B07421,
Bhushan Bharat-R65777, Sethi Varun-B16395,
virtualization@lists.linux-foundation.org, Antonios Motakis,
kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
kvmarm@lists.cs.columbia.edu
On 07/02/2013 06:25:59 PM, Yoder Stuart-B08248 wrote:
> The write-up below is the first draft of a proposal for how the
> kernel can expose
> platform devices to user space using vfio.
>
> In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> allows user space to correlate regions and interrupts to the
> corresponding
> device tree node structure that is defined for most platform devices.
>
> Regards,
> Stuart Yoder
>
> ------------------------------------------------------------------------------
> VFIO for Platform Devices
>
> The existing infrastructure for vfio-pci is pretty close to what we
> need:
> -mechanism to create a container
> -add groups/devices to a container
> -set the IOMMU model
> -map DMA regions
> -get an fd for a specific device, which allows user space to
> determine
> info about device regions (e.g. registers) and interrupt info
> -support for mmapping device regions
> -mechanism to set how interrupts are signaled
>
> Platform devices can get complicated-- potentially with a tree
> hierarchy
> of nodes, and links/phandles pointing to other platform
> devices. The kernel doesn't expose relationships between
> devices. The kernel just exposes mappable register regions and
> interrupts.
> It's up to user space to work out relationships between devices
> if it needs to-- this can be determined in the device tree exposed in
> /proc/device-tree.
>
> I think the changes needed for vfio are around some of the device tree
> related info that needs to be available with the device fd.
>
> 1. VFIO_GROUP_GET_DEVICE_FD
>
> User space has to know which device it is accessing and will call
> VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
> get the device information:
>
> fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
> "/soc@ffe000000/usb@210000");
>
> (whether the path is a device tree path or a sysfs path is up for
> discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
Doesn't VFIO need to operate on an actual Linux device, rather than
just an OF node?
Are we going to have a fixed assumption that you always want all the
children of the node corresponding to the assigned device, or will it
be possible to exclude some?
> 2. VFIO_DEVICE_GET_INFO
>
> Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
> than adding a new flag identifying a devices as a 'platform'
> device.
>
> This ioctl simply returns the number of regions and number of irqs.
>
> The number of regions corresponds to the number of regions
> that can be mapped for the device-- corresponds to the regions
> defined
> in "reg" and "ranges" in the device tree.
>
> 3. VFIO_DEVICE_GET_REGION_INFO
>
> No changes needed, except perhaps adding a new flag. Freescale
> has some
> devices with regions that must be mapped cacheable.
While I don't object to making the information available to the user
just in case, the main thing we need here is to influence what the
kernel does when the user tries to map it. At least on PPC it's not up
to userspace to select whether a mmap is cacheable.
> 4. VFIO_DEVICE_GET_DEVTREE_INFO
>
> The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
> expose device regions and interrupts, but it's not enough to know
> that there are X regions and Y interrupts. User space needs to
> know what the resources are for-- to correlate those
> regions/interrupts
> to the device tree structure that drivers use. The device tree
> structure could consist of multiple nodes and it is necessary to
> identify the node corresponding to the region/interrupt exposed
> by VFIO.
>
> The following information is needed:
> -the device tree path to the node corresponding to the
> region or interrupt
> -for a region, whether it corresponds to a "reg" or "ranges"
> property
> -there could be multiple sub-regions per "reg" or "ranges" and
> the sub-index within the reg/ranges is needed
>
> The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
>
> ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
>
> struct vfio_path_info {
> __u32 argsz;
> __u32 flags;
> #define VFIO_DEVTREE_INFO_RANGES (1 << 3) /* the region is a
> "ranges" property */
What about distinguishing a normal interrupt from one found in an
interrupt-map?
In the case of both ranges and interrupt-maps, we'll also want to
decide what the policy is for when to expose them directly, versus just
using them to translate regs and interrupts of child nodes.
> __u32 index; /* input: index of region or irq for
> which we are getting info */
> __u32 type; /* input: 0 - get devtree info for a
> region
> 1 - get devtree info for an
> irq
> */
> __u32 start; /* output: identifies the index
> within the reg/ranges */
"start" is an odd name for this. I'd rename "index" to "vfio_index"
and this to "dt_index".
> __u8 path[]; /* output: Full path to associated
> device tree node */
How does the caller know what size buffer to supply for this?
> The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
>
> -for region index 0:
> flags: 0x0 // i.e. this is a "reg" property
> start: 0x0 // i.e. index 0x0 in "reg"
> path: "/soc@ffe000000/crypto@300000"
>
> -for interrupt index 0:
> path: "/soc@ffe000000/crypto@300000/jr@1000"
>
> -for interrupt index 1:
> path: "/soc@ffe000000/crypto@300000/jr@2000"
Where is "start" for the interrupts?
-Scott
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: RFC: vfio interface for platform devices
2013-07-03 22:31 ` Scott Wood
@ 2013-07-16 21:51 ` Yoder Stuart-B08248
2013-07-16 22:01 ` Scott Wood
0 siblings, 1 reply; 13+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-16 21:51 UTC (permalink / raw)
To: Wood Scott-B07421
Cc: Alex Williamson, Alexander Graf, Bhushan Bharat-R65777,
Sethi Varun-B16395, virtualization@lists.linux-foundation.org,
Antonios Motakis, kvm@vger.kernel.org list,
kvm-ppc@vger.kernel.org, kvmarm@lists.cs.columbia.edu
> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Wednesday, July 03, 2013 5:32 PM
> To: Yoder Stuart-B08248
> Cc: Alex Williamson; Alexander Graf; Wood Scott-B07421; Bhushan Bharat-R65777; Sethi Varun-B16395;
> virtualization@lists.linux-foundation.org; Antonios Motakis; kvm@vger.kernel.org list; kvm-
> ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices
>
> On 07/02/2013 06:25:59 PM, Yoder Stuart-B08248 wrote:
> > The write-up below is the first draft of a proposal for how the
> > kernel can expose
> > platform devices to user space using vfio.
> >
> > In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
> > allows user space to correlate regions and interrupts to the
> > corresponding
> > device tree node structure that is defined for most platform devices.
> >
> > Regards,
> > Stuart Yoder
> >
> > ------------------------------------------------------------------------------
> > VFIO for Platform Devices
> >
> > The existing infrastructure for vfio-pci is pretty close to what we
> > need:
> > -mechanism to create a container
> > -add groups/devices to a container
> > -set the IOMMU model
> > -map DMA regions
> > -get an fd for a specific device, which allows user space to
> > determine
> > info about device regions (e.g. registers) and interrupt info
> > -support for mmapping device regions
> > -mechanism to set how interrupts are signaled
> >
> > Platform devices can get complicated-- potentially with a tree
> > hierarchy
> > of nodes, and links/phandles pointing to other platform
> > devices. The kernel doesn't expose relationships between
> > devices. The kernel just exposes mappable register regions and
> > interrupts.
> > It's up to user space to work out relationships between devices
> > if it needs to-- this can be determined in the device tree exposed in
> > /proc/device-tree.
> >
> > I think the changes needed for vfio are around some of the device tree
> > related info that needs to be available with the device fd.
> >
> > 1. VFIO_GROUP_GET_DEVICE_FD
> >
> > User space has to know which device it is accessing and will call
> > VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
> > get the device information:
> >
> > fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
> > "/soc@ffe000000/usb@210000");
> >
> > (whether the path is a device tree path or a sysfs path is up for
> > discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
>
> Doesn't VFIO need to operate on an actual Linux device, rather than
> just an OF node?
>
> Are we going to have a fixed assumption that you always want all the
> children of the node corresponding to the assigned device, or will it
> be possible to exclude some?
My assumption is that you always get all the children of the
node corresponding to the assigned device.
> > 2. VFIO_DEVICE_GET_INFO
> >
> > Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
> > than adding a new flag identifying a devices as a 'platform'
> > device.
> >
> > This ioctl simply returns the number of regions and number of irqs.
> >
> > The number of regions corresponds to the number of regions
> > that can be mapped for the device-- corresponds to the regions
> > defined
> > in "reg" and "ranges" in the device tree.
> >
> > 3. VFIO_DEVICE_GET_REGION_INFO
> >
> > No changes needed, except perhaps adding a new flag. Freescale
> > has some
> > devices with regions that must be mapped cacheable.
>
> While I don't object to making the information available to the user
> just in case, the main thing we need here is to influence what the
> kernel does when the user tries to map it. At least on PPC it's not up
> to userspace to select whether a mmap is cacheable.
If user space really can't do anything with the 'cacheable'
flag, do you think there is good reason to keep it? Will it
help any decision that user space makes? Maybe we should just
drop it.
> > 4. VFIO_DEVICE_GET_DEVTREE_INFO
> >
> > The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
> > expose device regions and interrupts, but it's not enough to know
> > that there are X regions and Y interrupts. User space needs to
> > know what the resources are for-- to correlate those
> > regions/interrupts
> > to the device tree structure that drivers use. The device tree
> > structure could consist of multiple nodes and it is necessary to
> > identify the node corresponding to the region/interrupt exposed
> > by VFIO.
> >
> > The following information is needed:
> > -the device tree path to the node corresponding to the
> > region or interrupt
> > -for a region, whether it corresponds to a "reg" or "ranges"
> > property
> > -there could be multiple sub-regions per "reg" or "ranges" and
> > the sub-index within the reg/ranges is needed
> >
> > The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
> >
> > ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
> >
> > struct vfio_path_info {
> > __u32 argsz;
> > __u32 flags;
> > #define VFIO_DEVTREE_INFO_RANGES (1 << 3) /* the region is a
> > "ranges" property */
>
> What about distinguishing a normal interrupt from one found in an
> interrupt-map?
I'm not sure we need that. The kernel needs to use the interrupt
map to get interrupts hooked up right, but all user space needs to
know is that there are N interrupts and possibly device tree
paths to help user space interpret which interrupt is which.
> In the case of both ranges and interrupt-maps, we'll also want to
> decide what the policy is for when to expose them directly, versus just
> using them to translate regs and interrupts of child nodes
Yes, not sure the best approach there...but guess we can cross
that bridge when we implement this. It doesn't affect this
interface.
> > __u32 index; /* input: index of region or irq for
> > which we are getting info */
> > __u32 type; /* input: 0 - get devtree info for a
> > region
> > 1 - get devtree info for an
> > irq
> > */
> > __u32 start; /* output: identifies the index
> > within the reg/ranges */
>
> "start" is an odd name for this. I'd rename "index" to "vfio_index"
> and this to "dt_index".
>
> > __u8 path[]; /* output: Full path to associated
> > device tree node */
>
> How does the caller know what size buffer to supply for this?
>
> > The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
> >
> > -for region index 0:
> > flags: 0x0 // i.e. this is a "reg" property
> > start: 0x0 // i.e. index 0x0 in "reg"
> > path: "/soc@ffe000000/crypto@300000"
> >
> > -for interrupt index 0:
> > path: "/soc@ffe000000/crypto@300000/jr@1000"
> >
> > -for interrupt index 1:
> > path: "/soc@ffe000000/crypto@300000/jr@2000"
>
> Where is "start" for the interrupts?
v2 of the proposal made changes that got rid of that stuff.
Stuart
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: vfio interface for platform devices
2013-07-16 21:51 ` Yoder Stuart-B08248
@ 2013-07-16 22:01 ` Scott Wood
2013-07-16 22:41 ` Yoder Stuart-B08248
0 siblings, 1 reply; 13+ messages in thread
From: Scott Wood @ 2013-07-16 22:01 UTC (permalink / raw)
To: Yoder Stuart-B08248
Cc: Wood Scott-B07421, Alex Williamson, Alexander Graf,
Bhushan Bharat-R65777, Sethi Varun-B16395,
virtualization@lists.linux-foundation.org, Antonios Motakis,
kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
kvmarm@lists.cs.columbia.edu
On 07/16/2013 04:51:12 PM, Yoder Stuart-B08248 wrote:
> > > 3. VFIO_DEVICE_GET_REGION_INFO
> > >
> > > No changes needed, except perhaps adding a new flag. Freescale
> > > has some
> > > devices with regions that must be mapped cacheable.
> >
> > While I don't object to making the information available to the user
> > just in case, the main thing we need here is to influence what the
> > kernel does when the user tries to map it. At least on PPC it's
> not up
> > to userspace to select whether a mmap is cacheable.
>
> If user space really can't do anything with the 'cacheable'
> flag, do you think there is good reason to keep it? Will it
> help any decision that user space makes? Maybe we should just
> drop it.
As long as we can be sure all architectures will map things correctly
without any flags needing to be specified, that's fine.
> > > struct vfio_path_info {
> > > __u32 argsz;
> > > __u32 flags;
> > > #define VFIO_DEVTREE_INFO_RANGES (1 << 3) /* the region
> is a
> > > "ranges" property */
> >
> > What about distinguishing a normal interrupt from one found in an
> > interrupt-map?
>
> I'm not sure we need that. The kernel needs to use the interrupt
> map to get interrupts hooked up right, but all user space needs to
> know is that there are N interrupts and possibly device tree
> paths to help user space interpret which interrupt is which.
What if the interrupt map is for devices without explicit nodes, such
as with a PCI controller (ignore the fact that we would normally use
vfio_pci for the indivdual PCI devices instead)?
You could say the same thing about ranges -- why expose ranges instead
of the individual child node regs after translation?
> > In the case of both ranges and interrupt-maps, we'll also want to
> > decide what the policy is for when to expose them directly, versus
> just
> > using them to translate regs and interrupts of child nodes
>
> Yes, not sure the best approach there...but guess we can cross
> that bridge when we implement this. It doesn't affect this
> interface.
It does affect the interface, because if you allow either of them to be
mapped directly (rather than implicitly used when mapping a child
node), you need a way to indicate which type of resource it is you're
describing (as you already do for reg/ranges).
It also affects how vfio device binding is done, even if only to the
point of specifying default behavior in the absence of knobs which
change whether interrupt maps and/or ranges are mapped.
> > > __u8 path[]; /* output: Full path to associated
> > > device tree node */
> >
> > How does the caller know what size buffer to supply for this?
Ping
-Scott
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: RFC: vfio interface for platform devices
2013-07-16 22:01 ` Scott Wood
@ 2013-07-16 22:41 ` Yoder Stuart-B08248
2013-07-16 22:50 ` Scott Wood
0 siblings, 1 reply; 13+ messages in thread
From: Yoder Stuart-B08248 @ 2013-07-16 22:41 UTC (permalink / raw)
To: Wood Scott-B07421
Cc: kvm@vger.kernel.org list, Bhushan Bharat-R65777,
kvm-ppc@vger.kernel.org,
virtualization@lists.linux-foundation.org, Antonios Motakis,
Sethi Varun-B16395, kvmarm@lists.cs.columbia.edu
> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Tuesday, July 16, 2013 5:01 PM
> To: Yoder Stuart-B08248
> Cc: Wood Scott-B07421; Alex Williamson; Alexander Graf; Bhushan Bharat-R65777; Sethi Varun-B16395;
> virtualization@lists.linux-foundation.org; Antonios Motakis; kvm@vger.kernel.org list; kvm-
> ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> Subject: Re: RFC: vfio interface for platform devices
>
> On 07/16/2013 04:51:12 PM, Yoder Stuart-B08248 wrote:
> > > > 3. VFIO_DEVICE_GET_REGION_INFO
> > > >
> > > > No changes needed, except perhaps adding a new flag. Freescale
> > > > has some
> > > > devices with regions that must be mapped cacheable.
> > >
> > > While I don't object to making the information available to the user
> > > just in case, the main thing we need here is to influence what the
> > > kernel does when the user tries to map it. At least on PPC it's
> > not up
> > > to userspace to select whether a mmap is cacheable.
> >
> > If user space really can't do anything with the 'cacheable'
> > flag, do you think there is good reason to keep it? Will it
> > help any decision that user space makes? Maybe we should just
> > drop it.
>
> As long as we can be sure all architectures will map things correctly
> without any flags needing to be specified, that's fine.
>
> > > > struct vfio_path_info {
> > > > __u32 argsz;
> > > > __u32 flags;
> > > > #define VFIO_DEVTREE_INFO_RANGES (1 << 3) /* the region
> > is a
> > > > "ranges" property */
> > >
> > > What about distinguishing a normal interrupt from one found in an
> > > interrupt-map?
> >
> > I'm not sure we need that. The kernel needs to use the interrupt
> > map to get interrupts hooked up right, but all user space needs to
> > know is that there are N interrupts and possibly device tree
> > paths to help user space interpret which interrupt is which.
>
> What if the interrupt map is for devices without explicit nodes, such
> as with a PCI controller (ignore the fact that we would normally use
> vfio_pci for the indivdual PCI devices instead)?
>
> You could say the same thing about ranges -- why expose ranges instead
> of the individual child node regs after translation?
Hmm...yes, I guess ranges and interrupt-map fall into the same
basic type of resource category. I'm not sure it's realistic
to pass entire bus controllers through to user space vs
just individual devices on a bus, but I guess it's theoretically
possible.
So the question is whether we future proof by adding flags
for both ranges and interrupt-map, or wait until there is
an actual need for it.
> > > In the case of both ranges and interrupt-maps, we'll also want to
> > > decide what the policy is for when to expose them directly, versus
> > just
> > > using them to translate regs and interrupts of child nodes
> >
> > Yes, not sure the best approach there...but guess we can cross
> > that bridge when we implement this. It doesn't affect this
> > interface.
>
> It does affect the interface, because if you allow either of them to be
> mapped directly (rather than implicitly used when mapping a child
> node), you need a way to indicate which type of resource it is you're
> describing (as you already do for reg/ranges).
>
> It also affects how vfio device binding is done, even if only to the
> point of specifying default behavior in the absence of knobs which
> change whether interrupt maps and/or ranges are mapped.
My opinion is that we want to expose the regs and interrupts for
individual nodes by default, not ranges (or interrupt maps). When someone
needs ranges/interrupt-map in the future they'll need to figure out some
means for the vfio layer to do the right thing. It's complicated
and I would be surprised to see someone need it.
> > > > __u8 path[]; /* output: Full path to associated
> > > > device tree node */
> > >
> > > How does the caller know what size buffer to supply for this?
>
> Ping
This is in the v2 RFC... the caller invokes the ioctl which returns
the complete/full size, then re-allocs the buffer and calls the
ioctl again. Or, as Alex suggested, just use a sufficiently large
buffer to start with.
Stuart
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: vfio interface for platform devices
2013-07-16 22:41 ` Yoder Stuart-B08248
@ 2013-07-16 22:50 ` Scott Wood
0 siblings, 0 replies; 13+ messages in thread
From: Scott Wood @ 2013-07-16 22:50 UTC (permalink / raw)
To: Yoder Stuart-B08248
Cc: Wood Scott-B07421, Alex Williamson, Alexander Graf,
Bhushan Bharat-R65777, Sethi Varun-B16395,
virtualization@lists.linux-foundation.org, Antonios Motakis,
kvm@vger.kernel.org list, kvm-ppc@vger.kernel.org,
kvmarm@lists.cs.columbia.edu
On 07/16/2013 05:41:04 PM, Yoder Stuart-B08248 wrote:
>
>
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Tuesday, July 16, 2013 5:01 PM
> > To: Yoder Stuart-B08248
> > Cc: Wood Scott-B07421; Alex Williamson; Alexander Graf; Bhushan
> Bharat-R65777; Sethi Varun-B16395;
> > virtualization@lists.linux-foundation.org; Antonios Motakis;
> kvm@vger.kernel.org list; kvm-
> > ppc@vger.kernel.org; kvmarm@lists.cs.columbia.edu
> > Subject: Re: RFC: vfio interface for platform devices
> >
> > What if the interrupt map is for devices without explicit nodes,
> such
> > as with a PCI controller (ignore the fact that we would normally use
> > vfio_pci for the indivdual PCI devices instead)?
> >
> > You could say the same thing about ranges -- why expose ranges
> instead
> > of the individual child node regs after translation?
>
> Hmm...yes, I guess ranges and interrupt-map fall into the same
> basic type of resource category. I'm not sure it's realistic
> to pass entire bus controllers through to user space vs
> just individual devices on a bus, but I guess it's theoretically
> possible.
Where "theoretically possible" means "we've done it before in other
contexts". :-)
> So the question is whether we future proof by adding flags
> for both ranges and interrupt-map, or wait until there is
> an actual need for it.
We don't need to actually add a flag for it, but we should have a
flag/type for the resources we do support, so that code written to the
current API would recognize that it doesn't recognize an interrupt-map
entry if it's added later.
> > > > > __u8 path[]; /* output: Full path to
> associated
> > > > > device tree node */
> > > >
> > > > How does the caller know what size buffer to supply for this?
> >
> > Ping
>
> This is in the v2 RFC... the caller invokes the ioctl which returns
> the complete/full size, then re-allocs the buffer and calls the
> ioctl again.
OK.
> Or, as Alex suggested, just use a sufficiently large buffer to start
> with.
It's fine for a user of the API to simplify things by using a large
fixed buffer, but the API shouldn't force that approach.
-Scott
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-07-16 22:50 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-02 23:25 RFC: vfio interface for platform devices Yoder Stuart-B08248
2013-07-03 1:07 ` Alexander Graf
2013-07-03 18:51 ` Scott Wood
2013-07-03 19:08 ` Yoder Stuart-B08248
2013-07-03 3:07 ` Alex Williamson
2013-07-03 10:44 ` Antonios Motakis
2013-07-03 19:23 ` Yoder Stuart-B08248
2013-07-03 17:20 ` Yoder Stuart-B08248
2013-07-03 22:31 ` Scott Wood
2013-07-16 21:51 ` Yoder Stuart-B08248
2013-07-16 22:01 ` Scott Wood
2013-07-16 22:41 ` Yoder Stuart-B08248
2013-07-16 22:50 ` Scott Wood
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox