* [PATCH v9 1/7] vfio: introduce a vfio_dma type field
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-04 11:54 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger-qxv4g6HH51o, eric.auger-QSEj5FYQhm4dnm+yROfE0A,
robin.murphy-5wv7dgnIgG8, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A,
tglx-hfZtesqFncYOwBW4kG4KsQ, jason-NLaQJdtUoK4Be96aLqz0jA,
marc.zyngier-5wv7dgnIgG8, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Cc: julien.grall-5wv7dgnIgG8, patches-QSEj5FYQhm4dnm+yROfE0A,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
We introduce a vfio_dma type since we will need to discriminate
different types of dma slots:
- VFIO_IOVA_USER: IOVA region used to map user vaddr
- VFIO_IOVA_RESERVED: IOVA region reserved to map host device PA such
as MSI doorbells
Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
v6 -> v7:
- add VFIO_IOVA_ANY
- do not introduce yet any VFIO_IOVA_RESERVED handling
---
drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 75b24e9..aaf5a6c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -53,6 +53,16 @@ module_param_named(disable_hugepages,
MODULE_PARM_DESC(disable_hugepages,
"Disable VFIO IOMMU support for IOMMU hugepages.");
+enum vfio_iova_type {
+ VFIO_IOVA_USER = 0, /* standard IOVA used to map user vaddr */
+ /*
+ * IOVA reserved to map special host physical addresses,
+ * MSI frames for instance
+ */
+ VFIO_IOVA_RESERVED,
+ VFIO_IOVA_ANY, /* matches any IOVA type */
+};
+
struct vfio_iommu {
struct list_head domain_list;
struct mutex lock;
@@ -75,6 +85,7 @@ struct vfio_dma {
unsigned long vaddr; /* Process virtual addr */
size_t size; /* Map size (bytes) */
int prot; /* IOMMU_READ/WRITE */
+ enum vfio_iova_type type; /* type of IOVA */
};
struct vfio_group {
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 1/7] vfio: introduce a vfio_dma type field
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger, eric.auger, robin.murphy, alex.williamson,
will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
linux-arm-kernel
Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday
We introduce a vfio_dma type since we will need to discriminate
different types of dma slots:
- VFIO_IOVA_USER: IOVA region used to map user vaddr
- VFIO_IOVA_RESERVED: IOVA region reserved to map host device PA such
as MSI doorbells
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
v6 -> v7:
- add VFIO_IOVA_ANY
- do not introduce yet any VFIO_IOVA_RESERVED handling
---
drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 75b24e9..aaf5a6c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -53,6 +53,16 @@ module_param_named(disable_hugepages,
MODULE_PARM_DESC(disable_hugepages,
"Disable VFIO IOMMU support for IOMMU hugepages.");
+enum vfio_iova_type {
+ VFIO_IOVA_USER = 0, /* standard IOVA used to map user vaddr */
+ /*
+ * IOVA reserved to map special host physical addresses,
+ * MSI frames for instance
+ */
+ VFIO_IOVA_RESERVED,
+ VFIO_IOVA_ANY, /* matches any IOVA type */
+};
+
struct vfio_iommu {
struct list_head domain_list;
struct mutex lock;
@@ -75,6 +85,7 @@ struct vfio_dma {
unsigned long vaddr; /* Process virtual addr */
size_t size; /* Map size (bytes) */
int prot; /* IOMMU_READ/WRITE */
+ enum vfio_iova_type type; /* type of IOVA */
};
struct vfio_group {
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 1/7] vfio: introduce a vfio_dma type field
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: linux-arm-kernel
We introduce a vfio_dma type since we will need to discriminate
different types of dma slots:
- VFIO_IOVA_USER: IOVA region used to map user vaddr
- VFIO_IOVA_RESERVED: IOVA region reserved to map host device PA such
as MSI doorbells
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
v6 -> v7:
- add VFIO_IOVA_ANY
- do not introduce yet any VFIO_IOVA_RESERVED handling
---
drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 75b24e9..aaf5a6c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -53,6 +53,16 @@ module_param_named(disable_hugepages,
MODULE_PARM_DESC(disable_hugepages,
"Disable VFIO IOMMU support for IOMMU hugepages.");
+enum vfio_iova_type {
+ VFIO_IOVA_USER = 0, /* standard IOVA used to map user vaddr */
+ /*
+ * IOVA reserved to map special host physical addresses,
+ * MSI frames for instance
+ */
+ VFIO_IOVA_RESERVED,
+ VFIO_IOVA_ANY, /* matches any IOVA type */
+};
+
struct vfio_iommu {
struct list_head domain_list;
struct mutex lock;
@@ -75,6 +85,7 @@ struct vfio_dma {
unsigned long vaddr; /* Process virtual addr */
size_t size; /* Map size (bytes) */
int prot; /* IOMMU_READ/WRITE */
+ enum vfio_iova_type type; /* type of IOVA */
};
struct vfio_group {
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-04 11:54 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger-qxv4g6HH51o, eric.auger-QSEj5FYQhm4dnm+yROfE0A,
robin.murphy-5wv7dgnIgG8, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A,
tglx-hfZtesqFncYOwBW4kG4KsQ, jason-NLaQJdtUoK4Be96aLqz0jA,
marc.zyngier-5wv7dgnIgG8, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Cc: julien.grall-5wv7dgnIgG8, patches-QSEj5FYQhm4dnm+yROfE0A,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
In our RB-tree we now have slots of different types (USER and RESERVED).
It becomes useful to be able to search for dma slots of a specific type or
any type. This patch proposes an implementation for that modality and also
changes the existing callers using the USER type.
Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
1 file changed, 53 insertions(+), 10 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index aaf5a6c..2d769d4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -98,23 +98,64 @@ struct vfio_group {
* into DMA'ble space using the IOMMU
*/
-static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
- dma_addr_t start, size_t size)
+/**
+ * vfio_find_dma_from_node: looks for a dma slot intersecting a window
+ * from a given rb tree node
+ * @top: top rb tree node where the search starts (including this node)
+ * @start: window start
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
+ dma_addr_t start, size_t size,
+ enum vfio_iova_type type)
{
- struct rb_node *node = iommu->dma_list.rb_node;
+ struct rb_node *node = top;
+ struct vfio_dma *dma;
while (node) {
- struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
-
+ dma = rb_entry(node, struct vfio_dma, node);
if (start + size <= dma->iova)
node = node->rb_left;
else if (start >= dma->iova + dma->size)
node = node->rb_right;
else
+ break;
+ }
+ if (!node)
+ return NULL;
+
+ /* a dma slot intersects our window, check the type also matches */
+ if (type == VFIO_IOVA_ANY || dma->type == type)
+ return dma;
+
+ /* restart 2 searches skipping the current node */
+ if (start < dma->iova) {
+ dma = vfio_find_dma_from_node(node->rb_left, start,
+ size, type);
+ if (dma)
return dma;
}
+ if (start + size > dma->iova + dma->size)
+ dma = vfio_find_dma_from_node(node->rb_right, start,
+ size, type);
+ return dma;
+}
+
+/**
+ * vfio_find_dma: find a dma slot intersecting a given window
+ * @iommu: vfio iommu handle
+ * @start: window base iova
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
+ dma_addr_t start, size_t size,
+ enum vfio_iova_type type)
+{
+ struct rb_node *top_node = iommu->dma_list.rb_node;
- return NULL;
+ return vfio_find_dma_from_node(top_node, start, size, type);
}
static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
@@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
* mappings within the range.
*/
if (iommu->v2) {
- dma = vfio_find_dma(iommu, unmap->iova, 0);
+ dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
if (dma && dma->iova != unmap->iova) {
ret = -EINVAL;
goto unlock;
}
- dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
+ dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
+ VFIO_IOVA_USER);
if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
ret = -EINVAL;
goto unlock;
}
}
- while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
+ while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
+ VFIO_IOVA_USER))) {
if (!iommu->v2 && unmap->iova > dma->iova)
break;
unmapped += dma->size;
@@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
mutex_lock(&iommu->lock);
- if (vfio_find_dma(iommu, iova, size)) {
+ if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
mutex_unlock(&iommu->lock);
return -EEXIST;
}
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger, eric.auger, robin.murphy, alex.williamson,
will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
linux-arm-kernel
Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday
In our RB-tree we now have slots of different types (USER and RESERVED).
It becomes useful to be able to search for dma slots of a specific type or
any type. This patch proposes an implementation for that modality and also
changes the existing callers using the USER type.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
1 file changed, 53 insertions(+), 10 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index aaf5a6c..2d769d4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -98,23 +98,64 @@ struct vfio_group {
* into DMA'ble space using the IOMMU
*/
-static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
- dma_addr_t start, size_t size)
+/**
+ * vfio_find_dma_from_node: looks for a dma slot intersecting a window
+ * from a given rb tree node
+ * @top: top rb tree node where the search starts (including this node)
+ * @start: window start
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
+ dma_addr_t start, size_t size,
+ enum vfio_iova_type type)
{
- struct rb_node *node = iommu->dma_list.rb_node;
+ struct rb_node *node = top;
+ struct vfio_dma *dma;
while (node) {
- struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
-
+ dma = rb_entry(node, struct vfio_dma, node);
if (start + size <= dma->iova)
node = node->rb_left;
else if (start >= dma->iova + dma->size)
node = node->rb_right;
else
+ break;
+ }
+ if (!node)
+ return NULL;
+
+ /* a dma slot intersects our window, check the type also matches */
+ if (type == VFIO_IOVA_ANY || dma->type == type)
+ return dma;
+
+ /* restart 2 searches skipping the current node */
+ if (start < dma->iova) {
+ dma = vfio_find_dma_from_node(node->rb_left, start,
+ size, type);
+ if (dma)
return dma;
}
+ if (start + size > dma->iova + dma->size)
+ dma = vfio_find_dma_from_node(node->rb_right, start,
+ size, type);
+ return dma;
+}
+
+/**
+ * vfio_find_dma: find a dma slot intersecting a given window
+ * @iommu: vfio iommu handle
+ * @start: window base iova
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
+ dma_addr_t start, size_t size,
+ enum vfio_iova_type type)
+{
+ struct rb_node *top_node = iommu->dma_list.rb_node;
- return NULL;
+ return vfio_find_dma_from_node(top_node, start, size, type);
}
static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
@@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
* mappings within the range.
*/
if (iommu->v2) {
- dma = vfio_find_dma(iommu, unmap->iova, 0);
+ dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
if (dma && dma->iova != unmap->iova) {
ret = -EINVAL;
goto unlock;
}
- dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
+ dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
+ VFIO_IOVA_USER);
if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
ret = -EINVAL;
goto unlock;
}
}
- while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
+ while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
+ VFIO_IOVA_USER))) {
if (!iommu->v2 && unmap->iova > dma->iova)
break;
unmapped += dma->size;
@@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
mutex_lock(&iommu->lock);
- if (vfio_find_dma(iommu, iova, size)) {
+ if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
mutex_unlock(&iommu->lock);
return -EEXIST;
}
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: linux-arm-kernel
In our RB-tree we now have slots of different types (USER and RESERVED).
It becomes useful to be able to search for dma slots of a specific type or
any type. This patch proposes an implementation for that modality and also
changes the existing callers using the USER type.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
1 file changed, 53 insertions(+), 10 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index aaf5a6c..2d769d4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -98,23 +98,64 @@ struct vfio_group {
* into DMA'ble space using the IOMMU
*/
-static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
- dma_addr_t start, size_t size)
+/**
+ * vfio_find_dma_from_node: looks for a dma slot intersecting a window
+ * from a given rb tree node
+ * @top: top rb tree node where the search starts (including this node)
+ * @start: window start
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
+ dma_addr_t start, size_t size,
+ enum vfio_iova_type type)
{
- struct rb_node *node = iommu->dma_list.rb_node;
+ struct rb_node *node = top;
+ struct vfio_dma *dma;
while (node) {
- struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
-
+ dma = rb_entry(node, struct vfio_dma, node);
if (start + size <= dma->iova)
node = node->rb_left;
else if (start >= dma->iova + dma->size)
node = node->rb_right;
else
+ break;
+ }
+ if (!node)
+ return NULL;
+
+ /* a dma slot intersects our window, check the type also matches */
+ if (type == VFIO_IOVA_ANY || dma->type == type)
+ return dma;
+
+ /* restart 2 searches skipping the current node */
+ if (start < dma->iova) {
+ dma = vfio_find_dma_from_node(node->rb_left, start,
+ size, type);
+ if (dma)
return dma;
}
+ if (start + size > dma->iova + dma->size)
+ dma = vfio_find_dma_from_node(node->rb_right, start,
+ size, type);
+ return dma;
+}
+
+/**
+ * vfio_find_dma: find a dma slot intersecting a given window
+ * @iommu: vfio iommu handle
+ * @start: window base iova
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
+ dma_addr_t start, size_t size,
+ enum vfio_iova_type type)
+{
+ struct rb_node *top_node = iommu->dma_list.rb_node;
- return NULL;
+ return vfio_find_dma_from_node(top_node, start, size, type);
}
static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
@@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
* mappings within the range.
*/
if (iommu->v2) {
- dma = vfio_find_dma(iommu, unmap->iova, 0);
+ dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
if (dma && dma->iova != unmap->iova) {
ret = -EINVAL;
goto unlock;
}
- dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
+ dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
+ VFIO_IOVA_USER);
if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
ret = -EINVAL;
goto unlock;
}
}
- while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
+ while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
+ VFIO_IOVA_USER))) {
if (!iommu->v2 && unmap->iova > dma->iova)
break;
unmapped += dma->size;
@@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
mutex_lock(&iommu->lock);
- if (vfio_find_dma(iommu, iova, size)) {
+ if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
mutex_unlock(&iommu->lock);
return -EEXIST;
}
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread[parent not found: <1462362858-2925-3-git-send-email-eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-09 22:49 ` Alex Williamson
-1 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
To: Eric Auger
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
On Wed, 4 May 2016 11:54:13 +0000
Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> In our RB-tree we now have slots of different types (USER and RESERVED).
> It becomes useful to be able to search for dma slots of a specific type or
> any type. This patch proposes an implementation for that modality and also
> changes the existing callers using the USER type.
>
> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> ---
> drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
> 1 file changed, 53 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index aaf5a6c..2d769d4 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -98,23 +98,64 @@ struct vfio_group {
> * into DMA'ble space using the IOMMU
> */
>
> -static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> - dma_addr_t start, size_t size)
> +/**
> + * vfio_find_dma_from_node: looks for a dma slot intersecting a window
> + * from a given rb tree node
> + * @top: top rb tree node where the search starts (including this node)
> + * @start: window start
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
> + dma_addr_t start, size_t size,
> + enum vfio_iova_type type)
> {
> - struct rb_node *node = iommu->dma_list.rb_node;
> + struct rb_node *node = top;
> + struct vfio_dma *dma;
>
> while (node) {
> - struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
> -
> + dma = rb_entry(node, struct vfio_dma, node);
> if (start + size <= dma->iova)
> node = node->rb_left;
> else if (start >= dma->iova + dma->size)
> node = node->rb_right;
> else
> + break;
> + }
> + if (!node)
> + return NULL;
> +
> + /* a dma slot intersects our window, check the type also matches */
> + if (type == VFIO_IOVA_ANY || dma->type == type)
> + return dma;
> +
> + /* restart 2 searches skipping the current node */
> + if (start < dma->iova) {
> + dma = vfio_find_dma_from_node(node->rb_left, start,
> + size, type);
> + if (dma)
> return dma;
> }
> + if (start + size > dma->iova + dma->size)
> + dma = vfio_find_dma_from_node(node->rb_right, start,
> + size, type);
It might be nice to split this little bit of trickery out to a separate
patch so that you can better explain what it does. It's really not
needed until we start creating user-type entries, so it shouldn't hurt
bisection.
> + return dma;
> +}
> +
> +/**
> + * vfio_find_dma: find a dma slot intersecting a given window
> + * @iommu: vfio iommu handle
> + * @start: window base iova
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> + dma_addr_t start, size_t size,
> + enum vfio_iova_type type)
> +{
> + struct rb_node *top_node = iommu->dma_list.rb_node;
>
> - return NULL;
> + return vfio_find_dma_from_node(top_node, start, size, type);
> }
>
> static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
> @@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> * mappings within the range.
> */
> if (iommu->v2) {
> - dma = vfio_find_dma(iommu, unmap->iova, 0);
> + dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
> if (dma && dma->iova != unmap->iova) {
> ret = -EINVAL;
> goto unlock;
> }
> - dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
> + dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
> + VFIO_IOVA_USER);
> if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
> ret = -EINVAL;
> goto unlock;
> }
> }
>
> - while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
> + while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
> + VFIO_IOVA_USER))) {
> if (!iommu->v2 && unmap->iova > dma->iova)
> break;
> unmapped += dma->size;
> @@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>
> mutex_lock(&iommu->lock);
>
> - if (vfio_find_dma(iommu, iova, size)) {
> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
> mutex_unlock(&iommu->lock);
> return -EEXIST;
> }
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-05-09 22:49 ` Alex Williamson
0 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
On Wed, 4 May 2016 11:54:13 +0000
Eric Auger <eric.auger@linaro.org> wrote:
> In our RB-tree we now have slots of different types (USER and RESERVED).
> It becomes useful to be able to search for dma slots of a specific type or
> any type. This patch proposes an implementation for that modality and also
> changes the existing callers using the USER type.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
> drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
> 1 file changed, 53 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index aaf5a6c..2d769d4 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -98,23 +98,64 @@ struct vfio_group {
> * into DMA'ble space using the IOMMU
> */
>
> -static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> - dma_addr_t start, size_t size)
> +/**
> + * vfio_find_dma_from_node: looks for a dma slot intersecting a window
> + * from a given rb tree node
> + * @top: top rb tree node where the search starts (including this node)
> + * @start: window start
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
> + dma_addr_t start, size_t size,
> + enum vfio_iova_type type)
> {
> - struct rb_node *node = iommu->dma_list.rb_node;
> + struct rb_node *node = top;
> + struct vfio_dma *dma;
>
> while (node) {
> - struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
> -
> + dma = rb_entry(node, struct vfio_dma, node);
> if (start + size <= dma->iova)
> node = node->rb_left;
> else if (start >= dma->iova + dma->size)
> node = node->rb_right;
> else
> + break;
> + }
> + if (!node)
> + return NULL;
> +
> + /* a dma slot intersects our window, check the type also matches */
> + if (type == VFIO_IOVA_ANY || dma->type == type)
> + return dma;
> +
> + /* restart 2 searches skipping the current node */
> + if (start < dma->iova) {
> + dma = vfio_find_dma_from_node(node->rb_left, start,
> + size, type);
> + if (dma)
> return dma;
> }
> + if (start + size > dma->iova + dma->size)
> + dma = vfio_find_dma_from_node(node->rb_right, start,
> + size, type);
It might be nice to split this little bit of trickery out to a separate
patch so that you can better explain what it does. It's really not
needed until we start creating user-type entries, so it shouldn't hurt
bisection.
> + return dma;
> +}
> +
> +/**
> + * vfio_find_dma: find a dma slot intersecting a given window
> + * @iommu: vfio iommu handle
> + * @start: window base iova
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> + dma_addr_t start, size_t size,
> + enum vfio_iova_type type)
> +{
> + struct rb_node *top_node = iommu->dma_list.rb_node;
>
> - return NULL;
> + return vfio_find_dma_from_node(top_node, start, size, type);
> }
>
> static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
> @@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> * mappings within the range.
> */
> if (iommu->v2) {
> - dma = vfio_find_dma(iommu, unmap->iova, 0);
> + dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
> if (dma && dma->iova != unmap->iova) {
> ret = -EINVAL;
> goto unlock;
> }
> - dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
> + dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
> + VFIO_IOVA_USER);
> if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
> ret = -EINVAL;
> goto unlock;
> }
> }
>
> - while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
> + while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
> + VFIO_IOVA_USER))) {
> if (!iommu->v2 && unmap->iova > dma->iova)
> break;
> unmapped += dma->size;
> @@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>
> mutex_lock(&iommu->lock);
>
> - if (vfio_find_dma(iommu, iova, size)) {
> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
> mutex_unlock(&iommu->lock);
> return -EEXIST;
> }
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-05-09 22:49 ` Alex Williamson
0 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 4 May 2016 11:54:13 +0000
Eric Auger <eric.auger@linaro.org> wrote:
> In our RB-tree we now have slots of different types (USER and RESERVED).
> It becomes useful to be able to search for dma slots of a specific type or
> any type. This patch proposes an implementation for that modality and also
> changes the existing callers using the USER type.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
> drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
> 1 file changed, 53 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index aaf5a6c..2d769d4 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -98,23 +98,64 @@ struct vfio_group {
> * into DMA'ble space using the IOMMU
> */
>
> -static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> - dma_addr_t start, size_t size)
> +/**
> + * vfio_find_dma_from_node: looks for a dma slot intersecting a window
> + * from a given rb tree node
> + * @top: top rb tree node where the search starts (including this node)
> + * @start: window start
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
> + dma_addr_t start, size_t size,
> + enum vfio_iova_type type)
> {
> - struct rb_node *node = iommu->dma_list.rb_node;
> + struct rb_node *node = top;
> + struct vfio_dma *dma;
>
> while (node) {
> - struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
> -
> + dma = rb_entry(node, struct vfio_dma, node);
> if (start + size <= dma->iova)
> node = node->rb_left;
> else if (start >= dma->iova + dma->size)
> node = node->rb_right;
> else
> + break;
> + }
> + if (!node)
> + return NULL;
> +
> + /* a dma slot intersects our window, check the type also matches */
> + if (type == VFIO_IOVA_ANY || dma->type == type)
> + return dma;
> +
> + /* restart 2 searches skipping the current node */
> + if (start < dma->iova) {
> + dma = vfio_find_dma_from_node(node->rb_left, start,
> + size, type);
> + if (dma)
> return dma;
> }
> + if (start + size > dma->iova + dma->size)
> + dma = vfio_find_dma_from_node(node->rb_right, start,
> + size, type);
It might be nice to split this little bit of trickery out to a separate
patch so that you can better explain what it does. It's really not
needed until we start creating user-type entries, so it shouldn't hurt
bisection.
> + return dma;
> +}
> +
> +/**
> + * vfio_find_dma: find a dma slot intersecting a given window
> + * @iommu: vfio iommu handle
> + * @start: window base iova
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> + dma_addr_t start, size_t size,
> + enum vfio_iova_type type)
> +{
> + struct rb_node *top_node = iommu->dma_list.rb_node;
>
> - return NULL;
> + return vfio_find_dma_from_node(top_node, start, size, type);
> }
>
> static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
> @@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> * mappings within the range.
> */
> if (iommu->v2) {
> - dma = vfio_find_dma(iommu, unmap->iova, 0);
> + dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
> if (dma && dma->iova != unmap->iova) {
> ret = -EINVAL;
> goto unlock;
> }
> - dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
> + dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
> + VFIO_IOVA_USER);
> if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
> ret = -EINVAL;
> goto unlock;
> }
> }
>
> - while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
> + while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
> + VFIO_IOVA_USER))) {
> if (!iommu->v2 && unmap->iova > dma->iova)
> break;
> unmapped += dma->size;
> @@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>
> mutex_lock(&iommu->lock);
>
> - if (vfio_find_dma(iommu, iova, size)) {
> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
> mutex_unlock(&iommu->lock);
> return -EEXIST;
> }
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <20160509164928.518a05a8-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>]
* Re: [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
2016-05-09 22:49 ` Alex Williamson
(?)
@ 2016-05-10 14:54 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 14:54 UTC (permalink / raw)
To: Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:13 +0000
> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>
>> In our RB-tree we now have slots of different types (USER and RESERVED).
>> It becomes useful to be able to search for dma slots of a specific type or
>> any type. This patch proposes an implementation for that modality and also
>> changes the existing callers using the USER type.
>>
>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
>> 1 file changed, 53 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index aaf5a6c..2d769d4 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -98,23 +98,64 @@ struct vfio_group {
>> * into DMA'ble space using the IOMMU
>> */
>>
>> -static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
>> - dma_addr_t start, size_t size)
>> +/**
>> + * vfio_find_dma_from_node: looks for a dma slot intersecting a window
>> + * from a given rb tree node
>> + * @top: top rb tree node where the search starts (including this node)
>> + * @start: window start
>> + * @size: window size
>> + * @type: window type
>> + */
>> +static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
>> + dma_addr_t start, size_t size,
>> + enum vfio_iova_type type)
>> {
>> - struct rb_node *node = iommu->dma_list.rb_node;
>> + struct rb_node *node = top;
>> + struct vfio_dma *dma;
>>
>> while (node) {
>> - struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
>> -
>> + dma = rb_entry(node, struct vfio_dma, node);
>> if (start + size <= dma->iova)
>> node = node->rb_left;
>> else if (start >= dma->iova + dma->size)
>> node = node->rb_right;
>> else
>> + break;
>> + }
>> + if (!node)
>> + return NULL;
>> +
>> + /* a dma slot intersects our window, check the type also matches */
>> + if (type == VFIO_IOVA_ANY || dma->type == type)
>> + return dma;
>> +
>> + /* restart 2 searches skipping the current node */
>> + if (start < dma->iova) {
>> + dma = vfio_find_dma_from_node(node->rb_left, start,
>> + size, type);
>> + if (dma)
>> return dma;
>> }
>> + if (start + size > dma->iova + dma->size)
>> + dma = vfio_find_dma_from_node(node->rb_right, start,
>> + size, type);
>
> It might be nice to split this little bit of trickery out to a separate
> patch so that you can better explain what it does. It's really not
> needed until we start creating user-type entries, so it shouldn't hurt
> bisection.
OK I will
Regards
Eric
>
>> + return dma;
>> +}
>> +
>> +/**
>> + * vfio_find_dma: find a dma slot intersecting a given window
>> + * @iommu: vfio iommu handle
>> + * @start: window base iova
>> + * @size: window size
>> + * @type: window type
>> + */
>> +static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
>> + dma_addr_t start, size_t size,
>> + enum vfio_iova_type type)
>> +{
>> + struct rb_node *top_node = iommu->dma_list.rb_node;
>>
>> - return NULL;
>> + return vfio_find_dma_from_node(top_node, start, size, type);
>> }
>>
>> static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
>> @@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>> * mappings within the range.
>> */
>> if (iommu->v2) {
>> - dma = vfio_find_dma(iommu, unmap->iova, 0);
>> + dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
>> if (dma && dma->iova != unmap->iova) {
>> ret = -EINVAL;
>> goto unlock;
>> }
>> - dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
>> + dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
>> + VFIO_IOVA_USER);
>> if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>> ret = -EINVAL;
>> goto unlock;
>> }
>> }
>>
>> - while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
>> + while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
>> + VFIO_IOVA_USER))) {
>> if (!iommu->v2 && unmap->iova > dma->iova)
>> break;
>> unmapped += dma->size;
>> @@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>
>> mutex_lock(&iommu->lock);
>>
>> - if (vfio_find_dma(iommu, iova, size)) {
>> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> mutex_unlock(&iommu->lock);
>> return -EEXIST;
>> }
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-05-10 14:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 14:54 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:13 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> In our RB-tree we now have slots of different types (USER and RESERVED).
>> It becomes useful to be able to search for dma slots of a specific type or
>> any type. This patch proposes an implementation for that modality and also
>> changes the existing callers using the USER type.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
>> 1 file changed, 53 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index aaf5a6c..2d769d4 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -98,23 +98,64 @@ struct vfio_group {
>> * into DMA'ble space using the IOMMU
>> */
>>
>> -static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
>> - dma_addr_t start, size_t size)
>> +/**
>> + * vfio_find_dma_from_node: looks for a dma slot intersecting a window
>> + * from a given rb tree node
>> + * @top: top rb tree node where the search starts (including this node)
>> + * @start: window start
>> + * @size: window size
>> + * @type: window type
>> + */
>> +static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
>> + dma_addr_t start, size_t size,
>> + enum vfio_iova_type type)
>> {
>> - struct rb_node *node = iommu->dma_list.rb_node;
>> + struct rb_node *node = top;
>> + struct vfio_dma *dma;
>>
>> while (node) {
>> - struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
>> -
>> + dma = rb_entry(node, struct vfio_dma, node);
>> if (start + size <= dma->iova)
>> node = node->rb_left;
>> else if (start >= dma->iova + dma->size)
>> node = node->rb_right;
>> else
>> + break;
>> + }
>> + if (!node)
>> + return NULL;
>> +
>> + /* a dma slot intersects our window, check the type also matches */
>> + if (type == VFIO_IOVA_ANY || dma->type == type)
>> + return dma;
>> +
>> + /* restart 2 searches skipping the current node */
>> + if (start < dma->iova) {
>> + dma = vfio_find_dma_from_node(node->rb_left, start,
>> + size, type);
>> + if (dma)
>> return dma;
>> }
>> + if (start + size > dma->iova + dma->size)
>> + dma = vfio_find_dma_from_node(node->rb_right, start,
>> + size, type);
>
> It might be nice to split this little bit of trickery out to a separate
> patch so that you can better explain what it does. It's really not
> needed until we start creating user-type entries, so it shouldn't hurt
> bisection.
OK I will
Regards
Eric
>
>> + return dma;
>> +}
>> +
>> +/**
>> + * vfio_find_dma: find a dma slot intersecting a given window
>> + * @iommu: vfio iommu handle
>> + * @start: window base iova
>> + * @size: window size
>> + * @type: window type
>> + */
>> +static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
>> + dma_addr_t start, size_t size,
>> + enum vfio_iova_type type)
>> +{
>> + struct rb_node *top_node = iommu->dma_list.rb_node;
>>
>> - return NULL;
>> + return vfio_find_dma_from_node(top_node, start, size, type);
>> }
>>
>> static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
>> @@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>> * mappings within the range.
>> */
>> if (iommu->v2) {
>> - dma = vfio_find_dma(iommu, unmap->iova, 0);
>> + dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
>> if (dma && dma->iova != unmap->iova) {
>> ret = -EINVAL;
>> goto unlock;
>> }
>> - dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
>> + dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
>> + VFIO_IOVA_USER);
>> if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>> ret = -EINVAL;
>> goto unlock;
>> }
>> }
>>
>> - while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
>> + while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
>> + VFIO_IOVA_USER))) {
>> if (!iommu->v2 && unmap->iova > dma->iova)
>> break;
>> unmapped += dma->size;
>> @@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>
>> mutex_lock(&iommu->lock);
>>
>> - if (vfio_find_dma(iommu, iova, size)) {
>> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> mutex_unlock(&iommu->lock);
>> return -EEXIST;
>> }
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-05-10 14:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 14:54 UTC (permalink / raw)
To: linux-arm-kernel
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:13 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> In our RB-tree we now have slots of different types (USER and RESERVED).
>> It becomes useful to be able to search for dma slots of a specific type or
>> any type. This patch proposes an implementation for that modality and also
>> changes the existing callers using the USER type.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
>> 1 file changed, 53 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index aaf5a6c..2d769d4 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -98,23 +98,64 @@ struct vfio_group {
>> * into DMA'ble space using the IOMMU
>> */
>>
>> -static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
>> - dma_addr_t start, size_t size)
>> +/**
>> + * vfio_find_dma_from_node: looks for a dma slot intersecting a window
>> + * from a given rb tree node
>> + * @top: top rb tree node where the search starts (including this node)
>> + * @start: window start
>> + * @size: window size
>> + * @type: window type
>> + */
>> +static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
>> + dma_addr_t start, size_t size,
>> + enum vfio_iova_type type)
>> {
>> - struct rb_node *node = iommu->dma_list.rb_node;
>> + struct rb_node *node = top;
>> + struct vfio_dma *dma;
>>
>> while (node) {
>> - struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
>> -
>> + dma = rb_entry(node, struct vfio_dma, node);
>> if (start + size <= dma->iova)
>> node = node->rb_left;
>> else if (start >= dma->iova + dma->size)
>> node = node->rb_right;
>> else
>> + break;
>> + }
>> + if (!node)
>> + return NULL;
>> +
>> + /* a dma slot intersects our window, check the type also matches */
>> + if (type == VFIO_IOVA_ANY || dma->type == type)
>> + return dma;
>> +
>> + /* restart 2 searches skipping the current node */
>> + if (start < dma->iova) {
>> + dma = vfio_find_dma_from_node(node->rb_left, start,
>> + size, type);
>> + if (dma)
>> return dma;
>> }
>> + if (start + size > dma->iova + dma->size)
>> + dma = vfio_find_dma_from_node(node->rb_right, start,
>> + size, type);
>
> It might be nice to split this little bit of trickery out to a separate
> patch so that you can better explain what it does. It's really not
> needed until we start creating user-type entries, so it shouldn't hurt
> bisection.
OK I will
Regards
Eric
>
>> + return dma;
>> +}
>> +
>> +/**
>> + * vfio_find_dma: find a dma slot intersecting a given window
>> + * @iommu: vfio iommu handle
>> + * @start: window base iova
>> + * @size: window size
>> + * @type: window type
>> + */
>> +static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
>> + dma_addr_t start, size_t size,
>> + enum vfio_iova_type type)
>> +{
>> + struct rb_node *top_node = iommu->dma_list.rb_node;
>>
>> - return NULL;
>> + return vfio_find_dma_from_node(top_node, start, size, type);
>> }
>>
>> static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
>> @@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>> * mappings within the range.
>> */
>> if (iommu->v2) {
>> - dma = vfio_find_dma(iommu, unmap->iova, 0);
>> + dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
>> if (dma && dma->iova != unmap->iova) {
>> ret = -EINVAL;
>> goto unlock;
>> }
>> - dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
>> + dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
>> + VFIO_IOVA_USER);
>> if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>> ret = -EINVAL;
>> goto unlock;
>> }
>> }
>>
>> - while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
>> + while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
>> + VFIO_IOVA_USER))) {
>> if (!iommu->v2 && unmap->iova > dma->iova)
>> break;
>> unmapped += dma->size;
>> @@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>
>> mutex_lock(&iommu->lock);
>>
>> - if (vfio_find_dma(iommu, iova, size)) {
>> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> mutex_unlock(&iommu->lock);
>> return -EEXIST;
>> }
>
^ permalink raw reply [flat|nested] 111+ messages in thread
* [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-04 11:54 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger-qxv4g6HH51o, eric.auger-QSEj5FYQhm4dnm+yROfE0A,
robin.murphy-5wv7dgnIgG8, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A,
tglx-hfZtesqFncYOwBW4kG4KsQ, jason-NLaQJdtUoK4Be96aLqz0jA,
marc.zyngier-5wv7dgnIgG8, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Cc: julien.grall-5wv7dgnIgG8, patches-QSEj5FYQhm4dnm+yROfE0A,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
let's implement the expected behavior for removal and replay. As opposed
to user dma slots, IOVAs are not systematically bound to PAs and PAs are
not pinned. VFIO just initializes the IOVA "aperture". IOVAs are allocated
outside of the VFIO framework, typically the MSI layer which is
responsible to free and unmap them. The MSI mapping resources are freeed
by the IOMMU driver on domain destruction.
Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
v7 -> v8:
- do no destroy anything anymore, just bypass unmap/unpin and iommu_map
on replay
---
drivers/vfio/vfio_iommu_type1.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2d769d4..94a9916 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -391,7 +391,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
struct vfio_domain *domain, *d;
long unlocked = 0;
- if (!dma->size)
+ if (!dma->size || dma->type != VFIO_IOVA_USER)
return;
/*
* We use the IOMMU to track the physical addresses, otherwise we'd
@@ -727,6 +727,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
dma = rb_entry(n, struct vfio_dma, node);
iova = dma->iova;
+ if (dma->type == VFIO_IOVA_RESERVED)
+ continue;
+
while (iova < dma->iova + dma->size) {
phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
size_t size;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger, eric.auger, robin.murphy, alex.williamson,
will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
linux-arm-kernel
Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday
Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
let's implement the expected behavior for removal and replay. As opposed
to user dma slots, IOVAs are not systematically bound to PAs and PAs are
not pinned. VFIO just initializes the IOVA "aperture". IOVAs are allocated
outside of the VFIO framework, typically the MSI layer which is
responsible to free and unmap them. The MSI mapping resources are freeed
by the IOMMU driver on domain destruction.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
v7 -> v8:
- do no destroy anything anymore, just bypass unmap/unpin and iommu_map
on replay
---
drivers/vfio/vfio_iommu_type1.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2d769d4..94a9916 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -391,7 +391,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
struct vfio_domain *domain, *d;
long unlocked = 0;
- if (!dma->size)
+ if (!dma->size || dma->type != VFIO_IOVA_USER)
return;
/*
* We use the IOMMU to track the physical addresses, otherwise we'd
@@ -727,6 +727,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
dma = rb_entry(n, struct vfio_dma, node);
iova = dma->iova;
+ if (dma->type == VFIO_IOVA_RESERVED)
+ continue;
+
while (iova < dma->iova + dma->size) {
phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
size_t size;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: linux-arm-kernel
Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
let's implement the expected behavior for removal and replay. As opposed
to user dma slots, IOVAs are not systematically bound to PAs and PAs are
not pinned. VFIO just initializes the IOVA "aperture". IOVAs are allocated
outside of the VFIO framework, typically the MSI layer which is
responsible to free and unmap them. The MSI mapping resources are freeed
by the IOMMU driver on domain destruction.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
v7 -> v8:
- do no destroy anything anymore, just bypass unmap/unpin and iommu_map
on replay
---
drivers/vfio/vfio_iommu_type1.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2d769d4..94a9916 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -391,7 +391,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
struct vfio_domain *domain, *d;
long unlocked = 0;
- if (!dma->size)
+ if (!dma->size || dma->type != VFIO_IOVA_USER)
return;
/*
* We use the IOMMU to track the physical addresses, otherwise we'd
@@ -727,6 +727,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
dma = rb_entry(n, struct vfio_dma, node);
iova = dma->iova;
+ if (dma->type == VFIO_IOVA_RESERVED)
+ continue;
+
while (iova < dma->iova + dma->size) {
phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
size_t size;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* Re: [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
2016-05-04 11:54 ` Eric Auger
@ 2016-05-09 22:49 ` Alex Williamson
-1 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
On Wed, 4 May 2016 11:54:14 +0000
Eric Auger <eric.auger@linaro.org> wrote:
> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
> let's implement the expected behavior for removal and replay. As opposed
> to user dma slots, IOVAs are not systematically bound to PAs and PAs are
> not pinned. VFIO just initializes the IOVA "aperture". IOVAs are allocated
> outside of the VFIO framework, typically the MSI layer which is
> responsible to free and unmap them. The MSI mapping resources are freeed
> by the IOMMU driver on domain destruction.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> ---
>
> v7 -> v8:
> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
> on replay
> ---
> drivers/vfio/vfio_iommu_type1.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2d769d4..94a9916 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -391,7 +391,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> struct vfio_domain *domain, *d;
> long unlocked = 0;
>
> - if (!dma->size)
> + if (!dma->size || dma->type != VFIO_IOVA_USER)
> return;
> /*
> * We use the IOMMU to track the physical addresses, otherwise we'd
> @@ -727,6 +727,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
> dma = rb_entry(n, struct vfio_dma, node);
> iova = dma->iova;
>
> + if (dma->type == VFIO_IOVA_RESERVED)
> + continue;
> +
But you do still need some sort of replay mechanism, right? Not to
replay the IOVA to PA mapping, but to call iommu_msi_set_aperture() for
the new domain. How will you know that this entry is an MSI reserved
range or something else? Perhaps we can't have a generic "reserved"
type here.
> while (iova < dma->iova + dma->size) {
> phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
> size_t size;
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-05-09 22:49 ` Alex Williamson
0 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 4 May 2016 11:54:14 +0000
Eric Auger <eric.auger@linaro.org> wrote:
> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
> let's implement the expected behavior for removal and replay. As opposed
> to user dma slots, IOVAs are not systematically bound to PAs and PAs are
> not pinned. VFIO just initializes the IOVA "aperture". IOVAs are allocated
> outside of the VFIO framework, typically the MSI layer which is
> responsible to free and unmap them. The MSI mapping resources are freeed
> by the IOMMU driver on domain destruction.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> ---
>
> v7 -> v8:
> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
> on replay
> ---
> drivers/vfio/vfio_iommu_type1.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2d769d4..94a9916 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -391,7 +391,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> struct vfio_domain *domain, *d;
> long unlocked = 0;
>
> - if (!dma->size)
> + if (!dma->size || dma->type != VFIO_IOVA_USER)
> return;
> /*
> * We use the IOMMU to track the physical addresses, otherwise we'd
> @@ -727,6 +727,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
> dma = rb_entry(n, struct vfio_dma, node);
> iova = dma->iova;
>
> + if (dma->type == VFIO_IOVA_RESERVED)
> + continue;
> +
But you do still need some sort of replay mechanism, right? Not to
replay the IOVA to PA mapping, but to call iommu_msi_set_aperture() for
the new domain. How will you know that this entry is an MSI reserved
range or something else? Perhaps we can't have a generic "reserved"
type here.
> while (iova < dma->iova + dma->size) {
> phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
> size_t size;
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <20160509164936.1a5ae402-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>]
* Re: [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
2016-05-09 22:49 ` Alex Williamson
(?)
@ 2016-05-11 12:58 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 12:58 UTC (permalink / raw)
To: Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:14 +0000
> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>
>> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
>> let's implement the expected behavior for removal and replay. As opposed
>> to user dma slots, IOVAs are not systematically bound to PAs and PAs are
>> not pinned. VFIO just initializes the IOVA "aperture". IOVAs are allocated
>> outside of the VFIO framework, typically the MSI layer which is
>> responsible to free and unmap them. The MSI mapping resources are freeed
>> by the IOMMU driver on domain destruction.
>>
>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>
>> ---
>>
>> v7 -> v8:
>> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>> on replay
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 2d769d4..94a9916 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -391,7 +391,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> struct vfio_domain *domain, *d;
>> long unlocked = 0;
>>
>> - if (!dma->size)
>> + if (!dma->size || dma->type != VFIO_IOVA_USER)
>> return;
>> /*
>> * We use the IOMMU to track the physical addresses, otherwise we'd
>> @@ -727,6 +727,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>> dma = rb_entry(n, struct vfio_dma, node);
>> iova = dma->iova;
>>
>> + if (dma->type == VFIO_IOVA_RESERVED)
>> + continue;
>> +
>
> But you do still need some sort of replay mechanism, right? Not to
> replay the IOVA to PA mapping, but to call iommu_msi_set_aperture() for
> the new domain. How will you know that this entry is an MSI reserved
> range or something else? Perhaps we can't have a generic "reserved"
> type here.
Thanks for spotting this bug. I was not testing this case yet and
effectively I need to replay the set_aperture.
Thanks for the review
Best Regards
Eric
>
>> while (iova < dma->iova + dma->size) {
>> phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>> size_t size;
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-05-11 12:58 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 12:58 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:14 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
>> let's implement the expected behavior for removal and replay. As opposed
>> to user dma slots, IOVAs are not systematically bound to PAs and PAs are
>> not pinned. VFIO just initializes the IOVA "aperture". IOVAs are allocated
>> outside of the VFIO framework, typically the MSI layer which is
>> responsible to free and unmap them. The MSI mapping resources are freeed
>> by the IOMMU driver on domain destruction.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v7 -> v8:
>> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>> on replay
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 2d769d4..94a9916 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -391,7 +391,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> struct vfio_domain *domain, *d;
>> long unlocked = 0;
>>
>> - if (!dma->size)
>> + if (!dma->size || dma->type != VFIO_IOVA_USER)
>> return;
>> /*
>> * We use the IOMMU to track the physical addresses, otherwise we'd
>> @@ -727,6 +727,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>> dma = rb_entry(n, struct vfio_dma, node);
>> iova = dma->iova;
>>
>> + if (dma->type == VFIO_IOVA_RESERVED)
>> + continue;
>> +
>
> But you do still need some sort of replay mechanism, right? Not to
> replay the IOVA to PA mapping, but to call iommu_msi_set_aperture() for
> the new domain. How will you know that this entry is an MSI reserved
> range or something else? Perhaps we can't have a generic "reserved"
> type here.
Thanks for spotting this bug. I was not testing this case yet and
effectively I need to replay the set_aperture.
Thanks for the review
Best Regards
Eric
>
>> while (iova < dma->iova + dma->size) {
>> phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>> size_t size;
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-05-11 12:58 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 12:58 UTC (permalink / raw)
To: linux-arm-kernel
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:14 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
>> let's implement the expected behavior for removal and replay. As opposed
>> to user dma slots, IOVAs are not systematically bound to PAs and PAs are
>> not pinned. VFIO just initializes the IOVA "aperture". IOVAs are allocated
>> outside of the VFIO framework, typically the MSI layer which is
>> responsible to free and unmap them. The MSI mapping resources are freeed
>> by the IOMMU driver on domain destruction.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v7 -> v8:
>> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>> on replay
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 2d769d4..94a9916 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -391,7 +391,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> struct vfio_domain *domain, *d;
>> long unlocked = 0;
>>
>> - if (!dma->size)
>> + if (!dma->size || dma->type != VFIO_IOVA_USER)
>> return;
>> /*
>> * We use the IOMMU to track the physical addresses, otherwise we'd
>> @@ -727,6 +727,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>> dma = rb_entry(n, struct vfio_dma, node);
>> iova = dma->iova;
>>
>> + if (dma->type == VFIO_IOVA_RESERVED)
>> + continue;
>> +
>
> But you do still need some sort of replay mechanism, right? Not to
> replay the IOVA to PA mapping, but to call iommu_msi_set_aperture() for
> the new domain. How will you know that this entry is an MSI reserved
> range or something else? Perhaps we can't have a generic "reserved"
> type here.
Thanks for spotting this bug. I was not testing this case yet and
effectively I need to replay the set_aperture.
Thanks for the review
Best Regards
Eric
>
>> while (iova < dma->iova + dma->size) {
>> phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>> size_t size;
>
^ permalink raw reply [flat|nested] 111+ messages in thread
* [PATCH v9 4/7] vfio: allow reserved msi iova registration
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-04 11:54 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger-qxv4g6HH51o, eric.auger-QSEj5FYQhm4dnm+yROfE0A,
robin.murphy-5wv7dgnIgG8, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A,
tglx-hfZtesqFncYOwBW4kG4KsQ, jason-NLaQJdtUoK4Be96aLqz0jA,
marc.zyngier-5wv7dgnIgG8, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Cc: julien.grall-5wv7dgnIgG8, patches-QSEj5FYQhm4dnm+yROfE0A,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
The user is allowed to register a reserved MSI IOVA range by using the
DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
This region is stored in the vfio_dma rb tree. At that point the iova
range is not mapped to any target address yet. The host kernel will use
those iova when needed, typically when MSIs are allocated.
Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
---
v7 -> v8:
- use iommu_msi_set_aperture function. There is no notion of
unregistration anymore since the reserved msi slot remains
until the container gets closed.
v6 -> v7:
- use iommu_free_reserved_iova_domain
- convey prot attributes downto dma-reserved-iommu iova domain creation
- reserved bindings teardown now performed on iommu domain destruction
- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
- change title
- pass the protection attribute to dma-reserved-iommu API
v3 -> v4:
- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
- protect vfio_register_reserved_iova_range implementation with
CONFIG_IOMMU_DMA_RESERVED
- handle unregistration by user-space and on vfio_iommu_type1 release
v1 -> v2:
- set returned value according to alloc_reserved_iova_domain result
- free the iova domains in case any error occurs
RFC v1 -> v1:
- takes into account Alex comments, based on
[RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
- use the existing dma map/unmap ioctl interface with a flag to register
a reserved IOVA range. A single reserved iova region is allowed.
---
drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
include/uapi/linux/vfio.h | 10 +++++-
2 files changed, 86 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 94a9916..4d3a6f1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include <linux/workqueue.h>
+#include <linux/msi-iommu.h>
#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
@@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
vfio_lock_acct(-unlocked);
}
+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
+ dma_addr_t iova, size_t size)
+{
+ struct vfio_domain *d;
+ int ret = 0;
+
+ list_for_each_entry(d, &iommu->domain_list, next) {
+ ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
+ if (ret)
+ break;
+ }
+ return ret;
+}
+
static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
{
vfio_unmap_unpin(iommu, dma);
@@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
return ret;
}
+static int vfio_register_msi_range(struct vfio_iommu *iommu,
+ struct vfio_iommu_type1_dma_map *map)
+{
+ dma_addr_t iova = map->iova;
+ size_t size = map->size;
+ int ret = 0;
+ struct vfio_dma *dma;
+ unsigned long order;
+ uint64_t mask;
+
+ /* Verify that none of our __u64 fields overflow */
+ if (map->size != size || map->iova != iova)
+ return -EINVAL;
+
+ order = __ffs(vfio_pgsize_bitmap(iommu));
+ mask = ((uint64_t)1 << order) - 1;
+
+ WARN_ON(mask & PAGE_MASK);
+
+ if (!size || (size | iova) & mask)
+ return -EINVAL;
+
+ /* Don't allow IOVA address wrap */
+ if (iova + size - 1 < iova)
+ return -EINVAL;
+
+ mutex_lock(&iommu->lock);
+
+ if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
+ ret = -EEXIST;
+ goto unlock;
+ }
+
+ dma = kzalloc(sizeof(*dma), GFP_KERNEL);
+ if (!dma) {
+ ret = -ENOMEM;
+ goto unlock;
+ }
+
+ dma->iova = iova;
+ dma->size = size;
+ dma->type = VFIO_IOVA_RESERVED;
+
+ ret = vfio_set_msi_aperture(iommu, iova, size);
+ if (ret)
+ goto free_unlock;
+
+ vfio_link_dma(iommu, dma);
+ goto unlock;
+
+free_unlock:
+ kfree(dma);
+unlock:
+ mutex_unlock(&iommu->lock);
+ return ret;
+}
+
static int vfio_bus_type(struct device *dev, void *data)
{
struct bus_type **bus = data;
@@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
} else if (cmd == VFIO_IOMMU_MAP_DMA) {
struct vfio_iommu_type1_dma_map map;
uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
- VFIO_DMA_MAP_FLAG_WRITE;
+ VFIO_DMA_MAP_FLAG_WRITE |
+ VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
@@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
if (map.argsz < minsz || map.flags & ~mask)
return -EINVAL;
+ if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
+ return vfio_register_msi_range(iommu, &map);
+
return vfio_dma_do_map(iommu, &map);
} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 255a211..4a9dbc2 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
*
* Map process virtual addresses to IO virtual addresses using the
* provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
+ * IOVA region that will be used on some platforms to map the host MSI frames.
+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
+ * IOVA region stays until the container is closed.
*/
struct vfio_iommu_type1_dma_map {
__u32 argsz;
__u32 flags;
#define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
#define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
__u64 vaddr; /* Process virtual address */
__u64 iova; /* IO virtual address */
__u64 size; /* Size of mapping (bytes) */
@@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
* Caller sets argsz. The actual unmapped size is returned in the size
* field. No guarantee is made to the user that arbitrary unmaps of iova
* or size different from those used in the original mapping call will
- * succeed.
+ * succeed. Once registered, an MSI region cannot be unmapped and stays
+ * until the container is closed.
*/
struct vfio_iommu_type1_dma_unmap {
__u32 argsz;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 4/7] vfio: allow reserved msi iova registration
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger, eric.auger, robin.murphy, alex.williamson,
will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
linux-arm-kernel
Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday
The user is allowed to register a reserved MSI IOVA range by using the
DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
This region is stored in the vfio_dma rb tree. At that point the iova
range is not mapped to any target address yet. The host kernel will use
those iova when needed, typically when MSIs are allocated.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
v7 -> v8:
- use iommu_msi_set_aperture function. There is no notion of
unregistration anymore since the reserved msi slot remains
until the container gets closed.
v6 -> v7:
- use iommu_free_reserved_iova_domain
- convey prot attributes downto dma-reserved-iommu iova domain creation
- reserved bindings teardown now performed on iommu domain destruction
- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
- change title
- pass the protection attribute to dma-reserved-iommu API
v3 -> v4:
- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
- protect vfio_register_reserved_iova_range implementation with
CONFIG_IOMMU_DMA_RESERVED
- handle unregistration by user-space and on vfio_iommu_type1 release
v1 -> v2:
- set returned value according to alloc_reserved_iova_domain result
- free the iova domains in case any error occurs
RFC v1 -> v1:
- takes into account Alex comments, based on
[RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
- use the existing dma map/unmap ioctl interface with a flag to register
a reserved IOVA range. A single reserved iova region is allowed.
---
drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
include/uapi/linux/vfio.h | 10 +++++-
2 files changed, 86 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 94a9916..4d3a6f1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include <linux/workqueue.h>
+#include <linux/msi-iommu.h>
#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
@@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
vfio_lock_acct(-unlocked);
}
+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
+ dma_addr_t iova, size_t size)
+{
+ struct vfio_domain *d;
+ int ret = 0;
+
+ list_for_each_entry(d, &iommu->domain_list, next) {
+ ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
+ if (ret)
+ break;
+ }
+ return ret;
+}
+
static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
{
vfio_unmap_unpin(iommu, dma);
@@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
return ret;
}
+static int vfio_register_msi_range(struct vfio_iommu *iommu,
+ struct vfio_iommu_type1_dma_map *map)
+{
+ dma_addr_t iova = map->iova;
+ size_t size = map->size;
+ int ret = 0;
+ struct vfio_dma *dma;
+ unsigned long order;
+ uint64_t mask;
+
+ /* Verify that none of our __u64 fields overflow */
+ if (map->size != size || map->iova != iova)
+ return -EINVAL;
+
+ order = __ffs(vfio_pgsize_bitmap(iommu));
+ mask = ((uint64_t)1 << order) - 1;
+
+ WARN_ON(mask & PAGE_MASK);
+
+ if (!size || (size | iova) & mask)
+ return -EINVAL;
+
+ /* Don't allow IOVA address wrap */
+ if (iova + size - 1 < iova)
+ return -EINVAL;
+
+ mutex_lock(&iommu->lock);
+
+ if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
+ ret = -EEXIST;
+ goto unlock;
+ }
+
+ dma = kzalloc(sizeof(*dma), GFP_KERNEL);
+ if (!dma) {
+ ret = -ENOMEM;
+ goto unlock;
+ }
+
+ dma->iova = iova;
+ dma->size = size;
+ dma->type = VFIO_IOVA_RESERVED;
+
+ ret = vfio_set_msi_aperture(iommu, iova, size);
+ if (ret)
+ goto free_unlock;
+
+ vfio_link_dma(iommu, dma);
+ goto unlock;
+
+free_unlock:
+ kfree(dma);
+unlock:
+ mutex_unlock(&iommu->lock);
+ return ret;
+}
+
static int vfio_bus_type(struct device *dev, void *data)
{
struct bus_type **bus = data;
@@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
} else if (cmd == VFIO_IOMMU_MAP_DMA) {
struct vfio_iommu_type1_dma_map map;
uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
- VFIO_DMA_MAP_FLAG_WRITE;
+ VFIO_DMA_MAP_FLAG_WRITE |
+ VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
@@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
if (map.argsz < minsz || map.flags & ~mask)
return -EINVAL;
+ if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
+ return vfio_register_msi_range(iommu, &map);
+
return vfio_dma_do_map(iommu, &map);
} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 255a211..4a9dbc2 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
*
* Map process virtual addresses to IO virtual addresses using the
* provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
+ * IOVA region that will be used on some platforms to map the host MSI frames.
+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
+ * IOVA region stays until the container is closed.
*/
struct vfio_iommu_type1_dma_map {
__u32 argsz;
__u32 flags;
#define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
#define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
__u64 vaddr; /* Process virtual address */
__u64 iova; /* IO virtual address */
__u64 size; /* Size of mapping (bytes) */
@@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
* Caller sets argsz. The actual unmapped size is returned in the size
* field. No guarantee is made to the user that arbitrary unmaps of iova
* or size different from those used in the original mapping call will
- * succeed.
+ * succeed. Once registered, an MSI region cannot be unmapped and stays
+ * until the container is closed.
*/
struct vfio_iommu_type1_dma_unmap {
__u32 argsz;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 4/7] vfio: allow reserved msi iova registration
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: linux-arm-kernel
The user is allowed to register a reserved MSI IOVA range by using the
DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
This region is stored in the vfio_dma rb tree. At that point the iova
range is not mapped to any target address yet. The host kernel will use
those iova when needed, typically when MSIs are allocated.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
v7 -> v8:
- use iommu_msi_set_aperture function. There is no notion of
unregistration anymore since the reserved msi slot remains
until the container gets closed.
v6 -> v7:
- use iommu_free_reserved_iova_domain
- convey prot attributes downto dma-reserved-iommu iova domain creation
- reserved bindings teardown now performed on iommu domain destruction
- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
- change title
- pass the protection attribute to dma-reserved-iommu API
v3 -> v4:
- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
- protect vfio_register_reserved_iova_range implementation with
CONFIG_IOMMU_DMA_RESERVED
- handle unregistration by user-space and on vfio_iommu_type1 release
v1 -> v2:
- set returned value according to alloc_reserved_iova_domain result
- free the iova domains in case any error occurs
RFC v1 -> v1:
- takes into account Alex comments, based on
[RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
- use the existing dma map/unmap ioctl interface with a flag to register
a reserved IOVA range. A single reserved iova region is allowed.
---
drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
include/uapi/linux/vfio.h | 10 +++++-
2 files changed, 86 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 94a9916..4d3a6f1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include <linux/workqueue.h>
+#include <linux/msi-iommu.h>
#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
@@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
vfio_lock_acct(-unlocked);
}
+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
+ dma_addr_t iova, size_t size)
+{
+ struct vfio_domain *d;
+ int ret = 0;
+
+ list_for_each_entry(d, &iommu->domain_list, next) {
+ ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
+ if (ret)
+ break;
+ }
+ return ret;
+}
+
static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
{
vfio_unmap_unpin(iommu, dma);
@@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
return ret;
}
+static int vfio_register_msi_range(struct vfio_iommu *iommu,
+ struct vfio_iommu_type1_dma_map *map)
+{
+ dma_addr_t iova = map->iova;
+ size_t size = map->size;
+ int ret = 0;
+ struct vfio_dma *dma;
+ unsigned long order;
+ uint64_t mask;
+
+ /* Verify that none of our __u64 fields overflow */
+ if (map->size != size || map->iova != iova)
+ return -EINVAL;
+
+ order = __ffs(vfio_pgsize_bitmap(iommu));
+ mask = ((uint64_t)1 << order) - 1;
+
+ WARN_ON(mask & PAGE_MASK);
+
+ if (!size || (size | iova) & mask)
+ return -EINVAL;
+
+ /* Don't allow IOVA address wrap */
+ if (iova + size - 1 < iova)
+ return -EINVAL;
+
+ mutex_lock(&iommu->lock);
+
+ if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
+ ret = -EEXIST;
+ goto unlock;
+ }
+
+ dma = kzalloc(sizeof(*dma), GFP_KERNEL);
+ if (!dma) {
+ ret = -ENOMEM;
+ goto unlock;
+ }
+
+ dma->iova = iova;
+ dma->size = size;
+ dma->type = VFIO_IOVA_RESERVED;
+
+ ret = vfio_set_msi_aperture(iommu, iova, size);
+ if (ret)
+ goto free_unlock;
+
+ vfio_link_dma(iommu, dma);
+ goto unlock;
+
+free_unlock:
+ kfree(dma);
+unlock:
+ mutex_unlock(&iommu->lock);
+ return ret;
+}
+
static int vfio_bus_type(struct device *dev, void *data)
{
struct bus_type **bus = data;
@@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
} else if (cmd == VFIO_IOMMU_MAP_DMA) {
struct vfio_iommu_type1_dma_map map;
uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
- VFIO_DMA_MAP_FLAG_WRITE;
+ VFIO_DMA_MAP_FLAG_WRITE |
+ VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
@@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
if (map.argsz < minsz || map.flags & ~mask)
return -EINVAL;
+ if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
+ return vfio_register_msi_range(iommu, &map);
+
return vfio_dma_do_map(iommu, &map);
} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 255a211..4a9dbc2 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
*
* Map process virtual addresses to IO virtual addresses using the
* provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case RESERVED_MSI_IOVA flag is set, the API only aims@registering an
+ * IOVA region that will be used on some platforms to map the host MSI frames.
+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
+ * IOVA region stays until the container is closed.
*/
struct vfio_iommu_type1_dma_map {
__u32 argsz;
__u32 flags;
#define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
#define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
__u64 vaddr; /* Process virtual address */
__u64 iova; /* IO virtual address */
__u64 size; /* Size of mapping (bytes) */
@@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
* Caller sets argsz. The actual unmapped size is returned in the size
* field. No guarantee is made to the user that arbitrary unmaps of iova
* or size different from those used in the original mapping call will
- * succeed.
+ * succeed. Once registered, an MSI region cannot be unmapped and stays
+ * until the container is closed.
*/
struct vfio_iommu_type1_dma_unmap {
__u32 argsz;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread[parent not found: <1462362858-2925-5-git-send-email-eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-05 19:22 ` Chalamarla, Tirumalesh
-1 siblings, 0 replies; 111+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-05-05 19:22 UTC (permalink / raw)
To: Eric Auger, eric.auger-qxv4g6HH51o@public.gmane.org,
robin.murphy-5wv7dgnIgG8@public.gmane.org,
alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
will.deacon-5wv7dgnIgG8@public.gmane.org,
joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org,
tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org,
jason-NLaQJdtUoK4Be96aLqz0jA@public.gmane.org,
marc.zyngier-5wv7dgnIgG8@public.gmane.org,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ@public.gmane.org,
patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
julien.grall-5wv7dgnIgG8@public.gmane.org,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org on behalf of eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>The user is allowed to register a reserved MSI IOVA range by using the
>DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>This region is stored in the vfio_dma rb tree. At that point the iova
>range is not mapped to any target address yet. The host kernel will use
>those iova when needed, typically when MSIs are allocated.
>
>Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>Signed-off-by: Bharat Bhushan <Bharat.Bhushan-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
>
>---
>v7 -> v8:
>- use iommu_msi_set_aperture function. There is no notion of
> unregistration anymore since the reserved msi slot remains
> until the container gets closed.
>
>v6 -> v7:
>- use iommu_free_reserved_iova_domain
>- convey prot attributes downto dma-reserved-iommu iova domain creation
>- reserved bindings teardown now performed on iommu domain destruction
>- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>- change title
>- pass the protection attribute to dma-reserved-iommu API
>
>v3 -> v4:
>- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>- protect vfio_register_reserved_iova_range implementation with
> CONFIG_IOMMU_DMA_RESERVED
>- handle unregistration by user-space and on vfio_iommu_type1 release
>
>v1 -> v2:
>- set returned value according to alloc_reserved_iova_domain result
>- free the iova domains in case any error occurs
>
>RFC v1 -> v1:
>- takes into account Alex comments, based on
> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>- use the existing dma map/unmap ioctl interface with a flag to register
> a reserved IOVA range. A single reserved iova region is allowed.
>---
> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> include/uapi/linux/vfio.h | 10 +++++-
> 2 files changed, 86 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 94a9916..4d3a6f1 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -36,6 +36,7 @@
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
>+#include <linux/msi-iommu.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>@@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> vfio_lock_acct(-unlocked);
> }
>
>+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>+ dma_addr_t iova, size_t size)
>+{
>+ struct vfio_domain *d;
>+ int ret = 0;
>+
>+ list_for_each_entry(d, &iommu->domain_list, next) {
>+ ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>+ if (ret)
>+ break;
>+ }
>+ return ret;
>+}
>+
> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> {
> vfio_unmap_unpin(iommu, dma);
>@@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> return ret;
> }
>
>+static int vfio_register_msi_range(struct vfio_iommu *iommu,
>+ struct vfio_iommu_type1_dma_map *map)
>+{
>+ dma_addr_t iova = map->iova;
>+ size_t size = map->size;
>+ int ret = 0;
>+ struct vfio_dma *dma;
>+ unsigned long order;
>+ uint64_t mask;
>+
>+ /* Verify that none of our __u64 fields overflow */
>+ if (map->size != size || map->iova != iova)
>+ return -EINVAL;
>+
>+ order = __ffs(vfio_pgsize_bitmap(iommu));
>+ mask = ((uint64_t)1 << order) - 1;
>+
>+ WARN_ON(mask & PAGE_MASK);
>+
>+ if (!size || (size | iova) & mask)
>+ return -EINVAL;
>+
>+ /* Don't allow IOVA address wrap */
>+ if (iova + size - 1 < iova)
>+ return -EINVAL;
>+
>+ mutex_lock(&iommu->lock);
>+
>+ if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>+ ret = -EEXIST;
>+ goto unlock;
>+ }
>+
>+ dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>+ if (!dma) {
>+ ret = -ENOMEM;
>+ goto unlock;
>+ }
>+
>+ dma->iova = iova;
>+ dma->size = size;
>+ dma->type = VFIO_IOVA_RESERVED;
>+
>+ ret = vfio_set_msi_aperture(iommu, iova, size);
>+ if (ret)
>+ goto free_unlock;
>+
>+ vfio_link_dma(iommu, dma);
>+ goto unlock;
>+
>+free_unlock:
>+ kfree(dma);
>+unlock:
>+ mutex_unlock(&iommu->lock);
>+ return ret;
>+}
>+
> static int vfio_bus_type(struct device *dev, void *data)
> {
> struct bus_type **bus = data;
>@@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
> struct vfio_iommu_type1_dma_map map;
> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>- VFIO_DMA_MAP_FLAG_WRITE;
>+ VFIO_DMA_MAP_FLAG_WRITE |
>+ VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>
> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>
>@@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> if (map.argsz < minsz || map.flags & ~mask)
> return -EINVAL;
>
>+ if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>+ return vfio_register_msi_range(iommu, &map);
>+
How are we getting the PA and any validation needed?
> return vfio_dma_do_map(iommu, &map);
>
> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>index 255a211..4a9dbc2 100644
>--- a/include/uapi/linux/vfio.h
>+++ b/include/uapi/linux/vfio.h
>@@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
> *
> * Map process virtual addresses to IO virtual addresses using the
> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>+ *
>+ * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>+ * IOVA region that will be used on some platforms to map the host MSI frames.
>+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>+ * IOVA region stays until the container is closed.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
> __u32 flags;
> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
>+/* reserved iova for MSI vectors*/
>+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
> __u64 vaddr; /* Process virtual address */
> __u64 iova; /* IO virtual address */
> __u64 size; /* Size of mapping (bytes) */
>@@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
> * Caller sets argsz. The actual unmapped size is returned in the size
> * field. No guarantee is made to the user that arbitrary unmaps of iova
> * or size different from those used in the original mapping call will
>- * succeed.
>+ * succeed. Once registered, an MSI region cannot be unmapped and stays
>+ * until the container is closed.
> */
> struct vfio_iommu_type1_dma_unmap {
> __u32 argsz;
>--
>1.9.1
>
>
>_______________________________________________
>linux-arm-kernel mailing list
>linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
@ 2016-05-05 19:22 ` Chalamarla, Tirumalesh
0 siblings, 0 replies; 111+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-05-05 19:22 UTC (permalink / raw)
To: Eric Auger, eric.auger@st.com, robin.murphy@arm.com,
alex.williamson@redhat.com, will.deacon@arm.com, joro@8bytes.org,
tglx@linutronix.de, jason@lakedaemon.net, marc.zyngier@arm.com,
christoffer.dall@linaro.org, linux-arm-kernel@lists.infradead.org
Cc: julien.grall@arm.com, patches@linaro.org,
Jean-Philippe.Brucker@arm.com, p.fedin@samsung.com,
linux-kernel@vger.kernel.org, Bharat.Bhushan@freescale.com,
iommu@lists.linux-foundation.org, pranav.sawargaonkar@gmail.com,
yehuday@marvell.com
On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces@lists.infradead.org on behalf of eric.auger@linaro.org> wrote:
>The user is allowed to register a reserved MSI IOVA range by using the
>DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>This region is stored in the vfio_dma rb tree. At that point the iova
>range is not mapped to any target address yet. The host kernel will use
>those iova when needed, typically when MSIs are allocated.
>
>Signed-off-by: Eric Auger <eric.auger@linaro.org>
>Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>
>---
>v7 -> v8:
>- use iommu_msi_set_aperture function. There is no notion of
> unregistration anymore since the reserved msi slot remains
> until the container gets closed.
>
>v6 -> v7:
>- use iommu_free_reserved_iova_domain
>- convey prot attributes downto dma-reserved-iommu iova domain creation
>- reserved bindings teardown now performed on iommu domain destruction
>- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>- change title
>- pass the protection attribute to dma-reserved-iommu API
>
>v3 -> v4:
>- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>- protect vfio_register_reserved_iova_range implementation with
> CONFIG_IOMMU_DMA_RESERVED
>- handle unregistration by user-space and on vfio_iommu_type1 release
>
>v1 -> v2:
>- set returned value according to alloc_reserved_iova_domain result
>- free the iova domains in case any error occurs
>
>RFC v1 -> v1:
>- takes into account Alex comments, based on
> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>- use the existing dma map/unmap ioctl interface with a flag to register
> a reserved IOVA range. A single reserved iova region is allowed.
>---
> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> include/uapi/linux/vfio.h | 10 +++++-
> 2 files changed, 86 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 94a9916..4d3a6f1 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -36,6 +36,7 @@
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
>+#include <linux/msi-iommu.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>@@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> vfio_lock_acct(-unlocked);
> }
>
>+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>+ dma_addr_t iova, size_t size)
>+{
>+ struct vfio_domain *d;
>+ int ret = 0;
>+
>+ list_for_each_entry(d, &iommu->domain_list, next) {
>+ ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>+ if (ret)
>+ break;
>+ }
>+ return ret;
>+}
>+
> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> {
> vfio_unmap_unpin(iommu, dma);
>@@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> return ret;
> }
>
>+static int vfio_register_msi_range(struct vfio_iommu *iommu,
>+ struct vfio_iommu_type1_dma_map *map)
>+{
>+ dma_addr_t iova = map->iova;
>+ size_t size = map->size;
>+ int ret = 0;
>+ struct vfio_dma *dma;
>+ unsigned long order;
>+ uint64_t mask;
>+
>+ /* Verify that none of our __u64 fields overflow */
>+ if (map->size != size || map->iova != iova)
>+ return -EINVAL;
>+
>+ order = __ffs(vfio_pgsize_bitmap(iommu));
>+ mask = ((uint64_t)1 << order) - 1;
>+
>+ WARN_ON(mask & PAGE_MASK);
>+
>+ if (!size || (size | iova) & mask)
>+ return -EINVAL;
>+
>+ /* Don't allow IOVA address wrap */
>+ if (iova + size - 1 < iova)
>+ return -EINVAL;
>+
>+ mutex_lock(&iommu->lock);
>+
>+ if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>+ ret = -EEXIST;
>+ goto unlock;
>+ }
>+
>+ dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>+ if (!dma) {
>+ ret = -ENOMEM;
>+ goto unlock;
>+ }
>+
>+ dma->iova = iova;
>+ dma->size = size;
>+ dma->type = VFIO_IOVA_RESERVED;
>+
>+ ret = vfio_set_msi_aperture(iommu, iova, size);
>+ if (ret)
>+ goto free_unlock;
>+
>+ vfio_link_dma(iommu, dma);
>+ goto unlock;
>+
>+free_unlock:
>+ kfree(dma);
>+unlock:
>+ mutex_unlock(&iommu->lock);
>+ return ret;
>+}
>+
> static int vfio_bus_type(struct device *dev, void *data)
> {
> struct bus_type **bus = data;
>@@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
> struct vfio_iommu_type1_dma_map map;
> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>- VFIO_DMA_MAP_FLAG_WRITE;
>+ VFIO_DMA_MAP_FLAG_WRITE |
>+ VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>
> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>
>@@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> if (map.argsz < minsz || map.flags & ~mask)
> return -EINVAL;
>
>+ if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>+ return vfio_register_msi_range(iommu, &map);
>+
How are we getting the PA and any validation needed?
> return vfio_dma_do_map(iommu, &map);
>
> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>index 255a211..4a9dbc2 100644
>--- a/include/uapi/linux/vfio.h
>+++ b/include/uapi/linux/vfio.h
>@@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
> *
> * Map process virtual addresses to IO virtual addresses using the
> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>+ *
>+ * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>+ * IOVA region that will be used on some platforms to map the host MSI frames.
>+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>+ * IOVA region stays until the container is closed.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
> __u32 flags;
> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
>+/* reserved iova for MSI vectors*/
>+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
> __u64 vaddr; /* Process virtual address */
> __u64 iova; /* IO virtual address */
> __u64 size; /* Size of mapping (bytes) */
>@@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
> * Caller sets argsz. The actual unmapped size is returned in the size
> * field. No guarantee is made to the user that arbitrary unmaps of iova
> * or size different from those used in the original mapping call will
>- * succeed.
>+ * succeed. Once registered, an MSI region cannot be unmapped and stays
>+ * until the container is closed.
> */
> struct vfio_iommu_type1_dma_unmap {
> __u32 argsz;
>--
>1.9.1
>
>
>_______________________________________________
>linux-arm-kernel mailing list
>linux-arm-kernel@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 4/7] vfio: allow reserved msi iova registration
@ 2016-05-05 19:22 ` Chalamarla, Tirumalesh
0 siblings, 0 replies; 111+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-05-05 19:22 UTC (permalink / raw)
To: linux-arm-kernel
On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces at lists.infradead.org on behalf of eric.auger@linaro.org> wrote:
>The user is allowed to register a reserved MSI IOVA range by using the
>DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>This region is stored in the vfio_dma rb tree. At that point the iova
>range is not mapped to any target address yet. The host kernel will use
>those iova when needed, typically when MSIs are allocated.
>
>Signed-off-by: Eric Auger <eric.auger@linaro.org>
>Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>
>---
>v7 -> v8:
>- use iommu_msi_set_aperture function. There is no notion of
> unregistration anymore since the reserved msi slot remains
> until the container gets closed.
>
>v6 -> v7:
>- use iommu_free_reserved_iova_domain
>- convey prot attributes downto dma-reserved-iommu iova domain creation
>- reserved bindings teardown now performed on iommu domain destruction
>- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>- change title
>- pass the protection attribute to dma-reserved-iommu API
>
>v3 -> v4:
>- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>- protect vfio_register_reserved_iova_range implementation with
> CONFIG_IOMMU_DMA_RESERVED
>- handle unregistration by user-space and on vfio_iommu_type1 release
>
>v1 -> v2:
>- set returned value according to alloc_reserved_iova_domain result
>- free the iova domains in case any error occurs
>
>RFC v1 -> v1:
>- takes into account Alex comments, based on
> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>- use the existing dma map/unmap ioctl interface with a flag to register
> a reserved IOVA range. A single reserved iova region is allowed.
>---
> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> include/uapi/linux/vfio.h | 10 +++++-
> 2 files changed, 86 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 94a9916..4d3a6f1 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -36,6 +36,7 @@
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
>+#include <linux/msi-iommu.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>@@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> vfio_lock_acct(-unlocked);
> }
>
>+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>+ dma_addr_t iova, size_t size)
>+{
>+ struct vfio_domain *d;
>+ int ret = 0;
>+
>+ list_for_each_entry(d, &iommu->domain_list, next) {
>+ ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>+ if (ret)
>+ break;
>+ }
>+ return ret;
>+}
>+
> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> {
> vfio_unmap_unpin(iommu, dma);
>@@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> return ret;
> }
>
>+static int vfio_register_msi_range(struct vfio_iommu *iommu,
>+ struct vfio_iommu_type1_dma_map *map)
>+{
>+ dma_addr_t iova = map->iova;
>+ size_t size = map->size;
>+ int ret = 0;
>+ struct vfio_dma *dma;
>+ unsigned long order;
>+ uint64_t mask;
>+
>+ /* Verify that none of our __u64 fields overflow */
>+ if (map->size != size || map->iova != iova)
>+ return -EINVAL;
>+
>+ order = __ffs(vfio_pgsize_bitmap(iommu));
>+ mask = ((uint64_t)1 << order) - 1;
>+
>+ WARN_ON(mask & PAGE_MASK);
>+
>+ if (!size || (size | iova) & mask)
>+ return -EINVAL;
>+
>+ /* Don't allow IOVA address wrap */
>+ if (iova + size - 1 < iova)
>+ return -EINVAL;
>+
>+ mutex_lock(&iommu->lock);
>+
>+ if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>+ ret = -EEXIST;
>+ goto unlock;
>+ }
>+
>+ dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>+ if (!dma) {
>+ ret = -ENOMEM;
>+ goto unlock;
>+ }
>+
>+ dma->iova = iova;
>+ dma->size = size;
>+ dma->type = VFIO_IOVA_RESERVED;
>+
>+ ret = vfio_set_msi_aperture(iommu, iova, size);
>+ if (ret)
>+ goto free_unlock;
>+
>+ vfio_link_dma(iommu, dma);
>+ goto unlock;
>+
>+free_unlock:
>+ kfree(dma);
>+unlock:
>+ mutex_unlock(&iommu->lock);
>+ return ret;
>+}
>+
> static int vfio_bus_type(struct device *dev, void *data)
> {
> struct bus_type **bus = data;
>@@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
> struct vfio_iommu_type1_dma_map map;
> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>- VFIO_DMA_MAP_FLAG_WRITE;
>+ VFIO_DMA_MAP_FLAG_WRITE |
>+ VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>
> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>
>@@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> if (map.argsz < minsz || map.flags & ~mask)
> return -EINVAL;
>
>+ if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>+ return vfio_register_msi_range(iommu, &map);
>+
How are we getting the PA and any validation needed?
> return vfio_dma_do_map(iommu, &map);
>
> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>index 255a211..4a9dbc2 100644
>--- a/include/uapi/linux/vfio.h
>+++ b/include/uapi/linux/vfio.h
>@@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
> *
> * Map process virtual addresses to IO virtual addresses using the
> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>+ *
>+ * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>+ * IOVA region that will be used on some platforms to map the host MSI frames.
>+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>+ * IOVA region stays until the container is closed.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
> __u32 flags;
> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
>+/* reserved iova for MSI vectors*/
>+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
> __u64 vaddr; /* Process virtual address */
> __u64 iova; /* IO virtual address */
> __u64 size; /* Size of mapping (bytes) */
>@@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
> * Caller sets argsz. The actual unmapped size is returned in the size
> * field. No guarantee is made to the user that arbitrary unmaps of iova
> * or size different from those used in the original mapping call will
>- * succeed.
>+ * succeed. Once registered, an MSI region cannot be unmapped and stays
>+ * until the container is closed.
> */
> struct vfio_iommu_type1_dma_unmap {
> __u32 argsz;
>--
>1.9.1
>
>
>_______________________________________________
>linux-arm-kernel mailing list
>linux-arm-kernel at lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <F50F9BB4-1D62-4CAF-BC0E-BE97D222E5A0-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
2016-05-05 19:22 ` Chalamarla, Tirumalesh
(?)
@ 2016-05-09 7:55 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-09 7:55 UTC (permalink / raw)
To: Chalamarla, Tirumalesh, eric.auger-qxv4g6HH51o@public.gmane.org,
robin.murphy-5wv7dgnIgG8@public.gmane.org,
alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
will.deacon-5wv7dgnIgG8@public.gmane.org,
joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org,
tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org,
jason-NLaQJdtUoK4Be96aLqz0jA@public.gmane.org,
marc.zyngier-5wv7dgnIgG8@public.gmane.org,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ@public.gmane.org,
patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
julien.grall-5wv7dgnIgG8@public.gmane.org,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Hi Chalamarla,
On 05/05/2016 09:22 PM, Chalamarla, Tirumalesh wrote:
>
>
>
>
>
> On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org on behalf of eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
>>
>> ---
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>> unregistration anymore since the reserved msi slot remains
>> until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>> CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>> a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>> include/uapi/linux/vfio.h | 10 +++++-
>> 2 files changed, 86 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 94a9916..4d3a6f1 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>> #include <linux/uaccess.h>
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> +#include <linux/msi-iommu.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> vfio_lock_acct(-unlocked);
>> }
>>
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> + dma_addr_t iova, size_t size)
>> +{
>> + struct vfio_domain *d;
>> + int ret = 0;
>> +
>> + list_for_each_entry(d, &iommu->domain_list, next) {
>> + ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>> + if (ret)
>> + break;
>> + }
>> + return ret;
>> +}
>> +
>> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> {
>> vfio_unmap_unpin(iommu, dma);
>> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>> return ret;
>> }
>>
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> + struct vfio_iommu_type1_dma_map *map)
>> +{
>> + dma_addr_t iova = map->iova;
>> + size_t size = map->size;
>> + int ret = 0;
>> + struct vfio_dma *dma;
>> + unsigned long order;
>> + uint64_t mask;
>> +
>> + /* Verify that none of our __u64 fields overflow */
>> + if (map->size != size || map->iova != iova)
>> + return -EINVAL;
>> +
>> + order = __ffs(vfio_pgsize_bitmap(iommu));
>> + mask = ((uint64_t)1 << order) - 1;
>> +
>> + WARN_ON(mask & PAGE_MASK);
>> +
>> + if (!size || (size | iova) & mask)
>> + return -EINVAL;
>> +
>> + /* Don't allow IOVA address wrap */
>> + if (iova + size - 1 < iova)
>> + return -EINVAL;
>> +
>> + mutex_lock(&iommu->lock);
>> +
>> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> + ret = -EEXIST;
>> + goto unlock;
>> + }
>> +
>> + dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> + if (!dma) {
>> + ret = -ENOMEM;
>> + goto unlock;
>> + }
>> +
>> + dma->iova = iova;
>> + dma->size = size;
>> + dma->type = VFIO_IOVA_RESERVED;
>> +
>> + ret = vfio_set_msi_aperture(iommu, iova, size);
>> + if (ret)
>> + goto free_unlock;
>> +
>> + vfio_link_dma(iommu, dma);
>> + goto unlock;
>> +
>> +free_unlock:
>> + kfree(dma);
>> +unlock:
>> + mutex_unlock(&iommu->lock);
>> + return ret;
>> +}
>> +
>> static int vfio_bus_type(struct device *dev, void *data)
>> {
>> struct bus_type **bus = data;
>> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
>> struct vfio_iommu_type1_dma_map map;
>> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>> - VFIO_DMA_MAP_FLAG_WRITE;
>> + VFIO_DMA_MAP_FLAG_WRITE |
>> + VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>>
>> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>>
>> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> if (map.argsz < minsz || map.flags & ~mask)
>> return -EINVAL;
>>
>> + if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>> + return vfio_register_msi_range(iommu, &map);
>> +
>
> How are we getting the PA and any validation needed?
Sorry I am not sure I get your question? The principle is the userspace
registers a spare Guest PA range that can be safely used by the kernel
to map host MSI frames. In QEMU integration we use platform bus unmapped
GPA to make things transparent for the virt machine.
Any validation: we currently make sure the registered GPA was not
already registered as RAM regions. The kernel also checks it is properly
aligned according to the IOMMU page size.
Please let me know if I answered your question.
Best Regards
Eric
>> return vfio_dma_do_map(iommu, &map);
>>
>> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 255a211..4a9dbc2 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>> *
>> * Map process virtual addresses to IO virtual addresses using the
>> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>> + *
>> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>> + * IOVA region that will be used on some platforms to map the host MSI frames.
>> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> + * IOVA region stays until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_map {
>> __u32 argsz;
>> __u32 flags;
>> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
>> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
>> +/* reserved iova for MSI vectors*/
>> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>> __u64 vaddr; /* Process virtual address */
>> __u64 iova; /* IO virtual address */
>> __u64 size; /* Size of mapping (bytes) */
>> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>> * Caller sets argsz. The actual unmapped size is returned in the size
>> * field. No guarantee is made to the user that arbitrary unmaps of iova
>> * or size different from those used in the original mapping call will
>> - * succeed.
>> + * succeed. Once registered, an MSI region cannot be unmapped and stays
>> + * until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_unmap {
>> __u32 argsz;
>> --
>> 1.9.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
@ 2016-05-09 7:55 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-09 7:55 UTC (permalink / raw)
To: Chalamarla, Tirumalesh, eric.auger@st.com, robin.murphy@arm.com,
alex.williamson@redhat.com, will.deacon@arm.com, joro@8bytes.org,
tglx@linutronix.de, jason@lakedaemon.net, marc.zyngier@arm.com,
christoffer.dall@linaro.org, linux-arm-kernel@lists.infradead.org
Cc: julien.grall@arm.com, patches@linaro.org,
Jean-Philippe.Brucker@arm.com, p.fedin@samsung.com,
linux-kernel@vger.kernel.org, Bharat.Bhushan@freescale.com,
iommu@lists.linux-foundation.org, pranav.sawargaonkar@gmail.com,
yehuday@marvell.com
Hi Chalamarla,
On 05/05/2016 09:22 PM, Chalamarla, Tirumalesh wrote:
>
>
>
>
>
> On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces@lists.infradead.org on behalf of eric.auger@linaro.org> wrote:
>
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>>
>> ---
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>> unregistration anymore since the reserved msi slot remains
>> until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>> CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>> a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>> include/uapi/linux/vfio.h | 10 +++++-
>> 2 files changed, 86 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 94a9916..4d3a6f1 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>> #include <linux/uaccess.h>
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> +#include <linux/msi-iommu.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> vfio_lock_acct(-unlocked);
>> }
>>
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> + dma_addr_t iova, size_t size)
>> +{
>> + struct vfio_domain *d;
>> + int ret = 0;
>> +
>> + list_for_each_entry(d, &iommu->domain_list, next) {
>> + ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>> + if (ret)
>> + break;
>> + }
>> + return ret;
>> +}
>> +
>> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> {
>> vfio_unmap_unpin(iommu, dma);
>> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>> return ret;
>> }
>>
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> + struct vfio_iommu_type1_dma_map *map)
>> +{
>> + dma_addr_t iova = map->iova;
>> + size_t size = map->size;
>> + int ret = 0;
>> + struct vfio_dma *dma;
>> + unsigned long order;
>> + uint64_t mask;
>> +
>> + /* Verify that none of our __u64 fields overflow */
>> + if (map->size != size || map->iova != iova)
>> + return -EINVAL;
>> +
>> + order = __ffs(vfio_pgsize_bitmap(iommu));
>> + mask = ((uint64_t)1 << order) - 1;
>> +
>> + WARN_ON(mask & PAGE_MASK);
>> +
>> + if (!size || (size | iova) & mask)
>> + return -EINVAL;
>> +
>> + /* Don't allow IOVA address wrap */
>> + if (iova + size - 1 < iova)
>> + return -EINVAL;
>> +
>> + mutex_lock(&iommu->lock);
>> +
>> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> + ret = -EEXIST;
>> + goto unlock;
>> + }
>> +
>> + dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> + if (!dma) {
>> + ret = -ENOMEM;
>> + goto unlock;
>> + }
>> +
>> + dma->iova = iova;
>> + dma->size = size;
>> + dma->type = VFIO_IOVA_RESERVED;
>> +
>> + ret = vfio_set_msi_aperture(iommu, iova, size);
>> + if (ret)
>> + goto free_unlock;
>> +
>> + vfio_link_dma(iommu, dma);
>> + goto unlock;
>> +
>> +free_unlock:
>> + kfree(dma);
>> +unlock:
>> + mutex_unlock(&iommu->lock);
>> + return ret;
>> +}
>> +
>> static int vfio_bus_type(struct device *dev, void *data)
>> {
>> struct bus_type **bus = data;
>> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
>> struct vfio_iommu_type1_dma_map map;
>> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>> - VFIO_DMA_MAP_FLAG_WRITE;
>> + VFIO_DMA_MAP_FLAG_WRITE |
>> + VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>>
>> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>>
>> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> if (map.argsz < minsz || map.flags & ~mask)
>> return -EINVAL;
>>
>> + if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>> + return vfio_register_msi_range(iommu, &map);
>> +
>
> How are we getting the PA and any validation needed?
Sorry I am not sure I get your question? The principle is the userspace
registers a spare Guest PA range that can be safely used by the kernel
to map host MSI frames. In QEMU integration we use platform bus unmapped
GPA to make things transparent for the virt machine.
Any validation: we currently make sure the registered GPA was not
already registered as RAM regions. The kernel also checks it is properly
aligned according to the IOMMU page size.
Please let me know if I answered your question.
Best Regards
Eric
>> return vfio_dma_do_map(iommu, &map);
>>
>> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 255a211..4a9dbc2 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>> *
>> * Map process virtual addresses to IO virtual addresses using the
>> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>> + *
>> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>> + * IOVA region that will be used on some platforms to map the host MSI frames.
>> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> + * IOVA region stays until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_map {
>> __u32 argsz;
>> __u32 flags;
>> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
>> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
>> +/* reserved iova for MSI vectors*/
>> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>> __u64 vaddr; /* Process virtual address */
>> __u64 iova; /* IO virtual address */
>> __u64 size; /* Size of mapping (bytes) */
>> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>> * Caller sets argsz. The actual unmapped size is returned in the size
>> * field. No guarantee is made to the user that arbitrary unmaps of iova
>> * or size different from those used in the original mapping call will
>> - * succeed.
>> + * succeed. Once registered, an MSI region cannot be unmapped and stays
>> + * until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_unmap {
>> __u32 argsz;
>> --
>> 1.9.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 4/7] vfio: allow reserved msi iova registration
@ 2016-05-09 7:55 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-09 7:55 UTC (permalink / raw)
To: linux-arm-kernel
Hi Chalamarla,
On 05/05/2016 09:22 PM, Chalamarla, Tirumalesh wrote:
>
>
>
>
>
> On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces at lists.infradead.org on behalf of eric.auger@linaro.org> wrote:
>
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>>
>> ---
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>> unregistration anymore since the reserved msi slot remains
>> until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>> CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>> a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>> include/uapi/linux/vfio.h | 10 +++++-
>> 2 files changed, 86 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 94a9916..4d3a6f1 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>> #include <linux/uaccess.h>
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> +#include <linux/msi-iommu.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> vfio_lock_acct(-unlocked);
>> }
>>
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> + dma_addr_t iova, size_t size)
>> +{
>> + struct vfio_domain *d;
>> + int ret = 0;
>> +
>> + list_for_each_entry(d, &iommu->domain_list, next) {
>> + ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>> + if (ret)
>> + break;
>> + }
>> + return ret;
>> +}
>> +
>> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> {
>> vfio_unmap_unpin(iommu, dma);
>> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>> return ret;
>> }
>>
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> + struct vfio_iommu_type1_dma_map *map)
>> +{
>> + dma_addr_t iova = map->iova;
>> + size_t size = map->size;
>> + int ret = 0;
>> + struct vfio_dma *dma;
>> + unsigned long order;
>> + uint64_t mask;
>> +
>> + /* Verify that none of our __u64 fields overflow */
>> + if (map->size != size || map->iova != iova)
>> + return -EINVAL;
>> +
>> + order = __ffs(vfio_pgsize_bitmap(iommu));
>> + mask = ((uint64_t)1 << order) - 1;
>> +
>> + WARN_ON(mask & PAGE_MASK);
>> +
>> + if (!size || (size | iova) & mask)
>> + return -EINVAL;
>> +
>> + /* Don't allow IOVA address wrap */
>> + if (iova + size - 1 < iova)
>> + return -EINVAL;
>> +
>> + mutex_lock(&iommu->lock);
>> +
>> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> + ret = -EEXIST;
>> + goto unlock;
>> + }
>> +
>> + dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> + if (!dma) {
>> + ret = -ENOMEM;
>> + goto unlock;
>> + }
>> +
>> + dma->iova = iova;
>> + dma->size = size;
>> + dma->type = VFIO_IOVA_RESERVED;
>> +
>> + ret = vfio_set_msi_aperture(iommu, iova, size);
>> + if (ret)
>> + goto free_unlock;
>> +
>> + vfio_link_dma(iommu, dma);
>> + goto unlock;
>> +
>> +free_unlock:
>> + kfree(dma);
>> +unlock:
>> + mutex_unlock(&iommu->lock);
>> + return ret;
>> +}
>> +
>> static int vfio_bus_type(struct device *dev, void *data)
>> {
>> struct bus_type **bus = data;
>> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
>> struct vfio_iommu_type1_dma_map map;
>> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>> - VFIO_DMA_MAP_FLAG_WRITE;
>> + VFIO_DMA_MAP_FLAG_WRITE |
>> + VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>>
>> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>>
>> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> if (map.argsz < minsz || map.flags & ~mask)
>> return -EINVAL;
>>
>> + if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>> + return vfio_register_msi_range(iommu, &map);
>> +
>
> How are we getting the PA and any validation needed?
Sorry I am not sure I get your question? The principle is the userspace
registers a spare Guest PA range that can be safely used by the kernel
to map host MSI frames. In QEMU integration we use platform bus unmapped
GPA to make things transparent for the virt machine.
Any validation: we currently make sure the registered GPA was not
already registered as RAM regions. The kernel also checks it is properly
aligned according to the IOMMU page size.
Please let me know if I answered your question.
Best Regards
Eric
>> return vfio_dma_do_map(iommu, &map);
>>
>> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 255a211..4a9dbc2 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>> *
>> * Map process virtual addresses to IO virtual addresses using the
>> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>> + *
>> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>> + * IOVA region that will be used on some platforms to map the host MSI frames.
>> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> + * IOVA region stays until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_map {
>> __u32 argsz;
>> __u32 flags;
>> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
>> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
>> +/* reserved iova for MSI vectors*/
>> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>> __u64 vaddr; /* Process virtual address */
>> __u64 iova; /* IO virtual address */
>> __u64 size; /* Size of mapping (bytes) */
>> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>> * Caller sets argsz. The actual unmapped size is returned in the size
>> * field. No guarantee is made to the user that arbitrary unmaps of iova
>> * or size different from those used in the original mapping call will
>> - * succeed.
>> + * succeed. Once registered, an MSI region cannot be unmapped and stays
>> + * until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_unmap {
>> __u32 argsz;
>> --
>> 1.9.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread
* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-10 15:29 ` Alex Williamson
-1 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-10 15:29 UTC (permalink / raw)
To: Eric Auger
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
On Wed, 4 May 2016 11:54:15 +0000
Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> The user is allowed to register a reserved MSI IOVA range by using the
> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
> This region is stored in the vfio_dma rb tree. At that point the iova
> range is not mapped to any target address yet. The host kernel will use
> those iova when needed, typically when MSIs are allocated.
>
> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
>
> ---
> v7 -> v8:
> - use iommu_msi_set_aperture function. There is no notion of
> unregistration anymore since the reserved msi slot remains
> until the container gets closed.
>
> v6 -> v7:
> - use iommu_free_reserved_iova_domain
> - convey prot attributes downto dma-reserved-iommu iova domain creation
> - reserved bindings teardown now performed on iommu domain destruction
> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
> - change title
> - pass the protection attribute to dma-reserved-iommu API
>
> v3 -> v4:
> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
> - protect vfio_register_reserved_iova_range implementation with
> CONFIG_IOMMU_DMA_RESERVED
> - handle unregistration by user-space and on vfio_iommu_type1 release
>
> v1 -> v2:
> - set returned value according to alloc_reserved_iova_domain result
> - free the iova domains in case any error occurs
>
> RFC v1 -> v1:
> - takes into account Alex comments, based on
> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
> - use the existing dma map/unmap ioctl interface with a flag to register
> a reserved IOVA range. A single reserved iova region is allowed.
> ---
> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> include/uapi/linux/vfio.h | 10 +++++-
> 2 files changed, 86 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 94a9916..4d3a6f1 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -36,6 +36,7 @@
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
> +#include <linux/msi-iommu.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> vfio_lock_acct(-unlocked);
> }
>
> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
> + dma_addr_t iova, size_t size)
> +{
> + struct vfio_domain *d;
> + int ret = 0;
> +
> + list_for_each_entry(d, &iommu->domain_list, next) {
> + ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
> + if (ret)
> + break;
> + }
> + return ret;
> +}
> +
> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> {
> vfio_unmap_unpin(iommu, dma);
> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> return ret;
> }
>
> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
> + struct vfio_iommu_type1_dma_map *map)
> +{
> + dma_addr_t iova = map->iova;
> + size_t size = map->size;
> + int ret = 0;
> + struct vfio_dma *dma;
> + unsigned long order;
> + uint64_t mask;
> +
> + /* Verify that none of our __u64 fields overflow */
> + if (map->size != size || map->iova != iova)
> + return -EINVAL;
> +
> + order = __ffs(vfio_pgsize_bitmap(iommu));
> + mask = ((uint64_t)1 << order) - 1;
> +
> + WARN_ON(mask & PAGE_MASK);
> +
> + if (!size || (size | iova) & mask)
> + return -EINVAL;
> +
> + /* Don't allow IOVA address wrap */
> + if (iova + size - 1 < iova)
> + return -EINVAL;
> +
> + mutex_lock(&iommu->lock);
> +
> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
> + ret = -EEXIST;
> + goto unlock;
> + }
> +
> + dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> + if (!dma) {
> + ret = -ENOMEM;
> + goto unlock;
> + }
> +
> + dma->iova = iova;
> + dma->size = size;
> + dma->type = VFIO_IOVA_RESERVED;
[oops, forgot to send this reply with the others]
I'm tempted to suggest we set type explicitly in the USER case too just
to make that abundantly clear rather than taking advantage of the
kzalloc struct.
> +
> + ret = vfio_set_msi_aperture(iommu, iova, size);
> + if (ret)
> + goto free_unlock;
> +
> + vfio_link_dma(iommu, dma);
> + goto unlock;
> +
> +free_unlock:
> + kfree(dma);
> +unlock:
> + mutex_unlock(&iommu->lock);
> + return ret;
> +}
> +
> static int vfio_bus_type(struct device *dev, void *data)
> {
> struct bus_type **bus = data;
> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
> struct vfio_iommu_type1_dma_map map;
> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
> - VFIO_DMA_MAP_FLAG_WRITE;
> + VFIO_DMA_MAP_FLAG_WRITE |
> + VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>
> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>
> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> if (map.argsz < minsz || map.flags & ~mask)
> return -EINVAL;
>
> + if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
> + return vfio_register_msi_range(iommu, &map);
> +
> return vfio_dma_do_map(iommu, &map);
>
> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 255a211..4a9dbc2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
> *
> * Map process virtual addresses to IO virtual addresses using the
> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
> + *
> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
> + * IOVA region that will be used on some platforms to map the host MSI frames.
> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> + * IOVA region stays until the container is closed.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
> __u32 flags;
> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
> +/* reserved iova for MSI vectors*/
> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
> __u64 vaddr; /* Process virtual address */
> __u64 iova; /* IO virtual address */
> __u64 size; /* Size of mapping (bytes) */
> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
> * Caller sets argsz. The actual unmapped size is returned in the size
> * field. No guarantee is made to the user that arbitrary unmaps of iova
> * or size different from those used in the original mapping call will
> - * succeed.
> + * succeed. Once registered, an MSI region cannot be unmapped and stays
> + * until the container is closed.
> */
> struct vfio_iommu_type1_dma_unmap {
> __u32 argsz;
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
@ 2016-05-10 15:29 ` Alex Williamson
0 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-10 15:29 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
On Wed, 4 May 2016 11:54:15 +0000
Eric Auger <eric.auger@linaro.org> wrote:
> The user is allowed to register a reserved MSI IOVA range by using the
> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
> This region is stored in the vfio_dma rb tree. At that point the iova
> range is not mapped to any target address yet. The host kernel will use
> those iova when needed, typically when MSIs are allocated.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>
> ---
> v7 -> v8:
> - use iommu_msi_set_aperture function. There is no notion of
> unregistration anymore since the reserved msi slot remains
> until the container gets closed.
>
> v6 -> v7:
> - use iommu_free_reserved_iova_domain
> - convey prot attributes downto dma-reserved-iommu iova domain creation
> - reserved bindings teardown now performed on iommu domain destruction
> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
> - change title
> - pass the protection attribute to dma-reserved-iommu API
>
> v3 -> v4:
> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
> - protect vfio_register_reserved_iova_range implementation with
> CONFIG_IOMMU_DMA_RESERVED
> - handle unregistration by user-space and on vfio_iommu_type1 release
>
> v1 -> v2:
> - set returned value according to alloc_reserved_iova_domain result
> - free the iova domains in case any error occurs
>
> RFC v1 -> v1:
> - takes into account Alex comments, based on
> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
> - use the existing dma map/unmap ioctl interface with a flag to register
> a reserved IOVA range. A single reserved iova region is allowed.
> ---
> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> include/uapi/linux/vfio.h | 10 +++++-
> 2 files changed, 86 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 94a9916..4d3a6f1 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -36,6 +36,7 @@
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
> +#include <linux/msi-iommu.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> vfio_lock_acct(-unlocked);
> }
>
> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
> + dma_addr_t iova, size_t size)
> +{
> + struct vfio_domain *d;
> + int ret = 0;
> +
> + list_for_each_entry(d, &iommu->domain_list, next) {
> + ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
> + if (ret)
> + break;
> + }
> + return ret;
> +}
> +
> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> {
> vfio_unmap_unpin(iommu, dma);
> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> return ret;
> }
>
> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
> + struct vfio_iommu_type1_dma_map *map)
> +{
> + dma_addr_t iova = map->iova;
> + size_t size = map->size;
> + int ret = 0;
> + struct vfio_dma *dma;
> + unsigned long order;
> + uint64_t mask;
> +
> + /* Verify that none of our __u64 fields overflow */
> + if (map->size != size || map->iova != iova)
> + return -EINVAL;
> +
> + order = __ffs(vfio_pgsize_bitmap(iommu));
> + mask = ((uint64_t)1 << order) - 1;
> +
> + WARN_ON(mask & PAGE_MASK);
> +
> + if (!size || (size | iova) & mask)
> + return -EINVAL;
> +
> + /* Don't allow IOVA address wrap */
> + if (iova + size - 1 < iova)
> + return -EINVAL;
> +
> + mutex_lock(&iommu->lock);
> +
> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
> + ret = -EEXIST;
> + goto unlock;
> + }
> +
> + dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> + if (!dma) {
> + ret = -ENOMEM;
> + goto unlock;
> + }
> +
> + dma->iova = iova;
> + dma->size = size;
> + dma->type = VFIO_IOVA_RESERVED;
[oops, forgot to send this reply with the others]
I'm tempted to suggest we set type explicitly in the USER case too just
to make that abundantly clear rather than taking advantage of the
kzalloc struct.
> +
> + ret = vfio_set_msi_aperture(iommu, iova, size);
> + if (ret)
> + goto free_unlock;
> +
> + vfio_link_dma(iommu, dma);
> + goto unlock;
> +
> +free_unlock:
> + kfree(dma);
> +unlock:
> + mutex_unlock(&iommu->lock);
> + return ret;
> +}
> +
> static int vfio_bus_type(struct device *dev, void *data)
> {
> struct bus_type **bus = data;
> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
> struct vfio_iommu_type1_dma_map map;
> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
> - VFIO_DMA_MAP_FLAG_WRITE;
> + VFIO_DMA_MAP_FLAG_WRITE |
> + VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>
> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>
> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> if (map.argsz < minsz || map.flags & ~mask)
> return -EINVAL;
>
> + if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
> + return vfio_register_msi_range(iommu, &map);
> +
> return vfio_dma_do_map(iommu, &map);
>
> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 255a211..4a9dbc2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
> *
> * Map process virtual addresses to IO virtual addresses using the
> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
> + *
> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
> + * IOVA region that will be used on some platforms to map the host MSI frames.
> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> + * IOVA region stays until the container is closed.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
> __u32 flags;
> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
> +/* reserved iova for MSI vectors*/
> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
> __u64 vaddr; /* Process virtual address */
> __u64 iova; /* IO virtual address */
> __u64 size; /* Size of mapping (bytes) */
> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
> * Caller sets argsz. The actual unmapped size is returned in the size
> * field. No guarantee is made to the user that arbitrary unmaps of iova
> * or size different from those used in the original mapping call will
> - * succeed.
> + * succeed. Once registered, an MSI region cannot be unmapped and stays
> + * until the container is closed.
> */
> struct vfio_iommu_type1_dma_unmap {
> __u32 argsz;
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 4/7] vfio: allow reserved msi iova registration
@ 2016-05-10 15:29 ` Alex Williamson
0 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-10 15:29 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 4 May 2016 11:54:15 +0000
Eric Auger <eric.auger@linaro.org> wrote:
> The user is allowed to register a reserved MSI IOVA range by using the
> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
> This region is stored in the vfio_dma rb tree. At that point the iova
> range is not mapped to any target address yet. The host kernel will use
> those iova when needed, typically when MSIs are allocated.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>
> ---
> v7 -> v8:
> - use iommu_msi_set_aperture function. There is no notion of
> unregistration anymore since the reserved msi slot remains
> until the container gets closed.
>
> v6 -> v7:
> - use iommu_free_reserved_iova_domain
> - convey prot attributes downto dma-reserved-iommu iova domain creation
> - reserved bindings teardown now performed on iommu domain destruction
> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
> - change title
> - pass the protection attribute to dma-reserved-iommu API
>
> v3 -> v4:
> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
> - protect vfio_register_reserved_iova_range implementation with
> CONFIG_IOMMU_DMA_RESERVED
> - handle unregistration by user-space and on vfio_iommu_type1 release
>
> v1 -> v2:
> - set returned value according to alloc_reserved_iova_domain result
> - free the iova domains in case any error occurs
>
> RFC v1 -> v1:
> - takes into account Alex comments, based on
> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
> - use the existing dma map/unmap ioctl interface with a flag to register
> a reserved IOVA range. A single reserved iova region is allowed.
> ---
> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> include/uapi/linux/vfio.h | 10 +++++-
> 2 files changed, 86 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 94a9916..4d3a6f1 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -36,6 +36,7 @@
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
> +#include <linux/msi-iommu.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> vfio_lock_acct(-unlocked);
> }
>
> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
> + dma_addr_t iova, size_t size)
> +{
> + struct vfio_domain *d;
> + int ret = 0;
> +
> + list_for_each_entry(d, &iommu->domain_list, next) {
> + ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
> + if (ret)
> + break;
> + }
> + return ret;
> +}
> +
> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> {
> vfio_unmap_unpin(iommu, dma);
> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> return ret;
> }
>
> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
> + struct vfio_iommu_type1_dma_map *map)
> +{
> + dma_addr_t iova = map->iova;
> + size_t size = map->size;
> + int ret = 0;
> + struct vfio_dma *dma;
> + unsigned long order;
> + uint64_t mask;
> +
> + /* Verify that none of our __u64 fields overflow */
> + if (map->size != size || map->iova != iova)
> + return -EINVAL;
> +
> + order = __ffs(vfio_pgsize_bitmap(iommu));
> + mask = ((uint64_t)1 << order) - 1;
> +
> + WARN_ON(mask & PAGE_MASK);
> +
> + if (!size || (size | iova) & mask)
> + return -EINVAL;
> +
> + /* Don't allow IOVA address wrap */
> + if (iova + size - 1 < iova)
> + return -EINVAL;
> +
> + mutex_lock(&iommu->lock);
> +
> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
> + ret = -EEXIST;
> + goto unlock;
> + }
> +
> + dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> + if (!dma) {
> + ret = -ENOMEM;
> + goto unlock;
> + }
> +
> + dma->iova = iova;
> + dma->size = size;
> + dma->type = VFIO_IOVA_RESERVED;
[oops, forgot to send this reply with the others]
I'm tempted to suggest we set type explicitly in the USER case too just
to make that abundantly clear rather than taking advantage of the
kzalloc struct.
> +
> + ret = vfio_set_msi_aperture(iommu, iova, size);
> + if (ret)
> + goto free_unlock;
> +
> + vfio_link_dma(iommu, dma);
> + goto unlock;
> +
> +free_unlock:
> + kfree(dma);
> +unlock:
> + mutex_unlock(&iommu->lock);
> + return ret;
> +}
> +
> static int vfio_bus_type(struct device *dev, void *data)
> {
> struct bus_type **bus = data;
> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
> struct vfio_iommu_type1_dma_map map;
> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
> - VFIO_DMA_MAP_FLAG_WRITE;
> + VFIO_DMA_MAP_FLAG_WRITE |
> + VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>
> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>
> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> if (map.argsz < minsz || map.flags & ~mask)
> return -EINVAL;
>
> + if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
> + return vfio_register_msi_range(iommu, &map);
> +
> return vfio_dma_do_map(iommu, &map);
>
> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 255a211..4a9dbc2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
> *
> * Map process virtual addresses to IO virtual addresses using the
> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
> + *
> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
> + * IOVA region that will be used on some platforms to map the host MSI frames.
> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> + * IOVA region stays until the container is closed.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
> __u32 flags;
> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
> +/* reserved iova for MSI vectors*/
> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
> __u64 vaddr; /* Process virtual address */
> __u64 iova; /* IO virtual address */
> __u64 size; /* Size of mapping (bytes) */
> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
> * Caller sets argsz. The actual unmapped size is returned in the size
> * field. No guarantee is made to the user that arbitrary unmaps of iova
> * or size different from those used in the original mapping call will
> - * succeed.
> + * succeed. Once registered, an MSI region cannot be unmapped and stays
> + * until the container is closed.
> */
> struct vfio_iommu_type1_dma_unmap {
> __u32 argsz;
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <20160510092951.67c2173e-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>]
* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
2016-05-10 15:29 ` Alex Williamson
(?)
@ 2016-05-10 15:34 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 15:34 UTC (permalink / raw)
To: Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Hi Alex,
On 05/10/2016 05:29 PM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:15 +0000
> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
>>
>> ---
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>> unregistration anymore since the reserved msi slot remains
>> until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>> CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>> a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>> include/uapi/linux/vfio.h | 10 +++++-
>> 2 files changed, 86 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 94a9916..4d3a6f1 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>> #include <linux/uaccess.h>
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> +#include <linux/msi-iommu.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> vfio_lock_acct(-unlocked);
>> }
>>
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> + dma_addr_t iova, size_t size)
>> +{
>> + struct vfio_domain *d;
>> + int ret = 0;
>> +
>> + list_for_each_entry(d, &iommu->domain_list, next) {
>> + ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>> + if (ret)
>> + break;
>> + }
>> + return ret;
>> +}
>> +
>> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> {
>> vfio_unmap_unpin(iommu, dma);
>> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>> return ret;
>> }
>>
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> + struct vfio_iommu_type1_dma_map *map)
>> +{
>> + dma_addr_t iova = map->iova;
>> + size_t size = map->size;
>> + int ret = 0;
>> + struct vfio_dma *dma;
>> + unsigned long order;
>> + uint64_t mask;
>> +
>> + /* Verify that none of our __u64 fields overflow */
>> + if (map->size != size || map->iova != iova)
>> + return -EINVAL;
>> +
>> + order = __ffs(vfio_pgsize_bitmap(iommu));
>> + mask = ((uint64_t)1 << order) - 1;
>> +
>> + WARN_ON(mask & PAGE_MASK);
>> +
>> + if (!size || (size | iova) & mask)
>> + return -EINVAL;
>> +
>> + /* Don't allow IOVA address wrap */
>> + if (iova + size - 1 < iova)
>> + return -EINVAL;
>> +
>> + mutex_lock(&iommu->lock);
>> +
>> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> + ret = -EEXIST;
>> + goto unlock;
>> + }
>> +
>> + dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> + if (!dma) {
>> + ret = -ENOMEM;
>> + goto unlock;
>> + }
>> +
>> + dma->iova = iova;
>> + dma->size = size;
>> + dma->type = VFIO_IOVA_RESERVED;
>
> [oops, forgot to send this reply with the others]
>
> I'm tempted to suggest we set type explicitly in the USER case too just
> to make that abundantly clear rather than taking advantage of the
> kzalloc struct.
no worries. OK I will set the dma type explicitly in vfio_dma_do_map too.
Thanks
Eric
>
>> +
>> + ret = vfio_set_msi_aperture(iommu, iova, size);
>> + if (ret)
>> + goto free_unlock;
>> +
>> + vfio_link_dma(iommu, dma);
>> + goto unlock;
>> +
>> +free_unlock:
>> + kfree(dma);
>> +unlock:
>> + mutex_unlock(&iommu->lock);
>> + return ret;
>> +}
>> +
>> static int vfio_bus_type(struct device *dev, void *data)
>> {
>> struct bus_type **bus = data;
>> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
>> struct vfio_iommu_type1_dma_map map;
>> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>> - VFIO_DMA_MAP_FLAG_WRITE;
>> + VFIO_DMA_MAP_FLAG_WRITE |
>> + VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>>
>> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>>
>> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> if (map.argsz < minsz || map.flags & ~mask)
>> return -EINVAL;
>>
>> + if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>> + return vfio_register_msi_range(iommu, &map);
>> +
>> return vfio_dma_do_map(iommu, &map);
>>
>> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 255a211..4a9dbc2 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>> *
>> * Map process virtual addresses to IO virtual addresses using the
>> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>> + *
>> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>> + * IOVA region that will be used on some platforms to map the host MSI frames.
>> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> + * IOVA region stays until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_map {
>> __u32 argsz;
>> __u32 flags;
>> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
>> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
>> +/* reserved iova for MSI vectors*/
>> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>> __u64 vaddr; /* Process virtual address */
>> __u64 iova; /* IO virtual address */
>> __u64 size; /* Size of mapping (bytes) */
>> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>> * Caller sets argsz. The actual unmapped size is returned in the size
>> * field. No guarantee is made to the user that arbitrary unmaps of iova
>> * or size different from those used in the original mapping call will
>> - * succeed.
>> + * succeed. Once registered, an MSI region cannot be unmapped and stays
>> + * until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_unmap {
>> __u32 argsz;
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
@ 2016-05-10 15:34 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 15:34 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
Hi Alex,
On 05/10/2016 05:29 PM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:15 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>>
>> ---
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>> unregistration anymore since the reserved msi slot remains
>> until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>> CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>> a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>> include/uapi/linux/vfio.h | 10 +++++-
>> 2 files changed, 86 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 94a9916..4d3a6f1 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>> #include <linux/uaccess.h>
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> +#include <linux/msi-iommu.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> vfio_lock_acct(-unlocked);
>> }
>>
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> + dma_addr_t iova, size_t size)
>> +{
>> + struct vfio_domain *d;
>> + int ret = 0;
>> +
>> + list_for_each_entry(d, &iommu->domain_list, next) {
>> + ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>> + if (ret)
>> + break;
>> + }
>> + return ret;
>> +}
>> +
>> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> {
>> vfio_unmap_unpin(iommu, dma);
>> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>> return ret;
>> }
>>
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> + struct vfio_iommu_type1_dma_map *map)
>> +{
>> + dma_addr_t iova = map->iova;
>> + size_t size = map->size;
>> + int ret = 0;
>> + struct vfio_dma *dma;
>> + unsigned long order;
>> + uint64_t mask;
>> +
>> + /* Verify that none of our __u64 fields overflow */
>> + if (map->size != size || map->iova != iova)
>> + return -EINVAL;
>> +
>> + order = __ffs(vfio_pgsize_bitmap(iommu));
>> + mask = ((uint64_t)1 << order) - 1;
>> +
>> + WARN_ON(mask & PAGE_MASK);
>> +
>> + if (!size || (size | iova) & mask)
>> + return -EINVAL;
>> +
>> + /* Don't allow IOVA address wrap */
>> + if (iova + size - 1 < iova)
>> + return -EINVAL;
>> +
>> + mutex_lock(&iommu->lock);
>> +
>> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> + ret = -EEXIST;
>> + goto unlock;
>> + }
>> +
>> + dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> + if (!dma) {
>> + ret = -ENOMEM;
>> + goto unlock;
>> + }
>> +
>> + dma->iova = iova;
>> + dma->size = size;
>> + dma->type = VFIO_IOVA_RESERVED;
>
> [oops, forgot to send this reply with the others]
>
> I'm tempted to suggest we set type explicitly in the USER case too just
> to make that abundantly clear rather than taking advantage of the
> kzalloc struct.
no worries. OK I will set the dma type explicitly in vfio_dma_do_map too.
Thanks
Eric
>
>> +
>> + ret = vfio_set_msi_aperture(iommu, iova, size);
>> + if (ret)
>> + goto free_unlock;
>> +
>> + vfio_link_dma(iommu, dma);
>> + goto unlock;
>> +
>> +free_unlock:
>> + kfree(dma);
>> +unlock:
>> + mutex_unlock(&iommu->lock);
>> + return ret;
>> +}
>> +
>> static int vfio_bus_type(struct device *dev, void *data)
>> {
>> struct bus_type **bus = data;
>> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
>> struct vfio_iommu_type1_dma_map map;
>> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>> - VFIO_DMA_MAP_FLAG_WRITE;
>> + VFIO_DMA_MAP_FLAG_WRITE |
>> + VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>>
>> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>>
>> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> if (map.argsz < minsz || map.flags & ~mask)
>> return -EINVAL;
>>
>> + if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>> + return vfio_register_msi_range(iommu, &map);
>> +
>> return vfio_dma_do_map(iommu, &map);
>>
>> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 255a211..4a9dbc2 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>> *
>> * Map process virtual addresses to IO virtual addresses using the
>> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>> + *
>> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>> + * IOVA region that will be used on some platforms to map the host MSI frames.
>> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> + * IOVA region stays until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_map {
>> __u32 argsz;
>> __u32 flags;
>> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
>> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
>> +/* reserved iova for MSI vectors*/
>> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>> __u64 vaddr; /* Process virtual address */
>> __u64 iova; /* IO virtual address */
>> __u64 size; /* Size of mapping (bytes) */
>> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>> * Caller sets argsz. The actual unmapped size is returned in the size
>> * field. No guarantee is made to the user that arbitrary unmaps of iova
>> * or size different from those used in the original mapping call will
>> - * succeed.
>> + * succeed. Once registered, an MSI region cannot be unmapped and stays
>> + * until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_unmap {
>> __u32 argsz;
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 4/7] vfio: allow reserved msi iova registration
@ 2016-05-10 15:34 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 15:34 UTC (permalink / raw)
To: linux-arm-kernel
Hi Alex,
On 05/10/2016 05:29 PM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:15 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>>
>> ---
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>> unregistration anymore since the reserved msi slot remains
>> until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>> CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>> [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>> a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>> include/uapi/linux/vfio.h | 10 +++++-
>> 2 files changed, 86 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 94a9916..4d3a6f1 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>> #include <linux/uaccess.h>
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> +#include <linux/msi-iommu.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> vfio_lock_acct(-unlocked);
>> }
>>
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> + dma_addr_t iova, size_t size)
>> +{
>> + struct vfio_domain *d;
>> + int ret = 0;
>> +
>> + list_for_each_entry(d, &iommu->domain_list, next) {
>> + ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>> + if (ret)
>> + break;
>> + }
>> + return ret;
>> +}
>> +
>> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> {
>> vfio_unmap_unpin(iommu, dma);
>> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>> return ret;
>> }
>>
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> + struct vfio_iommu_type1_dma_map *map)
>> +{
>> + dma_addr_t iova = map->iova;
>> + size_t size = map->size;
>> + int ret = 0;
>> + struct vfio_dma *dma;
>> + unsigned long order;
>> + uint64_t mask;
>> +
>> + /* Verify that none of our __u64 fields overflow */
>> + if (map->size != size || map->iova != iova)
>> + return -EINVAL;
>> +
>> + order = __ffs(vfio_pgsize_bitmap(iommu));
>> + mask = ((uint64_t)1 << order) - 1;
>> +
>> + WARN_ON(mask & PAGE_MASK);
>> +
>> + if (!size || (size | iova) & mask)
>> + return -EINVAL;
>> +
>> + /* Don't allow IOVA address wrap */
>> + if (iova + size - 1 < iova)
>> + return -EINVAL;
>> +
>> + mutex_lock(&iommu->lock);
>> +
>> + if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> + ret = -EEXIST;
>> + goto unlock;
>> + }
>> +
>> + dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> + if (!dma) {
>> + ret = -ENOMEM;
>> + goto unlock;
>> + }
>> +
>> + dma->iova = iova;
>> + dma->size = size;
>> + dma->type = VFIO_IOVA_RESERVED;
>
> [oops, forgot to send this reply with the others]
>
> I'm tempted to suggest we set type explicitly in the USER case too just
> to make that abundantly clear rather than taking advantage of the
> kzalloc struct.
no worries. OK I will set the dma type explicitly in vfio_dma_do_map too.
Thanks
Eric
>
>> +
>> + ret = vfio_set_msi_aperture(iommu, iova, size);
>> + if (ret)
>> + goto free_unlock;
>> +
>> + vfio_link_dma(iommu, dma);
>> + goto unlock;
>> +
>> +free_unlock:
>> + kfree(dma);
>> +unlock:
>> + mutex_unlock(&iommu->lock);
>> + return ret;
>> +}
>> +
>> static int vfio_bus_type(struct device *dev, void *data)
>> {
>> struct bus_type **bus = data;
>> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> } else if (cmd == VFIO_IOMMU_MAP_DMA) {
>> struct vfio_iommu_type1_dma_map map;
>> uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>> - VFIO_DMA_MAP_FLAG_WRITE;
>> + VFIO_DMA_MAP_FLAG_WRITE |
>> + VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>>
>> minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>>
>> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> if (map.argsz < minsz || map.flags & ~mask)
>> return -EINVAL;
>>
>> + if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>> + return vfio_register_msi_range(iommu, &map);
>> +
>> return vfio_dma_do_map(iommu, &map);
>>
>> } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 255a211..4a9dbc2 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>> *
>> * Map process virtual addresses to IO virtual addresses using the
>> * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>> + *
>> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>> + * IOVA region that will be used on some platforms to map the host MSI frames.
>> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> + * IOVA region stays until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_map {
>> __u32 argsz;
>> __u32 flags;
>> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
>> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
>> +/* reserved iova for MSI vectors*/
>> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>> __u64 vaddr; /* Process virtual address */
>> __u64 iova; /* IO virtual address */
>> __u64 size; /* Size of mapping (bytes) */
>> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>> * Caller sets argsz. The actual unmapped size is returned in the size
>> * field. No guarantee is made to the user that arbitrary unmaps of iova
>> * or size different from those used in the original mapping call will
>> - * succeed.
>> + * succeed. Once registered, an MSI region cannot be unmapped and stays
>> + * until the container is closed.
>> */
>> struct vfio_iommu_type1_dma_unmap {
>> __u32 argsz;
>
^ permalink raw reply [flat|nested] 111+ messages in thread
* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-04 11:54 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger-qxv4g6HH51o, eric.auger-QSEj5FYQhm4dnm+yROfE0A,
robin.murphy-5wv7dgnIgG8, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A,
tglx-hfZtesqFncYOwBW4kG4KsQ, jason-NLaQJdtUoK4Be96aLqz0jA,
marc.zyngier-5wv7dgnIgG8, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Cc: julien.grall-5wv7dgnIgG8, patches-QSEj5FYQhm4dnm+yROfE0A,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
by the msi controller. vfio_safe_irq_domain allows to check whether
interrupts are "safe" for a given device. They are if the device does
not use MSI or if the device uses MSI and the msi-parent controller
supports IRQ remapping.
Then we check at group level if all devices have safe interrupts: if not,
we only allow the group to be attached if allow_unsafe_interrupts is set.
At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
changed in next patch.
Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
v3 -> v4:
- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
and irq_remapping into safe_irq_domains
v2 -> v3:
- protect vfio_msi_parent_irq_remapping_capable with
CONFIG_GENERIC_MSI_IRQ_DOMAIN
---
drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4d3a6f1..2fc8197 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,8 @@
#include <linux/vfio.h>
#include <linux/workqueue.h>
#include <linux/msi-iommu.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
@@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
return 0;
}
+/**
+ * vfio_safe_irq_domain: returns whether the irq domain
+ * the device is attached to is safe with respect to MSI isolation.
+ * If the irq domain is not an MSI domain, we return it is safe.
+ *
+ * @dev: device handle
+ * @data: unused
+ * returns 0 if the irq domain is safe, -1 if not.
+ */
+static int vfio_safe_irq_domain(struct device *dev, void *data)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+ struct irq_domain *domain;
+ struct msi_domain_info *info;
+
+ domain = dev_get_msi_domain(dev);
+ if (!domain)
+ return 0;
+
+ info = msi_get_domain_info(domain);
+
+ if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
+ return -1;
+#endif
+ return 0;
+}
+
static int vfio_iommu_replay(struct vfio_iommu *iommu,
struct vfio_domain *domain)
{
@@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
struct vfio_group *group, *g;
struct vfio_domain *domain, *d;
struct bus_type *bus = NULL;
- int ret;
+ int ret, safe_irq_domains;
mutex_lock(&iommu->lock);
@@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
group->iommu_group = iommu_group;
+ /*
+ * Determine if all the devices of the group have a safe irq domain
+ * with respect to MSI isolation
+ */
+ safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
+ vfio_safe_irq_domain);
+
/* Determine bus_type in order to allocate a domain */
ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
if (ret)
@@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
INIT_LIST_HEAD(&domain->group_list);
list_add(&group->next, &domain->group_list);
+ /*
+ * to advertise safe interrupts either the IOMMU or the MSI controllers
+ * must support IRQ remapping/interrupt translation
+ */
if (!allow_unsafe_interrupts &&
- !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
+ (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
__func__);
ret = -EPERM;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger, eric.auger, robin.murphy, alex.williamson,
will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
linux-arm-kernel
Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday
On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
by the msi controller. vfio_safe_irq_domain allows to check whether
interrupts are "safe" for a given device. They are if the device does
not use MSI or if the device uses MSI and the msi-parent controller
supports IRQ remapping.
Then we check at group level if all devices have safe interrupts: if not,
we only allow the group to be attached if allow_unsafe_interrupts is set.
At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
changed in next patch.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
v3 -> v4:
- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
and irq_remapping into safe_irq_domains
v2 -> v3:
- protect vfio_msi_parent_irq_remapping_capable with
CONFIG_GENERIC_MSI_IRQ_DOMAIN
---
drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4d3a6f1..2fc8197 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,8 @@
#include <linux/vfio.h>
#include <linux/workqueue.h>
#include <linux/msi-iommu.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
@@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
return 0;
}
+/**
+ * vfio_safe_irq_domain: returns whether the irq domain
+ * the device is attached to is safe with respect to MSI isolation.
+ * If the irq domain is not an MSI domain, we return it is safe.
+ *
+ * @dev: device handle
+ * @data: unused
+ * returns 0 if the irq domain is safe, -1 if not.
+ */
+static int vfio_safe_irq_domain(struct device *dev, void *data)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+ struct irq_domain *domain;
+ struct msi_domain_info *info;
+
+ domain = dev_get_msi_domain(dev);
+ if (!domain)
+ return 0;
+
+ info = msi_get_domain_info(domain);
+
+ if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
+ return -1;
+#endif
+ return 0;
+}
+
static int vfio_iommu_replay(struct vfio_iommu *iommu,
struct vfio_domain *domain)
{
@@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
struct vfio_group *group, *g;
struct vfio_domain *domain, *d;
struct bus_type *bus = NULL;
- int ret;
+ int ret, safe_irq_domains;
mutex_lock(&iommu->lock);
@@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
group->iommu_group = iommu_group;
+ /*
+ * Determine if all the devices of the group have a safe irq domain
+ * with respect to MSI isolation
+ */
+ safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
+ vfio_safe_irq_domain);
+
/* Determine bus_type in order to allocate a domain */
ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
if (ret)
@@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
INIT_LIST_HEAD(&domain->group_list);
list_add(&group->next, &domain->group_list);
+ /*
+ * to advertise safe interrupts either the IOMMU or the MSI controllers
+ * must support IRQ remapping/interrupt translation
+ */
if (!allow_unsafe_interrupts &&
- !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
+ (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
__func__);
ret = -EPERM;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: linux-arm-kernel
On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
by the msi controller. vfio_safe_irq_domain allows to check whether
interrupts are "safe" for a given device. They are if the device does
not use MSI or if the device uses MSI and the msi-parent controller
supports IRQ remapping.
Then we check at group level if all devices have safe interrupts: if not,
we only allow the group to be attached if allow_unsafe_interrupts is set.
At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
changed in next patch.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
v3 -> v4:
- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
and irq_remapping into safe_irq_domains
v2 -> v3:
- protect vfio_msi_parent_irq_remapping_capable with
CONFIG_GENERIC_MSI_IRQ_DOMAIN
---
drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4d3a6f1..2fc8197 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,8 @@
#include <linux/vfio.h>
#include <linux/workqueue.h>
#include <linux/msi-iommu.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
@@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
return 0;
}
+/**
+ * vfio_safe_irq_domain: returns whether the irq domain
+ * the device is attached to is safe with respect to MSI isolation.
+ * If the irq domain is not an MSI domain, we return it is safe.
+ *
+ * @dev: device handle
+ * @data: unused
+ * returns 0 if the irq domain is safe, -1 if not.
+ */
+static int vfio_safe_irq_domain(struct device *dev, void *data)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+ struct irq_domain *domain;
+ struct msi_domain_info *info;
+
+ domain = dev_get_msi_domain(dev);
+ if (!domain)
+ return 0;
+
+ info = msi_get_domain_info(domain);
+
+ if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
+ return -1;
+#endif
+ return 0;
+}
+
static int vfio_iommu_replay(struct vfio_iommu *iommu,
struct vfio_domain *domain)
{
@@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
struct vfio_group *group, *g;
struct vfio_domain *domain, *d;
struct bus_type *bus = NULL;
- int ret;
+ int ret, safe_irq_domains;
mutex_lock(&iommu->lock);
@@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
group->iommu_group = iommu_group;
+ /*
+ * Determine if all the devices of the group have a safe irq domain
+ * with respect to MSI isolation
+ */
+ safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
+ vfio_safe_irq_domain);
+
/* Determine bus_type in order to allocate a domain */
ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
if (ret)
@@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
INIT_LIST_HEAD(&domain->group_list);
list_add(&group->next, &domain->group_list);
+ /*
+ * to advertise safe interrupts either the IOMMU or the MSI controllers
+ * must support IRQ remapping/interrupt translation
+ */
if (!allow_unsafe_interrupts &&
- !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
+ (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
__func__);
ret = -EPERM;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread[parent not found: <1462362858-2925-6-git-send-email-eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-05 19:23 ` Chalamarla, Tirumalesh
-1 siblings, 0 replies; 111+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-05-05 19:23 UTC (permalink / raw)
To: Eric Auger, eric.auger-qxv4g6HH51o@public.gmane.org,
robin.murphy-5wv7dgnIgG8@public.gmane.org,
alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
will.deacon-5wv7dgnIgG8@public.gmane.org,
joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org,
tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org,
jason-NLaQJdtUoK4Be96aLqz0jA@public.gmane.org,
marc.zyngier-5wv7dgnIgG8@public.gmane.org,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ@public.gmane.org,
patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
julien.grall-5wv7dgnIgG8@public.gmane.org,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org on behalf of eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>by the msi controller. vfio_safe_irq_domain allows to check whether
>interrupts are "safe" for a given device. They are if the device does
>not use MSI or if the device uses MSI and the msi-parent controller
>supports IRQ remapping.
>
>Then we check at group level if all devices have safe interrupts: if not,
>we only allow the group to be attached if allow_unsafe_interrupts is set.
>
>At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>changed in next patch.
Will this work in systems with multiple ITS?
>
>Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>
>---
>v3 -> v4:
>- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
> and irq_remapping into safe_irq_domains
>
>v2 -> v3:
>- protect vfio_msi_parent_irq_remapping_capable with
> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>---
> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 42 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 4d3a6f1..2fc8197 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -37,6 +37,8 @@
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
> #include <linux/msi-iommu.h>
>+#include <linux/irqdomain.h>
>+#include <linux/msi.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>@@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
> return 0;
> }
>
>+/**
>+ * vfio_safe_irq_domain: returns whether the irq domain
>+ * the device is attached to is safe with respect to MSI isolation.
>+ * If the irq domain is not an MSI domain, we return it is safe.
>+ *
>+ * @dev: device handle
>+ * @data: unused
>+ * returns 0 if the irq domain is safe, -1 if not.
>+ */
>+static int vfio_safe_irq_domain(struct device *dev, void *data)
>+{
>+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>+ struct irq_domain *domain;
>+ struct msi_domain_info *info;
>+
>+ domain = dev_get_msi_domain(dev);
>+ if (!domain)
>+ return 0;
>+
>+ info = msi_get_domain_info(domain);
>+
>+ if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>+ return -1;
>+#endif
>+ return 0;
>+}
>+
> static int vfio_iommu_replay(struct vfio_iommu *iommu,
> struct vfio_domain *domain)
> {
>@@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct vfio_group *group, *g;
> struct vfio_domain *domain, *d;
> struct bus_type *bus = NULL;
>- int ret;
>+ int ret, safe_irq_domains;
>
> mutex_lock(&iommu->lock);
>
>@@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>
> group->iommu_group = iommu_group;
>
>+ /*
>+ * Determine if all the devices of the group have a safe irq domain
>+ * with respect to MSI isolation
>+ */
>+ safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>+ vfio_safe_irq_domain);
>+
> /* Determine bus_type in order to allocate a domain */
> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
> if (ret)
>@@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> INIT_LIST_HEAD(&domain->group_list);
> list_add(&group->next, &domain->group_list);
>
>+ /*
>+ * to advertise safe interrupts either the IOMMU or the MSI controllers
>+ * must support IRQ remapping/interrupt translation
>+ */
> if (!allow_unsafe_interrupts &&
>- !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>+ (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
> __func__);
> ret = -EPERM;
>--
>1.9.1
>
>
>_______________________________________________
>linux-arm-kernel mailing list
>linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-05 19:23 ` Chalamarla, Tirumalesh
0 siblings, 0 replies; 111+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-05-05 19:23 UTC (permalink / raw)
To: Eric Auger, eric.auger@st.com, robin.murphy@arm.com,
alex.williamson@redhat.com, will.deacon@arm.com, joro@8bytes.org,
tglx@linutronix.de, jason@lakedaemon.net, marc.zyngier@arm.com,
christoffer.dall@linaro.org, linux-arm-kernel@lists.infradead.org
Cc: julien.grall@arm.com, patches@linaro.org,
Jean-Philippe.Brucker@arm.com, p.fedin@samsung.com,
linux-kernel@vger.kernel.org, Bharat.Bhushan@freescale.com,
iommu@lists.linux-foundation.org, pranav.sawargaonkar@gmail.com,
yehuday@marvell.com
On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces@lists.infradead.org on behalf of eric.auger@linaro.org> wrote:
>On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>by the msi controller. vfio_safe_irq_domain allows to check whether
>interrupts are "safe" for a given device. They are if the device does
>not use MSI or if the device uses MSI and the msi-parent controller
>supports IRQ remapping.
>
>Then we check at group level if all devices have safe interrupts: if not,
>we only allow the group to be attached if allow_unsafe_interrupts is set.
>
>At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>changed in next patch.
Will this work in systems with multiple ITS?
>
>Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
>---
>v3 -> v4:
>- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
> and irq_remapping into safe_irq_domains
>
>v2 -> v3:
>- protect vfio_msi_parent_irq_remapping_capable with
> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>---
> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 42 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 4d3a6f1..2fc8197 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -37,6 +37,8 @@
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
> #include <linux/msi-iommu.h>
>+#include <linux/irqdomain.h>
>+#include <linux/msi.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>@@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
> return 0;
> }
>
>+/**
>+ * vfio_safe_irq_domain: returns whether the irq domain
>+ * the device is attached to is safe with respect to MSI isolation.
>+ * If the irq domain is not an MSI domain, we return it is safe.
>+ *
>+ * @dev: device handle
>+ * @data: unused
>+ * returns 0 if the irq domain is safe, -1 if not.
>+ */
>+static int vfio_safe_irq_domain(struct device *dev, void *data)
>+{
>+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>+ struct irq_domain *domain;
>+ struct msi_domain_info *info;
>+
>+ domain = dev_get_msi_domain(dev);
>+ if (!domain)
>+ return 0;
>+
>+ info = msi_get_domain_info(domain);
>+
>+ if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>+ return -1;
>+#endif
>+ return 0;
>+}
>+
> static int vfio_iommu_replay(struct vfio_iommu *iommu,
> struct vfio_domain *domain)
> {
>@@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct vfio_group *group, *g;
> struct vfio_domain *domain, *d;
> struct bus_type *bus = NULL;
>- int ret;
>+ int ret, safe_irq_domains;
>
> mutex_lock(&iommu->lock);
>
>@@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>
> group->iommu_group = iommu_group;
>
>+ /*
>+ * Determine if all the devices of the group have a safe irq domain
>+ * with respect to MSI isolation
>+ */
>+ safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>+ vfio_safe_irq_domain);
>+
> /* Determine bus_type in order to allocate a domain */
> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
> if (ret)
>@@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> INIT_LIST_HEAD(&domain->group_list);
> list_add(&group->next, &domain->group_list);
>
>+ /*
>+ * to advertise safe interrupts either the IOMMU or the MSI controllers
>+ * must support IRQ remapping/interrupt translation
>+ */
> if (!allow_unsafe_interrupts &&
>- !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>+ (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
> __func__);
> ret = -EPERM;
>--
>1.9.1
>
>
>_______________________________________________
>linux-arm-kernel mailing list
>linux-arm-kernel@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-05 19:23 ` Chalamarla, Tirumalesh
0 siblings, 0 replies; 111+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-05-05 19:23 UTC (permalink / raw)
To: linux-arm-kernel
On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces at lists.infradead.org on behalf of eric.auger@linaro.org> wrote:
>On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>by the msi controller. vfio_safe_irq_domain allows to check whether
>interrupts are "safe" for a given device. They are if the device does
>not use MSI or if the device uses MSI and the msi-parent controller
>supports IRQ remapping.
>
>Then we check at group level if all devices have safe interrupts: if not,
>we only allow the group to be attached if allow_unsafe_interrupts is set.
>
>At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>changed in next patch.
Will this work in systems with multiple ITS?
>
>Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
>---
>v3 -> v4:
>- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
> and irq_remapping into safe_irq_domains
>
>v2 -> v3:
>- protect vfio_msi_parent_irq_remapping_capable with
> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>---
> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 42 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 4d3a6f1..2fc8197 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -37,6 +37,8 @@
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
> #include <linux/msi-iommu.h>
>+#include <linux/irqdomain.h>
>+#include <linux/msi.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>@@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
> return 0;
> }
>
>+/**
>+ * vfio_safe_irq_domain: returns whether the irq domain
>+ * the device is attached to is safe with respect to MSI isolation.
>+ * If the irq domain is not an MSI domain, we return it is safe.
>+ *
>+ * @dev: device handle
>+ * @data: unused
>+ * returns 0 if the irq domain is safe, -1 if not.
>+ */
>+static int vfio_safe_irq_domain(struct device *dev, void *data)
>+{
>+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>+ struct irq_domain *domain;
>+ struct msi_domain_info *info;
>+
>+ domain = dev_get_msi_domain(dev);
>+ if (!domain)
>+ return 0;
>+
>+ info = msi_get_domain_info(domain);
>+
>+ if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>+ return -1;
>+#endif
>+ return 0;
>+}
>+
> static int vfio_iommu_replay(struct vfio_iommu *iommu,
> struct vfio_domain *domain)
> {
>@@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct vfio_group *group, *g;
> struct vfio_domain *domain, *d;
> struct bus_type *bus = NULL;
>- int ret;
>+ int ret, safe_irq_domains;
>
> mutex_lock(&iommu->lock);
>
>@@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>
> group->iommu_group = iommu_group;
>
>+ /*
>+ * Determine if all the devices of the group have a safe irq domain
>+ * with respect to MSI isolation
>+ */
>+ safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>+ vfio_safe_irq_domain);
>+
> /* Determine bus_type in order to allocate a domain */
> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
> if (ret)
>@@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> INIT_LIST_HEAD(&domain->group_list);
> list_add(&group->next, &domain->group_list);
>
>+ /*
>+ * to advertise safe interrupts either the IOMMU or the MSI controllers
>+ * must support IRQ remapping/interrupt translation
>+ */
> if (!allow_unsafe_interrupts &&
>- !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>+ (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
> __func__);
> ret = -EPERM;
>--
>1.9.1
>
>
>_______________________________________________
>linux-arm-kernel mailing list
>linux-arm-kernel at lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <FFD9D7C5-B6EF-4E16-A209-BED31DAB26AA-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
2016-05-05 19:23 ` Chalamarla, Tirumalesh
(?)
@ 2016-05-09 8:05 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-09 8:05 UTC (permalink / raw)
To: Chalamarla, Tirumalesh, eric.auger-qxv4g6HH51o@public.gmane.org,
robin.murphy-5wv7dgnIgG8@public.gmane.org,
alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
will.deacon-5wv7dgnIgG8@public.gmane.org,
joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org,
tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org,
jason-NLaQJdtUoK4Be96aLqz0jA@public.gmane.org,
marc.zyngier-5wv7dgnIgG8@public.gmane.org,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ@public.gmane.org,
patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
julien.grall-5wv7dgnIgG8@public.gmane.org,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Hi Chalamarla,
On 05/05/2016 09:23 PM, Chalamarla, Tirumalesh wrote:
>
>
>
>
>
> On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org on behalf of eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>
>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>> by the msi controller. vfio_safe_irq_domain allows to check whether
>> interrupts are "safe" for a given device. They are if the device does
>> not use MSI or if the device uses MSI and the msi-parent controller
>> supports IRQ remapping.
>>
>> Then we check at group level if all devices have safe interrupts: if not,
>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>
>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>> changed in next patch.
>
> Will this work in systems with multiple ITS?
Yes it should support multiple ITS.
Please note however that the series does not yet implement the
msi_doorbell_info callback in GICv3 ITS PCIe/platform irqchip. Without
that, the pci_enable is going to fail. I am going to submit a separate
RFC for that but I can't test it at the moment.
Best Regards
Eric
>>
>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>
>> ---
>> v3 -> v4:
>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>> and irq_remapping into safe_irq_domains
>>
>> v2 -> v3:
>> - protect vfio_msi_parent_irq_remapping_capable with
>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 4d3a6f1..2fc8197 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -37,6 +37,8 @@
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> #include <linux/msi-iommu.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/msi.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>> return 0;
>> }
>>
>> +/**
>> + * vfio_safe_irq_domain: returns whether the irq domain
>> + * the device is attached to is safe with respect to MSI isolation.
>> + * If the irq domain is not an MSI domain, we return it is safe.
>> + *
>> + * @dev: device handle
>> + * @data: unused
>> + * returns 0 if the irq domain is safe, -1 if not.
>> + */
>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>> +{
>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> + struct irq_domain *domain;
>> + struct msi_domain_info *info;
>> +
>> + domain = dev_get_msi_domain(dev);
>> + if (!domain)
>> + return 0;
>> +
>> + info = msi_get_domain_info(domain);
>> +
>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>> + return -1;
>> +#endif
>> + return 0;
>> +}
>> +
>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>> struct vfio_domain *domain)
>> {
>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> struct vfio_group *group, *g;
>> struct vfio_domain *domain, *d;
>> struct bus_type *bus = NULL;
>> - int ret;
>> + int ret, safe_irq_domains;
>>
>> mutex_lock(&iommu->lock);
>>
>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>
>> group->iommu_group = iommu_group;
>>
>> + /*
>> + * Determine if all the devices of the group have a safe irq domain
>> + * with respect to MSI isolation
>> + */
>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>> + vfio_safe_irq_domain);
>> +
>> /* Determine bus_type in order to allocate a domain */
>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>> if (ret)
>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> INIT_LIST_HEAD(&domain->group_list);
>> list_add(&group->next, &domain->group_list);
>>
>> + /*
>> + * to advertise safe interrupts either the IOMMU or the MSI controllers
>> + * must support IRQ remapping/interrupt translation
>> + */
>> if (!allow_unsafe_interrupts &&
>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>> __func__);
>> ret = -EPERM;
>> --
>> 1.9.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-09 8:05 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-09 8:05 UTC (permalink / raw)
To: Chalamarla, Tirumalesh, eric.auger@st.com, robin.murphy@arm.com,
alex.williamson@redhat.com, will.deacon@arm.com, joro@8bytes.org,
tglx@linutronix.de, jason@lakedaemon.net, marc.zyngier@arm.com,
christoffer.dall@linaro.org, linux-arm-kernel@lists.infradead.org
Cc: julien.grall@arm.com, patches@linaro.org,
Jean-Philippe.Brucker@arm.com, p.fedin@samsung.com,
linux-kernel@vger.kernel.org, Bharat.Bhushan@freescale.com,
iommu@lists.linux-foundation.org, pranav.sawargaonkar@gmail.com,
yehuday@marvell.com
Hi Chalamarla,
On 05/05/2016 09:23 PM, Chalamarla, Tirumalesh wrote:
>
>
>
>
>
> On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces@lists.infradead.org on behalf of eric.auger@linaro.org> wrote:
>
>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>> by the msi controller. vfio_safe_irq_domain allows to check whether
>> interrupts are "safe" for a given device. They are if the device does
>> not use MSI or if the device uses MSI and the msi-parent controller
>> supports IRQ remapping.
>>
>> Then we check at group level if all devices have safe interrupts: if not,
>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>
>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>> changed in next patch.
>
> Will this work in systems with multiple ITS?
Yes it should support multiple ITS.
Please note however that the series does not yet implement the
msi_doorbell_info callback in GICv3 ITS PCIe/platform irqchip. Without
that, the pci_enable is going to fail. I am going to submit a separate
RFC for that but I can't test it at the moment.
Best Regards
Eric
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v3 -> v4:
>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>> and irq_remapping into safe_irq_domains
>>
>> v2 -> v3:
>> - protect vfio_msi_parent_irq_remapping_capable with
>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 4d3a6f1..2fc8197 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -37,6 +37,8 @@
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> #include <linux/msi-iommu.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/msi.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>> return 0;
>> }
>>
>> +/**
>> + * vfio_safe_irq_domain: returns whether the irq domain
>> + * the device is attached to is safe with respect to MSI isolation.
>> + * If the irq domain is not an MSI domain, we return it is safe.
>> + *
>> + * @dev: device handle
>> + * @data: unused
>> + * returns 0 if the irq domain is safe, -1 if not.
>> + */
>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>> +{
>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> + struct irq_domain *domain;
>> + struct msi_domain_info *info;
>> +
>> + domain = dev_get_msi_domain(dev);
>> + if (!domain)
>> + return 0;
>> +
>> + info = msi_get_domain_info(domain);
>> +
>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>> + return -1;
>> +#endif
>> + return 0;
>> +}
>> +
>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>> struct vfio_domain *domain)
>> {
>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> struct vfio_group *group, *g;
>> struct vfio_domain *domain, *d;
>> struct bus_type *bus = NULL;
>> - int ret;
>> + int ret, safe_irq_domains;
>>
>> mutex_lock(&iommu->lock);
>>
>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>
>> group->iommu_group = iommu_group;
>>
>> + /*
>> + * Determine if all the devices of the group have a safe irq domain
>> + * with respect to MSI isolation
>> + */
>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>> + vfio_safe_irq_domain);
>> +
>> /* Determine bus_type in order to allocate a domain */
>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>> if (ret)
>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> INIT_LIST_HEAD(&domain->group_list);
>> list_add(&group->next, &domain->group_list);
>>
>> + /*
>> + * to advertise safe interrupts either the IOMMU or the MSI controllers
>> + * must support IRQ remapping/interrupt translation
>> + */
>> if (!allow_unsafe_interrupts &&
>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>> __func__);
>> ret = -EPERM;
>> --
>> 1.9.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-09 8:05 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-09 8:05 UTC (permalink / raw)
To: linux-arm-kernel
Hi Chalamarla,
On 05/05/2016 09:23 PM, Chalamarla, Tirumalesh wrote:
>
>
>
>
>
> On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces at lists.infradead.org on behalf of eric.auger@linaro.org> wrote:
>
>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>> by the msi controller. vfio_safe_irq_domain allows to check whether
>> interrupts are "safe" for a given device. They are if the device does
>> not use MSI or if the device uses MSI and the msi-parent controller
>> supports IRQ remapping.
>>
>> Then we check at group level if all devices have safe interrupts: if not,
>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>
>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>> changed in next patch.
>
> Will this work in systems with multiple ITS?
Yes it should support multiple ITS.
Please note however that the series does not yet implement the
msi_doorbell_info callback in GICv3 ITS PCIe/platform irqchip. Without
that, the pci_enable is going to fail. I am going to submit a separate
RFC for that but I can't test it at the moment.
Best Regards
Eric
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v3 -> v4:
>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>> and irq_remapping into safe_irq_domains
>>
>> v2 -> v3:
>> - protect vfio_msi_parent_irq_remapping_capable with
>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 4d3a6f1..2fc8197 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -37,6 +37,8 @@
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> #include <linux/msi-iommu.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/msi.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>> return 0;
>> }
>>
>> +/**
>> + * vfio_safe_irq_domain: returns whether the irq domain
>> + * the device is attached to is safe with respect to MSI isolation.
>> + * If the irq domain is not an MSI domain, we return it is safe.
>> + *
>> + * @dev: device handle
>> + * @data: unused
>> + * returns 0 if the irq domain is safe, -1 if not.
>> + */
>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>> +{
>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> + struct irq_domain *domain;
>> + struct msi_domain_info *info;
>> +
>> + domain = dev_get_msi_domain(dev);
>> + if (!domain)
>> + return 0;
>> +
>> + info = msi_get_domain_info(domain);
>> +
>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>> + return -1;
>> +#endif
>> + return 0;
>> +}
>> +
>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>> struct vfio_domain *domain)
>> {
>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> struct vfio_group *group, *g;
>> struct vfio_domain *domain, *d;
>> struct bus_type *bus = NULL;
>> - int ret;
>> + int ret, safe_irq_domains;
>>
>> mutex_lock(&iommu->lock);
>>
>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>
>> group->iommu_group = iommu_group;
>>
>> + /*
>> + * Determine if all the devices of the group have a safe irq domain
>> + * with respect to MSI isolation
>> + */
>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>> + vfio_safe_irq_domain);
>> +
>> /* Determine bus_type in order to allocate a domain */
>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>> if (ret)
>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> INIT_LIST_HEAD(&domain->group_list);
>> list_add(&group->next, &domain->group_list);
>>
>> + /*
>> + * to advertise safe interrupts either the IOMMU or the MSI controllers
>> + * must support IRQ remapping/interrupt translation
>> + */
>> if (!allow_unsafe_interrupts &&
>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>> __func__);
>> ret = -EPERM;
>> --
>> 1.9.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 111+ messages in thread
* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-09 22:49 ` Alex Williamson
-1 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
To: Eric Auger
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
On Wed, 4 May 2016 11:54:16 +0000
Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
> by the msi controller. vfio_safe_irq_domain allows to check whether
> interrupts are "safe" for a given device. They are if the device does
> not use MSI
Are we sure we're not opening a security hole here? An MSI is simply a
DMA write, so really whether or not a device uses MSI is irrelevant.
If it can generate a DMA to the MSI doorbell then we need to be
protected and I think we pretty much need to assume that devices are
DMA capable. Do the MSI domain checks cover this?
> or if the device uses MSI and the msi-parent controller
> supports IRQ remapping.
>
> Then we check at group level if all devices have safe interrupts: if not,
> we only allow the group to be attached if allow_unsafe_interrupts is set.
>
> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
> changed in next patch.
>
> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>
> ---
> v3 -> v4:
> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
> and irq_remapping into safe_irq_domains
>
> v2 -> v3:
> - protect vfio_msi_parent_irq_remapping_capable with
> CONFIG_GENERIC_MSI_IRQ_DOMAIN
> ---
> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 42 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4d3a6f1..2fc8197 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -37,6 +37,8 @@
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
> #include <linux/msi-iommu.h>
> +#include <linux/irqdomain.h>
> +#include <linux/msi.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
> return 0;
> }
>
> +/**
> + * vfio_safe_irq_domain: returns whether the irq domain
> + * the device is attached to is safe with respect to MSI isolation.
> + * If the irq domain is not an MSI domain, we return it is safe.
> + *
> + * @dev: device handle
> + * @data: unused
> + * returns 0 if the irq domain is safe, -1 if not.
> + */
> +static int vfio_safe_irq_domain(struct device *dev, void *data)
> +{
> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
> + struct irq_domain *domain;
> + struct msi_domain_info *info;
> +
> + domain = dev_get_msi_domain(dev);
> + if (!domain)
> + return 0;
> +
> + info = msi_get_domain_info(domain);
> +
> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
> + return -1;
> +#endif
> + return 0;
> +}
> +
> static int vfio_iommu_replay(struct vfio_iommu *iommu,
> struct vfio_domain *domain)
> {
> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct vfio_group *group, *g;
> struct vfio_domain *domain, *d;
> struct bus_type *bus = NULL;
> - int ret;
> + int ret, safe_irq_domains;
>
> mutex_lock(&iommu->lock);
>
> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>
> group->iommu_group = iommu_group;
>
> + /*
> + * Determine if all the devices of the group have a safe irq domain
> + * with respect to MSI isolation
> + */
> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
> + vfio_safe_irq_domain);
> +
> /* Determine bus_type in order to allocate a domain */
> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
> if (ret)
> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> INIT_LIST_HEAD(&domain->group_list);
> list_add(&group->next, &domain->group_list);
>
> + /*
> + * to advertise safe interrupts either the IOMMU or the MSI controllers
> + * must support IRQ remapping/interrupt translation
> + */
> if (!allow_unsafe_interrupts &&
> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
> __func__);
> ret = -EPERM;
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-09 22:49 ` Alex Williamson
0 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
On Wed, 4 May 2016 11:54:16 +0000
Eric Auger <eric.auger@linaro.org> wrote:
> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
> by the msi controller. vfio_safe_irq_domain allows to check whether
> interrupts are "safe" for a given device. They are if the device does
> not use MSI
Are we sure we're not opening a security hole here? An MSI is simply a
DMA write, so really whether or not a device uses MSI is irrelevant.
If it can generate a DMA to the MSI doorbell then we need to be
protected and I think we pretty much need to assume that devices are
DMA capable. Do the MSI domain checks cover this?
> or if the device uses MSI and the msi-parent controller
> supports IRQ remapping.
>
> Then we check at group level if all devices have safe interrupts: if not,
> we only allow the group to be attached if allow_unsafe_interrupts is set.
>
> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
> changed in next patch.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> ---
> v3 -> v4:
> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
> and irq_remapping into safe_irq_domains
>
> v2 -> v3:
> - protect vfio_msi_parent_irq_remapping_capable with
> CONFIG_GENERIC_MSI_IRQ_DOMAIN
> ---
> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 42 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4d3a6f1..2fc8197 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -37,6 +37,8 @@
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
> #include <linux/msi-iommu.h>
> +#include <linux/irqdomain.h>
> +#include <linux/msi.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
> return 0;
> }
>
> +/**
> + * vfio_safe_irq_domain: returns whether the irq domain
> + * the device is attached to is safe with respect to MSI isolation.
> + * If the irq domain is not an MSI domain, we return it is safe.
> + *
> + * @dev: device handle
> + * @data: unused
> + * returns 0 if the irq domain is safe, -1 if not.
> + */
> +static int vfio_safe_irq_domain(struct device *dev, void *data)
> +{
> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
> + struct irq_domain *domain;
> + struct msi_domain_info *info;
> +
> + domain = dev_get_msi_domain(dev);
> + if (!domain)
> + return 0;
> +
> + info = msi_get_domain_info(domain);
> +
> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
> + return -1;
> +#endif
> + return 0;
> +}
> +
> static int vfio_iommu_replay(struct vfio_iommu *iommu,
> struct vfio_domain *domain)
> {
> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct vfio_group *group, *g;
> struct vfio_domain *domain, *d;
> struct bus_type *bus = NULL;
> - int ret;
> + int ret, safe_irq_domains;
>
> mutex_lock(&iommu->lock);
>
> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>
> group->iommu_group = iommu_group;
>
> + /*
> + * Determine if all the devices of the group have a safe irq domain
> + * with respect to MSI isolation
> + */
> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
> + vfio_safe_irq_domain);
> +
> /* Determine bus_type in order to allocate a domain */
> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
> if (ret)
> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> INIT_LIST_HEAD(&domain->group_list);
> list_add(&group->next, &domain->group_list);
>
> + /*
> + * to advertise safe interrupts either the IOMMU or the MSI controllers
> + * must support IRQ remapping/interrupt translation
> + */
> if (!allow_unsafe_interrupts &&
> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
> __func__);
> ret = -EPERM;
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-09 22:49 ` Alex Williamson
0 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 4 May 2016 11:54:16 +0000
Eric Auger <eric.auger@linaro.org> wrote:
> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
> by the msi controller. vfio_safe_irq_domain allows to check whether
> interrupts are "safe" for a given device. They are if the device does
> not use MSI
Are we sure we're not opening a security hole here? An MSI is simply a
DMA write, so really whether or not a device uses MSI is irrelevant.
If it can generate a DMA to the MSI doorbell then we need to be
protected and I think we pretty much need to assume that devices are
DMA capable. Do the MSI domain checks cover this?
> or if the device uses MSI and the msi-parent controller
> supports IRQ remapping.
>
> Then we check at group level if all devices have safe interrupts: if not,
> we only allow the group to be attached if allow_unsafe_interrupts is set.
>
> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
> changed in next patch.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> ---
> v3 -> v4:
> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
> and irq_remapping into safe_irq_domains
>
> v2 -> v3:
> - protect vfio_msi_parent_irq_remapping_capable with
> CONFIG_GENERIC_MSI_IRQ_DOMAIN
> ---
> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 42 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4d3a6f1..2fc8197 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -37,6 +37,8 @@
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
> #include <linux/msi-iommu.h>
> +#include <linux/irqdomain.h>
> +#include <linux/msi.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
> return 0;
> }
>
> +/**
> + * vfio_safe_irq_domain: returns whether the irq domain
> + * the device is attached to is safe with respect to MSI isolation.
> + * If the irq domain is not an MSI domain, we return it is safe.
> + *
> + * @dev: device handle
> + * @data: unused
> + * returns 0 if the irq domain is safe, -1 if not.
> + */
> +static int vfio_safe_irq_domain(struct device *dev, void *data)
> +{
> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
> + struct irq_domain *domain;
> + struct msi_domain_info *info;
> +
> + domain = dev_get_msi_domain(dev);
> + if (!domain)
> + return 0;
> +
> + info = msi_get_domain_info(domain);
> +
> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
> + return -1;
> +#endif
> + return 0;
> +}
> +
> static int vfio_iommu_replay(struct vfio_iommu *iommu,
> struct vfio_domain *domain)
> {
> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct vfio_group *group, *g;
> struct vfio_domain *domain, *d;
> struct bus_type *bus = NULL;
> - int ret;
> + int ret, safe_irq_domains;
>
> mutex_lock(&iommu->lock);
>
> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>
> group->iommu_group = iommu_group;
>
> + /*
> + * Determine if all the devices of the group have a safe irq domain
> + * with respect to MSI isolation
> + */
> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
> + vfio_safe_irq_domain);
> +
> /* Determine bus_type in order to allocate a domain */
> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
> if (ret)
> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> INIT_LIST_HEAD(&domain->group_list);
> list_add(&group->next, &domain->group_list);
>
> + /*
> + * to advertise safe interrupts either the IOMMU or the MSI controllers
> + * must support IRQ remapping/interrupt translation
> + */
> if (!allow_unsafe_interrupts &&
> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
> __func__);
> ret = -EPERM;
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <20160509164942.5af8b30d-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>]
* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
2016-05-09 22:49 ` Alex Williamson
(?)
@ 2016-05-10 16:10 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 16:10 UTC (permalink / raw)
To: Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:16 +0000
> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>
>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>> by the msi controller. vfio_safe_irq_domain allows to check whether
>> interrupts are "safe" for a given device. They are if the device does
>> not use MSI
>
> Are we sure we're not opening a security hole here? An MSI is simply a
> DMA write, so really whether or not a device uses MSI is irrelevant.
> If it can generate a DMA to the MSI doorbell then we need to be
> protected and I think we pretty much need to assume that devices are
> DMA capable. Do the MSI domain checks cover this?
Let me try to rephrase: we check the device is not attached to an MSI
controller (I think this is the semantic of dev_get_msi_domain(dev)).
If it is not, we don't have to care about MSI isolation: there will be
no IOMMU binding between the device and any MSI doorbell. If it is we
check the msi domain is backed by an MSI controller able to perform MSI
isolation.
So effectively "usage of MSIs" is improper - since it is decided after
the group attachment anyway - and the commit message should rather
state "if the device is linked to an MSI controller" (dt msi-parent
notion I think).
Does it sound better?
Regards
Eric
>
>> or if the device uses MSI and the msi-parent controller
>> supports IRQ remapping.
>>
>> Then we check at group level if all devices have safe interrupts: if not,
>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>
>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>> changed in next patch.
>>
>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>
>> ---
>> v3 -> v4:
>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>> and irq_remapping into safe_irq_domains
>>
>> v2 -> v3:
>> - protect vfio_msi_parent_irq_remapping_capable with
>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 4d3a6f1..2fc8197 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -37,6 +37,8 @@
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> #include <linux/msi-iommu.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/msi.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>> return 0;
>> }
>>
>> +/**
>> + * vfio_safe_irq_domain: returns whether the irq domain
>> + * the device is attached to is safe with respect to MSI isolation.
>> + * If the irq domain is not an MSI domain, we return it is safe.
>> + *
>> + * @dev: device handle
>> + * @data: unused
>> + * returns 0 if the irq domain is safe, -1 if not.
>> + */
>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>> +{
>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> + struct irq_domain *domain;
>> + struct msi_domain_info *info;
>> +
>> + domain = dev_get_msi_domain(dev);
>> + if (!domain)
>> + return 0;
>> +
>> + info = msi_get_domain_info(domain);
>> +
>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>> + return -1;
>> +#endif
>> + return 0;
>> +}
>> +
>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>> struct vfio_domain *domain)
>> {
>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> struct vfio_group *group, *g;
>> struct vfio_domain *domain, *d;
>> struct bus_type *bus = NULL;
>> - int ret;
>> + int ret, safe_irq_domains;
>>
>> mutex_lock(&iommu->lock);
>>
>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>
>> group->iommu_group = iommu_group;
>>
>> + /*
>> + * Determine if all the devices of the group have a safe irq domain
>> + * with respect to MSI isolation
>> + */
>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>> + vfio_safe_irq_domain);
>> +
>> /* Determine bus_type in order to allocate a domain */
>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>> if (ret)
>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> INIT_LIST_HEAD(&domain->group_list);
>> list_add(&group->next, &domain->group_list);
>>
>> + /*
>> + * to advertise safe interrupts either the IOMMU or the MSI controllers
>> + * must support IRQ remapping/interrupt translation
>> + */
>> if (!allow_unsafe_interrupts &&
>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>> __func__);
>> ret = -EPERM;
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-10 16:10 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 16:10 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:16 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>> by the msi controller. vfio_safe_irq_domain allows to check whether
>> interrupts are "safe" for a given device. They are if the device does
>> not use MSI
>
> Are we sure we're not opening a security hole here? An MSI is simply a
> DMA write, so really whether or not a device uses MSI is irrelevant.
> If it can generate a DMA to the MSI doorbell then we need to be
> protected and I think we pretty much need to assume that devices are
> DMA capable. Do the MSI domain checks cover this?
Let me try to rephrase: we check the device is not attached to an MSI
controller (I think this is the semantic of dev_get_msi_domain(dev)).
If it is not, we don't have to care about MSI isolation: there will be
no IOMMU binding between the device and any MSI doorbell. If it is we
check the msi domain is backed by an MSI controller able to perform MSI
isolation.
So effectively "usage of MSIs" is improper - since it is decided after
the group attachment anyway - and the commit message should rather
state "if the device is linked to an MSI controller" (dt msi-parent
notion I think).
Does it sound better?
Regards
Eric
>
>> or if the device uses MSI and the msi-parent controller
>> supports IRQ remapping.
>>
>> Then we check at group level if all devices have safe interrupts: if not,
>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>
>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>> changed in next patch.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v3 -> v4:
>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>> and irq_remapping into safe_irq_domains
>>
>> v2 -> v3:
>> - protect vfio_msi_parent_irq_remapping_capable with
>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 4d3a6f1..2fc8197 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -37,6 +37,8 @@
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> #include <linux/msi-iommu.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/msi.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>> return 0;
>> }
>>
>> +/**
>> + * vfio_safe_irq_domain: returns whether the irq domain
>> + * the device is attached to is safe with respect to MSI isolation.
>> + * If the irq domain is not an MSI domain, we return it is safe.
>> + *
>> + * @dev: device handle
>> + * @data: unused
>> + * returns 0 if the irq domain is safe, -1 if not.
>> + */
>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>> +{
>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> + struct irq_domain *domain;
>> + struct msi_domain_info *info;
>> +
>> + domain = dev_get_msi_domain(dev);
>> + if (!domain)
>> + return 0;
>> +
>> + info = msi_get_domain_info(domain);
>> +
>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>> + return -1;
>> +#endif
>> + return 0;
>> +}
>> +
>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>> struct vfio_domain *domain)
>> {
>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> struct vfio_group *group, *g;
>> struct vfio_domain *domain, *d;
>> struct bus_type *bus = NULL;
>> - int ret;
>> + int ret, safe_irq_domains;
>>
>> mutex_lock(&iommu->lock);
>>
>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>
>> group->iommu_group = iommu_group;
>>
>> + /*
>> + * Determine if all the devices of the group have a safe irq domain
>> + * with respect to MSI isolation
>> + */
>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>> + vfio_safe_irq_domain);
>> +
>> /* Determine bus_type in order to allocate a domain */
>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>> if (ret)
>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> INIT_LIST_HEAD(&domain->group_list);
>> list_add(&group->next, &domain->group_list);
>>
>> + /*
>> + * to advertise safe interrupts either the IOMMU or the MSI controllers
>> + * must support IRQ remapping/interrupt translation
>> + */
>> if (!allow_unsafe_interrupts &&
>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>> __func__);
>> ret = -EPERM;
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-10 16:10 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 16:10 UTC (permalink / raw)
To: linux-arm-kernel
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:16 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>> by the msi controller. vfio_safe_irq_domain allows to check whether
>> interrupts are "safe" for a given device. They are if the device does
>> not use MSI
>
> Are we sure we're not opening a security hole here? An MSI is simply a
> DMA write, so really whether or not a device uses MSI is irrelevant.
> If it can generate a DMA to the MSI doorbell then we need to be
> protected and I think we pretty much need to assume that devices are
> DMA capable. Do the MSI domain checks cover this?
Let me try to rephrase: we check the device is not attached to an MSI
controller (I think this is the semantic of dev_get_msi_domain(dev)).
If it is not, we don't have to care about MSI isolation: there will be
no IOMMU binding between the device and any MSI doorbell. If it is we
check the msi domain is backed by an MSI controller able to perform MSI
isolation.
So effectively "usage of MSIs" is improper - since it is decided after
the group attachment anyway - and the commit message should rather
state "if the device is linked to an MSI controller" (dt msi-parent
notion I think).
Does it sound better?
Regards
Eric
>
>> or if the device uses MSI and the msi-parent controller
>> supports IRQ remapping.
>>
>> Then we check at group level if all devices have safe interrupts: if not,
>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>
>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>> changed in next patch.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v3 -> v4:
>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>> and irq_remapping into safe_irq_domains
>>
>> v2 -> v3:
>> - protect vfio_msi_parent_irq_remapping_capable with
>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 4d3a6f1..2fc8197 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -37,6 +37,8 @@
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> #include <linux/msi-iommu.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/msi.h>
>>
>> #define DRIVER_VERSION "0.2"
>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>> return 0;
>> }
>>
>> +/**
>> + * vfio_safe_irq_domain: returns whether the irq domain
>> + * the device is attached to is safe with respect to MSI isolation.
>> + * If the irq domain is not an MSI domain, we return it is safe.
>> + *
>> + * @dev: device handle
>> + * @data: unused
>> + * returns 0 if the irq domain is safe, -1 if not.
>> + */
>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>> +{
>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> + struct irq_domain *domain;
>> + struct msi_domain_info *info;
>> +
>> + domain = dev_get_msi_domain(dev);
>> + if (!domain)
>> + return 0;
>> +
>> + info = msi_get_domain_info(domain);
>> +
>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>> + return -1;
>> +#endif
>> + return 0;
>> +}
>> +
>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>> struct vfio_domain *domain)
>> {
>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> struct vfio_group *group, *g;
>> struct vfio_domain *domain, *d;
>> struct bus_type *bus = NULL;
>> - int ret;
>> + int ret, safe_irq_domains;
>>
>> mutex_lock(&iommu->lock);
>>
>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>
>> group->iommu_group = iommu_group;
>>
>> + /*
>> + * Determine if all the devices of the group have a safe irq domain
>> + * with respect to MSI isolation
>> + */
>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>> + vfio_safe_irq_domain);
>> +
>> /* Determine bus_type in order to allocate a domain */
>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>> if (ret)
>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> INIT_LIST_HEAD(&domain->group_list);
>> list_add(&group->next, &domain->group_list);
>>
>> + /*
>> + * to advertise safe interrupts either the IOMMU or the MSI controllers
>> + * must support IRQ remapping/interrupt translation
>> + */
>> if (!allow_unsafe_interrupts &&
>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>> __func__);
>> ret = -EPERM;
>
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <57320803.8070506-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
2016-05-10 16:10 ` Eric Auger
(?)
@ 2016-05-10 17:24 ` Robin Murphy
-1 siblings, 0 replies; 111+ messages in thread
From: Robin Murphy @ 2016-05-10 17:24 UTC (permalink / raw)
To: Eric Auger, Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Hi Eric,
On 10/05/16 17:10, Eric Auger wrote:
> Hi Alex,
> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>> On Wed, 4 May 2016 11:54:16 +0000
>> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>
>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>> interrupts are "safe" for a given device. They are if the device does
>>> not use MSI
>>
>> Are we sure we're not opening a security hole here? An MSI is simply a
>> DMA write, so really whether or not a device uses MSI is irrelevant.
>> If it can generate a DMA to the MSI doorbell then we need to be
>> protected and I think we pretty much need to assume that devices are
>> DMA capable. Do the MSI domain checks cover this?
> Let me try to rephrase: we check the device is not attached to an MSI
> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>
> If it is not, we don't have to care about MSI isolation: there will be
> no IOMMU binding between the device and any MSI doorbell. If it is we
> check the msi domain is backed by an MSI controller able to perform MSI
> isolation.
>
> So effectively "usage of MSIs" is improper - since it is decided after
> the group attachment anyway - and the commit message should rather
> state "if the device is linked to an MSI controller" (dt msi-parent
> notion I think).
Hmm, I think Alex has a point here - on a GICv2m I can happily fire
arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
aren't in an MSI domain, so I don't think it's valid to assume that a
device using only wired interrupts, therefore with no connection to any
MSI controller, isn't still capable of maliciously spewing DMA all over
any and every doorbell region in the system.
Robin.
> Does it sound better?
>
> Regards
>
> Eric
>>
>>> or if the device uses MSI and the msi-parent controller
>>> supports IRQ remapping.
>>>
>>> Then we check at group level if all devices have safe interrupts: if not,
>>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>>
>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>> changed in next patch.
>>>
>>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>>
>>> ---
>>> v3 -> v4:
>>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>>> and irq_remapping into safe_irq_domains
>>>
>>> v2 -> v3:
>>> - protect vfio_msi_parent_irq_remapping_capable with
>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>> ---
>>> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 4d3a6f1..2fc8197 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -37,6 +37,8 @@
>>> #include <linux/vfio.h>
>>> #include <linux/workqueue.h>
>>> #include <linux/msi-iommu.h>
>>> +#include <linux/irqdomain.h>
>>> +#include <linux/msi.h>
>>>
>>> #define DRIVER_VERSION "0.2"
>>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>>> return 0;
>>> }
>>>
>>> +/**
>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>> + * the device is attached to is safe with respect to MSI isolation.
>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>> + *
>>> + * @dev: device handle
>>> + * @data: unused
>>> + * returns 0 if the irq domain is safe, -1 if not.
>>> + */
>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>> +{
>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>> + struct irq_domain *domain;
>>> + struct msi_domain_info *info;
>>> +
>>> + domain = dev_get_msi_domain(dev);
>>> + if (!domain)
>>> + return 0;
>>> +
>>> + info = msi_get_domain_info(domain);
>>> +
>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>> + return -1;
>>> +#endif
>>> + return 0;
>>> +}
>>> +
>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>> struct vfio_domain *domain)
>>> {
>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>> struct vfio_group *group, *g;
>>> struct vfio_domain *domain, *d;
>>> struct bus_type *bus = NULL;
>>> - int ret;
>>> + int ret, safe_irq_domains;
>>>
>>> mutex_lock(&iommu->lock);
>>>
>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>>
>>> group->iommu_group = iommu_group;
>>>
>>> + /*
>>> + * Determine if all the devices of the group have a safe irq domain
>>> + * with respect to MSI isolation
>>> + */
>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>> + vfio_safe_irq_domain);
>>> +
>>> /* Determine bus_type in order to allocate a domain */
>>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>> if (ret)
>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>> INIT_LIST_HEAD(&domain->group_list);
>>> list_add(&group->next, &domain->group_list);
>>>
>>> + /*
>>> + * to advertise safe interrupts either the IOMMU or the MSI controllers
>>> + * must support IRQ remapping/interrupt translation
>>> + */
>>> if (!allow_unsafe_interrupts &&
>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>>> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>>> __func__);
>>> ret = -EPERM;
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-10 17:24 ` Robin Murphy
0 siblings, 0 replies; 111+ messages in thread
From: Robin Murphy @ 2016-05-10 17:24 UTC (permalink / raw)
To: Eric Auger, Alex Williamson
Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
christoffer.dall, linux-arm-kernel, patches, linux-kernel,
Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
Hi Eric,
On 10/05/16 17:10, Eric Auger wrote:
> Hi Alex,
> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>> On Wed, 4 May 2016 11:54:16 +0000
>> Eric Auger <eric.auger@linaro.org> wrote:
>>
>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>> interrupts are "safe" for a given device. They are if the device does
>>> not use MSI
>>
>> Are we sure we're not opening a security hole here? An MSI is simply a
>> DMA write, so really whether or not a device uses MSI is irrelevant.
>> If it can generate a DMA to the MSI doorbell then we need to be
>> protected and I think we pretty much need to assume that devices are
>> DMA capable. Do the MSI domain checks cover this?
> Let me try to rephrase: we check the device is not attached to an MSI
> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>
> If it is not, we don't have to care about MSI isolation: there will be
> no IOMMU binding between the device and any MSI doorbell. If it is we
> check the msi domain is backed by an MSI controller able to perform MSI
> isolation.
>
> So effectively "usage of MSIs" is improper - since it is decided after
> the group attachment anyway - and the commit message should rather
> state "if the device is linked to an MSI controller" (dt msi-parent
> notion I think).
Hmm, I think Alex has a point here - on a GICv2m I can happily fire
arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
aren't in an MSI domain, so I don't think it's valid to assume that a
device using only wired interrupts, therefore with no connection to any
MSI controller, isn't still capable of maliciously spewing DMA all over
any and every doorbell region in the system.
Robin.
> Does it sound better?
>
> Regards
>
> Eric
>>
>>> or if the device uses MSI and the msi-parent controller
>>> supports IRQ remapping.
>>>
>>> Then we check at group level if all devices have safe interrupts: if not,
>>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>>
>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>> changed in next patch.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> ---
>>> v3 -> v4:
>>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>>> and irq_remapping into safe_irq_domains
>>>
>>> v2 -> v3:
>>> - protect vfio_msi_parent_irq_remapping_capable with
>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>> ---
>>> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 4d3a6f1..2fc8197 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -37,6 +37,8 @@
>>> #include <linux/vfio.h>
>>> #include <linux/workqueue.h>
>>> #include <linux/msi-iommu.h>
>>> +#include <linux/irqdomain.h>
>>> +#include <linux/msi.h>
>>>
>>> #define DRIVER_VERSION "0.2"
>>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>>> return 0;
>>> }
>>>
>>> +/**
>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>> + * the device is attached to is safe with respect to MSI isolation.
>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>> + *
>>> + * @dev: device handle
>>> + * @data: unused
>>> + * returns 0 if the irq domain is safe, -1 if not.
>>> + */
>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>> +{
>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>> + struct irq_domain *domain;
>>> + struct msi_domain_info *info;
>>> +
>>> + domain = dev_get_msi_domain(dev);
>>> + if (!domain)
>>> + return 0;
>>> +
>>> + info = msi_get_domain_info(domain);
>>> +
>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>> + return -1;
>>> +#endif
>>> + return 0;
>>> +}
>>> +
>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>> struct vfio_domain *domain)
>>> {
>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>> struct vfio_group *group, *g;
>>> struct vfio_domain *domain, *d;
>>> struct bus_type *bus = NULL;
>>> - int ret;
>>> + int ret, safe_irq_domains;
>>>
>>> mutex_lock(&iommu->lock);
>>>
>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>>
>>> group->iommu_group = iommu_group;
>>>
>>> + /*
>>> + * Determine if all the devices of the group have a safe irq domain
>>> + * with respect to MSI isolation
>>> + */
>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>> + vfio_safe_irq_domain);
>>> +
>>> /* Determine bus_type in order to allocate a domain */
>>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>> if (ret)
>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>> INIT_LIST_HEAD(&domain->group_list);
>>> list_add(&group->next, &domain->group_list);
>>>
>>> + /*
>>> + * to advertise safe interrupts either the IOMMU or the MSI controllers
>>> + * must support IRQ remapping/interrupt translation
>>> + */
>>> if (!allow_unsafe_interrupts &&
>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>>> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>>> __func__);
>>> ret = -EPERM;
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-10 17:24 ` Robin Murphy
0 siblings, 0 replies; 111+ messages in thread
From: Robin Murphy @ 2016-05-10 17:24 UTC (permalink / raw)
To: linux-arm-kernel
Hi Eric,
On 10/05/16 17:10, Eric Auger wrote:
> Hi Alex,
> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>> On Wed, 4 May 2016 11:54:16 +0000
>> Eric Auger <eric.auger@linaro.org> wrote:
>>
>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>> interrupts are "safe" for a given device. They are if the device does
>>> not use MSI
>>
>> Are we sure we're not opening a security hole here? An MSI is simply a
>> DMA write, so really whether or not a device uses MSI is irrelevant.
>> If it can generate a DMA to the MSI doorbell then we need to be
>> protected and I think we pretty much need to assume that devices are
>> DMA capable. Do the MSI domain checks cover this?
> Let me try to rephrase: we check the device is not attached to an MSI
> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>
> If it is not, we don't have to care about MSI isolation: there will be
> no IOMMU binding between the device and any MSI doorbell. If it is we
> check the msi domain is backed by an MSI controller able to perform MSI
> isolation.
>
> So effectively "usage of MSIs" is improper - since it is decided after
> the group attachment anyway - and the commit message should rather
> state "if the device is linked to an MSI controller" (dt msi-parent
> notion I think).
Hmm, I think Alex has a point here - on a GICv2m I can happily fire
arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
aren't in an MSI domain, so I don't think it's valid to assume that a
device using only wired interrupts, therefore with no connection to any
MSI controller, isn't still capable of maliciously spewing DMA all over
any and every doorbell region in the system.
Robin.
> Does it sound better?
>
> Regards
>
> Eric
>>
>>> or if the device uses MSI and the msi-parent controller
>>> supports IRQ remapping.
>>>
>>> Then we check at group level if all devices have safe interrupts: if not,
>>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>>
>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>> changed in next patch.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> ---
>>> v3 -> v4:
>>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>>> and irq_remapping into safe_irq_domains
>>>
>>> v2 -> v3:
>>> - protect vfio_msi_parent_irq_remapping_capable with
>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>> ---
>>> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 4d3a6f1..2fc8197 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -37,6 +37,8 @@
>>> #include <linux/vfio.h>
>>> #include <linux/workqueue.h>
>>> #include <linux/msi-iommu.h>
>>> +#include <linux/irqdomain.h>
>>> +#include <linux/msi.h>
>>>
>>> #define DRIVER_VERSION "0.2"
>>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>>> return 0;
>>> }
>>>
>>> +/**
>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>> + * the device is attached to is safe with respect to MSI isolation.
>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>> + *
>>> + * @dev: device handle
>>> + * @data: unused
>>> + * returns 0 if the irq domain is safe, -1 if not.
>>> + */
>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>> +{
>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>> + struct irq_domain *domain;
>>> + struct msi_domain_info *info;
>>> +
>>> + domain = dev_get_msi_domain(dev);
>>> + if (!domain)
>>> + return 0;
>>> +
>>> + info = msi_get_domain_info(domain);
>>> +
>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>> + return -1;
>>> +#endif
>>> + return 0;
>>> +}
>>> +
>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>> struct vfio_domain *domain)
>>> {
>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>> struct vfio_group *group, *g;
>>> struct vfio_domain *domain, *d;
>>> struct bus_type *bus = NULL;
>>> - int ret;
>>> + int ret, safe_irq_domains;
>>>
>>> mutex_lock(&iommu->lock);
>>>
>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>>
>>> group->iommu_group = iommu_group;
>>>
>>> + /*
>>> + * Determine if all the devices of the group have a safe irq domain
>>> + * with respect to MSI isolation
>>> + */
>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>> + vfio_safe_irq_domain);
>>> +
>>> /* Determine bus_type in order to allocate a domain */
>>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>> if (ret)
>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>> INIT_LIST_HEAD(&domain->group_list);
>>> list_add(&group->next, &domain->group_list);
>>>
>>> + /*
>>> + * to advertise safe interrupts either the IOMMU or the MSI controllers
>>> + * must support IRQ remapping/interrupt translation
>>> + */
>>> if (!allow_unsafe_interrupts &&
>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>>> pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>>> __func__);
>>> ret = -EPERM;
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <57321952.2020206-5wv7dgnIgG8@public.gmane.org>]
* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
2016-05-10 17:24 ` Robin Murphy
(?)
@ 2016-05-11 8:38 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 8:38 UTC (permalink / raw)
To: Robin Murphy, Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Hi Robin, Alex,
On 05/10/2016 07:24 PM, Robin Murphy wrote:
> Hi Eric,
>
> On 10/05/16 17:10, Eric Auger wrote:
>> Hi Alex,
>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>> On Wed, 4 May 2016 11:54:16 +0000
>>> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>>
>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>> abstracted
>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>> interrupts are "safe" for a given device. They are if the device does
>>>> not use MSI
>>>
>>> Are we sure we're not opening a security hole here? An MSI is simply a
>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>> If it can generate a DMA to the MSI doorbell then we need to be
>>> protected and I think we pretty much need to assume that devices are
>>> DMA capable. Do the MSI domain checks cover this?
>> Let me try to rephrase: we check the device is not attached to an MSI
>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>
>> If it is not, we don't have to care about MSI isolation: there will be
>> no IOMMU binding between the device and any MSI doorbell. If it is we
>> check the msi domain is backed by an MSI controller able to perform MSI
>> isolation.
>>
>> So effectively "usage of MSIs" is improper - since it is decided after
>> the group attachment anyway - and the commit message should rather
>> state "if the device is linked to an MSI controller" (dt msi-parent
>> notion I think).
>
> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
> aren't in an MSI domain, so I don't think it's valid to assume that a
> device using only wired interrupts, therefore with no connection to any
> MSI controller, isn't still capable of maliciously spewing DMA all over
> any and every doorbell region in the system.
Sorry but I still don't get the point. For the device to reach the
doorbell there must be an IOMMU mapping.
- if the device is not attached to an MSI domain, there won't be any
doorbell iommu mapping built by this series, so no risk, right?
The device will be allowed to reach only memory iommu mapped by
userspace with VFIO DMA MAP standard API. Of course if the userspace can
mmap all the host PA that's a more general issue, right?
- If the device is attached to an MSI domain (msi-parent link), 2 cases:
1) the MSI controller advertises MSI isolation (ITS cases), no risk
2) the MSI controller does not advertise MSI isolation (GICv2m), there
is a security hole.
a) by default we reject the device attachment
b) if the userspace overrides the safe interrupt option he accepts
the security hole
What am I missing?
Best Regards
Eric
>
> Robin.
>
>> Does it sound better?
>>
>> Regards
>>
>> Eric
>>>
>>>> or if the device uses MSI and the msi-parent controller
>>>> supports IRQ remapping.
>>>>
>>>> Then we check at group level if all devices have safe interrupts: if
>>>> not,
>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>> set.
>>>>
>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>> changed in next patch.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>>>
>>>> ---
>>>> v3 -> v4:
>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>> vfio_safe_irq_domain
>>>> and irq_remapping into safe_irq_domains
>>>>
>>>> v2 -> v3:
>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>> ---
>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>> +++++++++++++++++++++++++++++++++++++++--
>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>> index 4d3a6f1..2fc8197 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -37,6 +37,8 @@
>>>> #include <linux/vfio.h>
>>>> #include <linux/workqueue.h>
>>>> #include <linux/msi-iommu.h>
>>>> +#include <linux/irqdomain.h>
>>>> +#include <linux/msi.h>
>>>>
>>>> #define DRIVER_VERSION "0.2"
>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>> <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>> void *data)
>>>> return 0;
>>>> }
>>>>
>>>> +/**
>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>> + *
>>>> + * @dev: device handle
>>>> + * @data: unused
>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>> + */
>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>> +{
>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>> + struct irq_domain *domain;
>>>> + struct msi_domain_info *info;
>>>> +
>>>> + domain = dev_get_msi_domain(dev);
>>>> + if (!domain)
>>>> + return 0;
>>>> +
>>>> + info = msi_get_domain_info(domain);
>>>> +
>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>> + return -1;
>>>> +#endif
>>>> + return 0;
>>>> +}
>>>> +
>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>> struct vfio_domain *domain)
>>>> {
>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>> struct vfio_group *group, *g;
>>>> struct vfio_domain *domain, *d;
>>>> struct bus_type *bus = NULL;
>>>> - int ret;
>>>> + int ret, safe_irq_domains;
>>>>
>>>> mutex_lock(&iommu->lock);
>>>>
>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>>
>>>> group->iommu_group = iommu_group;
>>>>
>>>> + /*
>>>> + * Determine if all the devices of the group have a safe irq
>>>> domain
>>>> + * with respect to MSI isolation
>>>> + */
>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>> + vfio_safe_irq_domain);
>>>> +
>>>> /* Determine bus_type in order to allocate a domain */
>>>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>>> if (ret)
>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>> INIT_LIST_HEAD(&domain->group_list);
>>>> list_add(&group->next, &domain->group_list);
>>>>
>>>> + /*
>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>> controllers
>>>> + * must support IRQ remapping/interrupt translation
>>>> + */
>>>> if (!allow_unsafe_interrupts &&
>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>> !safe_irq_domains)) {
>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>> support on this platform\n",
>>>> __func__);
>>>> ret = -EPERM;
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-11 8:38 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 8:38 UTC (permalink / raw)
To: Robin Murphy, Alex Williamson
Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
christoffer.dall, linux-arm-kernel, patches, linux-kernel,
Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
Hi Robin, Alex,
On 05/10/2016 07:24 PM, Robin Murphy wrote:
> Hi Eric,
>
> On 10/05/16 17:10, Eric Auger wrote:
>> Hi Alex,
>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>> On Wed, 4 May 2016 11:54:16 +0000
>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>
>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>> abstracted
>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>> interrupts are "safe" for a given device. They are if the device does
>>>> not use MSI
>>>
>>> Are we sure we're not opening a security hole here? An MSI is simply a
>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>> If it can generate a DMA to the MSI doorbell then we need to be
>>> protected and I think we pretty much need to assume that devices are
>>> DMA capable. Do the MSI domain checks cover this?
>> Let me try to rephrase: we check the device is not attached to an MSI
>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>
>> If it is not, we don't have to care about MSI isolation: there will be
>> no IOMMU binding between the device and any MSI doorbell. If it is we
>> check the msi domain is backed by an MSI controller able to perform MSI
>> isolation.
>>
>> So effectively "usage of MSIs" is improper - since it is decided after
>> the group attachment anyway - and the commit message should rather
>> state "if the device is linked to an MSI controller" (dt msi-parent
>> notion I think).
>
> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
> aren't in an MSI domain, so I don't think it's valid to assume that a
> device using only wired interrupts, therefore with no connection to any
> MSI controller, isn't still capable of maliciously spewing DMA all over
> any and every doorbell region in the system.
Sorry but I still don't get the point. For the device to reach the
doorbell there must be an IOMMU mapping.
- if the device is not attached to an MSI domain, there won't be any
doorbell iommu mapping built by this series, so no risk, right?
The device will be allowed to reach only memory iommu mapped by
userspace with VFIO DMA MAP standard API. Of course if the userspace can
mmap all the host PA that's a more general issue, right?
- If the device is attached to an MSI domain (msi-parent link), 2 cases:
1) the MSI controller advertises MSI isolation (ITS cases), no risk
2) the MSI controller does not advertise MSI isolation (GICv2m), there
is a security hole.
a) by default we reject the device attachment
b) if the userspace overrides the safe interrupt option he accepts
the security hole
What am I missing?
Best Regards
Eric
>
> Robin.
>
>> Does it sound better?
>>
>> Regards
>>
>> Eric
>>>
>>>> or if the device uses MSI and the msi-parent controller
>>>> supports IRQ remapping.
>>>>
>>>> Then we check at group level if all devices have safe interrupts: if
>>>> not,
>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>> set.
>>>>
>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>> changed in next patch.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>
>>>> ---
>>>> v3 -> v4:
>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>> vfio_safe_irq_domain
>>>> and irq_remapping into safe_irq_domains
>>>>
>>>> v2 -> v3:
>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>> ---
>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>> +++++++++++++++++++++++++++++++++++++++--
>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>> index 4d3a6f1..2fc8197 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -37,6 +37,8 @@
>>>> #include <linux/vfio.h>
>>>> #include <linux/workqueue.h>
>>>> #include <linux/msi-iommu.h>
>>>> +#include <linux/irqdomain.h>
>>>> +#include <linux/msi.h>
>>>>
>>>> #define DRIVER_VERSION "0.2"
>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>> <alex.williamson@redhat.com>"
>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>> void *data)
>>>> return 0;
>>>> }
>>>>
>>>> +/**
>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>> + *
>>>> + * @dev: device handle
>>>> + * @data: unused
>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>> + */
>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>> +{
>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>> + struct irq_domain *domain;
>>>> + struct msi_domain_info *info;
>>>> +
>>>> + domain = dev_get_msi_domain(dev);
>>>> + if (!domain)
>>>> + return 0;
>>>> +
>>>> + info = msi_get_domain_info(domain);
>>>> +
>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>> + return -1;
>>>> +#endif
>>>> + return 0;
>>>> +}
>>>> +
>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>> struct vfio_domain *domain)
>>>> {
>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>> struct vfio_group *group, *g;
>>>> struct vfio_domain *domain, *d;
>>>> struct bus_type *bus = NULL;
>>>> - int ret;
>>>> + int ret, safe_irq_domains;
>>>>
>>>> mutex_lock(&iommu->lock);
>>>>
>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>>
>>>> group->iommu_group = iommu_group;
>>>>
>>>> + /*
>>>> + * Determine if all the devices of the group have a safe irq
>>>> domain
>>>> + * with respect to MSI isolation
>>>> + */
>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>> + vfio_safe_irq_domain);
>>>> +
>>>> /* Determine bus_type in order to allocate a domain */
>>>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>>> if (ret)
>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>> INIT_LIST_HEAD(&domain->group_list);
>>>> list_add(&group->next, &domain->group_list);
>>>>
>>>> + /*
>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>> controllers
>>>> + * must support IRQ remapping/interrupt translation
>>>> + */
>>>> if (!allow_unsafe_interrupts &&
>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>> !safe_irq_domains)) {
>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>> support on this platform\n",
>>>> __func__);
>>>> ret = -EPERM;
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-11 8:38 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 8:38 UTC (permalink / raw)
To: linux-arm-kernel
Hi Robin, Alex,
On 05/10/2016 07:24 PM, Robin Murphy wrote:
> Hi Eric,
>
> On 10/05/16 17:10, Eric Auger wrote:
>> Hi Alex,
>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>> On Wed, 4 May 2016 11:54:16 +0000
>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>
>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>> abstracted
>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>> interrupts are "safe" for a given device. They are if the device does
>>>> not use MSI
>>>
>>> Are we sure we're not opening a security hole here? An MSI is simply a
>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>> If it can generate a DMA to the MSI doorbell then we need to be
>>> protected and I think we pretty much need to assume that devices are
>>> DMA capable. Do the MSI domain checks cover this?
>> Let me try to rephrase: we check the device is not attached to an MSI
>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>
>> If it is not, we don't have to care about MSI isolation: there will be
>> no IOMMU binding between the device and any MSI doorbell. If it is we
>> check the msi domain is backed by an MSI controller able to perform MSI
>> isolation.
>>
>> So effectively "usage of MSIs" is improper - since it is decided after
>> the group attachment anyway - and the commit message should rather
>> state "if the device is linked to an MSI controller" (dt msi-parent
>> notion I think).
>
> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
> aren't in an MSI domain, so I don't think it's valid to assume that a
> device using only wired interrupts, therefore with no connection to any
> MSI controller, isn't still capable of maliciously spewing DMA all over
> any and every doorbell region in the system.
Sorry but I still don't get the point. For the device to reach the
doorbell there must be an IOMMU mapping.
- if the device is not attached to an MSI domain, there won't be any
doorbell iommu mapping built by this series, so no risk, right?
The device will be allowed to reach only memory iommu mapped by
userspace with VFIO DMA MAP standard API. Of course if the userspace can
mmap all the host PA that's a more general issue, right?
- If the device is attached to an MSI domain (msi-parent link), 2 cases:
1) the MSI controller advertises MSI isolation (ITS cases), no risk
2) the MSI controller does not advertise MSI isolation (GICv2m), there
is a security hole.
a) by default we reject the device attachment
b) if the userspace overrides the safe interrupt option he accepts
the security hole
What am I missing?
Best Regards
Eric
>
> Robin.
>
>> Does it sound better?
>>
>> Regards
>>
>> Eric
>>>
>>>> or if the device uses MSI and the msi-parent controller
>>>> supports IRQ remapping.
>>>>
>>>> Then we check at group level if all devices have safe interrupts: if
>>>> not,
>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>> set.
>>>>
>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>> changed in next patch.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>
>>>> ---
>>>> v3 -> v4:
>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>> vfio_safe_irq_domain
>>>> and irq_remapping into safe_irq_domains
>>>>
>>>> v2 -> v3:
>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>> ---
>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>> +++++++++++++++++++++++++++++++++++++++--
>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>> index 4d3a6f1..2fc8197 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -37,6 +37,8 @@
>>>> #include <linux/vfio.h>
>>>> #include <linux/workqueue.h>
>>>> #include <linux/msi-iommu.h>
>>>> +#include <linux/irqdomain.h>
>>>> +#include <linux/msi.h>
>>>>
>>>> #define DRIVER_VERSION "0.2"
>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>> <alex.williamson@redhat.com>"
>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>> void *data)
>>>> return 0;
>>>> }
>>>>
>>>> +/**
>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>> + *
>>>> + * @dev: device handle
>>>> + * @data: unused
>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>> + */
>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>> +{
>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>> + struct irq_domain *domain;
>>>> + struct msi_domain_info *info;
>>>> +
>>>> + domain = dev_get_msi_domain(dev);
>>>> + if (!domain)
>>>> + return 0;
>>>> +
>>>> + info = msi_get_domain_info(domain);
>>>> +
>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>> + return -1;
>>>> +#endif
>>>> + return 0;
>>>> +}
>>>> +
>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>> struct vfio_domain *domain)
>>>> {
>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>> struct vfio_group *group, *g;
>>>> struct vfio_domain *domain, *d;
>>>> struct bus_type *bus = NULL;
>>>> - int ret;
>>>> + int ret, safe_irq_domains;
>>>>
>>>> mutex_lock(&iommu->lock);
>>>>
>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>>
>>>> group->iommu_group = iommu_group;
>>>>
>>>> + /*
>>>> + * Determine if all the devices of the group have a safe irq
>>>> domain
>>>> + * with respect to MSI isolation
>>>> + */
>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>> + vfio_safe_irq_domain);
>>>> +
>>>> /* Determine bus_type in order to allocate a domain */
>>>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>>> if (ret)
>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>> INIT_LIST_HEAD(&domain->group_list);
>>>> list_add(&group->next, &domain->group_list);
>>>>
>>>> + /*
>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>> controllers
>>>> + * must support IRQ remapping/interrupt translation
>>>> + */
>>>> if (!allow_unsafe_interrupts &&
>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>> !safe_irq_domains)) {
>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>> support on this platform\n",
>>>> __func__);
>>>> ret = -EPERM;
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <5732EF96.4070001-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
2016-05-11 8:38 ` Eric Auger
(?)
@ 2016-05-11 9:31 ` Robin Murphy
-1 siblings, 0 replies; 111+ messages in thread
From: Robin Murphy @ 2016-05-11 9:31 UTC (permalink / raw)
To: Eric Auger, Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
On 11/05/16 09:38, Eric Auger wrote:
> Hi Robin, Alex,
> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>> Hi Eric,
>>
>> On 10/05/16 17:10, Eric Auger wrote:
>>> Hi Alex,
>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>>>
>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>> abstracted
>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>> not use MSI
>>>>
>>>> Are we sure we're not opening a security hole here? An MSI is simply a
>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>> protected and I think we pretty much need to assume that devices are
>>>> DMA capable. Do the MSI domain checks cover this?
>>> Let me try to rephrase: we check the device is not attached to an MSI
>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>
>>> If it is not, we don't have to care about MSI isolation: there will be
>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>> check the msi domain is backed by an MSI controller able to perform MSI
>>> isolation.
>>>
>>> So effectively "usage of MSIs" is improper - since it is decided after
>>> the group attachment anyway - and the commit message should rather
>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>> notion I think).
>>
>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>> aren't in an MSI domain, so I don't think it's valid to assume that a
>> device using only wired interrupts, therefore with no connection to any
>> MSI controller, isn't still capable of maliciously spewing DMA all over
>> any and every doorbell region in the system.
>
> Sorry but I still don't get the point. For the device to reach the
> doorbell there must be an IOMMU mapping.
Only when the doorbell is _downstream_ of IOMMU translation.
> - if the device is not attached to an MSI domain, there won't be any
> doorbell iommu mapping built by this series, so no risk, right?
>
> The device will be allowed to reach only memory iommu mapped by
> userspace with VFIO DMA MAP standard API. Of course if the userspace can
> mmap all the host PA that's a more general issue, right?
>
> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
> is a security hole.
> a) by default we reject the device attachment
> b) if the userspace overrides the safe interrupt option he accepts
> the security hole
>
> What am I missing?
The x86-with-interrupt-remapping-disabled case. Currently, without
unsafe_interrupts, everything is rejected - with this patch, we'll still
reject anything with an MSI domain, but e.g. legacy PCI devices using
INTx would be free to scribble all over the un-translated interrupt
region willy-nilly. That's the subtle, but significant, change in behaviour.
Robin.
> Best Regards
>
> Eric
>>
>> Robin.
>>
>>> Does it sound better?
>>>
>>> Regards
>>>
>>> Eric
>>>>
>>>>> or if the device uses MSI and the msi-parent controller
>>>>> supports IRQ remapping.
>>>>>
>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>> not,
>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>> set.
>>>>>
>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>> changed in next patch.
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>>>>
>>>>> ---
>>>>> v3 -> v4:
>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>> vfio_safe_irq_domain
>>>>> and irq_remapping into safe_irq_domains
>>>>>
>>>>> v2 -> v3:
>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>> ---
>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>> index 4d3a6f1..2fc8197 100644
>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>> @@ -37,6 +37,8 @@
>>>>> #include <linux/vfio.h>
>>>>> #include <linux/workqueue.h>
>>>>> #include <linux/msi-iommu.h>
>>>>> +#include <linux/irqdomain.h>
>>>>> +#include <linux/msi.h>
>>>>>
>>>>> #define DRIVER_VERSION "0.2"
>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>> <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>> void *data)
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +/**
>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>> + *
>>>>> + * @dev: device handle
>>>>> + * @data: unused
>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>> + */
>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>> +{
>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>> + struct irq_domain *domain;
>>>>> + struct msi_domain_info *info;
>>>>> +
>>>>> + domain = dev_get_msi_domain(dev);
>>>>> + if (!domain)
>>>>> + return 0;
>>>>> +
>>>>> + info = msi_get_domain_info(domain);
>>>>> +
>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>> + return -1;
>>>>> +#endif
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>> struct vfio_domain *domain)
>>>>> {
>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>> struct vfio_group *group, *g;
>>>>> struct vfio_domain *domain, *d;
>>>>> struct bus_type *bus = NULL;
>>>>> - int ret;
>>>>> + int ret, safe_irq_domains;
>>>>>
>>>>> mutex_lock(&iommu->lock);
>>>>>
>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>>
>>>>> group->iommu_group = iommu_group;
>>>>>
>>>>> + /*
>>>>> + * Determine if all the devices of the group have a safe irq
>>>>> domain
>>>>> + * with respect to MSI isolation
>>>>> + */
>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>> + vfio_safe_irq_domain);
>>>>> +
>>>>> /* Determine bus_type in order to allocate a domain */
>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>>>> if (ret)
>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>> list_add(&group->next, &domain->group_list);
>>>>>
>>>>> + /*
>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>> controllers
>>>>> + * must support IRQ remapping/interrupt translation
>>>>> + */
>>>>> if (!allow_unsafe_interrupts &&
>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>> !safe_irq_domains)) {
>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>> support on this platform\n",
>>>>> __func__);
>>>>> ret = -EPERM;
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-11 9:31 ` Robin Murphy
0 siblings, 0 replies; 111+ messages in thread
From: Robin Murphy @ 2016-05-11 9:31 UTC (permalink / raw)
To: Eric Auger, Alex Williamson
Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
christoffer.dall, linux-arm-kernel, patches, linux-kernel,
Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
On 11/05/16 09:38, Eric Auger wrote:
> Hi Robin, Alex,
> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>> Hi Eric,
>>
>> On 10/05/16 17:10, Eric Auger wrote:
>>> Hi Alex,
>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>
>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>> abstracted
>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>> not use MSI
>>>>
>>>> Are we sure we're not opening a security hole here? An MSI is simply a
>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>> protected and I think we pretty much need to assume that devices are
>>>> DMA capable. Do the MSI domain checks cover this?
>>> Let me try to rephrase: we check the device is not attached to an MSI
>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>
>>> If it is not, we don't have to care about MSI isolation: there will be
>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>> check the msi domain is backed by an MSI controller able to perform MSI
>>> isolation.
>>>
>>> So effectively "usage of MSIs" is improper - since it is decided after
>>> the group attachment anyway - and the commit message should rather
>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>> notion I think).
>>
>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>> aren't in an MSI domain, so I don't think it's valid to assume that a
>> device using only wired interrupts, therefore with no connection to any
>> MSI controller, isn't still capable of maliciously spewing DMA all over
>> any and every doorbell region in the system.
>
> Sorry but I still don't get the point. For the device to reach the
> doorbell there must be an IOMMU mapping.
Only when the doorbell is _downstream_ of IOMMU translation.
> - if the device is not attached to an MSI domain, there won't be any
> doorbell iommu mapping built by this series, so no risk, right?
>
> The device will be allowed to reach only memory iommu mapped by
> userspace with VFIO DMA MAP standard API. Of course if the userspace can
> mmap all the host PA that's a more general issue, right?
>
> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
> is a security hole.
> a) by default we reject the device attachment
> b) if the userspace overrides the safe interrupt option he accepts
> the security hole
>
> What am I missing?
The x86-with-interrupt-remapping-disabled case. Currently, without
unsafe_interrupts, everything is rejected - with this patch, we'll still
reject anything with an MSI domain, but e.g. legacy PCI devices using
INTx would be free to scribble all over the un-translated interrupt
region willy-nilly. That's the subtle, but significant, change in behaviour.
Robin.
> Best Regards
>
> Eric
>>
>> Robin.
>>
>>> Does it sound better?
>>>
>>> Regards
>>>
>>> Eric
>>>>
>>>>> or if the device uses MSI and the msi-parent controller
>>>>> supports IRQ remapping.
>>>>>
>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>> not,
>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>> set.
>>>>>
>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>> changed in next patch.
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>
>>>>> ---
>>>>> v3 -> v4:
>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>> vfio_safe_irq_domain
>>>>> and irq_remapping into safe_irq_domains
>>>>>
>>>>> v2 -> v3:
>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>> ---
>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>> index 4d3a6f1..2fc8197 100644
>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>> @@ -37,6 +37,8 @@
>>>>> #include <linux/vfio.h>
>>>>> #include <linux/workqueue.h>
>>>>> #include <linux/msi-iommu.h>
>>>>> +#include <linux/irqdomain.h>
>>>>> +#include <linux/msi.h>
>>>>>
>>>>> #define DRIVER_VERSION "0.2"
>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>> <alex.williamson@redhat.com>"
>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>> void *data)
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +/**
>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>> + *
>>>>> + * @dev: device handle
>>>>> + * @data: unused
>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>> + */
>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>> +{
>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>> + struct irq_domain *domain;
>>>>> + struct msi_domain_info *info;
>>>>> +
>>>>> + domain = dev_get_msi_domain(dev);
>>>>> + if (!domain)
>>>>> + return 0;
>>>>> +
>>>>> + info = msi_get_domain_info(domain);
>>>>> +
>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>> + return -1;
>>>>> +#endif
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>> struct vfio_domain *domain)
>>>>> {
>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>> struct vfio_group *group, *g;
>>>>> struct vfio_domain *domain, *d;
>>>>> struct bus_type *bus = NULL;
>>>>> - int ret;
>>>>> + int ret, safe_irq_domains;
>>>>>
>>>>> mutex_lock(&iommu->lock);
>>>>>
>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>>
>>>>> group->iommu_group = iommu_group;
>>>>>
>>>>> + /*
>>>>> + * Determine if all the devices of the group have a safe irq
>>>>> domain
>>>>> + * with respect to MSI isolation
>>>>> + */
>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>> + vfio_safe_irq_domain);
>>>>> +
>>>>> /* Determine bus_type in order to allocate a domain */
>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>>>> if (ret)
>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>> list_add(&group->next, &domain->group_list);
>>>>>
>>>>> + /*
>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>> controllers
>>>>> + * must support IRQ remapping/interrupt translation
>>>>> + */
>>>>> if (!allow_unsafe_interrupts &&
>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>> !safe_irq_domains)) {
>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>> support on this platform\n",
>>>>> __func__);
>>>>> ret = -EPERM;
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-11 9:31 ` Robin Murphy
0 siblings, 0 replies; 111+ messages in thread
From: Robin Murphy @ 2016-05-11 9:31 UTC (permalink / raw)
To: linux-arm-kernel
On 11/05/16 09:38, Eric Auger wrote:
> Hi Robin, Alex,
> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>> Hi Eric,
>>
>> On 10/05/16 17:10, Eric Auger wrote:
>>> Hi Alex,
>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>
>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>> abstracted
>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>> not use MSI
>>>>
>>>> Are we sure we're not opening a security hole here? An MSI is simply a
>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>> protected and I think we pretty much need to assume that devices are
>>>> DMA capable. Do the MSI domain checks cover this?
>>> Let me try to rephrase: we check the device is not attached to an MSI
>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>
>>> If it is not, we don't have to care about MSI isolation: there will be
>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>> check the msi domain is backed by an MSI controller able to perform MSI
>>> isolation.
>>>
>>> So effectively "usage of MSIs" is improper - since it is decided after
>>> the group attachment anyway - and the commit message should rather
>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>> notion I think).
>>
>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>> aren't in an MSI domain, so I don't think it's valid to assume that a
>> device using only wired interrupts, therefore with no connection to any
>> MSI controller, isn't still capable of maliciously spewing DMA all over
>> any and every doorbell region in the system.
>
> Sorry but I still don't get the point. For the device to reach the
> doorbell there must be an IOMMU mapping.
Only when the doorbell is _downstream_ of IOMMU translation.
> - if the device is not attached to an MSI domain, there won't be any
> doorbell iommu mapping built by this series, so no risk, right?
>
> The device will be allowed to reach only memory iommu mapped by
> userspace with VFIO DMA MAP standard API. Of course if the userspace can
> mmap all the host PA that's a more general issue, right?
>
> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
> is a security hole.
> a) by default we reject the device attachment
> b) if the userspace overrides the safe interrupt option he accepts
> the security hole
>
> What am I missing?
The x86-with-interrupt-remapping-disabled case. Currently, without
unsafe_interrupts, everything is rejected - with this patch, we'll still
reject anything with an MSI domain, but e.g. legacy PCI devices using
INTx would be free to scribble all over the un-translated interrupt
region willy-nilly. That's the subtle, but significant, change in behaviour.
Robin.
> Best Regards
>
> Eric
>>
>> Robin.
>>
>>> Does it sound better?
>>>
>>> Regards
>>>
>>> Eric
>>>>
>>>>> or if the device uses MSI and the msi-parent controller
>>>>> supports IRQ remapping.
>>>>>
>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>> not,
>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>> set.
>>>>>
>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>> changed in next patch.
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>
>>>>> ---
>>>>> v3 -> v4:
>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>> vfio_safe_irq_domain
>>>>> and irq_remapping into safe_irq_domains
>>>>>
>>>>> v2 -> v3:
>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>> ---
>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>> index 4d3a6f1..2fc8197 100644
>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>> @@ -37,6 +37,8 @@
>>>>> #include <linux/vfio.h>
>>>>> #include <linux/workqueue.h>
>>>>> #include <linux/msi-iommu.h>
>>>>> +#include <linux/irqdomain.h>
>>>>> +#include <linux/msi.h>
>>>>>
>>>>> #define DRIVER_VERSION "0.2"
>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>> <alex.williamson@redhat.com>"
>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>> void *data)
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +/**
>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>> + *
>>>>> + * @dev: device handle
>>>>> + * @data: unused
>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>> + */
>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>> +{
>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>> + struct irq_domain *domain;
>>>>> + struct msi_domain_info *info;
>>>>> +
>>>>> + domain = dev_get_msi_domain(dev);
>>>>> + if (!domain)
>>>>> + return 0;
>>>>> +
>>>>> + info = msi_get_domain_info(domain);
>>>>> +
>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>> + return -1;
>>>>> +#endif
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>> struct vfio_domain *domain)
>>>>> {
>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>> struct vfio_group *group, *g;
>>>>> struct vfio_domain *domain, *d;
>>>>> struct bus_type *bus = NULL;
>>>>> - int ret;
>>>>> + int ret, safe_irq_domains;
>>>>>
>>>>> mutex_lock(&iommu->lock);
>>>>>
>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>>
>>>>> group->iommu_group = iommu_group;
>>>>>
>>>>> + /*
>>>>> + * Determine if all the devices of the group have a safe irq
>>>>> domain
>>>>> + * with respect to MSI isolation
>>>>> + */
>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>> + vfio_safe_irq_domain);
>>>>> +
>>>>> /* Determine bus_type in order to allocate a domain */
>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>>>> if (ret)
>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>> list_add(&group->next, &domain->group_list);
>>>>>
>>>>> + /*
>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>> controllers
>>>>> + * must support IRQ remapping/interrupt translation
>>>>> + */
>>>>> if (!allow_unsafe_interrupts &&
>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>> !safe_irq_domains)) {
>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>> support on this platform\n",
>>>>> __func__);
>>>>> ret = -EPERM;
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <5732FBF0.4040105-5wv7dgnIgG8@public.gmane.org>]
* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
2016-05-11 9:31 ` Robin Murphy
(?)
@ 2016-05-11 9:44 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 9:44 UTC (permalink / raw)
To: Robin Murphy, Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Hi Robin,
On 05/11/2016 11:31 AM, Robin Murphy wrote:
> On 11/05/16 09:38, Eric Auger wrote:
>> Hi Robin, Alex,
>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>> Hi Eric,
>>>
>>> On 10/05/16 17:10, Eric Auger wrote:
>>>> Hi Alex,
>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>>> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>>>>
>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>> abstracted
>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>>> not use MSI
>>>>>
>>>>> Are we sure we're not opening a security hole here? An MSI is
>>>>> simply a
>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>> protected and I think we pretty much need to assume that devices are
>>>>> DMA capable. Do the MSI domain checks cover this?
>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>
>>>> If it is not, we don't have to care about MSI isolation: there will be
>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>> check the msi domain is backed by an MSI controller able to perform MSI
>>>> isolation.
>>>>
>>>> So effectively "usage of MSIs" is improper - since it is decided after
>>>> the group attachment anyway - and the commit message should rather
>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>> notion I think).
>>>
>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>> device using only wired interrupts, therefore with no connection to any
>>> MSI controller, isn't still capable of maliciously spewing DMA all over
>>> any and every doorbell region in the system.
>>
>> Sorry but I still don't get the point. For the device to reach the
>> doorbell there must be an IOMMU mapping.
>
> Only when the doorbell is _downstream_ of IOMMU translation.
Yes but that's the root hypothesis with VFIO assignment, right?
Except if the no-iommu option is explicitly set by the userspace, there
must be an IOMMU downstream to the device that protects all the DMA
transactions.
>
>> - if the device is not attached to an MSI domain, there won't be any
>> doorbell iommu mapping built by this series, so no risk, right?
>>
>> The device will be allowed to reach only memory iommu mapped by
>> userspace with VFIO DMA MAP standard API. Of course if the userspace can
>> mmap all the host PA that's a more general issue, right?
>>
>> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>> is a security hole.
>> a) by default we reject the device attachment
>> b) if the userspace overrides the safe interrupt option he accepts
>> the security hole
>>
>> What am I missing?
>
> The x86-with-interrupt-remapping-disabled case. Currently, without
> unsafe_interrupts, everything is rejected - with this patch, we'll still
> reject anything with an MSI domain, but e.g. legacy PCI devices using
> INTx would be free to scribble all over the un-translated interrupt
> region willy-nilly.
Assuming we have an IOMMU, either we have translated transactions or
faults? So if legacy PCI devices try to write into a doorbell, there
must be faults because there is no IOMMU mapping for doorbells?
Best Regards
Eric
That's the subtle, but significant, change in
> behaviour.
>
> Robin.
>
>> Best Regards
>>
>> Eric
>>>
>>> Robin.
>>>
>>>> Does it sound better?
>>>>
>>>> Regards
>>>>
>>>> Eric
>>>>>
>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>> supports IRQ remapping.
>>>>>>
>>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>>> not,
>>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>>> set.
>>>>>>
>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>>> changed in next patch.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>>>>>
>>>>>> ---
>>>>>> v3 -> v4:
>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>> vfio_safe_irq_domain
>>>>>> and irq_remapping into safe_irq_domains
>>>>>>
>>>>>> v2 -> v3:
>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>> ---
>>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>> @@ -37,6 +37,8 @@
>>>>>> #include <linux/vfio.h>
>>>>>> #include <linux/workqueue.h>
>>>>>> #include <linux/msi-iommu.h>
>>>>>> +#include <linux/irqdomain.h>
>>>>>> +#include <linux/msi.h>
>>>>>>
>>>>>> #define DRIVER_VERSION "0.2"
>>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>>> <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>> void *data)
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> +/**
>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>> + *
>>>>>> + * @dev: device handle
>>>>>> + * @data: unused
>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>> + */
>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>> +{
>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>> + struct irq_domain *domain;
>>>>>> + struct msi_domain_info *info;
>>>>>> +
>>>>>> + domain = dev_get_msi_domain(dev);
>>>>>> + if (!domain)
>>>>>> + return 0;
>>>>>> +
>>>>>> + info = msi_get_domain_info(domain);
>>>>>> +
>>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>> + return -1;
>>>>>> +#endif
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>> struct vfio_domain *domain)
>>>>>> {
>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>> struct vfio_group *group, *g;
>>>>>> struct vfio_domain *domain, *d;
>>>>>> struct bus_type *bus = NULL;
>>>>>> - int ret;
>>>>>> + int ret, safe_irq_domains;
>>>>>>
>>>>>> mutex_lock(&iommu->lock);
>>>>>>
>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>>
>>>>>> group->iommu_group = iommu_group;
>>>>>>
>>>>>> + /*
>>>>>> + * Determine if all the devices of the group have a safe irq
>>>>>> domain
>>>>>> + * with respect to MSI isolation
>>>>>> + */
>>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>>> + vfio_safe_irq_domain);
>>>>>> +
>>>>>> /* Determine bus_type in order to allocate a domain */
>>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>> vfio_bus_type);
>>>>>> if (ret)
>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>>> list_add(&group->next, &domain->group_list);
>>>>>>
>>>>>> + /*
>>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>>> controllers
>>>>>> + * must support IRQ remapping/interrupt translation
>>>>>> + */
>>>>>> if (!allow_unsafe_interrupts &&
>>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>> !safe_irq_domains)) {
>>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>> support on this platform\n",
>>>>>> __func__);
>>>>>> ret = -EPERM;
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-11 9:44 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 9:44 UTC (permalink / raw)
To: Robin Murphy, Alex Williamson
Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
christoffer.dall, linux-arm-kernel, patches, linux-kernel,
Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
Hi Robin,
On 05/11/2016 11:31 AM, Robin Murphy wrote:
> On 11/05/16 09:38, Eric Auger wrote:
>> Hi Robin, Alex,
>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>> Hi Eric,
>>>
>>> On 10/05/16 17:10, Eric Auger wrote:
>>>> Hi Alex,
>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>>
>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>> abstracted
>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>>> not use MSI
>>>>>
>>>>> Are we sure we're not opening a security hole here? An MSI is
>>>>> simply a
>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>> protected and I think we pretty much need to assume that devices are
>>>>> DMA capable. Do the MSI domain checks cover this?
>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>
>>>> If it is not, we don't have to care about MSI isolation: there will be
>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>> check the msi domain is backed by an MSI controller able to perform MSI
>>>> isolation.
>>>>
>>>> So effectively "usage of MSIs" is improper - since it is decided after
>>>> the group attachment anyway - and the commit message should rather
>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>> notion I think).
>>>
>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>> device using only wired interrupts, therefore with no connection to any
>>> MSI controller, isn't still capable of maliciously spewing DMA all over
>>> any and every doorbell region in the system.
>>
>> Sorry but I still don't get the point. For the device to reach the
>> doorbell there must be an IOMMU mapping.
>
> Only when the doorbell is _downstream_ of IOMMU translation.
Yes but that's the root hypothesis with VFIO assignment, right?
Except if the no-iommu option is explicitly set by the userspace, there
must be an IOMMU downstream to the device that protects all the DMA
transactions.
>
>> - if the device is not attached to an MSI domain, there won't be any
>> doorbell iommu mapping built by this series, so no risk, right?
>>
>> The device will be allowed to reach only memory iommu mapped by
>> userspace with VFIO DMA MAP standard API. Of course if the userspace can
>> mmap all the host PA that's a more general issue, right?
>>
>> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>> is a security hole.
>> a) by default we reject the device attachment
>> b) if the userspace overrides the safe interrupt option he accepts
>> the security hole
>>
>> What am I missing?
>
> The x86-with-interrupt-remapping-disabled case. Currently, without
> unsafe_interrupts, everything is rejected - with this patch, we'll still
> reject anything with an MSI domain, but e.g. legacy PCI devices using
> INTx would be free to scribble all over the un-translated interrupt
> region willy-nilly.
Assuming we have an IOMMU, either we have translated transactions or
faults? So if legacy PCI devices try to write into a doorbell, there
must be faults because there is no IOMMU mapping for doorbells?
Best Regards
Eric
That's the subtle, but significant, change in
> behaviour.
>
> Robin.
>
>> Best Regards
>>
>> Eric
>>>
>>> Robin.
>>>
>>>> Does it sound better?
>>>>
>>>> Regards
>>>>
>>>> Eric
>>>>>
>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>> supports IRQ remapping.
>>>>>>
>>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>>> not,
>>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>>> set.
>>>>>>
>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>>> changed in next patch.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>
>>>>>> ---
>>>>>> v3 -> v4:
>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>> vfio_safe_irq_domain
>>>>>> and irq_remapping into safe_irq_domains
>>>>>>
>>>>>> v2 -> v3:
>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>> ---
>>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>> @@ -37,6 +37,8 @@
>>>>>> #include <linux/vfio.h>
>>>>>> #include <linux/workqueue.h>
>>>>>> #include <linux/msi-iommu.h>
>>>>>> +#include <linux/irqdomain.h>
>>>>>> +#include <linux/msi.h>
>>>>>>
>>>>>> #define DRIVER_VERSION "0.2"
>>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>>> <alex.williamson@redhat.com>"
>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>> void *data)
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> +/**
>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>> + *
>>>>>> + * @dev: device handle
>>>>>> + * @data: unused
>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>> + */
>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>> +{
>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>> + struct irq_domain *domain;
>>>>>> + struct msi_domain_info *info;
>>>>>> +
>>>>>> + domain = dev_get_msi_domain(dev);
>>>>>> + if (!domain)
>>>>>> + return 0;
>>>>>> +
>>>>>> + info = msi_get_domain_info(domain);
>>>>>> +
>>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>> + return -1;
>>>>>> +#endif
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>> struct vfio_domain *domain)
>>>>>> {
>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>> struct vfio_group *group, *g;
>>>>>> struct vfio_domain *domain, *d;
>>>>>> struct bus_type *bus = NULL;
>>>>>> - int ret;
>>>>>> + int ret, safe_irq_domains;
>>>>>>
>>>>>> mutex_lock(&iommu->lock);
>>>>>>
>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>>
>>>>>> group->iommu_group = iommu_group;
>>>>>>
>>>>>> + /*
>>>>>> + * Determine if all the devices of the group have a safe irq
>>>>>> domain
>>>>>> + * with respect to MSI isolation
>>>>>> + */
>>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>>> + vfio_safe_irq_domain);
>>>>>> +
>>>>>> /* Determine bus_type in order to allocate a domain */
>>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>> vfio_bus_type);
>>>>>> if (ret)
>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>>> list_add(&group->next, &domain->group_list);
>>>>>>
>>>>>> + /*
>>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>>> controllers
>>>>>> + * must support IRQ remapping/interrupt translation
>>>>>> + */
>>>>>> if (!allow_unsafe_interrupts &&
>>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>> !safe_irq_domains)) {
>>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>> support on this platform\n",
>>>>>> __func__);
>>>>>> ret = -EPERM;
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-11 9:44 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 9:44 UTC (permalink / raw)
To: linux-arm-kernel
Hi Robin,
On 05/11/2016 11:31 AM, Robin Murphy wrote:
> On 11/05/16 09:38, Eric Auger wrote:
>> Hi Robin, Alex,
>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>> Hi Eric,
>>>
>>> On 10/05/16 17:10, Eric Auger wrote:
>>>> Hi Alex,
>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>>
>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>> abstracted
>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>>> not use MSI
>>>>>
>>>>> Are we sure we're not opening a security hole here? An MSI is
>>>>> simply a
>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>> protected and I think we pretty much need to assume that devices are
>>>>> DMA capable. Do the MSI domain checks cover this?
>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>
>>>> If it is not, we don't have to care about MSI isolation: there will be
>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>> check the msi domain is backed by an MSI controller able to perform MSI
>>>> isolation.
>>>>
>>>> So effectively "usage of MSIs" is improper - since it is decided after
>>>> the group attachment anyway - and the commit message should rather
>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>> notion I think).
>>>
>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>> device using only wired interrupts, therefore with no connection to any
>>> MSI controller, isn't still capable of maliciously spewing DMA all over
>>> any and every doorbell region in the system.
>>
>> Sorry but I still don't get the point. For the device to reach the
>> doorbell there must be an IOMMU mapping.
>
> Only when the doorbell is _downstream_ of IOMMU translation.
Yes but that's the root hypothesis with VFIO assignment, right?
Except if the no-iommu option is explicitly set by the userspace, there
must be an IOMMU downstream to the device that protects all the DMA
transactions.
>
>> - if the device is not attached to an MSI domain, there won't be any
>> doorbell iommu mapping built by this series, so no risk, right?
>>
>> The device will be allowed to reach only memory iommu mapped by
>> userspace with VFIO DMA MAP standard API. Of course if the userspace can
>> mmap all the host PA that's a more general issue, right?
>>
>> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>> is a security hole.
>> a) by default we reject the device attachment
>> b) if the userspace overrides the safe interrupt option he accepts
>> the security hole
>>
>> What am I missing?
>
> The x86-with-interrupt-remapping-disabled case. Currently, without
> unsafe_interrupts, everything is rejected - with this patch, we'll still
> reject anything with an MSI domain, but e.g. legacy PCI devices using
> INTx would be free to scribble all over the un-translated interrupt
> region willy-nilly.
Assuming we have an IOMMU, either we have translated transactions or
faults? So if legacy PCI devices try to write into a doorbell, there
must be faults because there is no IOMMU mapping for doorbells?
Best Regards
Eric
That's the subtle, but significant, change in
> behaviour.
>
> Robin.
>
>> Best Regards
>>
>> Eric
>>>
>>> Robin.
>>>
>>>> Does it sound better?
>>>>
>>>> Regards
>>>>
>>>> Eric
>>>>>
>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>> supports IRQ remapping.
>>>>>>
>>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>>> not,
>>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>>> set.
>>>>>>
>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>>> changed in next patch.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>
>>>>>> ---
>>>>>> v3 -> v4:
>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>> vfio_safe_irq_domain
>>>>>> and irq_remapping into safe_irq_domains
>>>>>>
>>>>>> v2 -> v3:
>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>> ---
>>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>> @@ -37,6 +37,8 @@
>>>>>> #include <linux/vfio.h>
>>>>>> #include <linux/workqueue.h>
>>>>>> #include <linux/msi-iommu.h>
>>>>>> +#include <linux/irqdomain.h>
>>>>>> +#include <linux/msi.h>
>>>>>>
>>>>>> #define DRIVER_VERSION "0.2"
>>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>>> <alex.williamson@redhat.com>"
>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>> void *data)
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> +/**
>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>> + *
>>>>>> + * @dev: device handle
>>>>>> + * @data: unused
>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>> + */
>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>> +{
>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>> + struct irq_domain *domain;
>>>>>> + struct msi_domain_info *info;
>>>>>> +
>>>>>> + domain = dev_get_msi_domain(dev);
>>>>>> + if (!domain)
>>>>>> + return 0;
>>>>>> +
>>>>>> + info = msi_get_domain_info(domain);
>>>>>> +
>>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>> + return -1;
>>>>>> +#endif
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>> struct vfio_domain *domain)
>>>>>> {
>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>> struct vfio_group *group, *g;
>>>>>> struct vfio_domain *domain, *d;
>>>>>> struct bus_type *bus = NULL;
>>>>>> - int ret;
>>>>>> + int ret, safe_irq_domains;
>>>>>>
>>>>>> mutex_lock(&iommu->lock);
>>>>>>
>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>>
>>>>>> group->iommu_group = iommu_group;
>>>>>>
>>>>>> + /*
>>>>>> + * Determine if all the devices of the group have a safe irq
>>>>>> domain
>>>>>> + * with respect to MSI isolation
>>>>>> + */
>>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>>> + vfio_safe_irq_domain);
>>>>>> +
>>>>>> /* Determine bus_type in order to allocate a domain */
>>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>> vfio_bus_type);
>>>>>> if (ret)
>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>>> list_add(&group->next, &domain->group_list);
>>>>>>
>>>>>> + /*
>>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>>> controllers
>>>>>> + * must support IRQ remapping/interrupt translation
>>>>>> + */
>>>>>> if (!allow_unsafe_interrupts &&
>>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>> !safe_irq_domains)) {
>>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>> support on this platform\n",
>>>>>> __func__);
>>>>>> ret = -EPERM;
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <5732FF04.6060202-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
2016-05-11 9:44 ` Eric Auger
(?)
@ 2016-05-11 13:48 ` Robin Murphy
-1 siblings, 0 replies; 111+ messages in thread
From: Robin Murphy @ 2016-05-11 13:48 UTC (permalink / raw)
To: Eric Auger, Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
On 11/05/16 10:44, Eric Auger wrote:
> Hi Robin,
> On 05/11/2016 11:31 AM, Robin Murphy wrote:
>> On 11/05/16 09:38, Eric Auger wrote:
>>> Hi Robin, Alex,
>>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>>> Hi Eric,
>>>>
>>>> On 10/05/16 17:10, Eric Auger wrote:
>>>>> Hi Alex,
>>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>>>> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>>>>>
>>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>>> abstracted
>>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>>>> not use MSI
>>>>>>
>>>>>> Are we sure we're not opening a security hole here? An MSI is
>>>>>> simply a
>>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>>> protected and I think we pretty much need to assume that devices are
>>>>>> DMA capable. Do the MSI domain checks cover this?
>>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>>
>>>>> If it is not, we don't have to care about MSI isolation: there will be
>>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>>> check the msi domain is backed by an MSI controller able to perform MSI
>>>>> isolation.
>>>>>
>>>>> So effectively "usage of MSIs" is improper - since it is decided after
>>>>> the group attachment anyway - and the commit message should rather
>>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>>> notion I think).
>>>>
>>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>>> device using only wired interrupts, therefore with no connection to any
>>>> MSI controller, isn't still capable of maliciously spewing DMA all over
>>>> any and every doorbell region in the system.
>>>
>>> Sorry but I still don't get the point. For the device to reach the
>>> doorbell there must be an IOMMU mapping.
>>
>> Only when the doorbell is _downstream_ of IOMMU translation.
> Yes but that's the root hypothesis with VFIO assignment, right?
>
> Except if the no-iommu option is explicitly set by the userspace, there
> must be an IOMMU downstream to the device that protects all the DMA
> transactions.
The IOMMU is downstream of the device, clearly, but the MSI machinery
itself could be _in between_ the two - say, integrated into a PCIe root
complex served by an external SMMU. The doorbell is carved out of the
upstream address space so that a write targeting the appropriate address
hits it directly and never even goes out to the SMMU - e.g.
http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268
>>> - if the device is not attached to an MSI domain, there won't be any
>>> doorbell iommu mapping built by this series, so no risk, right?
>>>
>>> The device will be allowed to reach only memory iommu mapped by
>>> userspace with VFIO DMA MAP standard API. Of course if the userspace can
>>> mmap all the host PA that's a more general issue, right?
>>>
>>> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
>>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>>> is a security hole.
>>> a) by default we reject the device attachment
>>> b) if the userspace overrides the safe interrupt option he accepts
>>> the security hole
>>>
>>> What am I missing?
>>
>> The x86-with-interrupt-remapping-disabled case. Currently, without
>> unsafe_interrupts, everything is rejected - with this patch, we'll still
>> reject anything with an MSI domain, but e.g. legacy PCI devices using
>> INTx would be free to scribble all over the un-translated interrupt
>> region willy-nilly.
> Assuming we have an IOMMU, either we have translated transactions or
> faults? So if legacy PCI devices try to write into a doorbell, there
> must be faults because there is no IOMMU mapping for doorbells?
To the best of my understanding, having been wading through the VT-d
spec, the case on x86 is as above (OK, so strictly it's more "in
parallel with" than "in front of") - writes outside the 0xFEExxxxx
interrupt region go to the DMA remapping units, whereas writes within
that region go to the interrupt remapping unit, and if that's disabled
or not implemented, simply sail straight through to the APIC beyond.
Robin.
> Best Regards
>
> Eric
> That's the subtle, but significant, change in
>> behaviour.
>>
>> Robin.
>>
>>> Best Regards
>>>
>>> Eric
>>>>
>>>> Robin.
>>>>
>>>>> Does it sound better?
>>>>>
>>>>> Regards
>>>>>
>>>>> Eric
>>>>>>
>>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>>> supports IRQ remapping.
>>>>>>>
>>>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>>>> not,
>>>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>>>> set.
>>>>>>>
>>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>>>> changed in next patch.
>>>>>>>
>>>>>>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>>>>>>
>>>>>>> ---
>>>>>>> v3 -> v4:
>>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>>> vfio_safe_irq_domain
>>>>>>> and irq_remapping into safe_irq_domains
>>>>>>>
>>>>>>> v2 -> v3:
>>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>> ---
>>>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>>> @@ -37,6 +37,8 @@
>>>>>>> #include <linux/vfio.h>
>>>>>>> #include <linux/workqueue.h>
>>>>>>> #include <linux/msi-iommu.h>
>>>>>>> +#include <linux/irqdomain.h>
>>>>>>> +#include <linux/msi.h>
>>>>>>>
>>>>>>> #define DRIVER_VERSION "0.2"
>>>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>>>> <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>>> void *data)
>>>>>>> return 0;
>>>>>>> }
>>>>>>>
>>>>>>> +/**
>>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>>> + *
>>>>>>> + * @dev: device handle
>>>>>>> + * @data: unused
>>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>>> + */
>>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>>> +{
>>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>> + struct irq_domain *domain;
>>>>>>> + struct msi_domain_info *info;
>>>>>>> +
>>>>>>> + domain = dev_get_msi_domain(dev);
>>>>>>> + if (!domain)
>>>>>>> + return 0;
>>>>>>> +
>>>>>>> + info = msi_get_domain_info(domain);
>>>>>>> +
>>>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>>> + return -1;
>>>>>>> +#endif
>>>>>>> + return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>>> struct vfio_domain *domain)
>>>>>>> {
>>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>> struct vfio_group *group, *g;
>>>>>>> struct vfio_domain *domain, *d;
>>>>>>> struct bus_type *bus = NULL;
>>>>>>> - int ret;
>>>>>>> + int ret, safe_irq_domains;
>>>>>>>
>>>>>>> mutex_lock(&iommu->lock);
>>>>>>>
>>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>>
>>>>>>> group->iommu_group = iommu_group;
>>>>>>>
>>>>>>> + /*
>>>>>>> + * Determine if all the devices of the group have a safe irq
>>>>>>> domain
>>>>>>> + * with respect to MSI isolation
>>>>>>> + */
>>>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>> + vfio_safe_irq_domain);
>>>>>>> +
>>>>>>> /* Determine bus_type in order to allocate a domain */
>>>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>> vfio_bus_type);
>>>>>>> if (ret)
>>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>>>> list_add(&group->next, &domain->group_list);
>>>>>>>
>>>>>>> + /*
>>>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>>>> controllers
>>>>>>> + * must support IRQ remapping/interrupt translation
>>>>>>> + */
>>>>>>> if (!allow_unsafe_interrupts &&
>>>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>>> !safe_irq_domains)) {
>>>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>>> support on this platform\n",
>>>>>>> __func__);
>>>>>>> ret = -EPERM;
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-11 13:48 ` Robin Murphy
0 siblings, 0 replies; 111+ messages in thread
From: Robin Murphy @ 2016-05-11 13:48 UTC (permalink / raw)
To: Eric Auger, Alex Williamson
Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
christoffer.dall, linux-arm-kernel, patches, linux-kernel,
Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
On 11/05/16 10:44, Eric Auger wrote:
> Hi Robin,
> On 05/11/2016 11:31 AM, Robin Murphy wrote:
>> On 11/05/16 09:38, Eric Auger wrote:
>>> Hi Robin, Alex,
>>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>>> Hi Eric,
>>>>
>>>> On 10/05/16 17:10, Eric Auger wrote:
>>>>> Hi Alex,
>>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>>>
>>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>>> abstracted
>>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>>>> not use MSI
>>>>>>
>>>>>> Are we sure we're not opening a security hole here? An MSI is
>>>>>> simply a
>>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>>> protected and I think we pretty much need to assume that devices are
>>>>>> DMA capable. Do the MSI domain checks cover this?
>>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>>
>>>>> If it is not, we don't have to care about MSI isolation: there will be
>>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>>> check the msi domain is backed by an MSI controller able to perform MSI
>>>>> isolation.
>>>>>
>>>>> So effectively "usage of MSIs" is improper - since it is decided after
>>>>> the group attachment anyway - and the commit message should rather
>>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>>> notion I think).
>>>>
>>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>>> device using only wired interrupts, therefore with no connection to any
>>>> MSI controller, isn't still capable of maliciously spewing DMA all over
>>>> any and every doorbell region in the system.
>>>
>>> Sorry but I still don't get the point. For the device to reach the
>>> doorbell there must be an IOMMU mapping.
>>
>> Only when the doorbell is _downstream_ of IOMMU translation.
> Yes but that's the root hypothesis with VFIO assignment, right?
>
> Except if the no-iommu option is explicitly set by the userspace, there
> must be an IOMMU downstream to the device that protects all the DMA
> transactions.
The IOMMU is downstream of the device, clearly, but the MSI machinery
itself could be _in between_ the two - say, integrated into a PCIe root
complex served by an external SMMU. The doorbell is carved out of the
upstream address space so that a write targeting the appropriate address
hits it directly and never even goes out to the SMMU - e.g.
http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268
>>> - if the device is not attached to an MSI domain, there won't be any
>>> doorbell iommu mapping built by this series, so no risk, right?
>>>
>>> The device will be allowed to reach only memory iommu mapped by
>>> userspace with VFIO DMA MAP standard API. Of course if the userspace can
>>> mmap all the host PA that's a more general issue, right?
>>>
>>> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
>>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>>> is a security hole.
>>> a) by default we reject the device attachment
>>> b) if the userspace overrides the safe interrupt option he accepts
>>> the security hole
>>>
>>> What am I missing?
>>
>> The x86-with-interrupt-remapping-disabled case. Currently, without
>> unsafe_interrupts, everything is rejected - with this patch, we'll still
>> reject anything with an MSI domain, but e.g. legacy PCI devices using
>> INTx would be free to scribble all over the un-translated interrupt
>> region willy-nilly.
> Assuming we have an IOMMU, either we have translated transactions or
> faults? So if legacy PCI devices try to write into a doorbell, there
> must be faults because there is no IOMMU mapping for doorbells?
To the best of my understanding, having been wading through the VT-d
spec, the case on x86 is as above (OK, so strictly it's more "in
parallel with" than "in front of") - writes outside the 0xFEExxxxx
interrupt region go to the DMA remapping units, whereas writes within
that region go to the interrupt remapping unit, and if that's disabled
or not implemented, simply sail straight through to the APIC beyond.
Robin.
> Best Regards
>
> Eric
> That's the subtle, but significant, change in
>> behaviour.
>>
>> Robin.
>>
>>> Best Regards
>>>
>>> Eric
>>>>
>>>> Robin.
>>>>
>>>>> Does it sound better?
>>>>>
>>>>> Regards
>>>>>
>>>>> Eric
>>>>>>
>>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>>> supports IRQ remapping.
>>>>>>>
>>>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>>>> not,
>>>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>>>> set.
>>>>>>>
>>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>>>> changed in next patch.
>>>>>>>
>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>>
>>>>>>> ---
>>>>>>> v3 -> v4:
>>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>>> vfio_safe_irq_domain
>>>>>>> and irq_remapping into safe_irq_domains
>>>>>>>
>>>>>>> v2 -> v3:
>>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>> ---
>>>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>>> @@ -37,6 +37,8 @@
>>>>>>> #include <linux/vfio.h>
>>>>>>> #include <linux/workqueue.h>
>>>>>>> #include <linux/msi-iommu.h>
>>>>>>> +#include <linux/irqdomain.h>
>>>>>>> +#include <linux/msi.h>
>>>>>>>
>>>>>>> #define DRIVER_VERSION "0.2"
>>>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>>>> <alex.williamson@redhat.com>"
>>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>>> void *data)
>>>>>>> return 0;
>>>>>>> }
>>>>>>>
>>>>>>> +/**
>>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>>> + *
>>>>>>> + * @dev: device handle
>>>>>>> + * @data: unused
>>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>>> + */
>>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>>> +{
>>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>> + struct irq_domain *domain;
>>>>>>> + struct msi_domain_info *info;
>>>>>>> +
>>>>>>> + domain = dev_get_msi_domain(dev);
>>>>>>> + if (!domain)
>>>>>>> + return 0;
>>>>>>> +
>>>>>>> + info = msi_get_domain_info(domain);
>>>>>>> +
>>>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>>> + return -1;
>>>>>>> +#endif
>>>>>>> + return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>>> struct vfio_domain *domain)
>>>>>>> {
>>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>> struct vfio_group *group, *g;
>>>>>>> struct vfio_domain *domain, *d;
>>>>>>> struct bus_type *bus = NULL;
>>>>>>> - int ret;
>>>>>>> + int ret, safe_irq_domains;
>>>>>>>
>>>>>>> mutex_lock(&iommu->lock);
>>>>>>>
>>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>>
>>>>>>> group->iommu_group = iommu_group;
>>>>>>>
>>>>>>> + /*
>>>>>>> + * Determine if all the devices of the group have a safe irq
>>>>>>> domain
>>>>>>> + * with respect to MSI isolation
>>>>>>> + */
>>>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>> + vfio_safe_irq_domain);
>>>>>>> +
>>>>>>> /* Determine bus_type in order to allocate a domain */
>>>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>> vfio_bus_type);
>>>>>>> if (ret)
>>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>>>> list_add(&group->next, &domain->group_list);
>>>>>>>
>>>>>>> + /*
>>>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>>>> controllers
>>>>>>> + * must support IRQ remapping/interrupt translation
>>>>>>> + */
>>>>>>> if (!allow_unsafe_interrupts &&
>>>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>>> !safe_irq_domains)) {
>>>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>>> support on this platform\n",
>>>>>>> __func__);
>>>>>>> ret = -EPERM;
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-11 13:48 ` Robin Murphy
0 siblings, 0 replies; 111+ messages in thread
From: Robin Murphy @ 2016-05-11 13:48 UTC (permalink / raw)
To: linux-arm-kernel
On 11/05/16 10:44, Eric Auger wrote:
> Hi Robin,
> On 05/11/2016 11:31 AM, Robin Murphy wrote:
>> On 11/05/16 09:38, Eric Auger wrote:
>>> Hi Robin, Alex,
>>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>>> Hi Eric,
>>>>
>>>> On 10/05/16 17:10, Eric Auger wrote:
>>>>> Hi Alex,
>>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>>>
>>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>>> abstracted
>>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>>>> not use MSI
>>>>>>
>>>>>> Are we sure we're not opening a security hole here? An MSI is
>>>>>> simply a
>>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>>> protected and I think we pretty much need to assume that devices are
>>>>>> DMA capable. Do the MSI domain checks cover this?
>>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>>
>>>>> If it is not, we don't have to care about MSI isolation: there will be
>>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>>> check the msi domain is backed by an MSI controller able to perform MSI
>>>>> isolation.
>>>>>
>>>>> So effectively "usage of MSIs" is improper - since it is decided after
>>>>> the group attachment anyway - and the commit message should rather
>>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>>> notion I think).
>>>>
>>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>>> device using only wired interrupts, therefore with no connection to any
>>>> MSI controller, isn't still capable of maliciously spewing DMA all over
>>>> any and every doorbell region in the system.
>>>
>>> Sorry but I still don't get the point. For the device to reach the
>>> doorbell there must be an IOMMU mapping.
>>
>> Only when the doorbell is _downstream_ of IOMMU translation.
> Yes but that's the root hypothesis with VFIO assignment, right?
>
> Except if the no-iommu option is explicitly set by the userspace, there
> must be an IOMMU downstream to the device that protects all the DMA
> transactions.
The IOMMU is downstream of the device, clearly, but the MSI machinery
itself could be _in between_ the two - say, integrated into a PCIe root
complex served by an external SMMU. The doorbell is carved out of the
upstream address space so that a write targeting the appropriate address
hits it directly and never even goes out to the SMMU - e.g.
http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268
>>> - if the device is not attached to an MSI domain, there won't be any
>>> doorbell iommu mapping built by this series, so no risk, right?
>>>
>>> The device will be allowed to reach only memory iommu mapped by
>>> userspace with VFIO DMA MAP standard API. Of course if the userspace can
>>> mmap all the host PA that's a more general issue, right?
>>>
>>> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
>>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>>> is a security hole.
>>> a) by default we reject the device attachment
>>> b) if the userspace overrides the safe interrupt option he accepts
>>> the security hole
>>>
>>> What am I missing?
>>
>> The x86-with-interrupt-remapping-disabled case. Currently, without
>> unsafe_interrupts, everything is rejected - with this patch, we'll still
>> reject anything with an MSI domain, but e.g. legacy PCI devices using
>> INTx would be free to scribble all over the un-translated interrupt
>> region willy-nilly.
> Assuming we have an IOMMU, either we have translated transactions or
> faults? So if legacy PCI devices try to write into a doorbell, there
> must be faults because there is no IOMMU mapping for doorbells?
To the best of my understanding, having been wading through the VT-d
spec, the case on x86 is as above (OK, so strictly it's more "in
parallel with" than "in front of") - writes outside the 0xFEExxxxx
interrupt region go to the DMA remapping units, whereas writes within
that region go to the interrupt remapping unit, and if that's disabled
or not implemented, simply sail straight through to the APIC beyond.
Robin.
> Best Regards
>
> Eric
> That's the subtle, but significant, change in
>> behaviour.
>>
>> Robin.
>>
>>> Best Regards
>>>
>>> Eric
>>>>
>>>> Robin.
>>>>
>>>>> Does it sound better?
>>>>>
>>>>> Regards
>>>>>
>>>>> Eric
>>>>>>
>>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>>> supports IRQ remapping.
>>>>>>>
>>>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>>>> not,
>>>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>>>> set.
>>>>>>>
>>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>>>> changed in next patch.
>>>>>>>
>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>>
>>>>>>> ---
>>>>>>> v3 -> v4:
>>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>>> vfio_safe_irq_domain
>>>>>>> and irq_remapping into safe_irq_domains
>>>>>>>
>>>>>>> v2 -> v3:
>>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>> ---
>>>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>>> @@ -37,6 +37,8 @@
>>>>>>> #include <linux/vfio.h>
>>>>>>> #include <linux/workqueue.h>
>>>>>>> #include <linux/msi-iommu.h>
>>>>>>> +#include <linux/irqdomain.h>
>>>>>>> +#include <linux/msi.h>
>>>>>>>
>>>>>>> #define DRIVER_VERSION "0.2"
>>>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>>>> <alex.williamson@redhat.com>"
>>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>>> void *data)
>>>>>>> return 0;
>>>>>>> }
>>>>>>>
>>>>>>> +/**
>>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>>> + *
>>>>>>> + * @dev: device handle
>>>>>>> + * @data: unused
>>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>>> + */
>>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>>> +{
>>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>> + struct irq_domain *domain;
>>>>>>> + struct msi_domain_info *info;
>>>>>>> +
>>>>>>> + domain = dev_get_msi_domain(dev);
>>>>>>> + if (!domain)
>>>>>>> + return 0;
>>>>>>> +
>>>>>>> + info = msi_get_domain_info(domain);
>>>>>>> +
>>>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>>> + return -1;
>>>>>>> +#endif
>>>>>>> + return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>>> struct vfio_domain *domain)
>>>>>>> {
>>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>> struct vfio_group *group, *g;
>>>>>>> struct vfio_domain *domain, *d;
>>>>>>> struct bus_type *bus = NULL;
>>>>>>> - int ret;
>>>>>>> + int ret, safe_irq_domains;
>>>>>>>
>>>>>>> mutex_lock(&iommu->lock);
>>>>>>>
>>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>>
>>>>>>> group->iommu_group = iommu_group;
>>>>>>>
>>>>>>> + /*
>>>>>>> + * Determine if all the devices of the group have a safe irq
>>>>>>> domain
>>>>>>> + * with respect to MSI isolation
>>>>>>> + */
>>>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>> + vfio_safe_irq_domain);
>>>>>>> +
>>>>>>> /* Determine bus_type in order to allocate a domain */
>>>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>> vfio_bus_type);
>>>>>>> if (ret)
>>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>>>> list_add(&group->next, &domain->group_list);
>>>>>>>
>>>>>>> + /*
>>>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>>>> controllers
>>>>>>> + * must support IRQ remapping/interrupt translation
>>>>>>> + */
>>>>>>> if (!allow_unsafe_interrupts &&
>>>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>>> !safe_irq_domains)) {
>>>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>>> support on this platform\n",
>>>>>>> __func__);
>>>>>>> ret = -EPERM;
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <5733383F.9060202-5wv7dgnIgG8@public.gmane.org>]
* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
2016-05-11 13:48 ` Robin Murphy
(?)
@ 2016-05-11 14:37 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 14:37 UTC (permalink / raw)
To: Robin Murphy, Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Hi Robin,
On 05/11/2016 03:48 PM, Robin Murphy wrote:
> On 11/05/16 10:44, Eric Auger wrote:
>> Hi Robin,
>> On 05/11/2016 11:31 AM, Robin Murphy wrote:
>>> On 11/05/16 09:38, Eric Auger wrote:
>>>> Hi Robin, Alex,
>>>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>>>> Hi Eric,
>>>>>
>>>>> On 10/05/16 17:10, Eric Auger wrote:
>>>>>> Hi Alex,
>>>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>>>>> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>>>>>>
>>>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>>>> abstracted
>>>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>>>> interrupts are "safe" for a given device. They are if the device
>>>>>>>> does
>>>>>>>> not use MSI
>>>>>>>
>>>>>>> Are we sure we're not opening a security hole here? An MSI is
>>>>>>> simply a
>>>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>>>> protected and I think we pretty much need to assume that devices are
>>>>>>> DMA capable. Do the MSI domain checks cover this?
>>>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>>>
>>>>>> If it is not, we don't have to care about MSI isolation: there
>>>>>> will be
>>>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>>>> check the msi domain is backed by an MSI controller able to
>>>>>> perform MSI
>>>>>> isolation.
>>>>>>
>>>>>> So effectively "usage of MSIs" is improper - since it is decided
>>>>>> after
>>>>>> the group attachment anyway - and the commit message should rather
>>>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>>>> notion I think).
>>>>>
>>>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs
>>>>> definitely
>>>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>>>> device using only wired interrupts, therefore with no connection to
>>>>> any
>>>>> MSI controller, isn't still capable of maliciously spewing DMA all
>>>>> over
>>>>> any and every doorbell region in the system.
>>>>
>>>> Sorry but I still don't get the point. For the device to reach the
>>>> doorbell there must be an IOMMU mapping.
>>>
>>> Only when the doorbell is _downstream_ of IOMMU translation.
>> Yes but that's the root hypothesis with VFIO assignment, right?
>>
>> Except if the no-iommu option is explicitly set by the userspace, there
>> must be an IOMMU downstream to the device that protects all the DMA
>> transactions.
>
> The IOMMU is downstream of the device, clearly, but the MSI machinery
> itself could be _in between_ the two - say, integrated into a PCIe root
> complex served by an external SMMU. The doorbell is carved out of the
> upstream address space so that a write targeting the appropriate address
> hits it directly and never even goes out to the SMMU - e.g.
> http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268
Ah OK, I now better understand the point. So in that case not all the
DMA accesses of the devices are properly protected by the sMMU. Doesn't
it look as a kind of PCIe Access Control Service issue here where
transactions are able to bypass the translation service (downstream to
the root complex)?
I clearly did not address that kind of topology in this series. In such
a case the MSIs may even not be mapped at all which was the primary
purpose of the series.
Do you have any clue about how to detect that situation? If we cannot
rely on any iommu service I don't find any other solution but to
recognize this situation and reject the assignment.
>
>>>> - if the device is not attached to an MSI domain, there won't be any
>>>> doorbell iommu mapping built by this series, so no risk, right?
>>>>
>>>> The device will be allowed to reach only memory iommu mapped by
>>>> userspace with VFIO DMA MAP standard API. Of course if the userspace
>>>> can
>>>> mmap all the host PA that's a more general issue, right?
>>>>
>>>> - If the device is attached to an MSI domain (msi-parent link), 2
>>>> cases:
>>>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>>>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>>>> is a security hole.
>>>> a) by default we reject the device attachment
>>>> b) if the userspace overrides the safe interrupt option he accepts
>>>> the security hole
>>>>
>>>> What am I missing?
>>>
>>> The x86-with-interrupt-remapping-disabled case. Currently, without
>>> unsafe_interrupts, everything is rejected - with this patch, we'll still
>>> reject anything with an MSI domain, but e.g. legacy PCI devices using
>>> INTx would be free to scribble all over the un-translated interrupt
>>> region willy-nilly.
>> Assuming we have an IOMMU, either we have translated transactions or
>> faults? So if legacy PCI devices try to write into a doorbell, there
>> must be faults because there is no IOMMU mapping for doorbells?
>
> To the best of my understanding, having been wading through the VT-d
> spec, the case on x86 is as above (OK, so strictly it's more "in
> parallel with" than "in front of") - writes outside the 0xFEExxxxx
> interrupt region go to the DMA remapping units, whereas writes within
> that region go to the interrupt remapping unit, and if that's disabled
> or not implemented, simply sail straight through to the APIC beyond.
that's my understanding too
Best Regards
Eric
>
> Robin.
>
>> Best Regards
>>
>> Eric
>> That's the subtle, but significant, change in
>>> behaviour.
>>>
>>> Robin.
>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>>
>>>>> Robin.
>>>>>
>>>>>> Does it sound better?
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Eric
>>>>>>>
>>>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>>>> supports IRQ remapping.
>>>>>>>>
>>>>>>>> Then we check at group level if all devices have safe
>>>>>>>> interrupts: if
>>>>>>>> not,
>>>>>>>> we only allow the group to be attached if
>>>>>>>> allow_unsafe_interrupts is
>>>>>>>> set.
>>>>>>>>
>>>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP.
>>>>>>>> This is
>>>>>>>> changed in next patch.
>>>>>>>>
>>>>>>>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> v3 -> v4:
>>>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>>>> vfio_safe_irq_domain
>>>>>>>> and irq_remapping into safe_irq_domains
>>>>>>>>
>>>>>>>> v2 -> v3:
>>>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>>> ---
>>>>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> @@ -37,6 +37,8 @@
>>>>>>>> #include <linux/vfio.h>
>>>>>>>> #include <linux/workqueue.h>
>>>>>>>> #include <linux/msi-iommu.h>
>>>>>>>> +#include <linux/irqdomain.h>
>>>>>>>> +#include <linux/msi.h>
>>>>>>>>
>>>>>>>> #define DRIVER_VERSION "0.2"
>>>>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>>>>> <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>>>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>>>> void *data)
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>>>> + * the device is attached to is safe with respect to MSI
>>>>>>>> isolation.
>>>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>>>> + *
>>>>>>>> + * @dev: device handle
>>>>>>>> + * @data: unused
>>>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>>>> + */
>>>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>>>> +{
>>>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>>> + struct irq_domain *domain;
>>>>>>>> + struct msi_domain_info *info;
>>>>>>>> +
>>>>>>>> + domain = dev_get_msi_domain(dev);
>>>>>>>> + if (!domain)
>>>>>>>> + return 0;
>>>>>>>> +
>>>>>>>> + info = msi_get_domain_info(domain);
>>>>>>>> +
>>>>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>>>> + return -1;
>>>>>>>> +#endif
>>>>>>>> + return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>>>> struct vfio_domain *domain)
>>>>>>>> {
>>>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>> struct vfio_group *group, *g;
>>>>>>>> struct vfio_domain *domain, *d;
>>>>>>>> struct bus_type *bus = NULL;
>>>>>>>> - int ret;
>>>>>>>> + int ret, safe_irq_domains;
>>>>>>>>
>>>>>>>> mutex_lock(&iommu->lock);
>>>>>>>>
>>>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>>
>>>>>>>> group->iommu_group = iommu_group;
>>>>>>>>
>>>>>>>> + /*
>>>>>>>> + * Determine if all the devices of the group have a safe irq
>>>>>>>> domain
>>>>>>>> + * with respect to MSI isolation
>>>>>>>> + */
>>>>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group,
>>>>>>>> &bus,
>>>>>>>> + vfio_safe_irq_domain);
>>>>>>>> +
>>>>>>>> /* Determine bus_type in order to allocate a domain */
>>>>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>>> vfio_bus_type);
>>>>>>>> if (ret)
>>>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>>>>> list_add(&group->next, &domain->group_list);
>>>>>>>>
>>>>>>>> + /*
>>>>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>>>>> controllers
>>>>>>>> + * must support IRQ remapping/interrupt translation
>>>>>>>> + */
>>>>>>>> if (!allow_unsafe_interrupts &&
>>>>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>>>> !safe_irq_domains)) {
>>>>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>>>> support on this platform\n",
>>>>>>>> __func__);
>>>>>>>> ret = -EPERM;
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-11 14:37 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 14:37 UTC (permalink / raw)
To: Robin Murphy, Alex Williamson
Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
christoffer.dall, linux-arm-kernel, patches, linux-kernel,
Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
Hi Robin,
On 05/11/2016 03:48 PM, Robin Murphy wrote:
> On 11/05/16 10:44, Eric Auger wrote:
>> Hi Robin,
>> On 05/11/2016 11:31 AM, Robin Murphy wrote:
>>> On 11/05/16 09:38, Eric Auger wrote:
>>>> Hi Robin, Alex,
>>>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>>>> Hi Eric,
>>>>>
>>>>> On 10/05/16 17:10, Eric Auger wrote:
>>>>>> Hi Alex,
>>>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>>>>
>>>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>>>> abstracted
>>>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>>>> interrupts are "safe" for a given device. They are if the device
>>>>>>>> does
>>>>>>>> not use MSI
>>>>>>>
>>>>>>> Are we sure we're not opening a security hole here? An MSI is
>>>>>>> simply a
>>>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>>>> protected and I think we pretty much need to assume that devices are
>>>>>>> DMA capable. Do the MSI domain checks cover this?
>>>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>>>
>>>>>> If it is not, we don't have to care about MSI isolation: there
>>>>>> will be
>>>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>>>> check the msi domain is backed by an MSI controller able to
>>>>>> perform MSI
>>>>>> isolation.
>>>>>>
>>>>>> So effectively "usage of MSIs" is improper - since it is decided
>>>>>> after
>>>>>> the group attachment anyway - and the commit message should rather
>>>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>>>> notion I think).
>>>>>
>>>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs
>>>>> definitely
>>>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>>>> device using only wired interrupts, therefore with no connection to
>>>>> any
>>>>> MSI controller, isn't still capable of maliciously spewing DMA all
>>>>> over
>>>>> any and every doorbell region in the system.
>>>>
>>>> Sorry but I still don't get the point. For the device to reach the
>>>> doorbell there must be an IOMMU mapping.
>>>
>>> Only when the doorbell is _downstream_ of IOMMU translation.
>> Yes but that's the root hypothesis with VFIO assignment, right?
>>
>> Except if the no-iommu option is explicitly set by the userspace, there
>> must be an IOMMU downstream to the device that protects all the DMA
>> transactions.
>
> The IOMMU is downstream of the device, clearly, but the MSI machinery
> itself could be _in between_ the two - say, integrated into a PCIe root
> complex served by an external SMMU. The doorbell is carved out of the
> upstream address space so that a write targeting the appropriate address
> hits it directly and never even goes out to the SMMU - e.g.
> http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268
Ah OK, I now better understand the point. So in that case not all the
DMA accesses of the devices are properly protected by the sMMU. Doesn't
it look as a kind of PCIe Access Control Service issue here where
transactions are able to bypass the translation service (downstream to
the root complex)?
I clearly did not address that kind of topology in this series. In such
a case the MSIs may even not be mapped at all which was the primary
purpose of the series.
Do you have any clue about how to detect that situation? If we cannot
rely on any iommu service I don't find any other solution but to
recognize this situation and reject the assignment.
>
>>>> - if the device is not attached to an MSI domain, there won't be any
>>>> doorbell iommu mapping built by this series, so no risk, right?
>>>>
>>>> The device will be allowed to reach only memory iommu mapped by
>>>> userspace with VFIO DMA MAP standard API. Of course if the userspace
>>>> can
>>>> mmap all the host PA that's a more general issue, right?
>>>>
>>>> - If the device is attached to an MSI domain (msi-parent link), 2
>>>> cases:
>>>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>>>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>>>> is a security hole.
>>>> a) by default we reject the device attachment
>>>> b) if the userspace overrides the safe interrupt option he accepts
>>>> the security hole
>>>>
>>>> What am I missing?
>>>
>>> The x86-with-interrupt-remapping-disabled case. Currently, without
>>> unsafe_interrupts, everything is rejected - with this patch, we'll still
>>> reject anything with an MSI domain, but e.g. legacy PCI devices using
>>> INTx would be free to scribble all over the un-translated interrupt
>>> region willy-nilly.
>> Assuming we have an IOMMU, either we have translated transactions or
>> faults? So if legacy PCI devices try to write into a doorbell, there
>> must be faults because there is no IOMMU mapping for doorbells?
>
> To the best of my understanding, having been wading through the VT-d
> spec, the case on x86 is as above (OK, so strictly it's more "in
> parallel with" than "in front of") - writes outside the 0xFEExxxxx
> interrupt region go to the DMA remapping units, whereas writes within
> that region go to the interrupt remapping unit, and if that's disabled
> or not implemented, simply sail straight through to the APIC beyond.
that's my understanding too
Best Regards
Eric
>
> Robin.
>
>> Best Regards
>>
>> Eric
>> That's the subtle, but significant, change in
>>> behaviour.
>>>
>>> Robin.
>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>>
>>>>> Robin.
>>>>>
>>>>>> Does it sound better?
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Eric
>>>>>>>
>>>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>>>> supports IRQ remapping.
>>>>>>>>
>>>>>>>> Then we check at group level if all devices have safe
>>>>>>>> interrupts: if
>>>>>>>> not,
>>>>>>>> we only allow the group to be attached if
>>>>>>>> allow_unsafe_interrupts is
>>>>>>>> set.
>>>>>>>>
>>>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP.
>>>>>>>> This is
>>>>>>>> changed in next patch.
>>>>>>>>
>>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> v3 -> v4:
>>>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>>>> vfio_safe_irq_domain
>>>>>>>> and irq_remapping into safe_irq_domains
>>>>>>>>
>>>>>>>> v2 -> v3:
>>>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>>> ---
>>>>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> @@ -37,6 +37,8 @@
>>>>>>>> #include <linux/vfio.h>
>>>>>>>> #include <linux/workqueue.h>
>>>>>>>> #include <linux/msi-iommu.h>
>>>>>>>> +#include <linux/irqdomain.h>
>>>>>>>> +#include <linux/msi.h>
>>>>>>>>
>>>>>>>> #define DRIVER_VERSION "0.2"
>>>>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>>>>> <alex.williamson@redhat.com>"
>>>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>>>> void *data)
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>>>> + * the device is attached to is safe with respect to MSI
>>>>>>>> isolation.
>>>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>>>> + *
>>>>>>>> + * @dev: device handle
>>>>>>>> + * @data: unused
>>>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>>>> + */
>>>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>>>> +{
>>>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>>> + struct irq_domain *domain;
>>>>>>>> + struct msi_domain_info *info;
>>>>>>>> +
>>>>>>>> + domain = dev_get_msi_domain(dev);
>>>>>>>> + if (!domain)
>>>>>>>> + return 0;
>>>>>>>> +
>>>>>>>> + info = msi_get_domain_info(domain);
>>>>>>>> +
>>>>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>>>> + return -1;
>>>>>>>> +#endif
>>>>>>>> + return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>>>> struct vfio_domain *domain)
>>>>>>>> {
>>>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>> struct vfio_group *group, *g;
>>>>>>>> struct vfio_domain *domain, *d;
>>>>>>>> struct bus_type *bus = NULL;
>>>>>>>> - int ret;
>>>>>>>> + int ret, safe_irq_domains;
>>>>>>>>
>>>>>>>> mutex_lock(&iommu->lock);
>>>>>>>>
>>>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>>
>>>>>>>> group->iommu_group = iommu_group;
>>>>>>>>
>>>>>>>> + /*
>>>>>>>> + * Determine if all the devices of the group have a safe irq
>>>>>>>> domain
>>>>>>>> + * with respect to MSI isolation
>>>>>>>> + */
>>>>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group,
>>>>>>>> &bus,
>>>>>>>> + vfio_safe_irq_domain);
>>>>>>>> +
>>>>>>>> /* Determine bus_type in order to allocate a domain */
>>>>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>>> vfio_bus_type);
>>>>>>>> if (ret)
>>>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>>>>> list_add(&group->next, &domain->group_list);
>>>>>>>>
>>>>>>>> + /*
>>>>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>>>>> controllers
>>>>>>>> + * must support IRQ remapping/interrupt translation
>>>>>>>> + */
>>>>>>>> if (!allow_unsafe_interrupts &&
>>>>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>>>> !safe_irq_domains)) {
>>>>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>>>> support on this platform\n",
>>>>>>>> __func__);
>>>>>>>> ret = -EPERM;
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
@ 2016-05-11 14:37 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-11 14:37 UTC (permalink / raw)
To: linux-arm-kernel
Hi Robin,
On 05/11/2016 03:48 PM, Robin Murphy wrote:
> On 11/05/16 10:44, Eric Auger wrote:
>> Hi Robin,
>> On 05/11/2016 11:31 AM, Robin Murphy wrote:
>>> On 11/05/16 09:38, Eric Auger wrote:
>>>> Hi Robin, Alex,
>>>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>>>> Hi Eric,
>>>>>
>>>>> On 10/05/16 17:10, Eric Auger wrote:
>>>>>> Hi Alex,
>>>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>>>> On Wed, 4 May 2016 11:54:16 +0000
>>>>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>>>>
>>>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>>>> abstracted
>>>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>>>> interrupts are "safe" for a given device. They are if the device
>>>>>>>> does
>>>>>>>> not use MSI
>>>>>>>
>>>>>>> Are we sure we're not opening a security hole here? An MSI is
>>>>>>> simply a
>>>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>>>> protected and I think we pretty much need to assume that devices are
>>>>>>> DMA capable. Do the MSI domain checks cover this?
>>>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>>>
>>>>>> If it is not, we don't have to care about MSI isolation: there
>>>>>> will be
>>>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>>>> check the msi domain is backed by an MSI controller able to
>>>>>> perform MSI
>>>>>> isolation.
>>>>>>
>>>>>> So effectively "usage of MSIs" is improper - since it is decided
>>>>>> after
>>>>>> the group attachment anyway - and the commit message should rather
>>>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>>>> notion I think).
>>>>>
>>>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs
>>>>> definitely
>>>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>>>> device using only wired interrupts, therefore with no connection to
>>>>> any
>>>>> MSI controller, isn't still capable of maliciously spewing DMA all
>>>>> over
>>>>> any and every doorbell region in the system.
>>>>
>>>> Sorry but I still don't get the point. For the device to reach the
>>>> doorbell there must be an IOMMU mapping.
>>>
>>> Only when the doorbell is _downstream_ of IOMMU translation.
>> Yes but that's the root hypothesis with VFIO assignment, right?
>>
>> Except if the no-iommu option is explicitly set by the userspace, there
>> must be an IOMMU downstream to the device that protects all the DMA
>> transactions.
>
> The IOMMU is downstream of the device, clearly, but the MSI machinery
> itself could be _in between_ the two - say, integrated into a PCIe root
> complex served by an external SMMU. The doorbell is carved out of the
> upstream address space so that a write targeting the appropriate address
> hits it directly and never even goes out to the SMMU - e.g.
> http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268
Ah OK, I now better understand the point. So in that case not all the
DMA accesses of the devices are properly protected by the sMMU. Doesn't
it look as a kind of PCIe Access Control Service issue here where
transactions are able to bypass the translation service (downstream to
the root complex)?
I clearly did not address that kind of topology in this series. In such
a case the MSIs may even not be mapped at all which was the primary
purpose of the series.
Do you have any clue about how to detect that situation? If we cannot
rely on any iommu service I don't find any other solution but to
recognize this situation and reject the assignment.
>
>>>> - if the device is not attached to an MSI domain, there won't be any
>>>> doorbell iommu mapping built by this series, so no risk, right?
>>>>
>>>> The device will be allowed to reach only memory iommu mapped by
>>>> userspace with VFIO DMA MAP standard API. Of course if the userspace
>>>> can
>>>> mmap all the host PA that's a more general issue, right?
>>>>
>>>> - If the device is attached to an MSI domain (msi-parent link), 2
>>>> cases:
>>>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>>>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>>>> is a security hole.
>>>> a) by default we reject the device attachment
>>>> b) if the userspace overrides the safe interrupt option he accepts
>>>> the security hole
>>>>
>>>> What am I missing?
>>>
>>> The x86-with-interrupt-remapping-disabled case. Currently, without
>>> unsafe_interrupts, everything is rejected - with this patch, we'll still
>>> reject anything with an MSI domain, but e.g. legacy PCI devices using
>>> INTx would be free to scribble all over the un-translated interrupt
>>> region willy-nilly.
>> Assuming we have an IOMMU, either we have translated transactions or
>> faults? So if legacy PCI devices try to write into a doorbell, there
>> must be faults because there is no IOMMU mapping for doorbells?
>
> To the best of my understanding, having been wading through the VT-d
> spec, the case on x86 is as above (OK, so strictly it's more "in
> parallel with" than "in front of") - writes outside the 0xFEExxxxx
> interrupt region go to the DMA remapping units, whereas writes within
> that region go to the interrupt remapping unit, and if that's disabled
> or not implemented, simply sail straight through to the APIC beyond.
that's my understanding too
Best Regards
Eric
>
> Robin.
>
>> Best Regards
>>
>> Eric
>> That's the subtle, but significant, change in
>>> behaviour.
>>>
>>> Robin.
>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>>
>>>>> Robin.
>>>>>
>>>>>> Does it sound better?
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Eric
>>>>>>>
>>>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>>>> supports IRQ remapping.
>>>>>>>>
>>>>>>>> Then we check at group level if all devices have safe
>>>>>>>> interrupts: if
>>>>>>>> not,
>>>>>>>> we only allow the group to be attached if
>>>>>>>> allow_unsafe_interrupts is
>>>>>>>> set.
>>>>>>>>
>>>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP.
>>>>>>>> This is
>>>>>>>> changed in next patch.
>>>>>>>>
>>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> v3 -> v4:
>>>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>>>> vfio_safe_irq_domain
>>>>>>>> and irq_remapping into safe_irq_domains
>>>>>>>>
>>>>>>>> v2 -> v3:
>>>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>>>> CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>>> ---
>>>>>>>> drivers/vfio/vfio_iommu_type1.c | 44
>>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> @@ -37,6 +37,8 @@
>>>>>>>> #include <linux/vfio.h>
>>>>>>>> #include <linux/workqueue.h>
>>>>>>>> #include <linux/msi-iommu.h>
>>>>>>>> +#include <linux/irqdomain.h>
>>>>>>>> +#include <linux/msi.h>
>>>>>>>>
>>>>>>>> #define DRIVER_VERSION "0.2"
>>>>>>>> #define DRIVER_AUTHOR "Alex Williamson
>>>>>>>> <alex.williamson@redhat.com>"
>>>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>>>> void *data)
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>>>> + * the device is attached to is safe with respect to MSI
>>>>>>>> isolation.
>>>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>>>> + *
>>>>>>>> + * @dev: device handle
>>>>>>>> + * @data: unused
>>>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>>>> + */
>>>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>>>> +{
>>>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>>> + struct irq_domain *domain;
>>>>>>>> + struct msi_domain_info *info;
>>>>>>>> +
>>>>>>>> + domain = dev_get_msi_domain(dev);
>>>>>>>> + if (!domain)
>>>>>>>> + return 0;
>>>>>>>> +
>>>>>>>> + info = msi_get_domain_info(domain);
>>>>>>>> +
>>>>>>>> + if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>>>> + return -1;
>>>>>>>> +#endif
>>>>>>>> + return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>>>> struct vfio_domain *domain)
>>>>>>>> {
>>>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>> struct vfio_group *group, *g;
>>>>>>>> struct vfio_domain *domain, *d;
>>>>>>>> struct bus_type *bus = NULL;
>>>>>>>> - int ret;
>>>>>>>> + int ret, safe_irq_domains;
>>>>>>>>
>>>>>>>> mutex_lock(&iommu->lock);
>>>>>>>>
>>>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>>
>>>>>>>> group->iommu_group = iommu_group;
>>>>>>>>
>>>>>>>> + /*
>>>>>>>> + * Determine if all the devices of the group have a safe irq
>>>>>>>> domain
>>>>>>>> + * with respect to MSI isolation
>>>>>>>> + */
>>>>>>>> + safe_irq_domains = !iommu_group_for_each_dev(iommu_group,
>>>>>>>> &bus,
>>>>>>>> + vfio_safe_irq_domain);
>>>>>>>> +
>>>>>>>> /* Determine bus_type in order to allocate a domain */
>>>>>>>> ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>>> vfio_bus_type);
>>>>>>>> if (ret)
>>>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>> INIT_LIST_HEAD(&domain->group_list);
>>>>>>>> list_add(&group->next, &domain->group_list);
>>>>>>>>
>>>>>>>> + /*
>>>>>>>> + * to advertise safe interrupts either the IOMMU or the MSI
>>>>>>>> controllers
>>>>>>>> + * must support IRQ remapping/interrupt translation
>>>>>>>> + */
>>>>>>>> if (!allow_unsafe_interrupts &&
>>>>>>>> - !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>>>> + (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>>>> !safe_irq_domains)) {
>>>>>>>> pr_warn("%s: No interrupt remapping support. Use the
>>>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>>>> support on this platform\n",
>>>>>>>> __func__);
>>>>>>>> ret = -EPERM;
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 111+ messages in thread
* [PATCH v9 6/7] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-04 11:54 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger-qxv4g6HH51o, eric.auger-QSEj5FYQhm4dnm+yROfE0A,
robin.murphy-5wv7dgnIgG8, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A,
tglx-hfZtesqFncYOwBW4kG4KsQ, jason-NLaQJdtUoK4Be96aLqz0jA,
marc.zyngier-5wv7dgnIgG8, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Cc: julien.grall-5wv7dgnIgG8, patches-QSEj5FYQhm4dnm+yROfE0A,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Do not advertise IOMMU_CAP_INTR_REMAP for arm-smmu(-v3). Indeed the
irq_remapping capability is abstracted on irqchip side for ARM as
opposed to Intel IOMMU featuring IRQ remapping HW.
So to check IRQ remapping capability, the msi domain needs to be
checked instead.
This commit needs to be applied after "vfio/type1: also check IRQ
remapping capability at msi domain" else the legacy interrupt
assignment gets broken with arm-smmu.
Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
drivers/iommu/arm-smmu-v3.c | 3 ++-
drivers/iommu/arm-smmu.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b5d9826..778212c 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1386,7 +1386,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
case IOMMU_CAP_CACHE_COHERENCY:
return true;
case IOMMU_CAP_INTR_REMAP:
- return true; /* MSIs are just memory writes */
+ /* interrupt translation handled at MSI controller level */
+ return false;
case IOMMU_CAP_NOEXEC:
return true;
default:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 0908985..e7c9e89 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1320,7 +1320,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
*/
return true;
case IOMMU_CAP_INTR_REMAP:
- return true; /* MSIs are just memory writes */
+ /* interrupt translation handled at MSI controller level */
+ return false;
case IOMMU_CAP_NOEXEC:
return true;
default:
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v9 6/7] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger, eric.auger, robin.murphy, alex.williamson,
will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
linux-arm-kernel
Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday
Do not advertise IOMMU_CAP_INTR_REMAP for arm-smmu(-v3). Indeed the
irq_remapping capability is abstracted on irqchip side for ARM as
opposed to Intel IOMMU featuring IRQ remapping HW.
So to check IRQ remapping capability, the msi domain needs to be
checked instead.
This commit needs to be applied after "vfio/type1: also check IRQ
remapping capability at msi domain" else the legacy interrupt
assignment gets broken with arm-smmu.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
drivers/iommu/arm-smmu-v3.c | 3 ++-
drivers/iommu/arm-smmu.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b5d9826..778212c 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1386,7 +1386,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
case IOMMU_CAP_CACHE_COHERENCY:
return true;
case IOMMU_CAP_INTR_REMAP:
- return true; /* MSIs are just memory writes */
+ /* interrupt translation handled at MSI controller level */
+ return false;
case IOMMU_CAP_NOEXEC:
return true;
default:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 0908985..e7c9e89 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1320,7 +1320,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
*/
return true;
case IOMMU_CAP_INTR_REMAP:
- return true; /* MSIs are just memory writes */
+ /* interrupt translation handled at MSI controller level */
+ return false;
case IOMMU_CAP_NOEXEC:
return true;
default:
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v9 6/7] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: linux-arm-kernel
Do not advertise IOMMU_CAP_INTR_REMAP for arm-smmu(-v3). Indeed the
irq_remapping capability is abstracted on irqchip side for ARM as
opposed to Intel IOMMU featuring IRQ remapping HW.
So to check IRQ remapping capability, the msi domain needs to be
checked instead.
This commit needs to be applied after "vfio/type1: also check IRQ
remapping capability at msi domain" else the legacy interrupt
assignment gets broken with arm-smmu.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
drivers/iommu/arm-smmu-v3.c | 3 ++-
drivers/iommu/arm-smmu.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b5d9826..778212c 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1386,7 +1386,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
case IOMMU_CAP_CACHE_COHERENCY:
return true;
case IOMMU_CAP_INTR_REMAP:
- return true; /* MSIs are just memory writes */
+ /* interrupt translation handled at MSI controller level */
+ return false;
case IOMMU_CAP_NOEXEC:
return true;
default:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 0908985..e7c9e89 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1320,7 +1320,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
*/
return true;
case IOMMU_CAP_INTR_REMAP:
- return true; /* MSIs are just memory writes */
+ /* interrupt translation handled at MSI controller level */
+ return false;
case IOMMU_CAP_NOEXEC:
return true;
default:
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-04 11:54 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger-qxv4g6HH51o, eric.auger-QSEj5FYQhm4dnm+yROfE0A,
robin.murphy-5wv7dgnIgG8, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A,
tglx-hfZtesqFncYOwBW4kG4KsQ, jason-NLaQJdtUoK4Be96aLqz0jA,
marc.zyngier-5wv7dgnIgG8, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Cc: julien.grall-5wv7dgnIgG8, patches-QSEj5FYQhm4dnm+yROfE0A,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
This patch allows the user-space to retrieve the MSI geometry. The
implementation is based on capability chains, now also added to
VFIO_IOMMU_GET_INFO.
The returned info comprise:
- whether the MSI IOVA are constrained to a reserved range (x86 case) and
in the positive, the start/end of the aperture,
- or whether the IOVA aperture need to be set by the userspace. In that
case, the size and alignment of the IOVA region to be provided are
returned.
In case the userspace must provide the IOVA range, we currently return
an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
current ARM platforms. This may be deprecated by a more sophisticated
computation later on.
Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
v8 -> v9:
- use iommu_msi_supported flag instead of programmable
- replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
capability chain, reporting the MSI geometry
v7 -> v8:
- use iommu_domain_msi_geometry
v6 -> v7:
- remove the computation of the number of IOVA pages to be provisionned.
This number depends on the domain/group/device topology which can
dynamically change. Let's rely instead rely on an arbitrary max depending
on the system
v4 -> v5:
- move msi_info and ret declaration within the conditional code
v3 -> v4:
- replace former vfio_domains_require_msi_mapping by
more complex computation of MSI mapping requirements, especially the
number of pages to be provided by the user-space.
- reword patch title
RFC v1 -> v1:
- derived from
[RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
- renamed allow_msi_reconfig into require_msi_mapping
- fixed VFIO_IOMMU_GET_INFO
---
drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
include/uapi/linux/vfio.h | 30 +++++++++++++++++-
2 files changed, 98 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2fc8197..841360b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
return ret;
}
+static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
+ struct vfio_info_cap *caps)
+{
+ struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
+ struct iommu_domain_msi_geometry msi_geometry;
+ struct vfio_info_cap_header *header;
+ struct vfio_domain *d;
+ bool mapping_required;
+ size_t size;
+
+ mutex_lock(&iommu->lock);
+ /* All domains have same require_msi_map property, pick first */
+ d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
+ iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
+ &msi_geometry);
+ mapping_required = msi_geometry.iommu_msi_supported;
+
+ mutex_unlock(&iommu->lock);
+
+ size = sizeof(*vfio_msi_geometry);
+ header = vfio_info_cap_add(caps, size,
+ VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
+
+ if (IS_ERR(header))
+ return PTR_ERR(header);
+
+ vfio_msi_geometry = container_of(header,
+ struct vfio_iommu_type1_info_cap_msi_geometry,
+ header);
+
+ vfio_msi_geometry->reserved = !mapping_required;
+ if (vfio_msi_geometry->reserved) {
+ vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
+ vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
+ return 0;
+ }
+
+ vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
+ /* we currently report the need for an arbitray number of 16 pages */
+ vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
+
+ return 0;
+}
+
static long vfio_iommu_type1_ioctl(void *iommu_data,
unsigned int cmd, unsigned long arg)
{
@@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
}
} else if (cmd == VFIO_IOMMU_GET_INFO) {
struct vfio_iommu_type1_info info;
+ struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+ int ret;
minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
@@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
+ ret = compute_msi_geometry_caps(iommu, &caps);
+ if (ret)
+ return ret;
+
+ if (caps.size) {
+ info.flags |= VFIO_IOMMU_INFO_CAPS;
+ if (info.argsz < sizeof(info) + caps.size) {
+ info.argsz = sizeof(info) + caps.size;
+ info.cap_offset = 0;
+ } else {
+ vfio_info_cap_shift(&caps, sizeof(info));
+ if (copy_to_user((void __user *)arg +
+ sizeof(info), caps.buf,
+ caps.size)) {
+ kfree(caps.buf);
+ return -EFAULT;
+ }
+ info.cap_offset = sizeof(info);
+ }
+
+ kfree(caps.buf);
+ }
+
return copy_to_user((void __user *)arg, &info, minsz) ?
-EFAULT : 0;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 4a9dbc2..0ff6a8d 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
__u32 argsz;
__u32 flags;
#define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
- __u64 iova_pgsizes; /* Bitmap of supported page sizes */
+#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
+ __u32 cap_offset; /* Offset within info struct of first cap */
+ __u64 iova_pgsizes; /* Bitmap of supported page sizes */
+};
+
+#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
+
+/*
+ * The MSI geometry capability allows to report the MSI IOVA geometry:
+ * - either the MSI IOVAs are constrained within a reserved IOVA aperture
+ * whose boundaries are given by [@aperture_start, @aperture_end].
+ * this is typically the case on x86 host. The userspace is not allowed
+ * to map userspace memory at IOVAs intersecting this range using
+ * VFIO_IOMMU_MAP_DMA.
+ * - or the MSI IOVAs are not requested to belong to any reserved range;
+ * in that case the userspace must provide an IOVA window characterized by
+ * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
+ */
+struct vfio_iommu_type1_info_cap_msi_geometry {
+ struct vfio_info_cap_header header;
+ bool reserved; /* Are MSI IOVAs within a reserved aperture? */
+ /* reserved */
+ __u64 aperture_start;
+ __u64 aperture_end;
+ /* not reserved */
+ __u64 size; /* IOVA aperture size in bytes the userspace must provide */
+ __u64 alignment; /* alignment of the window, in bytes */
};
#define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
@@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
* IOVA region that will be used on some platforms to map the host MSI frames.
* In that specific case, vaddr is ignored. Once registered, an MSI reserved
* IOVA region stays until the container is closed.
+ * The requirement for provisioning such reserved IOVA range can be checked by
+ * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
*/
struct vfio_iommu_type1_dma_map {
__u32 argsz;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: eric.auger, eric.auger, robin.murphy, alex.williamson,
will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
linux-arm-kernel
Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday
This patch allows the user-space to retrieve the MSI geometry. The
implementation is based on capability chains, now also added to
VFIO_IOMMU_GET_INFO.
The returned info comprise:
- whether the MSI IOVA are constrained to a reserved range (x86 case) and
in the positive, the start/end of the aperture,
- or whether the IOVA aperture need to be set by the userspace. In that
case, the size and alignment of the IOVA region to be provided are
returned.
In case the userspace must provide the IOVA range, we currently return
an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
current ARM platforms. This may be deprecated by a more sophisticated
computation later on.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
v8 -> v9:
- use iommu_msi_supported flag instead of programmable
- replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
capability chain, reporting the MSI geometry
v7 -> v8:
- use iommu_domain_msi_geometry
v6 -> v7:
- remove the computation of the number of IOVA pages to be provisionned.
This number depends on the domain/group/device topology which can
dynamically change. Let's rely instead rely on an arbitrary max depending
on the system
v4 -> v5:
- move msi_info and ret declaration within the conditional code
v3 -> v4:
- replace former vfio_domains_require_msi_mapping by
more complex computation of MSI mapping requirements, especially the
number of pages to be provided by the user-space.
- reword patch title
RFC v1 -> v1:
- derived from
[RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
- renamed allow_msi_reconfig into require_msi_mapping
- fixed VFIO_IOMMU_GET_INFO
---
drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
include/uapi/linux/vfio.h | 30 +++++++++++++++++-
2 files changed, 98 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2fc8197..841360b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
return ret;
}
+static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
+ struct vfio_info_cap *caps)
+{
+ struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
+ struct iommu_domain_msi_geometry msi_geometry;
+ struct vfio_info_cap_header *header;
+ struct vfio_domain *d;
+ bool mapping_required;
+ size_t size;
+
+ mutex_lock(&iommu->lock);
+ /* All domains have same require_msi_map property, pick first */
+ d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
+ iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
+ &msi_geometry);
+ mapping_required = msi_geometry.iommu_msi_supported;
+
+ mutex_unlock(&iommu->lock);
+
+ size = sizeof(*vfio_msi_geometry);
+ header = vfio_info_cap_add(caps, size,
+ VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
+
+ if (IS_ERR(header))
+ return PTR_ERR(header);
+
+ vfio_msi_geometry = container_of(header,
+ struct vfio_iommu_type1_info_cap_msi_geometry,
+ header);
+
+ vfio_msi_geometry->reserved = !mapping_required;
+ if (vfio_msi_geometry->reserved) {
+ vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
+ vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
+ return 0;
+ }
+
+ vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
+ /* we currently report the need for an arbitray number of 16 pages */
+ vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
+
+ return 0;
+}
+
static long vfio_iommu_type1_ioctl(void *iommu_data,
unsigned int cmd, unsigned long arg)
{
@@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
}
} else if (cmd == VFIO_IOMMU_GET_INFO) {
struct vfio_iommu_type1_info info;
+ struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+ int ret;
minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
@@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
+ ret = compute_msi_geometry_caps(iommu, &caps);
+ if (ret)
+ return ret;
+
+ if (caps.size) {
+ info.flags |= VFIO_IOMMU_INFO_CAPS;
+ if (info.argsz < sizeof(info) + caps.size) {
+ info.argsz = sizeof(info) + caps.size;
+ info.cap_offset = 0;
+ } else {
+ vfio_info_cap_shift(&caps, sizeof(info));
+ if (copy_to_user((void __user *)arg +
+ sizeof(info), caps.buf,
+ caps.size)) {
+ kfree(caps.buf);
+ return -EFAULT;
+ }
+ info.cap_offset = sizeof(info);
+ }
+
+ kfree(caps.buf);
+ }
+
return copy_to_user((void __user *)arg, &info, minsz) ?
-EFAULT : 0;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 4a9dbc2..0ff6a8d 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
__u32 argsz;
__u32 flags;
#define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
- __u64 iova_pgsizes; /* Bitmap of supported page sizes */
+#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
+ __u32 cap_offset; /* Offset within info struct of first cap */
+ __u64 iova_pgsizes; /* Bitmap of supported page sizes */
+};
+
+#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
+
+/*
+ * The MSI geometry capability allows to report the MSI IOVA geometry:
+ * - either the MSI IOVAs are constrained within a reserved IOVA aperture
+ * whose boundaries are given by [@aperture_start, @aperture_end].
+ * this is typically the case on x86 host. The userspace is not allowed
+ * to map userspace memory at IOVAs intersecting this range using
+ * VFIO_IOMMU_MAP_DMA.
+ * - or the MSI IOVAs are not requested to belong to any reserved range;
+ * in that case the userspace must provide an IOVA window characterized by
+ * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
+ */
+struct vfio_iommu_type1_info_cap_msi_geometry {
+ struct vfio_info_cap_header header;
+ bool reserved; /* Are MSI IOVAs within a reserved aperture? */
+ /* reserved */
+ __u64 aperture_start;
+ __u64 aperture_end;
+ /* not reserved */
+ __u64 size; /* IOVA aperture size in bytes the userspace must provide */
+ __u64 alignment; /* alignment of the window, in bytes */
};
#define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
@@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
* IOVA region that will be used on some platforms to map the host MSI frames.
* In that specific case, vaddr is ignored. Once registered, an MSI reserved
* IOVA region stays until the container is closed.
+ * The requirement for provisioning such reserved IOVA range can be checked by
+ * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
*/
struct vfio_iommu_type1_dma_map {
__u32 argsz;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread* [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-04 11:54 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
To: linux-arm-kernel
This patch allows the user-space to retrieve the MSI geometry. The
implementation is based on capability chains, now also added to
VFIO_IOMMU_GET_INFO.
The returned info comprise:
- whether the MSI IOVA are constrained to a reserved range (x86 case) and
in the positive, the start/end of the aperture,
- or whether the IOVA aperture need to be set by the userspace. In that
case, the size and alignment of the IOVA region to be provided are
returned.
In case the userspace must provide the IOVA range, we currently return
an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
current ARM platforms. This may be deprecated by a more sophisticated
computation later on.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
v8 -> v9:
- use iommu_msi_supported flag instead of programmable
- replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
capability chain, reporting the MSI geometry
v7 -> v8:
- use iommu_domain_msi_geometry
v6 -> v7:
- remove the computation of the number of IOVA pages to be provisionned.
This number depends on the domain/group/device topology which can
dynamically change. Let's rely instead rely on an arbitrary max depending
on the system
v4 -> v5:
- move msi_info and ret declaration within the conditional code
v3 -> v4:
- replace former vfio_domains_require_msi_mapping by
more complex computation of MSI mapping requirements, especially the
number of pages to be provided by the user-space.
- reword patch title
RFC v1 -> v1:
- derived from
[RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
- renamed allow_msi_reconfig into require_msi_mapping
- fixed VFIO_IOMMU_GET_INFO
---
drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
include/uapi/linux/vfio.h | 30 +++++++++++++++++-
2 files changed, 98 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2fc8197..841360b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
return ret;
}
+static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
+ struct vfio_info_cap *caps)
+{
+ struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
+ struct iommu_domain_msi_geometry msi_geometry;
+ struct vfio_info_cap_header *header;
+ struct vfio_domain *d;
+ bool mapping_required;
+ size_t size;
+
+ mutex_lock(&iommu->lock);
+ /* All domains have same require_msi_map property, pick first */
+ d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
+ iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
+ &msi_geometry);
+ mapping_required = msi_geometry.iommu_msi_supported;
+
+ mutex_unlock(&iommu->lock);
+
+ size = sizeof(*vfio_msi_geometry);
+ header = vfio_info_cap_add(caps, size,
+ VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
+
+ if (IS_ERR(header))
+ return PTR_ERR(header);
+
+ vfio_msi_geometry = container_of(header,
+ struct vfio_iommu_type1_info_cap_msi_geometry,
+ header);
+
+ vfio_msi_geometry->reserved = !mapping_required;
+ if (vfio_msi_geometry->reserved) {
+ vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
+ vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
+ return 0;
+ }
+
+ vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
+ /* we currently report the need for an arbitray number of 16 pages */
+ vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
+
+ return 0;
+}
+
static long vfio_iommu_type1_ioctl(void *iommu_data,
unsigned int cmd, unsigned long arg)
{
@@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
}
} else if (cmd == VFIO_IOMMU_GET_INFO) {
struct vfio_iommu_type1_info info;
+ struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+ int ret;
minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
@@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
+ ret = compute_msi_geometry_caps(iommu, &caps);
+ if (ret)
+ return ret;
+
+ if (caps.size) {
+ info.flags |= VFIO_IOMMU_INFO_CAPS;
+ if (info.argsz < sizeof(info) + caps.size) {
+ info.argsz = sizeof(info) + caps.size;
+ info.cap_offset = 0;
+ } else {
+ vfio_info_cap_shift(&caps, sizeof(info));
+ if (copy_to_user((void __user *)arg +
+ sizeof(info), caps.buf,
+ caps.size)) {
+ kfree(caps.buf);
+ return -EFAULT;
+ }
+ info.cap_offset = sizeof(info);
+ }
+
+ kfree(caps.buf);
+ }
+
return copy_to_user((void __user *)arg, &info, minsz) ?
-EFAULT : 0;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 4a9dbc2..0ff6a8d 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
__u32 argsz;
__u32 flags;
#define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
- __u64 iova_pgsizes; /* Bitmap of supported page sizes */
+#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
+ __u32 cap_offset; /* Offset within info struct of first cap */
+ __u64 iova_pgsizes; /* Bitmap of supported page sizes */
+};
+
+#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
+
+/*
+ * The MSI geometry capability allows to report the MSI IOVA geometry:
+ * - either the MSI IOVAs are constrained within a reserved IOVA aperture
+ * whose boundaries are given by [@aperture_start, @aperture_end].
+ * this is typically the case on x86 host. The userspace is not allowed
+ * to map userspace memory@IOVAs intersecting this range using
+ * VFIO_IOMMU_MAP_DMA.
+ * - or the MSI IOVAs are not requested to belong to any reserved range;
+ * in that case the userspace must provide an IOVA window characterized by
+ * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
+ */
+struct vfio_iommu_type1_info_cap_msi_geometry {
+ struct vfio_info_cap_header header;
+ bool reserved; /* Are MSI IOVAs within a reserved aperture? */
+ /* reserved */
+ __u64 aperture_start;
+ __u64 aperture_end;
+ /* not reserved */
+ __u64 size; /* IOVA aperture size in bytes the userspace must provide */
+ __u64 alignment; /* alignment of the window, in bytes */
};
#define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
@@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
* IOVA region that will be used on some platforms to map the host MSI frames.
* In that specific case, vaddr is ignored. Once registered, an MSI reserved
* IOVA region stays until the container is closed.
+ * The requirement for provisioning such reserved IOVA range can be checked by
+ * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
*/
struct vfio_iommu_type1_dma_map {
__u32 argsz;
--
1.9.1
^ permalink raw reply related [flat|nested] 111+ messages in thread[parent not found: <1462362858-2925-8-git-send-email-eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-04 12:06 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 12:06 UTC (permalink / raw)
To: eric.auger-qxv4g6HH51o, robin.murphy-5wv7dgnIgG8,
alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
jason-NLaQJdtUoK4Be96aLqz0jA, marc.zyngier-5wv7dgnIgG8,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Cc: julien.grall-5wv7dgnIgG8, patches-QSEj5FYQhm4dnm+yROfE0A,
p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Hi Alex,
On 05/04/2016 01:54 PM, Eric Auger wrote:
> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.
If you prefer we could consider this patch outside of the main series
since it brings extra functionalities (MSI geometry reporting). In a
first QEMU integration we would live without knowing the MSI geometry I
think, all the more so I currently report an arbitrary number of
requested IOVA pages. The computation of the exact number of doorbells
to map brings extra complexity and I did not address this issue yet.
It sketches a possible user API to report the MSI geometry based on the
capability chains, as you suggested some time ago. I am currently busy
drafting a QEMU integration.
Best Regards
Eric
>
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
> case, the size and alignment of the IOVA region to be provided are
> returned.
>
> In case the userspace must provide the IOVA range, we currently return
> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
> current ARM platforms. This may be deprecated by a more sophisticated
> computation later on.
>
> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>
> ---
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> capability chain, reporting the MSI geometry
>
> v7 -> v8:
> - use iommu_domain_msi_geometry
>
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
> This number depends on the domain/group/device topology which can
> dynamically change. Let's rely instead rely on an arbitrary max depending
> on the system
>
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
>
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
> more complex computation of MSI mapping requirements, especially the
> number of pages to be provided by the user-space.
> - reword patch title
>
> RFC v1 -> v1:
> - derived from
> [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
> drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 30 +++++++++++++++++-
> 2 files changed, 98 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2fc8197..841360b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> return ret;
> }
>
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> + struct vfio_info_cap *caps)
> +{
> + struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> + struct iommu_domain_msi_geometry msi_geometry;
> + struct vfio_info_cap_header *header;
> + struct vfio_domain *d;
> + bool mapping_required;
> + size_t size;
> +
> + mutex_lock(&iommu->lock);
> + /* All domains have same require_msi_map property, pick first */
> + d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> + iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> + &msi_geometry);
> + mapping_required = msi_geometry.iommu_msi_supported;
> +
> + mutex_unlock(&iommu->lock);
> +
> + size = sizeof(*vfio_msi_geometry);
> + header = vfio_info_cap_add(caps, size,
> + VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> +
> + vfio_msi_geometry = container_of(header,
> + struct vfio_iommu_type1_info_cap_msi_geometry,
> + header);
> +
> + vfio_msi_geometry->reserved = !mapping_required;
> + if (vfio_msi_geometry->reserved) {
> + vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> + vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
> + return 0;
> + }
> +
> + vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
> + /* we currently report the need for an arbitray number of 16 pages */
> + vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
> +
> + return 0;
> +}
> +
> static long vfio_iommu_type1_ioctl(void *iommu_data,
> unsigned int cmd, unsigned long arg)
> {
> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> }
> } else if (cmd == VFIO_IOMMU_GET_INFO) {
> struct vfio_iommu_type1_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + int ret;
>
> minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>
> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>
> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>
> + ret = compute_msi_geometry_caps(iommu, &caps);
> + if (ret)
> + return ret;
> +
> + if (caps.size) {
> + info.flags |= VFIO_IOMMU_INFO_CAPS;
> + if (info.argsz < sizeof(info) + caps.size) {
> + info.argsz = sizeof(info) + caps.size;
> + info.cap_offset = 0;
> + } else {
> + vfio_info_cap_shift(&caps, sizeof(info));
> + if (copy_to_user((void __user *)arg +
> + sizeof(info), caps.buf,
> + caps.size)) {
> + kfree(caps.buf);
> + return -EFAULT;
> + }
> + info.cap_offset = sizeof(info);
> + }
> +
> + kfree(caps.buf);
> + }
> +
> return copy_to_user((void __user *)arg, &info, minsz) ?
> -EFAULT : 0;
>
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..0ff6a8d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
> __u32 argsz;
> __u32 flags;
> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
> - __u64 iova_pgsizes; /* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
> + __u32 cap_offset; /* Offset within info struct of first cap */
> + __u64 iova_pgsizes; /* Bitmap of supported page sizes */
> +};
> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + * whose boundaries are given by [@aperture_start, @aperture_end].
> + * this is typically the case on x86 host. The userspace is not allowed
> + * to map userspace memory at IOVAs intersecting this range using
> + * VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + * in that case the userspace must provide an IOVA window characterized by
> + * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> + struct vfio_info_cap_header header;
> + bool reserved; /* Are MSI IOVAs within a reserved aperture? */
> + /* reserved */
> + __u64 aperture_start;
> + __u64 aperture_end;
> + /* not reserved */
> + __u64 size; /* IOVA aperture size in bytes the userspace must provide */
> + __u64 alignment; /* alignment of the window, in bytes */
> };
>
> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
> * IOVA region that will be used on some platforms to map the host MSI frames.
> * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-04 12:06 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 12:06 UTC (permalink / raw)
To: eric.auger, robin.murphy, alex.williamson, will.deacon, joro,
tglx, jason, marc.zyngier, christoffer.dall, linux-arm-kernel
Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday
Hi Alex,
On 05/04/2016 01:54 PM, Eric Auger wrote:
> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.
If you prefer we could consider this patch outside of the main series
since it brings extra functionalities (MSI geometry reporting). In a
first QEMU integration we would live without knowing the MSI geometry I
think, all the more so I currently report an arbitrary number of
requested IOVA pages. The computation of the exact number of doorbells
to map brings extra complexity and I did not address this issue yet.
It sketches a possible user API to report the MSI geometry based on the
capability chains, as you suggested some time ago. I am currently busy
drafting a QEMU integration.
Best Regards
Eric
>
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
> case, the size and alignment of the IOVA region to be provided are
> returned.
>
> In case the userspace must provide the IOVA range, we currently return
> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
> current ARM platforms. This may be deprecated by a more sophisticated
> computation later on.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> ---
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> capability chain, reporting the MSI geometry
>
> v7 -> v8:
> - use iommu_domain_msi_geometry
>
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
> This number depends on the domain/group/device topology which can
> dynamically change. Let's rely instead rely on an arbitrary max depending
> on the system
>
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
>
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
> more complex computation of MSI mapping requirements, especially the
> number of pages to be provided by the user-space.
> - reword patch title
>
> RFC v1 -> v1:
> - derived from
> [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
> drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 30 +++++++++++++++++-
> 2 files changed, 98 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2fc8197..841360b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> return ret;
> }
>
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> + struct vfio_info_cap *caps)
> +{
> + struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> + struct iommu_domain_msi_geometry msi_geometry;
> + struct vfio_info_cap_header *header;
> + struct vfio_domain *d;
> + bool mapping_required;
> + size_t size;
> +
> + mutex_lock(&iommu->lock);
> + /* All domains have same require_msi_map property, pick first */
> + d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> + iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> + &msi_geometry);
> + mapping_required = msi_geometry.iommu_msi_supported;
> +
> + mutex_unlock(&iommu->lock);
> +
> + size = sizeof(*vfio_msi_geometry);
> + header = vfio_info_cap_add(caps, size,
> + VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> +
> + vfio_msi_geometry = container_of(header,
> + struct vfio_iommu_type1_info_cap_msi_geometry,
> + header);
> +
> + vfio_msi_geometry->reserved = !mapping_required;
> + if (vfio_msi_geometry->reserved) {
> + vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> + vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
> + return 0;
> + }
> +
> + vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
> + /* we currently report the need for an arbitray number of 16 pages */
> + vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
> +
> + return 0;
> +}
> +
> static long vfio_iommu_type1_ioctl(void *iommu_data,
> unsigned int cmd, unsigned long arg)
> {
> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> }
> } else if (cmd == VFIO_IOMMU_GET_INFO) {
> struct vfio_iommu_type1_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + int ret;
>
> minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>
> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>
> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>
> + ret = compute_msi_geometry_caps(iommu, &caps);
> + if (ret)
> + return ret;
> +
> + if (caps.size) {
> + info.flags |= VFIO_IOMMU_INFO_CAPS;
> + if (info.argsz < sizeof(info) + caps.size) {
> + info.argsz = sizeof(info) + caps.size;
> + info.cap_offset = 0;
> + } else {
> + vfio_info_cap_shift(&caps, sizeof(info));
> + if (copy_to_user((void __user *)arg +
> + sizeof(info), caps.buf,
> + caps.size)) {
> + kfree(caps.buf);
> + return -EFAULT;
> + }
> + info.cap_offset = sizeof(info);
> + }
> +
> + kfree(caps.buf);
> + }
> +
> return copy_to_user((void __user *)arg, &info, minsz) ?
> -EFAULT : 0;
>
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..0ff6a8d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
> __u32 argsz;
> __u32 flags;
> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
> - __u64 iova_pgsizes; /* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
> + __u32 cap_offset; /* Offset within info struct of first cap */
> + __u64 iova_pgsizes; /* Bitmap of supported page sizes */
> +};
> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + * whose boundaries are given by [@aperture_start, @aperture_end].
> + * this is typically the case on x86 host. The userspace is not allowed
> + * to map userspace memory at IOVAs intersecting this range using
> + * VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + * in that case the userspace must provide an IOVA window characterized by
> + * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> + struct vfio_info_cap_header header;
> + bool reserved; /* Are MSI IOVAs within a reserved aperture? */
> + /* reserved */
> + __u64 aperture_start;
> + __u64 aperture_end;
> + /* not reserved */
> + __u64 size; /* IOVA aperture size in bytes the userspace must provide */
> + __u64 alignment; /* alignment of the window, in bytes */
> };
>
> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
> * IOVA region that will be used on some platforms to map the host MSI frames.
> * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-04 12:06 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-04 12:06 UTC (permalink / raw)
To: linux-arm-kernel
Hi Alex,
On 05/04/2016 01:54 PM, Eric Auger wrote:
> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.
If you prefer we could consider this patch outside of the main series
since it brings extra functionalities (MSI geometry reporting). In a
first QEMU integration we would live without knowing the MSI geometry I
think, all the more so I currently report an arbitrary number of
requested IOVA pages. The computation of the exact number of doorbells
to map brings extra complexity and I did not address this issue yet.
It sketches a possible user API to report the MSI geometry based on the
capability chains, as you suggested some time ago. I am currently busy
drafting a QEMU integration.
Best Regards
Eric
>
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
> case, the size and alignment of the IOVA region to be provided are
> returned.
>
> In case the userspace must provide the IOVA range, we currently return
> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
> current ARM platforms. This may be deprecated by a more sophisticated
> computation later on.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> ---
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> capability chain, reporting the MSI geometry
>
> v7 -> v8:
> - use iommu_domain_msi_geometry
>
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
> This number depends on the domain/group/device topology which can
> dynamically change. Let's rely instead rely on an arbitrary max depending
> on the system
>
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
>
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
> more complex computation of MSI mapping requirements, especially the
> number of pages to be provided by the user-space.
> - reword patch title
>
> RFC v1 -> v1:
> - derived from
> [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
> drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 30 +++++++++++++++++-
> 2 files changed, 98 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2fc8197..841360b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> return ret;
> }
>
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> + struct vfio_info_cap *caps)
> +{
> + struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> + struct iommu_domain_msi_geometry msi_geometry;
> + struct vfio_info_cap_header *header;
> + struct vfio_domain *d;
> + bool mapping_required;
> + size_t size;
> +
> + mutex_lock(&iommu->lock);
> + /* All domains have same require_msi_map property, pick first */
> + d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> + iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> + &msi_geometry);
> + mapping_required = msi_geometry.iommu_msi_supported;
> +
> + mutex_unlock(&iommu->lock);
> +
> + size = sizeof(*vfio_msi_geometry);
> + header = vfio_info_cap_add(caps, size,
> + VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> +
> + vfio_msi_geometry = container_of(header,
> + struct vfio_iommu_type1_info_cap_msi_geometry,
> + header);
> +
> + vfio_msi_geometry->reserved = !mapping_required;
> + if (vfio_msi_geometry->reserved) {
> + vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> + vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
> + return 0;
> + }
> +
> + vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
> + /* we currently report the need for an arbitray number of 16 pages */
> + vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
> +
> + return 0;
> +}
> +
> static long vfio_iommu_type1_ioctl(void *iommu_data,
> unsigned int cmd, unsigned long arg)
> {
> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> }
> } else if (cmd == VFIO_IOMMU_GET_INFO) {
> struct vfio_iommu_type1_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + int ret;
>
> minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>
> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>
> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>
> + ret = compute_msi_geometry_caps(iommu, &caps);
> + if (ret)
> + return ret;
> +
> + if (caps.size) {
> + info.flags |= VFIO_IOMMU_INFO_CAPS;
> + if (info.argsz < sizeof(info) + caps.size) {
> + info.argsz = sizeof(info) + caps.size;
> + info.cap_offset = 0;
> + } else {
> + vfio_info_cap_shift(&caps, sizeof(info));
> + if (copy_to_user((void __user *)arg +
> + sizeof(info), caps.buf,
> + caps.size)) {
> + kfree(caps.buf);
> + return -EFAULT;
> + }
> + info.cap_offset = sizeof(info);
> + }
> +
> + kfree(caps.buf);
> + }
> +
> return copy_to_user((void __user *)arg, &info, minsz) ?
> -EFAULT : 0;
>
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..0ff6a8d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
> __u32 argsz;
> __u32 flags;
> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
> - __u64 iova_pgsizes; /* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
> + __u32 cap_offset; /* Offset within info struct of first cap */
> + __u64 iova_pgsizes; /* Bitmap of supported page sizes */
> +};
> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + * whose boundaries are given by [@aperture_start, @aperture_end].
> + * this is typically the case on x86 host. The userspace is not allowed
> + * to map userspace memory at IOVAs intersecting this range using
> + * VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + * in that case the userspace must provide an IOVA window characterized by
> + * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> + struct vfio_info_cap_header header;
> + bool reserved; /* Are MSI IOVAs within a reserved aperture? */
> + /* reserved */
> + __u64 aperture_start;
> + __u64 aperture_end;
> + /* not reserved */
> + __u64 size; /* IOVA aperture size in bytes the userspace must provide */
> + __u64 alignment; /* alignment of the window, in bytes */
> };
>
> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
> * IOVA region that will be used on some platforms to map the host MSI frames.
> * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
>
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <5729E5BB.1030101-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
2016-05-04 12:06 ` Eric Auger
(?)
@ 2016-05-09 23:03 ` Alex Williamson
-1 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 23:03 UTC (permalink / raw)
To: Eric Auger
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
On Wed, 4 May 2016 14:06:19 +0200
Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> Hi Alex,
> On 05/04/2016 01:54 PM, Eric Auger wrote:
> > This patch allows the user-space to retrieve the MSI geometry. The
> > implementation is based on capability chains, now also added to
> > VFIO_IOMMU_GET_INFO.
>
> If you prefer we could consider this patch outside of the main series
> since it brings extra functionalities (MSI geometry reporting). In a
> first QEMU integration we would live without knowing the MSI geometry I
> think, all the more so I currently report an arbitrary number of
> requested IOVA pages. The computation of the exact number of doorbells
> to map brings extra complexity and I did not address this issue yet.
>
> It sketches a possible user API to report the MSI geometry based on the
> capability chains, as you suggested some time ago. I am currently busy
> drafting a QEMU integration.
How would the user know that reserved MSI mappings are requires or
available without this? Wouldn't the only option be for userspace to
try to map something with the reserved MSI flag set and see if the
kernel accepts it? That's not a very desirable programming model. The
arbitrary size is pretty ugly, but it at least makes for a consistent
user interface. Is it a functional issue if we overestimate the size
or is it just a matter of wasting IOVA space? Is there significant
harm in making it obscenely large, like 1MB? The reference counting and
re-use of IOVA pages seems like we may often only be using a single
IOVA page for multiple doorbells. I guess I'm leaning towards defining
the API even if the value is somewhat arbitrary because we'd rather have
control of this rather than having the user guess and try to rope them
back in later to use a kernel recommended value. Thanks,
Alex
^ permalink raw reply [flat|nested] 111+ messages in thread
* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-09 23:03 ` Alex Williamson
0 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 23:03 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
On Wed, 4 May 2016 14:06:19 +0200
Eric Auger <eric.auger@linaro.org> wrote:
> Hi Alex,
> On 05/04/2016 01:54 PM, Eric Auger wrote:
> > This patch allows the user-space to retrieve the MSI geometry. The
> > implementation is based on capability chains, now also added to
> > VFIO_IOMMU_GET_INFO.
>
> If you prefer we could consider this patch outside of the main series
> since it brings extra functionalities (MSI geometry reporting). In a
> first QEMU integration we would live without knowing the MSI geometry I
> think, all the more so I currently report an arbitrary number of
> requested IOVA pages. The computation of the exact number of doorbells
> to map brings extra complexity and I did not address this issue yet.
>
> It sketches a possible user API to report the MSI geometry based on the
> capability chains, as you suggested some time ago. I am currently busy
> drafting a QEMU integration.
How would the user know that reserved MSI mappings are requires or
available without this? Wouldn't the only option be for userspace to
try to map something with the reserved MSI flag set and see if the
kernel accepts it? That's not a very desirable programming model. The
arbitrary size is pretty ugly, but it at least makes for a consistent
user interface. Is it a functional issue if we overestimate the size
or is it just a matter of wasting IOVA space? Is there significant
harm in making it obscenely large, like 1MB? The reference counting and
re-use of IOVA pages seems like we may often only be using a single
IOVA page for multiple doorbells. I guess I'm leaning towards defining
the API even if the value is somewhat arbitrary because we'd rather have
control of this rather than having the user guess and try to rope them
back in later to use a kernel recommended value. Thanks,
Alex
^ permalink raw reply [flat|nested] 111+ messages in thread
* [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-09 23:03 ` Alex Williamson
0 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 23:03 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 4 May 2016 14:06:19 +0200
Eric Auger <eric.auger@linaro.org> wrote:
> Hi Alex,
> On 05/04/2016 01:54 PM, Eric Auger wrote:
> > This patch allows the user-space to retrieve the MSI geometry. The
> > implementation is based on capability chains, now also added to
> > VFIO_IOMMU_GET_INFO.
>
> If you prefer we could consider this patch outside of the main series
> since it brings extra functionalities (MSI geometry reporting). In a
> first QEMU integration we would live without knowing the MSI geometry I
> think, all the more so I currently report an arbitrary number of
> requested IOVA pages. The computation of the exact number of doorbells
> to map brings extra complexity and I did not address this issue yet.
>
> It sketches a possible user API to report the MSI geometry based on the
> capability chains, as you suggested some time ago. I am currently busy
> drafting a QEMU integration.
How would the user know that reserved MSI mappings are requires or
available without this? Wouldn't the only option be for userspace to
try to map something with the reserved MSI flag set and see if the
kernel accepts it? That's not a very desirable programming model. The
arbitrary size is pretty ugly, but it at least makes for a consistent
user interface. Is it a functional issue if we overestimate the size
or is it just a matter of wasting IOVA space? Is there significant
harm in making it obscenely large, like 1MB? The reference counting and
re-use of IOVA pages seems like we may often only be using a single
IOVA page for multiple doorbells. I guess I'm leaning towards defining
the API even if the value is somewhat arbitrary because we'd rather have
control of this rather than having the user guess and try to rope them
back in later to use a kernel recommended value. Thanks,
Alex
^ permalink raw reply [flat|nested] 111+ messages in thread
[parent not found: <20160509170312.06e9c4fe-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>]
* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
2016-05-09 23:03 ` Alex Williamson
(?)
@ 2016-05-10 16:50 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 16:50 UTC (permalink / raw)
To: Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
On 05/10/2016 01:03 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 14:06:19 +0200
> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>
>> Hi Alex,
>> On 05/04/2016 01:54 PM, Eric Auger wrote:
>>> This patch allows the user-space to retrieve the MSI geometry. The
>>> implementation is based on capability chains, now also added to
>>> VFIO_IOMMU_GET_INFO.
>>
>> If you prefer we could consider this patch outside of the main series
>> since it brings extra functionalities (MSI geometry reporting). In a
>> first QEMU integration we would live without knowing the MSI geometry I
>> think, all the more so I currently report an arbitrary number of
>> requested IOVA pages. The computation of the exact number of doorbells
>> to map brings extra complexity and I did not address this issue yet.
>>
>> It sketches a possible user API to report the MSI geometry based on the
>> capability chains, as you suggested some time ago. I am currently busy
>> drafting a QEMU integration.
>
> How would the user know that reserved MSI mappings are requires or
> available without this? Wouldn't the only option be for userspace to
> try to map something with the reserved MSI flag set and see if the
> kernel accepts it?
Well my first guess was that the (QEMU) virt machine using KVM/PCIe-MSI
passthrough could hardcode an arbitrary "large" iova size (currently 16
64kB pages in my QEMU integration). In case this is not sufficient for
mapping all host doorbells, we would see MSI allocation failing. In case
the need shows up, we could increase the value later on.
That's not a very desirable programming model. The
> arbitrary size is pretty ugly, but it at least makes for a consistent
> user interface. Is it a functional issue if we overestimate the size
> or is it just a matter of wasting IOVA space? Is there significant
> harm in making it obscenely large, like 1MB?
no just waste of IOVA space. To make it as transparent as possible for
virt machine I wanted to hide in the platform bus reserved IOVA.
The reference counting and
> re-use of IOVA pages seems like we may often only be using a single
> IOVA page for multiple doorbells. I guess I'm leaning towards defining
> the API even if the value is somewhat arbitrary because we'd rather have
> control of this rather than having the user guess and try to rope them
> back in later to use a kernel recommended value. Thanks,
OK
Thanks
Eric
>
> Alex
>
^ permalink raw reply [flat|nested] 111+ messages in thread
* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-10 16:50 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 16:50 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
On 05/10/2016 01:03 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 14:06:19 +0200
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> Hi Alex,
>> On 05/04/2016 01:54 PM, Eric Auger wrote:
>>> This patch allows the user-space to retrieve the MSI geometry. The
>>> implementation is based on capability chains, now also added to
>>> VFIO_IOMMU_GET_INFO.
>>
>> If you prefer we could consider this patch outside of the main series
>> since it brings extra functionalities (MSI geometry reporting). In a
>> first QEMU integration we would live without knowing the MSI geometry I
>> think, all the more so I currently report an arbitrary number of
>> requested IOVA pages. The computation of the exact number of doorbells
>> to map brings extra complexity and I did not address this issue yet.
>>
>> It sketches a possible user API to report the MSI geometry based on the
>> capability chains, as you suggested some time ago. I am currently busy
>> drafting a QEMU integration.
>
> How would the user know that reserved MSI mappings are requires or
> available without this? Wouldn't the only option be for userspace to
> try to map something with the reserved MSI flag set and see if the
> kernel accepts it?
Well my first guess was that the (QEMU) virt machine using KVM/PCIe-MSI
passthrough could hardcode an arbitrary "large" iova size (currently 16
64kB pages in my QEMU integration). In case this is not sufficient for
mapping all host doorbells, we would see MSI allocation failing. In case
the need shows up, we could increase the value later on.
That's not a very desirable programming model. The
> arbitrary size is pretty ugly, but it at least makes for a consistent
> user interface. Is it a functional issue if we overestimate the size
> or is it just a matter of wasting IOVA space? Is there significant
> harm in making it obscenely large, like 1MB?
no just waste of IOVA space. To make it as transparent as possible for
virt machine I wanted to hide in the platform bus reserved IOVA.
The reference counting and
> re-use of IOVA pages seems like we may often only be using a single
> IOVA page for multiple doorbells. I guess I'm leaning towards defining
> the API even if the value is somewhat arbitrary because we'd rather have
> control of this rather than having the user guess and try to rope them
> back in later to use a kernel recommended value. Thanks,
OK
Thanks
Eric
>
> Alex
>
^ permalink raw reply [flat|nested] 111+ messages in thread
* [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-10 16:50 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 16:50 UTC (permalink / raw)
To: linux-arm-kernel
On 05/10/2016 01:03 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 14:06:19 +0200
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> Hi Alex,
>> On 05/04/2016 01:54 PM, Eric Auger wrote:
>>> This patch allows the user-space to retrieve the MSI geometry. The
>>> implementation is based on capability chains, now also added to
>>> VFIO_IOMMU_GET_INFO.
>>
>> If you prefer we could consider this patch outside of the main series
>> since it brings extra functionalities (MSI geometry reporting). In a
>> first QEMU integration we would live without knowing the MSI geometry I
>> think, all the more so I currently report an arbitrary number of
>> requested IOVA pages. The computation of the exact number of doorbells
>> to map brings extra complexity and I did not address this issue yet.
>>
>> It sketches a possible user API to report the MSI geometry based on the
>> capability chains, as you suggested some time ago. I am currently busy
>> drafting a QEMU integration.
>
> How would the user know that reserved MSI mappings are requires or
> available without this? Wouldn't the only option be for userspace to
> try to map something with the reserved MSI flag set and see if the
> kernel accepts it?
Well my first guess was that the (QEMU) virt machine using KVM/PCIe-MSI
passthrough could hardcode an arbitrary "large" iova size (currently 16
64kB pages in my QEMU integration). In case this is not sufficient for
mapping all host doorbells, we would see MSI allocation failing. In case
the need shows up, we could increase the value later on.
That's not a very desirable programming model. The
> arbitrary size is pretty ugly, but it at least makes for a consistent
> user interface. Is it a functional issue if we overestimate the size
> or is it just a matter of wasting IOVA space? Is there significant
> harm in making it obscenely large, like 1MB?
no just waste of IOVA space. To make it as transparent as possible for
virt machine I wanted to hide in the platform bus reserved IOVA.
The reference counting and
> re-use of IOVA pages seems like we may often only be using a single
> IOVA page for multiple doorbells. I guess I'm leaning towards defining
> the API even if the value is somewhat arbitrary because we'd rather have
> control of this rather than having the user guess and try to rope them
> back in later to use a kernel recommended value. Thanks,
OK
Thanks
Eric
>
> Alex
>
^ permalink raw reply [flat|nested] 111+ messages in thread
* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
2016-05-04 11:54 ` Eric Auger
(?)
@ 2016-05-09 22:49 ` Alex Williamson
-1 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
To: Eric Auger
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
On Wed, 4 May 2016 11:54:18 +0000
Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.
>
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
> case, the size and alignment of the IOVA region to be provided are
> returned.
>
> In case the userspace must provide the IOVA range, we currently return
> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
> current ARM platforms. This may be deprecated by a more sophisticated
> computation later on.
>
> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>
> ---
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> capability chain, reporting the MSI geometry
>
> v7 -> v8:
> - use iommu_domain_msi_geometry
>
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
> This number depends on the domain/group/device topology which can
> dynamically change. Let's rely instead rely on an arbitrary max depending
> on the system
>
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
>
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
> more complex computation of MSI mapping requirements, especially the
> number of pages to be provided by the user-space.
> - reword patch title
>
> RFC v1 -> v1:
> - derived from
> [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
> drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 30 +++++++++++++++++-
> 2 files changed, 98 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2fc8197..841360b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> return ret;
> }
>
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> + struct vfio_info_cap *caps)
> +{
> + struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> + struct iommu_domain_msi_geometry msi_geometry;
> + struct vfio_info_cap_header *header;
> + struct vfio_domain *d;
> + bool mapping_required;
> + size_t size;
> +
> + mutex_lock(&iommu->lock);
> + /* All domains have same require_msi_map property, pick first */
> + d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> + iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> + &msi_geometry);
> + mapping_required = msi_geometry.iommu_msi_supported;
> +
> + mutex_unlock(&iommu->lock);
> +
> + size = sizeof(*vfio_msi_geometry);
> + header = vfio_info_cap_add(caps, size,
> + VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> +
> + vfio_msi_geometry = container_of(header,
> + struct vfio_iommu_type1_info_cap_msi_geometry,
> + header);
> +
> + vfio_msi_geometry->reserved = !mapping_required;
> + if (vfio_msi_geometry->reserved) {
> + vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> + vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
> + return 0;
> + }
> +
> + vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
> + /* we currently report the need for an arbitray number of 16 pages */
> + vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
Hmm, that really is arbitrary. How could we know a real value here?
> +
> + return 0;
> +}
> +
> static long vfio_iommu_type1_ioctl(void *iommu_data,
> unsigned int cmd, unsigned long arg)
> {
> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> }
> } else if (cmd == VFIO_IOMMU_GET_INFO) {
> struct vfio_iommu_type1_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + int ret;
>
> minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>
> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>
> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>
> + ret = compute_msi_geometry_caps(iommu, &caps);
> + if (ret)
> + return ret;
> +
> + if (caps.size) {
> + info.flags |= VFIO_IOMMU_INFO_CAPS;
> + if (info.argsz < sizeof(info) + caps.size) {
> + info.argsz = sizeof(info) + caps.size;
> + info.cap_offset = 0;
> + } else {
> + vfio_info_cap_shift(&caps, sizeof(info));
> + if (copy_to_user((void __user *)arg +
> + sizeof(info), caps.buf,
> + caps.size)) {
> + kfree(caps.buf);
> + return -EFAULT;
> + }
> + info.cap_offset = sizeof(info);
> + }
> +
> + kfree(caps.buf);
> + }
> +
> return copy_to_user((void __user *)arg, &info, minsz) ?
> -EFAULT : 0;
>
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..0ff6a8d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
> __u32 argsz;
> __u32 flags;
> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
> - __u64 iova_pgsizes; /* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
> + __u32 cap_offset; /* Offset within info struct of first cap */
> + __u64 iova_pgsizes; /* Bitmap of supported page sizes */
This would break existing users, we can't arbitrarily change the offset
of iova_pgsizes. We can add cap_offset to the end and I think
everything would work about above if we do that.
> +};
> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + * whose boundaries are given by [@aperture_start, @aperture_end].
> + * this is typically the case on x86 host. The userspace is not allowed
> + * to map userspace memory at IOVAs intersecting this range using
> + * VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + * in that case the userspace must provide an IOVA window characterized by
> + * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> + struct vfio_info_cap_header header;
> + bool reserved; /* Are MSI IOVAs within a reserved aperture? */
Do bools have a guaranteed user size? Let's make this a __u32 and call
it flags with bit 0 defined as reserved. I'm tempted to suggest we
could figure out how to make alignment fit in another __u32 so we have a
properly packed structure, otherwise we should make a reserved __u32.
> + /* reserved */
> + __u64 aperture_start;
> + __u64 aperture_end;
> + /* not reserved */
> + __u64 size; /* IOVA aperture size in bytes the userspace must provide */
> + __u64 alignment; /* alignment of the window, in bytes */
> };
>
> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
> * IOVA region that will be used on some platforms to map the host MSI frames.
> * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-09 22:49 ` Alex Williamson
0 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
On Wed, 4 May 2016 11:54:18 +0000
Eric Auger <eric.auger@linaro.org> wrote:
> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.
>
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
> case, the size and alignment of the IOVA region to be provided are
> returned.
>
> In case the userspace must provide the IOVA range, we currently return
> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
> current ARM platforms. This may be deprecated by a more sophisticated
> computation later on.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> ---
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> capability chain, reporting the MSI geometry
>
> v7 -> v8:
> - use iommu_domain_msi_geometry
>
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
> This number depends on the domain/group/device topology which can
> dynamically change. Let's rely instead rely on an arbitrary max depending
> on the system
>
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
>
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
> more complex computation of MSI mapping requirements, especially the
> number of pages to be provided by the user-space.
> - reword patch title
>
> RFC v1 -> v1:
> - derived from
> [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
> drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 30 +++++++++++++++++-
> 2 files changed, 98 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2fc8197..841360b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> return ret;
> }
>
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> + struct vfio_info_cap *caps)
> +{
> + struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> + struct iommu_domain_msi_geometry msi_geometry;
> + struct vfio_info_cap_header *header;
> + struct vfio_domain *d;
> + bool mapping_required;
> + size_t size;
> +
> + mutex_lock(&iommu->lock);
> + /* All domains have same require_msi_map property, pick first */
> + d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> + iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> + &msi_geometry);
> + mapping_required = msi_geometry.iommu_msi_supported;
> +
> + mutex_unlock(&iommu->lock);
> +
> + size = sizeof(*vfio_msi_geometry);
> + header = vfio_info_cap_add(caps, size,
> + VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> +
> + vfio_msi_geometry = container_of(header,
> + struct vfio_iommu_type1_info_cap_msi_geometry,
> + header);
> +
> + vfio_msi_geometry->reserved = !mapping_required;
> + if (vfio_msi_geometry->reserved) {
> + vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> + vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
> + return 0;
> + }
> +
> + vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
> + /* we currently report the need for an arbitray number of 16 pages */
> + vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
Hmm, that really is arbitrary. How could we know a real value here?
> +
> + return 0;
> +}
> +
> static long vfio_iommu_type1_ioctl(void *iommu_data,
> unsigned int cmd, unsigned long arg)
> {
> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> }
> } else if (cmd == VFIO_IOMMU_GET_INFO) {
> struct vfio_iommu_type1_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + int ret;
>
> minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>
> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>
> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>
> + ret = compute_msi_geometry_caps(iommu, &caps);
> + if (ret)
> + return ret;
> +
> + if (caps.size) {
> + info.flags |= VFIO_IOMMU_INFO_CAPS;
> + if (info.argsz < sizeof(info) + caps.size) {
> + info.argsz = sizeof(info) + caps.size;
> + info.cap_offset = 0;
> + } else {
> + vfio_info_cap_shift(&caps, sizeof(info));
> + if (copy_to_user((void __user *)arg +
> + sizeof(info), caps.buf,
> + caps.size)) {
> + kfree(caps.buf);
> + return -EFAULT;
> + }
> + info.cap_offset = sizeof(info);
> + }
> +
> + kfree(caps.buf);
> + }
> +
> return copy_to_user((void __user *)arg, &info, minsz) ?
> -EFAULT : 0;
>
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..0ff6a8d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
> __u32 argsz;
> __u32 flags;
> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
> - __u64 iova_pgsizes; /* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
> + __u32 cap_offset; /* Offset within info struct of first cap */
> + __u64 iova_pgsizes; /* Bitmap of supported page sizes */
This would break existing users, we can't arbitrarily change the offset
of iova_pgsizes. We can add cap_offset to the end and I think
everything would work about above if we do that.
> +};
> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + * whose boundaries are given by [@aperture_start, @aperture_end].
> + * this is typically the case on x86 host. The userspace is not allowed
> + * to map userspace memory at IOVAs intersecting this range using
> + * VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + * in that case the userspace must provide an IOVA window characterized by
> + * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> + struct vfio_info_cap_header header;
> + bool reserved; /* Are MSI IOVAs within a reserved aperture? */
Do bools have a guaranteed user size? Let's make this a __u32 and call
it flags with bit 0 defined as reserved. I'm tempted to suggest we
could figure out how to make alignment fit in another __u32 so we have a
properly packed structure, otherwise we should make a reserved __u32.
> + /* reserved */
> + __u64 aperture_start;
> + __u64 aperture_end;
> + /* not reserved */
> + __u64 size; /* IOVA aperture size in bytes the userspace must provide */
> + __u64 alignment; /* alignment of the window, in bytes */
> };
>
> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
> * IOVA region that will be used on some platforms to map the host MSI frames.
> * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-09 22:49 ` Alex Williamson
0 siblings, 0 replies; 111+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 4 May 2016 11:54:18 +0000
Eric Auger <eric.auger@linaro.org> wrote:
> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.
>
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
> case, the size and alignment of the IOVA region to be provided are
> returned.
>
> In case the userspace must provide the IOVA range, we currently return
> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
> current ARM platforms. This may be deprecated by a more sophisticated
> computation later on.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> ---
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> capability chain, reporting the MSI geometry
>
> v7 -> v8:
> - use iommu_domain_msi_geometry
>
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
> This number depends on the domain/group/device topology which can
> dynamically change. Let's rely instead rely on an arbitrary max depending
> on the system
>
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
>
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
> more complex computation of MSI mapping requirements, especially the
> number of pages to be provided by the user-space.
> - reword patch title
>
> RFC v1 -> v1:
> - derived from
> [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
> drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 30 +++++++++++++++++-
> 2 files changed, 98 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2fc8197..841360b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> return ret;
> }
>
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> + struct vfio_info_cap *caps)
> +{
> + struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> + struct iommu_domain_msi_geometry msi_geometry;
> + struct vfio_info_cap_header *header;
> + struct vfio_domain *d;
> + bool mapping_required;
> + size_t size;
> +
> + mutex_lock(&iommu->lock);
> + /* All domains have same require_msi_map property, pick first */
> + d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> + iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> + &msi_geometry);
> + mapping_required = msi_geometry.iommu_msi_supported;
> +
> + mutex_unlock(&iommu->lock);
> +
> + size = sizeof(*vfio_msi_geometry);
> + header = vfio_info_cap_add(caps, size,
> + VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> +
> + vfio_msi_geometry = container_of(header,
> + struct vfio_iommu_type1_info_cap_msi_geometry,
> + header);
> +
> + vfio_msi_geometry->reserved = !mapping_required;
> + if (vfio_msi_geometry->reserved) {
> + vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> + vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
> + return 0;
> + }
> +
> + vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
> + /* we currently report the need for an arbitray number of 16 pages */
> + vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
Hmm, that really is arbitrary. How could we know a real value here?
> +
> + return 0;
> +}
> +
> static long vfio_iommu_type1_ioctl(void *iommu_data,
> unsigned int cmd, unsigned long arg)
> {
> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> }
> } else if (cmd == VFIO_IOMMU_GET_INFO) {
> struct vfio_iommu_type1_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + int ret;
>
> minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>
> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>
> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>
> + ret = compute_msi_geometry_caps(iommu, &caps);
> + if (ret)
> + return ret;
> +
> + if (caps.size) {
> + info.flags |= VFIO_IOMMU_INFO_CAPS;
> + if (info.argsz < sizeof(info) + caps.size) {
> + info.argsz = sizeof(info) + caps.size;
> + info.cap_offset = 0;
> + } else {
> + vfio_info_cap_shift(&caps, sizeof(info));
> + if (copy_to_user((void __user *)arg +
> + sizeof(info), caps.buf,
> + caps.size)) {
> + kfree(caps.buf);
> + return -EFAULT;
> + }
> + info.cap_offset = sizeof(info);
> + }
> +
> + kfree(caps.buf);
> + }
> +
> return copy_to_user((void __user *)arg, &info, minsz) ?
> -EFAULT : 0;
>
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..0ff6a8d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
> __u32 argsz;
> __u32 flags;
> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
> - __u64 iova_pgsizes; /* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
> + __u32 cap_offset; /* Offset within info struct of first cap */
> + __u64 iova_pgsizes; /* Bitmap of supported page sizes */
This would break existing users, we can't arbitrarily change the offset
of iova_pgsizes. We can add cap_offset to the end and I think
everything would work about above if we do that.
> +};
> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + * whose boundaries are given by [@aperture_start, @aperture_end].
> + * this is typically the case on x86 host. The userspace is not allowed
> + * to map userspace memory at IOVAs intersecting this range using
> + * VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + * in that case the userspace must provide an IOVA window characterized by
> + * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> + struct vfio_info_cap_header header;
> + bool reserved; /* Are MSI IOVAs within a reserved aperture? */
Do bools have a guaranteed user size? Let's make this a __u32 and call
it flags with bit 0 defined as reserved. I'm tempted to suggest we
could figure out how to make alignment fit in another __u32 so we have a
properly packed structure, otherwise we should make a reserved __u32.
> + /* reserved */
> + __u64 aperture_start;
> + __u64 aperture_end;
> + /* not reserved */
> + __u64 size; /* IOVA aperture size in bytes the userspace must provide */
> + __u64 alignment; /* alignment of the window, in bytes */
> };
>
> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
> * IOVA region that will be used on some platforms to map the host MSI frames.
> * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
> */
> struct vfio_iommu_type1_dma_map {
> __u32 argsz;
^ permalink raw reply [flat|nested] 111+ messages in thread[parent not found: <20160509164950.0b1cf9c1-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>]
* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
2016-05-09 22:49 ` Alex Williamson
(?)
@ 2016-05-10 16:36 ` Eric Auger
-1 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 16:36 UTC (permalink / raw)
To: Alex Williamson
Cc: julien.grall-5wv7dgnIgG8, eric.auger-qxv4g6HH51o,
jason-NLaQJdtUoK4Be96aLqz0jA, patches-QSEj5FYQhm4dnm+yROfE0A,
marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
tglx-hfZtesqFncYOwBW4kG4KsQ,
christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
yehuday-eYqpPyKDWXRBDgjK7y7TUQ
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:18 +0000
> Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>
>> This patch allows the user-space to retrieve the MSI geometry. The
>> implementation is based on capability chains, now also added to
>> VFIO_IOMMU_GET_INFO.
>>
>> The returned info comprise:
>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>> in the positive, the start/end of the aperture,
>> - or whether the IOVA aperture need to be set by the userspace. In that
>> case, the size and alignment of the IOVA region to be provided are
>> returned.
>>
>> In case the userspace must provide the IOVA range, we currently return
>> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
>> current ARM platforms. This may be deprecated by a more sophisticated
>> computation later on.
>>
>> Signed-off-by: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>
>> ---
>> v8 -> v9:
>> - use iommu_msi_supported flag instead of programmable
>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>> capability chain, reporting the MSI geometry
>>
>> v7 -> v8:
>> - use iommu_domain_msi_geometry
>>
>> v6 -> v7:
>> - remove the computation of the number of IOVA pages to be provisionned.
>> This number depends on the domain/group/device topology which can
>> dynamically change. Let's rely instead rely on an arbitrary max depending
>> on the system
>>
>> v4 -> v5:
>> - move msi_info and ret declaration within the conditional code
>>
>> v3 -> v4:
>> - replace former vfio_domains_require_msi_mapping by
>> more complex computation of MSI mapping requirements, especially the
>> number of pages to be provided by the user-space.
>> - reword patch title
>>
>> RFC v1 -> v1:
>> - derived from
>> [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>> - renamed allow_msi_reconfig into require_msi_mapping
>> - fixed VFIO_IOMMU_GET_INFO
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
>> include/uapi/linux/vfio.h | 30 +++++++++++++++++-
>> 2 files changed, 98 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 2fc8197..841360b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>> return ret;
>> }
>>
>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>> + struct vfio_info_cap *caps)
>> +{
>> + struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>> + struct iommu_domain_msi_geometry msi_geometry;
>> + struct vfio_info_cap_header *header;
>> + struct vfio_domain *d;
>> + bool mapping_required;
>> + size_t size;
>> +
>> + mutex_lock(&iommu->lock);
>> + /* All domains have same require_msi_map property, pick first */
>> + d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>> + iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>> + &msi_geometry);
>> + mapping_required = msi_geometry.iommu_msi_supported;
>> +
>> + mutex_unlock(&iommu->lock);
>> +
>> + size = sizeof(*vfio_msi_geometry);
>> + header = vfio_info_cap_add(caps, size,
>> + VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>> +
>> + if (IS_ERR(header))
>> + return PTR_ERR(header);
>> +
>> + vfio_msi_geometry = container_of(header,
>> + struct vfio_iommu_type1_info_cap_msi_geometry,
>> + header);
>> +
>> + vfio_msi_geometry->reserved = !mapping_required;
>> + if (vfio_msi_geometry->reserved) {
>> + vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>> + vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
>> + return 0;
>> + }
>> +
>> + vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
>> + /* we currently report the need for an arbitray number of 16 pages */
>> + vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
>
> Hmm, that really is arbitrary. How could we know a real value here?
Yes I fully agree and this is aknowledged in the cover/commit msg. I
dared to do that because this has the benefits to allow introducing the
userspace API while refining this computation later on.
I did not find yet an elegant solution to compute the platform max
number/size of doorbells (besides what I did in the past which was
dependent on the group/device current topology).
Maybe an option would be to have the relevant MSI controllers
registering their doorbells in a global list at probe time and then we
would enumerate all of them. I was reluctant to add this new
functionality in the series at this stage, hence the current simplification.
>
>> +
>> + return 0;
>> +}
>> +
>> static long vfio_iommu_type1_ioctl(void *iommu_data,
>> unsigned int cmd, unsigned long arg)
>> {
>> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> }
>> } else if (cmd == VFIO_IOMMU_GET_INFO) {
>> struct vfio_iommu_type1_info info;
>> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>> + int ret;
>>
>> minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>>
>> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>
>> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>
>> + ret = compute_msi_geometry_caps(iommu, &caps);
>> + if (ret)
>> + return ret;
>> +
>> + if (caps.size) {
>> + info.flags |= VFIO_IOMMU_INFO_CAPS;
>> + if (info.argsz < sizeof(info) + caps.size) {
>> + info.argsz = sizeof(info) + caps.size;
>> + info.cap_offset = 0;
>> + } else {
>> + vfio_info_cap_shift(&caps, sizeof(info));
>> + if (copy_to_user((void __user *)arg +
>> + sizeof(info), caps.buf,
>> + caps.size)) {
>> + kfree(caps.buf);
>> + return -EFAULT;
>> + }
>> + info.cap_offset = sizeof(info);
>> + }
>> +
>> + kfree(caps.buf);
>> + }
>> +
>> return copy_to_user((void __user *)arg, &info, minsz) ?
>> -EFAULT : 0;
>>
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 4a9dbc2..0ff6a8d 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
>> __u32 argsz;
>> __u32 flags;
>> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
>> - __u64 iova_pgsizes; /* Bitmap of supported page sizes */
>> +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
>> + __u32 cap_offset; /* Offset within info struct of first cap */
>> + __u64 iova_pgsizes; /* Bitmap of supported page sizes */
>
> This would break existing users, we can't arbitrarily change the offset
> of iova_pgsizes. We can add cap_offset to the end and I think
> everything would work about above if we do that.
Hum yes, sorry for the lack of care.
>
>> +};
>> +
>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
>> +
>> +/*
>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
>> + * whose boundaries are given by [@aperture_start, @aperture_end].
>> + * this is typically the case on x86 host. The userspace is not allowed
>> + * to map userspace memory at IOVAs intersecting this range using
>> + * VFIO_IOMMU_MAP_DMA.
>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
>> + * in that case the userspace must provide an IOVA window characterized by
>> + * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
>> + */
>> +struct vfio_iommu_type1_info_cap_msi_geometry {
>> + struct vfio_info_cap_header header;
>> + bool reserved; /* Are MSI IOVAs within a reserved aperture? */
>
> Do bools have a guaranteed user size? Let's make this a __u32 and call
> it flags with bit 0 defined as reserved. I'm tempted to suggest we
> could figure out how to make alignment fit in another __u32 so we have a
> properly packed structure, otherwise we should make a reserved __u32.
OK will rewrite & check that.
Thanks
Eric
>
>> + /* reserved */
>> + __u64 aperture_start;
>> + __u64 aperture_end;
>> + /* not reserved */
>> + __u64 size; /* IOVA aperture size in bytes the userspace must provide */
>> + __u64 alignment; /* alignment of the window, in bytes */
>> };
>>
>> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
>> * IOVA region that will be used on some platforms to map the host MSI frames.
>> * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> * IOVA region stays until the container is closed.
>> + * The requirement for provisioning such reserved IOVA range can be checked by
>> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>> */
>> struct vfio_iommu_type1_dma_map {
>> __u32 argsz;
>
^ permalink raw reply [flat|nested] 111+ messages in thread* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-10 16:36 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 16:36 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
Jean-Philippe.Brucker, julien.grall, yehuday
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:18 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> This patch allows the user-space to retrieve the MSI geometry. The
>> implementation is based on capability chains, now also added to
>> VFIO_IOMMU_GET_INFO.
>>
>> The returned info comprise:
>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>> in the positive, the start/end of the aperture,
>> - or whether the IOVA aperture need to be set by the userspace. In that
>> case, the size and alignment of the IOVA region to be provided are
>> returned.
>>
>> In case the userspace must provide the IOVA range, we currently return
>> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
>> current ARM platforms. This may be deprecated by a more sophisticated
>> computation later on.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v8 -> v9:
>> - use iommu_msi_supported flag instead of programmable
>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>> capability chain, reporting the MSI geometry
>>
>> v7 -> v8:
>> - use iommu_domain_msi_geometry
>>
>> v6 -> v7:
>> - remove the computation of the number of IOVA pages to be provisionned.
>> This number depends on the domain/group/device topology which can
>> dynamically change. Let's rely instead rely on an arbitrary max depending
>> on the system
>>
>> v4 -> v5:
>> - move msi_info and ret declaration within the conditional code
>>
>> v3 -> v4:
>> - replace former vfio_domains_require_msi_mapping by
>> more complex computation of MSI mapping requirements, especially the
>> number of pages to be provided by the user-space.
>> - reword patch title
>>
>> RFC v1 -> v1:
>> - derived from
>> [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>> - renamed allow_msi_reconfig into require_msi_mapping
>> - fixed VFIO_IOMMU_GET_INFO
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
>> include/uapi/linux/vfio.h | 30 +++++++++++++++++-
>> 2 files changed, 98 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 2fc8197..841360b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>> return ret;
>> }
>>
>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>> + struct vfio_info_cap *caps)
>> +{
>> + struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>> + struct iommu_domain_msi_geometry msi_geometry;
>> + struct vfio_info_cap_header *header;
>> + struct vfio_domain *d;
>> + bool mapping_required;
>> + size_t size;
>> +
>> + mutex_lock(&iommu->lock);
>> + /* All domains have same require_msi_map property, pick first */
>> + d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>> + iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>> + &msi_geometry);
>> + mapping_required = msi_geometry.iommu_msi_supported;
>> +
>> + mutex_unlock(&iommu->lock);
>> +
>> + size = sizeof(*vfio_msi_geometry);
>> + header = vfio_info_cap_add(caps, size,
>> + VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>> +
>> + if (IS_ERR(header))
>> + return PTR_ERR(header);
>> +
>> + vfio_msi_geometry = container_of(header,
>> + struct vfio_iommu_type1_info_cap_msi_geometry,
>> + header);
>> +
>> + vfio_msi_geometry->reserved = !mapping_required;
>> + if (vfio_msi_geometry->reserved) {
>> + vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>> + vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
>> + return 0;
>> + }
>> +
>> + vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
>> + /* we currently report the need for an arbitray number of 16 pages */
>> + vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
>
> Hmm, that really is arbitrary. How could we know a real value here?
Yes I fully agree and this is aknowledged in the cover/commit msg. I
dared to do that because this has the benefits to allow introducing the
userspace API while refining this computation later on.
I did not find yet an elegant solution to compute the platform max
number/size of doorbells (besides what I did in the past which was
dependent on the group/device current topology).
Maybe an option would be to have the relevant MSI controllers
registering their doorbells in a global list at probe time and then we
would enumerate all of them. I was reluctant to add this new
functionality in the series at this stage, hence the current simplification.
>
>> +
>> + return 0;
>> +}
>> +
>> static long vfio_iommu_type1_ioctl(void *iommu_data,
>> unsigned int cmd, unsigned long arg)
>> {
>> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> }
>> } else if (cmd == VFIO_IOMMU_GET_INFO) {
>> struct vfio_iommu_type1_info info;
>> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>> + int ret;
>>
>> minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>>
>> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>
>> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>
>> + ret = compute_msi_geometry_caps(iommu, &caps);
>> + if (ret)
>> + return ret;
>> +
>> + if (caps.size) {
>> + info.flags |= VFIO_IOMMU_INFO_CAPS;
>> + if (info.argsz < sizeof(info) + caps.size) {
>> + info.argsz = sizeof(info) + caps.size;
>> + info.cap_offset = 0;
>> + } else {
>> + vfio_info_cap_shift(&caps, sizeof(info));
>> + if (copy_to_user((void __user *)arg +
>> + sizeof(info), caps.buf,
>> + caps.size)) {
>> + kfree(caps.buf);
>> + return -EFAULT;
>> + }
>> + info.cap_offset = sizeof(info);
>> + }
>> +
>> + kfree(caps.buf);
>> + }
>> +
>> return copy_to_user((void __user *)arg, &info, minsz) ?
>> -EFAULT : 0;
>>
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 4a9dbc2..0ff6a8d 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
>> __u32 argsz;
>> __u32 flags;
>> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
>> - __u64 iova_pgsizes; /* Bitmap of supported page sizes */
>> +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
>> + __u32 cap_offset; /* Offset within info struct of first cap */
>> + __u64 iova_pgsizes; /* Bitmap of supported page sizes */
>
> This would break existing users, we can't arbitrarily change the offset
> of iova_pgsizes. We can add cap_offset to the end and I think
> everything would work about above if we do that.
Hum yes, sorry for the lack of care.
>
>> +};
>> +
>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
>> +
>> +/*
>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
>> + * whose boundaries are given by [@aperture_start, @aperture_end].
>> + * this is typically the case on x86 host. The userspace is not allowed
>> + * to map userspace memory at IOVAs intersecting this range using
>> + * VFIO_IOMMU_MAP_DMA.
>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
>> + * in that case the userspace must provide an IOVA window characterized by
>> + * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
>> + */
>> +struct vfio_iommu_type1_info_cap_msi_geometry {
>> + struct vfio_info_cap_header header;
>> + bool reserved; /* Are MSI IOVAs within a reserved aperture? */
>
> Do bools have a guaranteed user size? Let's make this a __u32 and call
> it flags with bit 0 defined as reserved. I'm tempted to suggest we
> could figure out how to make alignment fit in another __u32 so we have a
> properly packed structure, otherwise we should make a reserved __u32.
OK will rewrite & check that.
Thanks
Eric
>
>> + /* reserved */
>> + __u64 aperture_start;
>> + __u64 aperture_end;
>> + /* not reserved */
>> + __u64 size; /* IOVA aperture size in bytes the userspace must provide */
>> + __u64 alignment; /* alignment of the window, in bytes */
>> };
>>
>> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
>> * IOVA region that will be used on some platforms to map the host MSI frames.
>> * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> * IOVA region stays until the container is closed.
>> + * The requirement for provisioning such reserved IOVA range can be checked by
>> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>> */
>> struct vfio_iommu_type1_dma_map {
>> __u32 argsz;
>
^ permalink raw reply [flat|nested] 111+ messages in thread* [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-05-10 16:36 ` Eric Auger
0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2016-05-10 16:36 UTC (permalink / raw)
To: linux-arm-kernel
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 11:54:18 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
>
>> This patch allows the user-space to retrieve the MSI geometry. The
>> implementation is based on capability chains, now also added to
>> VFIO_IOMMU_GET_INFO.
>>
>> The returned info comprise:
>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>> in the positive, the start/end of the aperture,
>> - or whether the IOVA aperture need to be set by the userspace. In that
>> case, the size and alignment of the IOVA region to be provided are
>> returned.
>>
>> In case the userspace must provide the IOVA range, we currently return
>> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
>> current ARM platforms. This may be deprecated by a more sophisticated
>> computation later on.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v8 -> v9:
>> - use iommu_msi_supported flag instead of programmable
>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>> capability chain, reporting the MSI geometry
>>
>> v7 -> v8:
>> - use iommu_domain_msi_geometry
>>
>> v6 -> v7:
>> - remove the computation of the number of IOVA pages to be provisionned.
>> This number depends on the domain/group/device topology which can
>> dynamically change. Let's rely instead rely on an arbitrary max depending
>> on the system
>>
>> v4 -> v5:
>> - move msi_info and ret declaration within the conditional code
>>
>> v3 -> v4:
>> - replace former vfio_domains_require_msi_mapping by
>> more complex computation of MSI mapping requirements, especially the
>> number of pages to be provided by the user-space.
>> - reword patch title
>>
>> RFC v1 -> v1:
>> - derived from
>> [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>> - renamed allow_msi_reconfig into require_msi_mapping
>> - fixed VFIO_IOMMU_GET_INFO
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
>> include/uapi/linux/vfio.h | 30 +++++++++++++++++-
>> 2 files changed, 98 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 2fc8197..841360b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>> return ret;
>> }
>>
>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>> + struct vfio_info_cap *caps)
>> +{
>> + struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>> + struct iommu_domain_msi_geometry msi_geometry;
>> + struct vfio_info_cap_header *header;
>> + struct vfio_domain *d;
>> + bool mapping_required;
>> + size_t size;
>> +
>> + mutex_lock(&iommu->lock);
>> + /* All domains have same require_msi_map property, pick first */
>> + d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>> + iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>> + &msi_geometry);
>> + mapping_required = msi_geometry.iommu_msi_supported;
>> +
>> + mutex_unlock(&iommu->lock);
>> +
>> + size = sizeof(*vfio_msi_geometry);
>> + header = vfio_info_cap_add(caps, size,
>> + VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>> +
>> + if (IS_ERR(header))
>> + return PTR_ERR(header);
>> +
>> + vfio_msi_geometry = container_of(header,
>> + struct vfio_iommu_type1_info_cap_msi_geometry,
>> + header);
>> +
>> + vfio_msi_geometry->reserved = !mapping_required;
>> + if (vfio_msi_geometry->reserved) {
>> + vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>> + vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
>> + return 0;
>> + }
>> +
>> + vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
>> + /* we currently report the need for an arbitray number of 16 pages */
>> + vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
>
> Hmm, that really is arbitrary. How could we know a real value here?
Yes I fully agree and this is aknowledged in the cover/commit msg. I
dared to do that because this has the benefits to allow introducing the
userspace API while refining this computation later on.
I did not find yet an elegant solution to compute the platform max
number/size of doorbells (besides what I did in the past which was
dependent on the group/device current topology).
Maybe an option would be to have the relevant MSI controllers
registering their doorbells in a global list at probe time and then we
would enumerate all of them. I was reluctant to add this new
functionality in the series at this stage, hence the current simplification.
>
>> +
>> + return 0;
>> +}
>> +
>> static long vfio_iommu_type1_ioctl(void *iommu_data,
>> unsigned int cmd, unsigned long arg)
>> {
>> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> }
>> } else if (cmd == VFIO_IOMMU_GET_INFO) {
>> struct vfio_iommu_type1_info info;
>> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>> + int ret;
>>
>> minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>>
>> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>
>> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>
>> + ret = compute_msi_geometry_caps(iommu, &caps);
>> + if (ret)
>> + return ret;
>> +
>> + if (caps.size) {
>> + info.flags |= VFIO_IOMMU_INFO_CAPS;
>> + if (info.argsz < sizeof(info) + caps.size) {
>> + info.argsz = sizeof(info) + caps.size;
>> + info.cap_offset = 0;
>> + } else {
>> + vfio_info_cap_shift(&caps, sizeof(info));
>> + if (copy_to_user((void __user *)arg +
>> + sizeof(info), caps.buf,
>> + caps.size)) {
>> + kfree(caps.buf);
>> + return -EFAULT;
>> + }
>> + info.cap_offset = sizeof(info);
>> + }
>> +
>> + kfree(caps.buf);
>> + }
>> +
>> return copy_to_user((void __user *)arg, &info, minsz) ?
>> -EFAULT : 0;
>>
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 4a9dbc2..0ff6a8d 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
>> __u32 argsz;
>> __u32 flags;
>> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
>> - __u64 iova_pgsizes; /* Bitmap of supported page sizes */
>> +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
>> + __u32 cap_offset; /* Offset within info struct of first cap */
>> + __u64 iova_pgsizes; /* Bitmap of supported page sizes */
>
> This would break existing users, we can't arbitrarily change the offset
> of iova_pgsizes. We can add cap_offset to the end and I think
> everything would work about above if we do that.
Hum yes, sorry for the lack of care.
>
>> +};
>> +
>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1
>> +
>> +/*
>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
>> + * whose boundaries are given by [@aperture_start, @aperture_end].
>> + * this is typically the case on x86 host. The userspace is not allowed
>> + * to map userspace memory at IOVAs intersecting this range using
>> + * VFIO_IOMMU_MAP_DMA.
>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
>> + * in that case the userspace must provide an IOVA window characterized by
>> + * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
>> + */
>> +struct vfio_iommu_type1_info_cap_msi_geometry {
>> + struct vfio_info_cap_header header;
>> + bool reserved; /* Are MSI IOVAs within a reserved aperture? */
>
> Do bools have a guaranteed user size? Let's make this a __u32 and call
> it flags with bit 0 defined as reserved. I'm tempted to suggest we
> could figure out how to make alignment fit in another __u32 so we have a
> properly packed structure, otherwise we should make a reserved __u32.
OK will rewrite & check that.
Thanks
Eric
>
>> + /* reserved */
>> + __u64 aperture_start;
>> + __u64 aperture_end;
>> + /* not reserved */
>> + __u64 size; /* IOVA aperture size in bytes the userspace must provide */
>> + __u64 alignment; /* alignment of the window, in bytes */
>> };
>>
>> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
>> * IOVA region that will be used on some platforms to map the host MSI frames.
>> * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> * IOVA region stays until the container is closed.
>> + * The requirement for provisioning such reserved IOVA range can be checked by
>> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>> */
>> struct vfio_iommu_type1_dma_map {
>> __u32 argsz;
>
^ permalink raw reply [flat|nested] 111+ messages in thread