* [PATCH v3 0/3] iommu/s390: add support for IOMMU passthrough
@ 2025-01-24 20:17 Matthew Rosato
2025-01-24 20:17 ` [PATCH v3 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Matthew Rosato @ 2025-01-24 20:17 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
This series introduces the ability for certain devices on s390 to bypass
a layer of IOMMU via the iommu.passthrough=1 option. In order to enable
this, the concept of an identity domain is added to s390-iommu. On s390,
IOMMU passthrough is only allowed if indicated via a special bit in s390
CLP data for the associated device group, otherwise we must fall back to
dma-iommu.
Changes for v3:
- Rebase onto 6.13
- fixed bus_dma_region size (Niklas)
Changes for v2:
- Remove ARCH_HAS_PHYS_TO_DMA, use bus_dma_region
- Remove use of def_domain_type, use 1 of 2 ops chosen at init
Matthew Rosato (3):
s390/pci: check for relaxed translation capability
s390/pci: store DMA offset in bus_dma_region
iommu/s390: implement iommu passthrough via identity domain
arch/s390/include/asm/pci.h | 2 +-
arch/s390/include/asm/pci_clp.h | 4 +-
arch/s390/pci/pci.c | 6 ++-
arch/s390/pci/pci_bus.c | 18 +++++++
arch/s390/pci/pci_clp.c | 1 +
drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++--------
6 files changed, 99 insertions(+), 27 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/3] s390/pci: check for relaxed translation capability
2025-01-24 20:17 [PATCH v3 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
@ 2025-01-24 20:17 ` Matthew Rosato
2025-01-29 10:07 ` Niklas Schnelle
2025-01-24 20:17 ` [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
2 siblings, 1 reply; 10+ messages in thread
From: Matthew Rosato @ 2025-01-24 20:17 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
For each zdev, record whether or not CLP indicates relaxed translation
capability for the associated device group.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
arch/s390/include/asm/pci.h | 2 +-
arch/s390/include/asm/pci_clp.h | 4 +++-
arch/s390/pci/pci_clp.c | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 474e1f8d1d3c..8fe4c7a72c0b 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -144,7 +144,7 @@ struct zpci_dev {
u8 util_str_avail : 1;
u8 irqs_registered : 1;
u8 tid_avail : 1;
- u8 reserved : 1;
+ u8 rtr_avail : 1; /* Relaxed translation allowed */
unsigned int devfn; /* DEVFN part of the RID*/
u8 pfip[CLP_PFIP_NR_SEGMENTS]; /* pci function internal path */
diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
index 3fff2f7095c8..7ebff39c84b3 100644
--- a/arch/s390/include/asm/pci_clp.h
+++ b/arch/s390/include/asm/pci_clp.h
@@ -156,7 +156,9 @@ struct clp_rsp_query_pci_grp {
u16 : 4;
u16 noi : 12; /* number of interrupts */
u8 version;
- u8 : 6;
+ u8 : 2;
+ u8 rtr : 1; /* Relaxed translation requirement */
+ u8 : 3;
u8 frame : 1;
u8 refresh : 1; /* TLB refresh mode */
u16 : 3;
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index 14bf7e8d06b7..27248686e588 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -112,6 +112,7 @@ static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
zdev->version = response->version;
zdev->maxstbl = response->maxstbl;
zdev->dtsm = response->dtsm;
+ zdev->rtr_avail = response->rtr;
switch (response->version) {
case 1:
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region
2025-01-24 20:17 [PATCH v3 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
2025-01-24 20:17 ` [PATCH v3 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
@ 2025-01-24 20:17 ` Matthew Rosato
2025-01-29 10:20 ` Niklas Schnelle
2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
2 siblings, 1 reply; 10+ messages in thread
From: Matthew Rosato @ 2025-01-24 20:17 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
PCI devices on s390 have a DMA offset that is reported via CLP. In
preparation for allowing identity domains, setup the bus_dma_region
for all PCI devices using the reported CLP value.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
arch/s390/pci/pci_bus.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
index d5ace00d10f0..51fa993b64fc 100644
--- a/arch/s390/pci/pci_bus.c
+++ b/arch/s390/pci/pci_bus.c
@@ -19,6 +19,7 @@
#include <linux/jump_label.h>
#include <linux/pci.h>
#include <linux/printk.h>
+#include <linux/dma-direct.h>
#include <asm/pci_clp.h>
#include <asm/pci_dma.h>
@@ -284,10 +285,27 @@ static struct zpci_bus *zpci_bus_alloc(int topo, bool topo_is_tid)
return zbus;
}
+static void pci_dma_range_setup(struct pci_dev *pdev)
+{
+ struct zpci_dev *zdev = to_zpci(pdev);
+ struct bus_dma_region *map;
+
+ map = kzalloc(sizeof(*map), GFP_KERNEL);
+ if (!map)
+ return;
+
+ map->cpu_start = 0;
+ map->dma_start = PAGE_ALIGN(zdev->start_dma);
+ map->size = zdev->end_dma - zdev->start_dma + 1;
+ pdev->dev.dma_range_map = map;
+}
+
void pcibios_bus_add_device(struct pci_dev *pdev)
{
struct zpci_dev *zdev = to_zpci(pdev);
+ pci_dma_range_setup(pdev);
+
/*
* With pdev->no_vf_scan the common PCI probing code does not
* perform PF/VF linking.
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain
2025-01-24 20:17 [PATCH v3 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
2025-01-24 20:17 ` [PATCH v3 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
2025-01-24 20:17 ` [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
@ 2025-01-24 20:17 ` Matthew Rosato
2025-01-28 17:30 ` Jason Gunthorpe
` (2 more replies)
2 siblings, 3 replies; 10+ messages in thread
From: Matthew Rosato @ 2025-01-24 20:17 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
Enabled via the kernel command-line 'iommu.passthrough=1' option.
Introduce the concept of identity domains to s390-iommu, which relies on
the bus_dma_region to offset identity mappings to the start of the DMA
aperture advertized by CLP.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
arch/s390/pci/pci.c | 6 ++-
drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++---------
2 files changed, 76 insertions(+), 25 deletions(-)
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 88f72745fa59..758b23331754 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -124,14 +124,16 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
struct zpci_fib fib = {0};
u8 cc;
- WARN_ON_ONCE(iota & 0x3fff);
fib.pba = base;
/* Work around off by one in ISM virt device */
if (zdev->pft == PCI_FUNC_TYPE_ISM && limit > base)
fib.pal = limit + (1 << 12);
else
fib.pal = limit;
- fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
+ if (iota == 0)
+ fib.iota = iota;
+ else
+ fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
fib.gd = zdev->gisa;
cc = zpci_mod_fc(req, &fib, status);
if (cc)
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index fbdeded3d48b..3d93a9644fca 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -16,7 +16,7 @@
#include "dma-iommu.h"
-static const struct iommu_ops s390_iommu_ops;
+static const struct iommu_ops s390_iommu_ops, s390_iommu_rtr_ops;
static struct kmem_cache *dma_region_table_cache;
static struct kmem_cache *dma_page_table_cache;
@@ -392,9 +392,11 @@ static int blocking_domain_attach_device(struct iommu_domain *domain,
return 0;
s390_domain = to_s390_domain(zdev->s390_domain);
- spin_lock_irqsave(&s390_domain->list_lock, flags);
- list_del_rcu(&zdev->iommu_list);
- spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+ if (zdev->dma_table) {
+ spin_lock_irqsave(&s390_domain->list_lock, flags);
+ list_del_rcu(&zdev->iommu_list);
+ spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+ }
zpci_unregister_ioat(zdev, 0);
zdev->dma_table = NULL;
@@ -723,7 +725,13 @@ int zpci_init_iommu(struct zpci_dev *zdev)
if (rc)
goto out_err;
- rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops, NULL);
+ if (zdev->rtr_avail) {
+ rc = iommu_device_register(&zdev->iommu_dev,
+ &s390_iommu_rtr_ops, NULL);
+ } else {
+ rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops,
+ NULL);
+ }
if (rc)
goto out_sysfs;
@@ -787,6 +795,39 @@ static int __init s390_iommu_init(void)
}
subsys_initcall(s390_iommu_init);
+static int s390_attach_dev_identity(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct zpci_dev *zdev = to_zpci_dev(dev);
+ u8 status;
+ int cc;
+
+ blocking_domain_attach_device(&blocking_domain, dev);
+
+ /* If we fail now DMA remains blocked via blocking domain */
+ cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
+ 0, &status);
+ /*
+ * If the device is undergoing error recovery the reset code
+ * will re-establish the new domain.
+ */
+ if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL)
+ return -EIO;
+
+ zdev_s390_domain_update(zdev, domain);
+
+ return 0;
+}
+
+static const struct iommu_domain_ops s390_identity_ops = {
+ .attach_dev = s390_attach_dev_identity,
+};
+
+static struct iommu_domain s390_identity_domain = {
+ .type = IOMMU_DOMAIN_IDENTITY,
+ .ops = &s390_identity_ops,
+};
+
static struct iommu_domain blocking_domain = {
.type = IOMMU_DOMAIN_BLOCKED,
.ops = &(const struct iommu_domain_ops) {
@@ -794,23 +835,31 @@ static struct iommu_domain blocking_domain = {
}
};
-static const struct iommu_ops s390_iommu_ops = {
- .blocked_domain = &blocking_domain,
- .release_domain = &blocking_domain,
- .capable = s390_iommu_capable,
- .domain_alloc_paging = s390_domain_alloc_paging,
- .probe_device = s390_iommu_probe_device,
- .device_group = generic_device_group,
- .pgsize_bitmap = SZ_4K,
- .get_resv_regions = s390_iommu_get_resv_regions,
- .default_domain_ops = &(const struct iommu_domain_ops) {
- .attach_dev = s390_iommu_attach_device,
- .map_pages = s390_iommu_map_pages,
- .unmap_pages = s390_iommu_unmap_pages,
- .flush_iotlb_all = s390_iommu_flush_iotlb_all,
- .iotlb_sync = s390_iommu_iotlb_sync,
- .iotlb_sync_map = s390_iommu_iotlb_sync_map,
- .iova_to_phys = s390_iommu_iova_to_phys,
- .free = s390_domain_free,
+#define S390_IOMMU_COMMON_OPS() \
+ .blocked_domain = &blocking_domain, \
+ .release_domain = &blocking_domain, \
+ .capable = s390_iommu_capable, \
+ .domain_alloc_paging = s390_domain_alloc_paging, \
+ .probe_device = s390_iommu_probe_device, \
+ .device_group = generic_device_group, \
+ .pgsize_bitmap = SZ_4K, \
+ .get_resv_regions = s390_iommu_get_resv_regions, \
+ .default_domain_ops = &(const struct iommu_domain_ops) { \
+ .attach_dev = s390_iommu_attach_device, \
+ .map_pages = s390_iommu_map_pages, \
+ .unmap_pages = s390_iommu_unmap_pages, \
+ .flush_iotlb_all = s390_iommu_flush_iotlb_all, \
+ .iotlb_sync = s390_iommu_iotlb_sync, \
+ .iotlb_sync_map = s390_iommu_iotlb_sync_map, \
+ .iova_to_phys = s390_iommu_iova_to_phys, \
+ .free = s390_domain_free, \
}
+
+static const struct iommu_ops s390_iommu_ops = {
+ S390_IOMMU_COMMON_OPS()
+};
+
+static const struct iommu_ops s390_iommu_rtr_ops = {
+ .identity_domain = &s390_identity_domain,
+ S390_IOMMU_COMMON_OPS()
};
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain
2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
@ 2025-01-28 17:30 ` Jason Gunthorpe
2025-01-29 10:29 ` Niklas Schnelle
2025-01-30 7:43 ` Niklas Schnelle
2 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2025-01-28 17:30 UTC (permalink / raw)
To: Matthew Rosato
Cc: joro, will, robin.murphy, gerald.schaefer, schnelle, hca, gor,
agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
On Fri, Jan 24, 2025 at 03:17:17PM -0500, Matthew Rosato wrote:
> Enabled via the kernel command-line 'iommu.passthrough=1' option.
>
> Introduce the concept of identity domains to s390-iommu, which relies on
> the bus_dma_region to offset identity mappings to the start of the DMA
> aperture advertized by CLP.
>
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
> arch/s390/pci/pci.c | 6 ++-
> drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++---------
> 2 files changed, 76 insertions(+), 25 deletions(-)
Seems Ok
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> -static const struct iommu_ops s390_iommu_ops = {
> - .blocked_domain = &blocking_domain,
> - .release_domain = &blocking_domain,
> - .capable = s390_iommu_capable,
> - .domain_alloc_paging = s390_domain_alloc_paging,
> - .probe_device = s390_iommu_probe_device,
> - .device_group = generic_device_group,
> - .pgsize_bitmap = SZ_4K,
> - .get_resv_regions = s390_iommu_get_resv_regions,
> - .default_domain_ops = &(const struct iommu_domain_ops) {
> - .attach_dev = s390_iommu_attach_device,
> - .map_pages = s390_iommu_map_pages,
> - .unmap_pages = s390_iommu_unmap_pages,
> - .flush_iotlb_all = s390_iommu_flush_iotlb_all,
> - .iotlb_sync = s390_iommu_iotlb_sync,
> - .iotlb_sync_map = s390_iommu_iotlb_sync_map,
> - .iova_to_phys = s390_iommu_iova_to_phys,
> - .free = s390_domain_free,
> +#define S390_IOMMU_COMMON_OPS() \
> + .blocked_domain = &blocking_domain, \
> + .release_domain = &blocking_domain, \
> + .capable = s390_iommu_capable, \
> + .domain_alloc_paging = s390_domain_alloc_paging, \
> + .probe_device = s390_iommu_probe_device, \
> + .device_group = generic_device_group, \
> + .pgsize_bitmap = SZ_4K, \
> + .get_resv_regions = s390_iommu_get_resv_regions, \
> + .default_domain_ops = &(const struct iommu_domain_ops) { \
> + .attach_dev = s390_iommu_attach_device, \
> + .map_pages = s390_iommu_map_pages, \
> + .unmap_pages = s390_iommu_unmap_pages, \
> + .flush_iotlb_all = s390_iommu_flush_iotlb_all, \
> + .iotlb_sync = s390_iommu_iotlb_sync, \
> + .iotlb_sync_map = s390_iommu_iotlb_sync_map, \
> + .iova_to_phys = s390_iommu_iova_to_phys, \
> + .free = s390_domain_free, \
> }
> +
> +static const struct iommu_ops s390_iommu_ops = {
> + S390_IOMMU_COMMON_OPS()
> +};
> +
> +static const struct iommu_ops s390_iommu_rtr_ops = {
> + .identity_domain = &s390_identity_domain,
> + S390_IOMMU_COMMON_OPS()
> };
Though it is a pattern in iommu drivers to use a non-cost ops and
mutate them during probe. For instance you could NULL
s390_iommu_rtr_ops->identity_domain if the platform does not support
it.
On the other hand your version with a const ops is security nicer.
Alternatively, I have a patch series that adds a
domain_alloc_identity() function to get virtio-iommu moved to the new
APIs, that could work well here too and keep the const ops.
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] s390/pci: check for relaxed translation capability
2025-01-24 20:17 ` [PATCH v3 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
@ 2025-01-29 10:07 ` Niklas Schnelle
0 siblings, 0 replies; 10+ messages in thread
From: Niklas Schnelle @ 2025-01-29 10:07 UTC (permalink / raw)
To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
On Fri, 2025-01-24 at 15:17 -0500, Matthew Rosato wrote:
> For each zdev, record whether or not CLP indicates relaxed translation
> capability for the associated device group.
>
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
> arch/s390/include/asm/pci.h | 2 +-
> arch/s390/include/asm/pci_clp.h | 4 +++-
> arch/s390/pci/pci_clp.c | 1 +
> 3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index 474e1f8d1d3c..8fe4c7a72c0b 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -144,7 +144,7 @@ struct zpci_dev {
> u8 util_str_avail : 1;
> u8 irqs_registered : 1;
> u8 tid_avail : 1;
> - u8 reserved : 1;
> + u8 rtr_avail : 1; /* Relaxed translation allowed */
> unsigned int devfn; /* DEVFN part of the RID*/
>
> u8 pfip[CLP_PFIP_NR_SEGMENTS]; /* pci function internal path */
> diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
> index 3fff2f7095c8..7ebff39c84b3 100644
> --- a/arch/s390/include/asm/pci_clp.h
> +++ b/arch/s390/include/asm/pci_clp.h
> @@ -156,7 +156,9 @@ struct clp_rsp_query_pci_grp {
> u16 : 4;
> u16 noi : 12; /* number of interrupts */
> u8 version;
> - u8 : 6;
> + u8 : 2;
> + u8 rtr : 1; /* Relaxed translation requirement */
> + u8 : 3;
> u8 frame : 1;
> u8 refresh : 1; /* TLB refresh mode */
> u16 : 3;
> diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
> index 14bf7e8d06b7..27248686e588 100644
> --- a/arch/s390/pci/pci_clp.c
> +++ b/arch/s390/pci/pci_clp.c
> @@ -112,6 +112,7 @@ static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
> zdev->version = response->version;
> zdev->maxstbl = response->maxstbl;
> zdev->dtsm = response->dtsm;
> + zdev->rtr_avail = response->rtr;
>
> switch (response->version) {
> case 1:
Checked the bit positions and as discussed out of band I gave the whole
series a try too. So feel free to add:
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region
2025-01-24 20:17 ` [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
@ 2025-01-29 10:20 ` Niklas Schnelle
0 siblings, 0 replies; 10+ messages in thread
From: Niklas Schnelle @ 2025-01-29 10:20 UTC (permalink / raw)
To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
On Fri, 2025-01-24 at 15:17 -0500, Matthew Rosato wrote:
> PCI devices on s390 have a DMA offset that is reported via CLP. In
> preparation for allowing identity domains, setup the bus_dma_region
> for all PCI devices using the reported CLP value.
>
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
> arch/s390/pci/pci_bus.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
> index d5ace00d10f0..51fa993b64fc 100644
> --- a/arch/s390/pci/pci_bus.c
> +++ b/arch/s390/pci/pci_bus.c
> @@ -19,6 +19,7 @@
> #include <linux/jump_label.h>
> #include <linux/pci.h>
> #include <linux/printk.h>
> +#include <linux/dma-direct.h>
>
> #include <asm/pci_clp.h>
> #include <asm/pci_dma.h>
> @@ -284,10 +285,27 @@ static struct zpci_bus *zpci_bus_alloc(int topo, bool topo_is_tid)
> return zbus;
> }
>
> +static void pci_dma_range_setup(struct pci_dev *pdev)
> +{
> + struct zpci_dev *zdev = to_zpci(pdev);
> + struct bus_dma_region *map;
> +
> + map = kzalloc(sizeof(*map), GFP_KERNEL);
> + if (!map)
> + return;
> +
> + map->cpu_start = 0;
> + map->dma_start = PAGE_ALIGN(zdev->start_dma);
> + map->size = zdev->end_dma - zdev->start_dma + 1;
Shouldn't the size calculation also use PAGE_ALIGN(zdev->start_dma) in
case that actually aligns? And for zdev->end_dma I think we want
PAGE_ALIGN_DOWN(zdev->end_dma + 1). There is also the, hopefully
theoretical, case that zdev->end_dma could be less than PAGE_SIZE
larger than zdev->start_dma. I guess in that case we should just ensure
that size is 0, maybe using max(…, 0) around the expression.
> + pdev->dev.dma_range_map = map;
> +}
> +
> void pcibios_bus_add_device(struct pci_dev *pdev)
> {
> struct zpci_dev *zdev = to_zpci(pdev);
>
> + pci_dma_range_setup(pdev);
> +
> /*
> * With pdev->no_vf_scan the common PCI probing code does not
> * perform PF/VF linking.
I was initially wondering where the map is freed, then found the
kfree()s in common code in device_unbind_cleanup() and
device_release().
Despite the thoughts above I already tested this on my setup and it
worked great, so you can already add my:
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain
2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
2025-01-28 17:30 ` Jason Gunthorpe
@ 2025-01-29 10:29 ` Niklas Schnelle
2025-01-30 7:43 ` Niklas Schnelle
2 siblings, 0 replies; 10+ messages in thread
From: Niklas Schnelle @ 2025-01-29 10:29 UTC (permalink / raw)
To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
On Fri, 2025-01-24 at 15:17 -0500, Matthew Rosato wrote:
> Enabled via the kernel command-line 'iommu.passthrough=1' option.
>
> Introduce the concept of identity domains to s390-iommu, which relies on
> the bus_dma_region to offset identity mappings to the start of the DMA
> aperture advertized by CLP.
>
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
> arch/s390/pci/pci.c | 6 ++-
> drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++---------
> 2 files changed, 76 insertions(+), 25 deletions(-)
>
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index 88f72745fa59..758b23331754 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
> @@ -124,14 +124,16 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
> struct zpci_fib fib = {0};
> u8 cc;
>
> - WARN_ON_ONCE(iota & 0x3fff);
> fib.pba = base;
> /* Work around off by one in ISM virt device */
> if (zdev->pft == PCI_FUNC_TYPE_ISM && limit > base)
> fib.pal = limit + (1 << 12);
> else
> fib.pal = limit;
> - fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
> + if (iota == 0)
> + fib.iota = iota;
> + else
> + fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
The logic here did confuse me resolving the merge conflict with your
other series. Maybe a ternary conves better that this is really about
not applying the flags if the I/O address translation anchor isn't
used. Something like:
fib.iota = iota | ((iota) ? ZPCI_IOTA_RTTO_FLASG : 0);
> fib.gd = zdev->gisa;
> cc = zpci_mod_fc(req, &fib, status);
> if (cc)
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index fbdeded3d48b..3d93a9644fca 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -16,7 +16,7 @@
>
> #include "dma-iommu.h"
>
> -static const struct iommu_ops s390_iommu_ops;
> +static const struct iommu_ops s390_iommu_ops, s390_iommu_rtr_ops;
>
> static struct kmem_cache *dma_region_table_cache;
> static struct kmem_cache *dma_page_table_cache;
> @@ -392,9 +392,11 @@ static int blocking_domain_attach_device(struct iommu_domain *domain,
> return 0;
>
> s390_domain = to_s390_domain(zdev->s390_domain);
> - spin_lock_irqsave(&s390_domain->list_lock, flags);
> - list_del_rcu(&zdev->iommu_list);
> - spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> + if (zdev->dma_table) {
> + spin_lock_irqsave(&s390_domain->list_lock, flags);
> + list_del_rcu(&zdev->iommu_list);
> + spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> + }
>
> zpci_unregister_ioat(zdev, 0);
> zdev->dma_table = NULL;
> @@ -723,7 +725,13 @@ int zpci_init_iommu(struct zpci_dev *zdev)
> if (rc)
> goto out_err;
>
> - rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops, NULL);
> + if (zdev->rtr_avail) {
> + rc = iommu_device_register(&zdev->iommu_dev,
> + &s390_iommu_rtr_ops, NULL);
> + } else {
> + rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops,
> + NULL);
> + }
> if (rc)
> goto out_sysfs;
>
> @@ -787,6 +795,39 @@ static int __init s390_iommu_init(void)
> }
> subsys_initcall(s390_iommu_init);
>
> +static int s390_attach_dev_identity(struct iommu_domain *domain,
> + struct device *dev)
> +{
> + struct zpci_dev *zdev = to_zpci_dev(dev);
> + u8 status;
> + int cc;
> +
> + blocking_domain_attach_device(&blocking_domain, dev);
> +
> + /* If we fail now DMA remains blocked via blocking domain */
> + cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> + 0, &status);
> + /*
> + * If the device is undergoing error recovery the reset code
> + * will re-establish the new domain.
> + */
> + if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL)
> + return -EIO;
> +
> + zdev_s390_domain_update(zdev, domain);
> +
> + return 0;
> +}
> +
> +static const struct iommu_domain_ops s390_identity_ops = {
> + .attach_dev = s390_attach_dev_identity,
> +};
> +
> +static struct iommu_domain s390_identity_domain = {
> + .type = IOMMU_DOMAIN_IDENTITY,
> + .ops = &s390_identity_ops,
> +};
> +
> static struct iommu_domain blocking_domain = {
> .type = IOMMU_DOMAIN_BLOCKED,
> .ops = &(const struct iommu_domain_ops) {
> @@ -794,23 +835,31 @@ static struct iommu_domain blocking_domain = {
> }
> };
>
> -static const struct iommu_ops s390_iommu_ops = {
> - .blocked_domain = &blocking_domain,
> - .release_domain = &blocking_domain,
> - .capable = s390_iommu_capable,
> - .domain_alloc_paging = s390_domain_alloc_paging,
> - .probe_device = s390_iommu_probe_device,
> - .device_group = generic_device_group,
> - .pgsize_bitmap = SZ_4K,
> - .get_resv_regions = s390_iommu_get_resv_regions,
> - .default_domain_ops = &(const struct iommu_domain_ops) {
> - .attach_dev = s390_iommu_attach_device,
> - .map_pages = s390_iommu_map_pages,
> - .unmap_pages = s390_iommu_unmap_pages,
> - .flush_iotlb_all = s390_iommu_flush_iotlb_all,
> - .iotlb_sync = s390_iommu_iotlb_sync,
> - .iotlb_sync_map = s390_iommu_iotlb_sync_map,
> - .iova_to_phys = s390_iommu_iova_to_phys,
> - .free = s390_domain_free,
> +#define S390_IOMMU_COMMON_OPS() \
> + .blocked_domain = &blocking_domain, \
> + .release_domain = &blocking_domain, \
> + .capable = s390_iommu_capable, \
> + .domain_alloc_paging = s390_domain_alloc_paging, \
> + .probe_device = s390_iommu_probe_device, \
> + .device_group = generic_device_group, \
> + .pgsize_bitmap = SZ_4K, \
> + .get_resv_regions = s390_iommu_get_resv_regions, \
> + .default_domain_ops = &(const struct iommu_domain_ops) { \
> + .attach_dev = s390_iommu_attach_device, \
> + .map_pages = s390_iommu_map_pages, \
> + .unmap_pages = s390_iommu_unmap_pages, \
> + .flush_iotlb_all = s390_iommu_flush_iotlb_all, \
> + .iotlb_sync = s390_iommu_iotlb_sync, \
> + .iotlb_sync_map = s390_iommu_iotlb_sync_map, \
> + .iova_to_phys = s390_iommu_iova_to_phys, \
> + .free = s390_domain_free, \
> }
> +
> +static const struct iommu_ops s390_iommu_ops = {
> + S390_IOMMU_COMMON_OPS()
> +};
> +
> +static const struct iommu_ops s390_iommu_rtr_ops = {
> + .identity_domain = &s390_identity_domain,
> + S390_IOMMU_COMMON_OPS()
> };
Other than the nit above this all looks good to me and as stated on
other patches I did give this a try in my setup.
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain
2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
2025-01-28 17:30 ` Jason Gunthorpe
2025-01-29 10:29 ` Niklas Schnelle
@ 2025-01-30 7:43 ` Niklas Schnelle
2025-02-05 19:55 ` Matthew Rosato
2 siblings, 1 reply; 10+ messages in thread
From: Niklas Schnelle @ 2025-01-30 7:43 UTC (permalink / raw)
To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
On Fri, 2025-01-24 at 15:17 -0500, Matthew Rosato wrote:
> Enabled via the kernel command-line 'iommu.passthrough=1' option.
>
> Introduce the concept of identity domains to s390-iommu, which relies on
> the bus_dma_region to offset identity mappings to the start of the DMA
> aperture advertized by CLP.
>
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
> arch/s390/pci/pci.c | 6 ++-
> drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++---------
> 2 files changed, 76 insertions(+), 25 deletions(-)
>
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index 88f72745fa59..758b23331754 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
> @@ -124,14 +124,16 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
> struct zpci_fib fib = {0};
> u8 cc;
>
> - WARN_ON_ONCE(iota & 0x3fff);
> fib.pba = base;
> /* Work around off by one in ISM virt device */
> if (zdev->pft == PCI_FUNC_TYPE_ISM && limit > base)
> fib.pal = limit + (1 << 12);
> else
> fib.pal = limit;
> - fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
> + if (iota == 0)
> + fib.iota = iota;
Taking another look, I think there is a small problem with the logic of
passing iota == 0 to indicate direct mapping. In
zpci_hot_reset_device() we call zpci_register_ioat() with iota set to
virt_to_phys(zdev->dma_table) and while zdev->dma_table is NULL for the
identity domain we can't rely on virt_to_phys(NULL) == NULL.
> + else
> + fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
> fib.gd = zdev->gisa;
> cc = zpci_mod_fc(req, &fib, status);
> if (cc)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain
2025-01-30 7:43 ` Niklas Schnelle
@ 2025-02-05 19:55 ` Matthew Rosato
0 siblings, 0 replies; 10+ messages in thread
From: Matthew Rosato @ 2025-02-05 19:55 UTC (permalink / raw)
To: Niklas Schnelle, joro, will, robin.murphy, gerald.schaefer
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
On 1/30/25 2:43 AM, Niklas Schnelle wrote:
> On Fri, 2025-01-24 at 15:17 -0500, Matthew Rosato wrote:
>> Enabled via the kernel command-line 'iommu.passthrough=1' option.
>>
>> Introduce the concept of identity domains to s390-iommu, which relies on
>> the bus_dma_region to offset identity mappings to the start of the DMA
>> aperture advertized by CLP.
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>> arch/s390/pci/pci.c | 6 ++-
>> drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++---------
>> 2 files changed, 76 insertions(+), 25 deletions(-)
>>
>> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
>> index 88f72745fa59..758b23331754 100644
>> --- a/arch/s390/pci/pci.c
>> +++ b/arch/s390/pci/pci.c
>> @@ -124,14 +124,16 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
>> struct zpci_fib fib = {0};
>> u8 cc;
>>
>> - WARN_ON_ONCE(iota & 0x3fff);
>> fib.pba = base;
>> /* Work around off by one in ISM virt device */
>> if (zdev->pft == PCI_FUNC_TYPE_ISM && limit > base)
>> fib.pal = limit + (1 << 12);
>> else
>> fib.pal = limit;
>> - fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
>> + if (iota == 0)
>> + fib.iota = iota;
>
> Taking another look, I think there is a small problem with the logic of
> passing iota == 0 to indicate direct mapping. In
> zpci_hot_reset_device() we call zpci_register_ioat() with iota set to
> virt_to_phys(zdev->dma_table) and while zdev->dma_table is NULL for the
> identity domain we can't rely on virt_to_phys(NULL) == NULL.
Thanks for pointing this out. We discussed this a fair bit off-list, but to summarize here: I'll include an additional patch in the next version that will drive IOAT registration code through s390-iommu, since really this owns the dma_table and knows when it does (not) make sense to register it. Places in s390 code that need to perform IOAT re-registration (such as zpci_hot_reset_device) will call into this routine and let s390-iommu determine the correct action based upon the type of domain in use.
I'll set that patch before this one, so then this patch remains unchanged except to replace the zpci_register_ioat call added by this patch in s390_attach_dev_identity with the new routine.
>
>> + else
>> + fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
>> fib.gd = zdev->gisa;
>> cc = zpci_mod_fc(req, &fib, status);
>> if (cc)
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-02-05 19:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-24 20:17 [PATCH v3 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
2025-01-24 20:17 ` [PATCH v3 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
2025-01-29 10:07 ` Niklas Schnelle
2025-01-24 20:17 ` [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
2025-01-29 10:20 ` Niklas Schnelle
2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
2025-01-28 17:30 ` Jason Gunthorpe
2025-01-29 10:29 ` Niklas Schnelle
2025-01-30 7:43 ` Niklas Schnelle
2025-02-05 19:55 ` Matthew Rosato
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox