* [PATCH v3 0/3] iommu/s390: add support for IOMMU passthrough
@ 2025-01-24 20:17 Matthew Rosato
2025-01-24 20:17 ` [PATCH v3 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Matthew Rosato @ 2025-01-24 20:17 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
This series introduces the ability for certain devices on s390 to bypass
a layer of IOMMU via the iommu.passthrough=1 option. In order to enable
this, the concept of an identity domain is added to s390-iommu. On s390,
IOMMU passthrough is only allowed if indicated via a special bit in s390
CLP data for the associated device group, otherwise we must fall back to
dma-iommu.
Changes for v3:
- Rebase onto 6.13
- fixed bus_dma_region size (Niklas)
Changes for v2:
- Remove ARCH_HAS_PHYS_TO_DMA, use bus_dma_region
- Remove use of def_domain_type, use 1 of 2 ops chosen at init
Matthew Rosato (3):
s390/pci: check for relaxed translation capability
s390/pci: store DMA offset in bus_dma_region
iommu/s390: implement iommu passthrough via identity domain
arch/s390/include/asm/pci.h | 2 +-
arch/s390/include/asm/pci_clp.h | 4 +-
arch/s390/pci/pci.c | 6 ++-
arch/s390/pci/pci_bus.c | 18 +++++++
arch/s390/pci/pci_clp.c | 1 +
drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++--------
6 files changed, 99 insertions(+), 27 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v3 1/3] s390/pci: check for relaxed translation capability 2025-01-24 20:17 [PATCH v3 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato @ 2025-01-24 20:17 ` Matthew Rosato 2025-01-29 10:07 ` Niklas Schnelle 2025-01-24 20:17 ` [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato 2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato 2 siblings, 1 reply; 10+ messages in thread From: Matthew Rosato @ 2025-01-24 20:17 UTC (permalink / raw) To: joro, will, robin.murphy, gerald.schaefer, schnelle Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu, linux-kernel, linux-s390 For each zdev, record whether or not CLP indicates relaxed translation capability for the associated device group. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> --- arch/s390/include/asm/pci.h | 2 +- arch/s390/include/asm/pci_clp.h | 4 +++- arch/s390/pci/pci_clp.c | 1 + 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 474e1f8d1d3c..8fe4c7a72c0b 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -144,7 +144,7 @@ struct zpci_dev { u8 util_str_avail : 1; u8 irqs_registered : 1; u8 tid_avail : 1; - u8 reserved : 1; + u8 rtr_avail : 1; /* Relaxed translation allowed */ unsigned int devfn; /* DEVFN part of the RID*/ u8 pfip[CLP_PFIP_NR_SEGMENTS]; /* pci function internal path */ diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h index 3fff2f7095c8..7ebff39c84b3 100644 --- a/arch/s390/include/asm/pci_clp.h +++ b/arch/s390/include/asm/pci_clp.h @@ -156,7 +156,9 @@ struct clp_rsp_query_pci_grp { u16 : 4; u16 noi : 12; /* number of interrupts */ u8 version; - u8 : 6; + u8 : 2; + u8 rtr : 1; /* Relaxed translation requirement */ + u8 : 3; u8 frame : 1; u8 refresh : 1; /* TLB refresh mode */ u16 : 3; diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c index 14bf7e8d06b7..27248686e588 100644 --- a/arch/s390/pci/pci_clp.c +++ b/arch/s390/pci/pci_clp.c @@ -112,6 +112,7 @@ static void clp_store_query_pci_fngrp(struct zpci_dev *zdev, zdev->version = response->version; zdev->maxstbl = response->maxstbl; zdev->dtsm = response->dtsm; + zdev->rtr_avail = response->rtr; switch (response->version) { case 1: -- 2.48.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] s390/pci: check for relaxed translation capability 2025-01-24 20:17 ` [PATCH v3 1/3] s390/pci: check for relaxed translation capability Matthew Rosato @ 2025-01-29 10:07 ` Niklas Schnelle 0 siblings, 0 replies; 10+ messages in thread From: Niklas Schnelle @ 2025-01-29 10:07 UTC (permalink / raw) To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu, linux-kernel, linux-s390 On Fri, 2025-01-24 at 15:17 -0500, Matthew Rosato wrote: > For each zdev, record whether or not CLP indicates relaxed translation > capability for the associated device group. > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > arch/s390/include/asm/pci.h | 2 +- > arch/s390/include/asm/pci_clp.h | 4 +++- > arch/s390/pci/pci_clp.c | 1 + > 3 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h > index 474e1f8d1d3c..8fe4c7a72c0b 100644 > --- a/arch/s390/include/asm/pci.h > +++ b/arch/s390/include/asm/pci.h > @@ -144,7 +144,7 @@ struct zpci_dev { > u8 util_str_avail : 1; > u8 irqs_registered : 1; > u8 tid_avail : 1; > - u8 reserved : 1; > + u8 rtr_avail : 1; /* Relaxed translation allowed */ > unsigned int devfn; /* DEVFN part of the RID*/ > > u8 pfip[CLP_PFIP_NR_SEGMENTS]; /* pci function internal path */ > diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h > index 3fff2f7095c8..7ebff39c84b3 100644 > --- a/arch/s390/include/asm/pci_clp.h > +++ b/arch/s390/include/asm/pci_clp.h > @@ -156,7 +156,9 @@ struct clp_rsp_query_pci_grp { > u16 : 4; > u16 noi : 12; /* number of interrupts */ > u8 version; > - u8 : 6; > + u8 : 2; > + u8 rtr : 1; /* Relaxed translation requirement */ > + u8 : 3; > u8 frame : 1; > u8 refresh : 1; /* TLB refresh mode */ > u16 : 3; > diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c > index 14bf7e8d06b7..27248686e588 100644 > --- a/arch/s390/pci/pci_clp.c > +++ b/arch/s390/pci/pci_clp.c > @@ -112,6 +112,7 @@ static void clp_store_query_pci_fngrp(struct zpci_dev *zdev, > zdev->version = response->version; > zdev->maxstbl = response->maxstbl; > zdev->dtsm = response->dtsm; > + zdev->rtr_avail = response->rtr; > > switch (response->version) { > case 1: Checked the bit positions and as discussed out of band I gave the whole series a try too. So feel free to add: Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region 2025-01-24 20:17 [PATCH v3 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato 2025-01-24 20:17 ` [PATCH v3 1/3] s390/pci: check for relaxed translation capability Matthew Rosato @ 2025-01-24 20:17 ` Matthew Rosato 2025-01-29 10:20 ` Niklas Schnelle 2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato 2 siblings, 1 reply; 10+ messages in thread From: Matthew Rosato @ 2025-01-24 20:17 UTC (permalink / raw) To: joro, will, robin.murphy, gerald.schaefer, schnelle Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu, linux-kernel, linux-s390 PCI devices on s390 have a DMA offset that is reported via CLP. In preparation for allowing identity domains, setup the bus_dma_region for all PCI devices using the reported CLP value. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> --- arch/s390/pci/pci_bus.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c index d5ace00d10f0..51fa993b64fc 100644 --- a/arch/s390/pci/pci_bus.c +++ b/arch/s390/pci/pci_bus.c @@ -19,6 +19,7 @@ #include <linux/jump_label.h> #include <linux/pci.h> #include <linux/printk.h> +#include <linux/dma-direct.h> #include <asm/pci_clp.h> #include <asm/pci_dma.h> @@ -284,10 +285,27 @@ static struct zpci_bus *zpci_bus_alloc(int topo, bool topo_is_tid) return zbus; } +static void pci_dma_range_setup(struct pci_dev *pdev) +{ + struct zpci_dev *zdev = to_zpci(pdev); + struct bus_dma_region *map; + + map = kzalloc(sizeof(*map), GFP_KERNEL); + if (!map) + return; + + map->cpu_start = 0; + map->dma_start = PAGE_ALIGN(zdev->start_dma); + map->size = zdev->end_dma - zdev->start_dma + 1; + pdev->dev.dma_range_map = map; +} + void pcibios_bus_add_device(struct pci_dev *pdev) { struct zpci_dev *zdev = to_zpci(pdev); + pci_dma_range_setup(pdev); + /* * With pdev->no_vf_scan the common PCI probing code does not * perform PF/VF linking. -- 2.48.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region 2025-01-24 20:17 ` [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato @ 2025-01-29 10:20 ` Niklas Schnelle 0 siblings, 0 replies; 10+ messages in thread From: Niklas Schnelle @ 2025-01-29 10:20 UTC (permalink / raw) To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu, linux-kernel, linux-s390 On Fri, 2025-01-24 at 15:17 -0500, Matthew Rosato wrote: > PCI devices on s390 have a DMA offset that is reported via CLP. In > preparation for allowing identity domains, setup the bus_dma_region > for all PCI devices using the reported CLP value. > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > arch/s390/pci/pci_bus.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c > index d5ace00d10f0..51fa993b64fc 100644 > --- a/arch/s390/pci/pci_bus.c > +++ b/arch/s390/pci/pci_bus.c > @@ -19,6 +19,7 @@ > #include <linux/jump_label.h> > #include <linux/pci.h> > #include <linux/printk.h> > +#include <linux/dma-direct.h> > > #include <asm/pci_clp.h> > #include <asm/pci_dma.h> > @@ -284,10 +285,27 @@ static struct zpci_bus *zpci_bus_alloc(int topo, bool topo_is_tid) > return zbus; > } > > +static void pci_dma_range_setup(struct pci_dev *pdev) > +{ > + struct zpci_dev *zdev = to_zpci(pdev); > + struct bus_dma_region *map; > + > + map = kzalloc(sizeof(*map), GFP_KERNEL); > + if (!map) > + return; > + > + map->cpu_start = 0; > + map->dma_start = PAGE_ALIGN(zdev->start_dma); > + map->size = zdev->end_dma - zdev->start_dma + 1; Shouldn't the size calculation also use PAGE_ALIGN(zdev->start_dma) in case that actually aligns? And for zdev->end_dma I think we want PAGE_ALIGN_DOWN(zdev->end_dma + 1). There is also the, hopefully theoretical, case that zdev->end_dma could be less than PAGE_SIZE larger than zdev->start_dma. I guess in that case we should just ensure that size is 0, maybe using max(…, 0) around the expression. > + pdev->dev.dma_range_map = map; > +} > + > void pcibios_bus_add_device(struct pci_dev *pdev) > { > struct zpci_dev *zdev = to_zpci(pdev); > > + pci_dma_range_setup(pdev); > + > /* > * With pdev->no_vf_scan the common PCI probing code does not > * perform PF/VF linking. I was initially wondering where the map is freed, then found the kfree()s in common code in device_unbind_cleanup() and device_release(). Despite the thoughts above I already tested this on my setup and it worked great, so you can already add my: Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain 2025-01-24 20:17 [PATCH v3 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato 2025-01-24 20:17 ` [PATCH v3 1/3] s390/pci: check for relaxed translation capability Matthew Rosato 2025-01-24 20:17 ` [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato @ 2025-01-24 20:17 ` Matthew Rosato 2025-01-28 17:30 ` Jason Gunthorpe ` (2 more replies) 2 siblings, 3 replies; 10+ messages in thread From: Matthew Rosato @ 2025-01-24 20:17 UTC (permalink / raw) To: joro, will, robin.murphy, gerald.schaefer, schnelle Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu, linux-kernel, linux-s390 Enabled via the kernel command-line 'iommu.passthrough=1' option. Introduce the concept of identity domains to s390-iommu, which relies on the bus_dma_region to offset identity mappings to the start of the DMA aperture advertized by CLP. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> --- arch/s390/pci/pci.c | 6 ++- drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++--------- 2 files changed, 76 insertions(+), 25 deletions(-) diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index 88f72745fa59..758b23331754 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -124,14 +124,16 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas, struct zpci_fib fib = {0}; u8 cc; - WARN_ON_ONCE(iota & 0x3fff); fib.pba = base; /* Work around off by one in ISM virt device */ if (zdev->pft == PCI_FUNC_TYPE_ISM && limit > base) fib.pal = limit + (1 << 12); else fib.pal = limit; - fib.iota = iota | ZPCI_IOTA_RTTO_FLAG; + if (iota == 0) + fib.iota = iota; + else + fib.iota = iota | ZPCI_IOTA_RTTO_FLAG; fib.gd = zdev->gisa; cc = zpci_mod_fc(req, &fib, status); if (cc) diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c index fbdeded3d48b..3d93a9644fca 100644 --- a/drivers/iommu/s390-iommu.c +++ b/drivers/iommu/s390-iommu.c @@ -16,7 +16,7 @@ #include "dma-iommu.h" -static const struct iommu_ops s390_iommu_ops; +static const struct iommu_ops s390_iommu_ops, s390_iommu_rtr_ops; static struct kmem_cache *dma_region_table_cache; static struct kmem_cache *dma_page_table_cache; @@ -392,9 +392,11 @@ static int blocking_domain_attach_device(struct iommu_domain *domain, return 0; s390_domain = to_s390_domain(zdev->s390_domain); - spin_lock_irqsave(&s390_domain->list_lock, flags); - list_del_rcu(&zdev->iommu_list); - spin_unlock_irqrestore(&s390_domain->list_lock, flags); + if (zdev->dma_table) { + spin_lock_irqsave(&s390_domain->list_lock, flags); + list_del_rcu(&zdev->iommu_list); + spin_unlock_irqrestore(&s390_domain->list_lock, flags); + } zpci_unregister_ioat(zdev, 0); zdev->dma_table = NULL; @@ -723,7 +725,13 @@ int zpci_init_iommu(struct zpci_dev *zdev) if (rc) goto out_err; - rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops, NULL); + if (zdev->rtr_avail) { + rc = iommu_device_register(&zdev->iommu_dev, + &s390_iommu_rtr_ops, NULL); + } else { + rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops, + NULL); + } if (rc) goto out_sysfs; @@ -787,6 +795,39 @@ static int __init s390_iommu_init(void) } subsys_initcall(s390_iommu_init); +static int s390_attach_dev_identity(struct iommu_domain *domain, + struct device *dev) +{ + struct zpci_dev *zdev = to_zpci_dev(dev); + u8 status; + int cc; + + blocking_domain_attach_device(&blocking_domain, dev); + + /* If we fail now DMA remains blocked via blocking domain */ + cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma, + 0, &status); + /* + * If the device is undergoing error recovery the reset code + * will re-establish the new domain. + */ + if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL) + return -EIO; + + zdev_s390_domain_update(zdev, domain); + + return 0; +} + +static const struct iommu_domain_ops s390_identity_ops = { + .attach_dev = s390_attach_dev_identity, +}; + +static struct iommu_domain s390_identity_domain = { + .type = IOMMU_DOMAIN_IDENTITY, + .ops = &s390_identity_ops, +}; + static struct iommu_domain blocking_domain = { .type = IOMMU_DOMAIN_BLOCKED, .ops = &(const struct iommu_domain_ops) { @@ -794,23 +835,31 @@ static struct iommu_domain blocking_domain = { } }; -static const struct iommu_ops s390_iommu_ops = { - .blocked_domain = &blocking_domain, - .release_domain = &blocking_domain, - .capable = s390_iommu_capable, - .domain_alloc_paging = s390_domain_alloc_paging, - .probe_device = s390_iommu_probe_device, - .device_group = generic_device_group, - .pgsize_bitmap = SZ_4K, - .get_resv_regions = s390_iommu_get_resv_regions, - .default_domain_ops = &(const struct iommu_domain_ops) { - .attach_dev = s390_iommu_attach_device, - .map_pages = s390_iommu_map_pages, - .unmap_pages = s390_iommu_unmap_pages, - .flush_iotlb_all = s390_iommu_flush_iotlb_all, - .iotlb_sync = s390_iommu_iotlb_sync, - .iotlb_sync_map = s390_iommu_iotlb_sync_map, - .iova_to_phys = s390_iommu_iova_to_phys, - .free = s390_domain_free, +#define S390_IOMMU_COMMON_OPS() \ + .blocked_domain = &blocking_domain, \ + .release_domain = &blocking_domain, \ + .capable = s390_iommu_capable, \ + .domain_alloc_paging = s390_domain_alloc_paging, \ + .probe_device = s390_iommu_probe_device, \ + .device_group = generic_device_group, \ + .pgsize_bitmap = SZ_4K, \ + .get_resv_regions = s390_iommu_get_resv_regions, \ + .default_domain_ops = &(const struct iommu_domain_ops) { \ + .attach_dev = s390_iommu_attach_device, \ + .map_pages = s390_iommu_map_pages, \ + .unmap_pages = s390_iommu_unmap_pages, \ + .flush_iotlb_all = s390_iommu_flush_iotlb_all, \ + .iotlb_sync = s390_iommu_iotlb_sync, \ + .iotlb_sync_map = s390_iommu_iotlb_sync_map, \ + .iova_to_phys = s390_iommu_iova_to_phys, \ + .free = s390_domain_free, \ } + +static const struct iommu_ops s390_iommu_ops = { + S390_IOMMU_COMMON_OPS() +}; + +static const struct iommu_ops s390_iommu_rtr_ops = { + .identity_domain = &s390_identity_domain, + S390_IOMMU_COMMON_OPS() }; -- 2.48.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain 2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato @ 2025-01-28 17:30 ` Jason Gunthorpe 2025-01-29 10:29 ` Niklas Schnelle 2025-01-30 7:43 ` Niklas Schnelle 2 siblings, 0 replies; 10+ messages in thread From: Jason Gunthorpe @ 2025-01-28 17:30 UTC (permalink / raw) To: Matthew Rosato Cc: joro, will, robin.murphy, gerald.schaefer, schnelle, hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu, linux-kernel, linux-s390 On Fri, Jan 24, 2025 at 03:17:17PM -0500, Matthew Rosato wrote: > Enabled via the kernel command-line 'iommu.passthrough=1' option. > > Introduce the concept of identity domains to s390-iommu, which relies on > the bus_dma_region to offset identity mappings to the start of the DMA > aperture advertized by CLP. > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > arch/s390/pci/pci.c | 6 ++- > drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++--------- > 2 files changed, 76 insertions(+), 25 deletions(-) Seems Ok Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> > -static const struct iommu_ops s390_iommu_ops = { > - .blocked_domain = &blocking_domain, > - .release_domain = &blocking_domain, > - .capable = s390_iommu_capable, > - .domain_alloc_paging = s390_domain_alloc_paging, > - .probe_device = s390_iommu_probe_device, > - .device_group = generic_device_group, > - .pgsize_bitmap = SZ_4K, > - .get_resv_regions = s390_iommu_get_resv_regions, > - .default_domain_ops = &(const struct iommu_domain_ops) { > - .attach_dev = s390_iommu_attach_device, > - .map_pages = s390_iommu_map_pages, > - .unmap_pages = s390_iommu_unmap_pages, > - .flush_iotlb_all = s390_iommu_flush_iotlb_all, > - .iotlb_sync = s390_iommu_iotlb_sync, > - .iotlb_sync_map = s390_iommu_iotlb_sync_map, > - .iova_to_phys = s390_iommu_iova_to_phys, > - .free = s390_domain_free, > +#define S390_IOMMU_COMMON_OPS() \ > + .blocked_domain = &blocking_domain, \ > + .release_domain = &blocking_domain, \ > + .capable = s390_iommu_capable, \ > + .domain_alloc_paging = s390_domain_alloc_paging, \ > + .probe_device = s390_iommu_probe_device, \ > + .device_group = generic_device_group, \ > + .pgsize_bitmap = SZ_4K, \ > + .get_resv_regions = s390_iommu_get_resv_regions, \ > + .default_domain_ops = &(const struct iommu_domain_ops) { \ > + .attach_dev = s390_iommu_attach_device, \ > + .map_pages = s390_iommu_map_pages, \ > + .unmap_pages = s390_iommu_unmap_pages, \ > + .flush_iotlb_all = s390_iommu_flush_iotlb_all, \ > + .iotlb_sync = s390_iommu_iotlb_sync, \ > + .iotlb_sync_map = s390_iommu_iotlb_sync_map, \ > + .iova_to_phys = s390_iommu_iova_to_phys, \ > + .free = s390_domain_free, \ > } > + > +static const struct iommu_ops s390_iommu_ops = { > + S390_IOMMU_COMMON_OPS() > +}; > + > +static const struct iommu_ops s390_iommu_rtr_ops = { > + .identity_domain = &s390_identity_domain, > + S390_IOMMU_COMMON_OPS() > }; Though it is a pattern in iommu drivers to use a non-cost ops and mutate them during probe. For instance you could NULL s390_iommu_rtr_ops->identity_domain if the platform does not support it. On the other hand your version with a const ops is security nicer. Alternatively, I have a patch series that adds a domain_alloc_identity() function to get virtio-iommu moved to the new APIs, that could work well here too and keep the const ops. Jason ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain 2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato 2025-01-28 17:30 ` Jason Gunthorpe @ 2025-01-29 10:29 ` Niklas Schnelle 2025-01-30 7:43 ` Niklas Schnelle 2 siblings, 0 replies; 10+ messages in thread From: Niklas Schnelle @ 2025-01-29 10:29 UTC (permalink / raw) To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu, linux-kernel, linux-s390 On Fri, 2025-01-24 at 15:17 -0500, Matthew Rosato wrote: > Enabled via the kernel command-line 'iommu.passthrough=1' option. > > Introduce the concept of identity domains to s390-iommu, which relies on > the bus_dma_region to offset identity mappings to the start of the DMA > aperture advertized by CLP. > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > arch/s390/pci/pci.c | 6 ++- > drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++--------- > 2 files changed, 76 insertions(+), 25 deletions(-) > > diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c > index 88f72745fa59..758b23331754 100644 > --- a/arch/s390/pci/pci.c > +++ b/arch/s390/pci/pci.c > @@ -124,14 +124,16 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas, > struct zpci_fib fib = {0}; > u8 cc; > > - WARN_ON_ONCE(iota & 0x3fff); > fib.pba = base; > /* Work around off by one in ISM virt device */ > if (zdev->pft == PCI_FUNC_TYPE_ISM && limit > base) > fib.pal = limit + (1 << 12); > else > fib.pal = limit; > - fib.iota = iota | ZPCI_IOTA_RTTO_FLAG; > + if (iota == 0) > + fib.iota = iota; > + else > + fib.iota = iota | ZPCI_IOTA_RTTO_FLAG; The logic here did confuse me resolving the merge conflict with your other series. Maybe a ternary conves better that this is really about not applying the flags if the I/O address translation anchor isn't used. Something like: fib.iota = iota | ((iota) ? ZPCI_IOTA_RTTO_FLASG : 0); > fib.gd = zdev->gisa; > cc = zpci_mod_fc(req, &fib, status); > if (cc) > diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c > index fbdeded3d48b..3d93a9644fca 100644 > --- a/drivers/iommu/s390-iommu.c > +++ b/drivers/iommu/s390-iommu.c > @@ -16,7 +16,7 @@ > > #include "dma-iommu.h" > > -static const struct iommu_ops s390_iommu_ops; > +static const struct iommu_ops s390_iommu_ops, s390_iommu_rtr_ops; > > static struct kmem_cache *dma_region_table_cache; > static struct kmem_cache *dma_page_table_cache; > @@ -392,9 +392,11 @@ static int blocking_domain_attach_device(struct iommu_domain *domain, > return 0; > > s390_domain = to_s390_domain(zdev->s390_domain); > - spin_lock_irqsave(&s390_domain->list_lock, flags); > - list_del_rcu(&zdev->iommu_list); > - spin_unlock_irqrestore(&s390_domain->list_lock, flags); > + if (zdev->dma_table) { > + spin_lock_irqsave(&s390_domain->list_lock, flags); > + list_del_rcu(&zdev->iommu_list); > + spin_unlock_irqrestore(&s390_domain->list_lock, flags); > + } > > zpci_unregister_ioat(zdev, 0); > zdev->dma_table = NULL; > @@ -723,7 +725,13 @@ int zpci_init_iommu(struct zpci_dev *zdev) > if (rc) > goto out_err; > > - rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops, NULL); > + if (zdev->rtr_avail) { > + rc = iommu_device_register(&zdev->iommu_dev, > + &s390_iommu_rtr_ops, NULL); > + } else { > + rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops, > + NULL); > + } > if (rc) > goto out_sysfs; > > @@ -787,6 +795,39 @@ static int __init s390_iommu_init(void) > } > subsys_initcall(s390_iommu_init); > > +static int s390_attach_dev_identity(struct iommu_domain *domain, > + struct device *dev) > +{ > + struct zpci_dev *zdev = to_zpci_dev(dev); > + u8 status; > + int cc; > + > + blocking_domain_attach_device(&blocking_domain, dev); > + > + /* If we fail now DMA remains blocked via blocking domain */ > + cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma, > + 0, &status); > + /* > + * If the device is undergoing error recovery the reset code > + * will re-establish the new domain. > + */ > + if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL) > + return -EIO; > + > + zdev_s390_domain_update(zdev, domain); > + > + return 0; > +} > + > +static const struct iommu_domain_ops s390_identity_ops = { > + .attach_dev = s390_attach_dev_identity, > +}; > + > +static struct iommu_domain s390_identity_domain = { > + .type = IOMMU_DOMAIN_IDENTITY, > + .ops = &s390_identity_ops, > +}; > + > static struct iommu_domain blocking_domain = { > .type = IOMMU_DOMAIN_BLOCKED, > .ops = &(const struct iommu_domain_ops) { > @@ -794,23 +835,31 @@ static struct iommu_domain blocking_domain = { > } > }; > > -static const struct iommu_ops s390_iommu_ops = { > - .blocked_domain = &blocking_domain, > - .release_domain = &blocking_domain, > - .capable = s390_iommu_capable, > - .domain_alloc_paging = s390_domain_alloc_paging, > - .probe_device = s390_iommu_probe_device, > - .device_group = generic_device_group, > - .pgsize_bitmap = SZ_4K, > - .get_resv_regions = s390_iommu_get_resv_regions, > - .default_domain_ops = &(const struct iommu_domain_ops) { > - .attach_dev = s390_iommu_attach_device, > - .map_pages = s390_iommu_map_pages, > - .unmap_pages = s390_iommu_unmap_pages, > - .flush_iotlb_all = s390_iommu_flush_iotlb_all, > - .iotlb_sync = s390_iommu_iotlb_sync, > - .iotlb_sync_map = s390_iommu_iotlb_sync_map, > - .iova_to_phys = s390_iommu_iova_to_phys, > - .free = s390_domain_free, > +#define S390_IOMMU_COMMON_OPS() \ > + .blocked_domain = &blocking_domain, \ > + .release_domain = &blocking_domain, \ > + .capable = s390_iommu_capable, \ > + .domain_alloc_paging = s390_domain_alloc_paging, \ > + .probe_device = s390_iommu_probe_device, \ > + .device_group = generic_device_group, \ > + .pgsize_bitmap = SZ_4K, \ > + .get_resv_regions = s390_iommu_get_resv_regions, \ > + .default_domain_ops = &(const struct iommu_domain_ops) { \ > + .attach_dev = s390_iommu_attach_device, \ > + .map_pages = s390_iommu_map_pages, \ > + .unmap_pages = s390_iommu_unmap_pages, \ > + .flush_iotlb_all = s390_iommu_flush_iotlb_all, \ > + .iotlb_sync = s390_iommu_iotlb_sync, \ > + .iotlb_sync_map = s390_iommu_iotlb_sync_map, \ > + .iova_to_phys = s390_iommu_iova_to_phys, \ > + .free = s390_domain_free, \ > } > + > +static const struct iommu_ops s390_iommu_ops = { > + S390_IOMMU_COMMON_OPS() > +}; > + > +static const struct iommu_ops s390_iommu_rtr_ops = { > + .identity_domain = &s390_identity_domain, > + S390_IOMMU_COMMON_OPS() > }; Other than the nit above this all looks good to me and as stated on other patches I did give this a try in my setup. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain 2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato 2025-01-28 17:30 ` Jason Gunthorpe 2025-01-29 10:29 ` Niklas Schnelle @ 2025-01-30 7:43 ` Niklas Schnelle 2025-02-05 19:55 ` Matthew Rosato 2 siblings, 1 reply; 10+ messages in thread From: Niklas Schnelle @ 2025-01-30 7:43 UTC (permalink / raw) To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu, linux-kernel, linux-s390 On Fri, 2025-01-24 at 15:17 -0500, Matthew Rosato wrote: > Enabled via the kernel command-line 'iommu.passthrough=1' option. > > Introduce the concept of identity domains to s390-iommu, which relies on > the bus_dma_region to offset identity mappings to the start of the DMA > aperture advertized by CLP. > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > arch/s390/pci/pci.c | 6 ++- > drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++--------- > 2 files changed, 76 insertions(+), 25 deletions(-) > > diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c > index 88f72745fa59..758b23331754 100644 > --- a/arch/s390/pci/pci.c > +++ b/arch/s390/pci/pci.c > @@ -124,14 +124,16 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas, > struct zpci_fib fib = {0}; > u8 cc; > > - WARN_ON_ONCE(iota & 0x3fff); > fib.pba = base; > /* Work around off by one in ISM virt device */ > if (zdev->pft == PCI_FUNC_TYPE_ISM && limit > base) > fib.pal = limit + (1 << 12); > else > fib.pal = limit; > - fib.iota = iota | ZPCI_IOTA_RTTO_FLAG; > + if (iota == 0) > + fib.iota = iota; Taking another look, I think there is a small problem with the logic of passing iota == 0 to indicate direct mapping. In zpci_hot_reset_device() we call zpci_register_ioat() with iota set to virt_to_phys(zdev->dma_table) and while zdev->dma_table is NULL for the identity domain we can't rely on virt_to_phys(NULL) == NULL. > + else > + fib.iota = iota | ZPCI_IOTA_RTTO_FLAG; > fib.gd = zdev->gisa; > cc = zpci_mod_fc(req, &fib, status); > if (cc) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain 2025-01-30 7:43 ` Niklas Schnelle @ 2025-02-05 19:55 ` Matthew Rosato 0 siblings, 0 replies; 10+ messages in thread From: Matthew Rosato @ 2025-02-05 19:55 UTC (permalink / raw) To: Niklas Schnelle, joro, will, robin.murphy, gerald.schaefer Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu, linux-kernel, linux-s390 On 1/30/25 2:43 AM, Niklas Schnelle wrote: > On Fri, 2025-01-24 at 15:17 -0500, Matthew Rosato wrote: >> Enabled via the kernel command-line 'iommu.passthrough=1' option. >> >> Introduce the concept of identity domains to s390-iommu, which relies on >> the bus_dma_region to offset identity mappings to the start of the DMA >> aperture advertized by CLP. >> >> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> >> --- >> arch/s390/pci/pci.c | 6 ++- >> drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++--------- >> 2 files changed, 76 insertions(+), 25 deletions(-) >> >> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c >> index 88f72745fa59..758b23331754 100644 >> --- a/arch/s390/pci/pci.c >> +++ b/arch/s390/pci/pci.c >> @@ -124,14 +124,16 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas, >> struct zpci_fib fib = {0}; >> u8 cc; >> >> - WARN_ON_ONCE(iota & 0x3fff); >> fib.pba = base; >> /* Work around off by one in ISM virt device */ >> if (zdev->pft == PCI_FUNC_TYPE_ISM && limit > base) >> fib.pal = limit + (1 << 12); >> else >> fib.pal = limit; >> - fib.iota = iota | ZPCI_IOTA_RTTO_FLAG; >> + if (iota == 0) >> + fib.iota = iota; > > Taking another look, I think there is a small problem with the logic of > passing iota == 0 to indicate direct mapping. In > zpci_hot_reset_device() we call zpci_register_ioat() with iota set to > virt_to_phys(zdev->dma_table) and while zdev->dma_table is NULL for the > identity domain we can't rely on virt_to_phys(NULL) == NULL. Thanks for pointing this out. We discussed this a fair bit off-list, but to summarize here: I'll include an additional patch in the next version that will drive IOAT registration code through s390-iommu, since really this owns the dma_table and knows when it does (not) make sense to register it. Places in s390 code that need to perform IOAT re-registration (such as zpci_hot_reset_device) will call into this routine and let s390-iommu determine the correct action based upon the type of domain in use. I'll set that patch before this one, so then this patch remains unchanged except to replace the zpci_register_ioat call added by this patch in s390_attach_dev_identity with the new routine. > >> + else >> + fib.iota = iota | ZPCI_IOTA_RTTO_FLAG; >> fib.gd = zdev->gisa; >> cc = zpci_mod_fc(req, &fib, status); >> if (cc) ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-02-05 19:56 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-24 20:17 [PATCH v3 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato 2025-01-24 20:17 ` [PATCH v3 1/3] s390/pci: check for relaxed translation capability Matthew Rosato 2025-01-29 10:07 ` Niklas Schnelle 2025-01-24 20:17 ` [PATCH v3 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato 2025-01-29 10:20 ` Niklas Schnelle 2025-01-24 20:17 ` [PATCH v3 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato 2025-01-28 17:30 ` Jason Gunthorpe 2025-01-29 10:29 ` Niklas Schnelle 2025-01-30 7:43 ` Niklas Schnelle 2025-02-05 19:55 ` Matthew Rosato
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox