* [PATCH v2 0/3] iommu/s390: add support for IOMMU passthrough
@ 2024-12-13 22:49 Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Matthew Rosato @ 2024-12-13 22:49 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
This series introduces the ability for certain devices on s390 to bypass
a layer of IOMMU via the iommu.passthrough=1 option. In order to enable
this, the concept of an identity domain is added to s390-iommu. On s390,
IOMMU passthrough is only allowed if indicated via a special bit in s390
CLP data for the associated device group, otherwise we must fall back to
dma-iommu.
Changes for v2:
- Remove ARCH_HAS_PHYS_TO_DMA, use bus_dma_region
- Remove use of def_domain_type, use 1 of 2 ops chosen at init
Matthew Rosato (3):
s390/pci: check for relaxed translation capability
s390/pci: store DMA offset in bus_dma_region
iommu/s390: implement iommu passthrough via identity domain
arch/s390/include/asm/pci.h | 2 +-
arch/s390/include/asm/pci_clp.h | 4 +-
arch/s390/pci/pci.c | 6 ++-
arch/s390/pci/pci_bus.c | 18 +++++++
arch/s390/pci/pci_clp.c | 1 +
drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++--------
6 files changed, 99 insertions(+), 27 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/3] s390/pci: check for relaxed translation capability
2024-12-13 22:49 [PATCH v2 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
@ 2024-12-13 22:49 ` Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
2 siblings, 0 replies; 7+ messages in thread
From: Matthew Rosato @ 2024-12-13 22:49 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
For each zdev, record whether or not CLP indicates relaxed translation
capability for the associated device group.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
arch/s390/include/asm/pci.h | 2 +-
arch/s390/include/asm/pci_clp.h | 4 +++-
arch/s390/pci/pci_clp.c | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 474e1f8d1d3c..8fe4c7a72c0b 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -144,7 +144,7 @@ struct zpci_dev {
u8 util_str_avail : 1;
u8 irqs_registered : 1;
u8 tid_avail : 1;
- u8 reserved : 1;
+ u8 rtr_avail : 1; /* Relaxed translation allowed */
unsigned int devfn; /* DEVFN part of the RID*/
u8 pfip[CLP_PFIP_NR_SEGMENTS]; /* pci function internal path */
diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
index 3fff2f7095c8..7ebff39c84b3 100644
--- a/arch/s390/include/asm/pci_clp.h
+++ b/arch/s390/include/asm/pci_clp.h
@@ -156,7 +156,9 @@ struct clp_rsp_query_pci_grp {
u16 : 4;
u16 noi : 12; /* number of interrupts */
u8 version;
- u8 : 6;
+ u8 : 2;
+ u8 rtr : 1; /* Relaxed translation requirement */
+ u8 : 3;
u8 frame : 1;
u8 refresh : 1; /* TLB refresh mode */
u16 : 3;
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index 14bf7e8d06b7..27248686e588 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -112,6 +112,7 @@ static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
zdev->version = response->version;
zdev->maxstbl = response->maxstbl;
zdev->dtsm = response->dtsm;
+ zdev->rtr_avail = response->rtr;
switch (response->version) {
case 1:
--
2.47.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region
2024-12-13 22:49 [PATCH v2 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
@ 2024-12-13 22:49 ` Matthew Rosato
2024-12-16 9:29 ` Niklas Schnelle
2024-12-13 22:49 ` [PATCH v2 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
2 siblings, 1 reply; 7+ messages in thread
From: Matthew Rosato @ 2024-12-13 22:49 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
PCI devices on s390 have a DMA offset that is reported via CLP. In
preparation for allowing identity domains, setup the bus_dma_region
for all PCI devices using the reported CLP value.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
arch/s390/pci/pci_bus.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
index d5ace00d10f0..14527687d0f2 100644
--- a/arch/s390/pci/pci_bus.c
+++ b/arch/s390/pci/pci_bus.c
@@ -19,6 +19,7 @@
#include <linux/jump_label.h>
#include <linux/pci.h>
#include <linux/printk.h>
+#include <linux/dma-direct.h>
#include <asm/pci_clp.h>
#include <asm/pci_dma.h>
@@ -284,10 +285,27 @@ static struct zpci_bus *zpci_bus_alloc(int topo, bool topo_is_tid)
return zbus;
}
+static void pci_dma_range_setup(struct pci_dev *pdev)
+{
+ struct zpci_dev *zdev = to_zpci(pdev);
+ struct bus_dma_region *map;
+
+ map = kzalloc(sizeof(*map), GFP_KERNEL);
+ if (!map)
+ return;
+
+ map->cpu_start = 0;
+ map->dma_start = PAGE_ALIGN(zdev->start_dma);
+ map->size = (u64)virt_to_phys(high_memory);
+ pdev->dev.dma_range_map = map;
+}
+
void pcibios_bus_add_device(struct pci_dev *pdev)
{
struct zpci_dev *zdev = to_zpci(pdev);
+ pci_dma_range_setup(pdev);
+
/*
* With pdev->no_vf_scan the common PCI probing code does not
* perform PF/VF linking.
--
2.47.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 3/3] iommu/s390: implement iommu passthrough via identity domain
2024-12-13 22:49 [PATCH v2 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
@ 2024-12-13 22:49 ` Matthew Rosato
2 siblings, 0 replies; 7+ messages in thread
From: Matthew Rosato @ 2024-12-13 22:49 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
Enabled via the kernel command-line 'iommu.passthrough=1' option.
Introduce the concept of identity domains to s390-iommu, which relies on
the bus_dma_region to offset identity mappings to the start of the DMA
aperture advertized by CLP.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
arch/s390/pci/pci.c | 6 ++-
drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++---------
2 files changed, 76 insertions(+), 25 deletions(-)
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 88f72745fa59..758b23331754 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -124,14 +124,16 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
struct zpci_fib fib = {0};
u8 cc;
- WARN_ON_ONCE(iota & 0x3fff);
fib.pba = base;
/* Work around off by one in ISM virt device */
if (zdev->pft == PCI_FUNC_TYPE_ISM && limit > base)
fib.pal = limit + (1 << 12);
else
fib.pal = limit;
- fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
+ if (iota == 0)
+ fib.iota = iota;
+ else
+ fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
fib.gd = zdev->gisa;
cc = zpci_mod_fc(req, &fib, status);
if (cc)
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index fbdeded3d48b..3d93a9644fca 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -16,7 +16,7 @@
#include "dma-iommu.h"
-static const struct iommu_ops s390_iommu_ops;
+static const struct iommu_ops s390_iommu_ops, s390_iommu_rtr_ops;
static struct kmem_cache *dma_region_table_cache;
static struct kmem_cache *dma_page_table_cache;
@@ -392,9 +392,11 @@ static int blocking_domain_attach_device(struct iommu_domain *domain,
return 0;
s390_domain = to_s390_domain(zdev->s390_domain);
- spin_lock_irqsave(&s390_domain->list_lock, flags);
- list_del_rcu(&zdev->iommu_list);
- spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+ if (zdev->dma_table) {
+ spin_lock_irqsave(&s390_domain->list_lock, flags);
+ list_del_rcu(&zdev->iommu_list);
+ spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+ }
zpci_unregister_ioat(zdev, 0);
zdev->dma_table = NULL;
@@ -723,7 +725,13 @@ int zpci_init_iommu(struct zpci_dev *zdev)
if (rc)
goto out_err;
- rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops, NULL);
+ if (zdev->rtr_avail) {
+ rc = iommu_device_register(&zdev->iommu_dev,
+ &s390_iommu_rtr_ops, NULL);
+ } else {
+ rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops,
+ NULL);
+ }
if (rc)
goto out_sysfs;
@@ -787,6 +795,39 @@ static int __init s390_iommu_init(void)
}
subsys_initcall(s390_iommu_init);
+static int s390_attach_dev_identity(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct zpci_dev *zdev = to_zpci_dev(dev);
+ u8 status;
+ int cc;
+
+ blocking_domain_attach_device(&blocking_domain, dev);
+
+ /* If we fail now DMA remains blocked via blocking domain */
+ cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
+ 0, &status);
+ /*
+ * If the device is undergoing error recovery the reset code
+ * will re-establish the new domain.
+ */
+ if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL)
+ return -EIO;
+
+ zdev_s390_domain_update(zdev, domain);
+
+ return 0;
+}
+
+static const struct iommu_domain_ops s390_identity_ops = {
+ .attach_dev = s390_attach_dev_identity,
+};
+
+static struct iommu_domain s390_identity_domain = {
+ .type = IOMMU_DOMAIN_IDENTITY,
+ .ops = &s390_identity_ops,
+};
+
static struct iommu_domain blocking_domain = {
.type = IOMMU_DOMAIN_BLOCKED,
.ops = &(const struct iommu_domain_ops) {
@@ -794,23 +835,31 @@ static struct iommu_domain blocking_domain = {
}
};
-static const struct iommu_ops s390_iommu_ops = {
- .blocked_domain = &blocking_domain,
- .release_domain = &blocking_domain,
- .capable = s390_iommu_capable,
- .domain_alloc_paging = s390_domain_alloc_paging,
- .probe_device = s390_iommu_probe_device,
- .device_group = generic_device_group,
- .pgsize_bitmap = SZ_4K,
- .get_resv_regions = s390_iommu_get_resv_regions,
- .default_domain_ops = &(const struct iommu_domain_ops) {
- .attach_dev = s390_iommu_attach_device,
- .map_pages = s390_iommu_map_pages,
- .unmap_pages = s390_iommu_unmap_pages,
- .flush_iotlb_all = s390_iommu_flush_iotlb_all,
- .iotlb_sync = s390_iommu_iotlb_sync,
- .iotlb_sync_map = s390_iommu_iotlb_sync_map,
- .iova_to_phys = s390_iommu_iova_to_phys,
- .free = s390_domain_free,
+#define S390_IOMMU_COMMON_OPS() \
+ .blocked_domain = &blocking_domain, \
+ .release_domain = &blocking_domain, \
+ .capable = s390_iommu_capable, \
+ .domain_alloc_paging = s390_domain_alloc_paging, \
+ .probe_device = s390_iommu_probe_device, \
+ .device_group = generic_device_group, \
+ .pgsize_bitmap = SZ_4K, \
+ .get_resv_regions = s390_iommu_get_resv_regions, \
+ .default_domain_ops = &(const struct iommu_domain_ops) { \
+ .attach_dev = s390_iommu_attach_device, \
+ .map_pages = s390_iommu_map_pages, \
+ .unmap_pages = s390_iommu_unmap_pages, \
+ .flush_iotlb_all = s390_iommu_flush_iotlb_all, \
+ .iotlb_sync = s390_iommu_iotlb_sync, \
+ .iotlb_sync_map = s390_iommu_iotlb_sync_map, \
+ .iova_to_phys = s390_iommu_iova_to_phys, \
+ .free = s390_domain_free, \
}
+
+static const struct iommu_ops s390_iommu_ops = {
+ S390_IOMMU_COMMON_OPS()
+};
+
+static const struct iommu_ops s390_iommu_rtr_ops = {
+ .identity_domain = &s390_identity_domain,
+ S390_IOMMU_COMMON_OPS()
};
--
2.47.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region
2024-12-13 22:49 ` [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
@ 2024-12-16 9:29 ` Niklas Schnelle
2024-12-16 10:18 ` Niklas Schnelle
0 siblings, 1 reply; 7+ messages in thread
From: Niklas Schnelle @ 2024-12-16 9:29 UTC (permalink / raw)
To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
On Fri, 2024-12-13 at 17:49 -0500, Matthew Rosato wrote:
> PCI devices on s390 have a DMA offset that is reported via CLP. In
> preparation for allowing identity domains, setup the bus_dma_region
> for all PCI devices using the reported CLP value.
>
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
> arch/s390/pci/pci_bus.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
> index d5ace00d10f0..14527687d0f2 100644
> --- a/arch/s390/pci/pci_bus.c
> +++ b/arch/s390/pci/pci_bus.c
> @@ -19,6 +19,7 @@
> #include <linux/jump_label.h>
> #include <linux/pci.h>
> #include <linux/printk.h>
> +#include <linux/dma-direct.h>
>
> #include <asm/pci_clp.h>
> #include <asm/pci_dma.h>
> @@ -284,10 +285,27 @@ static struct zpci_bus *zpci_bus_alloc(int topo, bool topo_is_tid)
> return zbus;
> }
>
> +static void pci_dma_range_setup(struct pci_dev *pdev)
> +{
> + struct zpci_dev *zdev = to_zpci(pdev);
> + struct bus_dma_region *map;
> +
> + map = kzalloc(sizeof(*map), GFP_KERNEL);
> + if (!map)
> + return;
> +
> + map->cpu_start = 0;
> + map->dma_start = PAGE_ALIGN(zdev->start_dma);
> + map->size = (u64)virt_to_phys(high_memory);
I don't think we should restrict the size here to the size of memory.
Instead I think it should be zdev->end_dma - zdev->start_dma.
Since we handle the restriction to memory size as reserved regions I
think that should be compatible. Also I think otherwise this might
break the admittedly odd s390_iommu_aperture=X kernel parameter on
LPARs.
> + pdev->dev.dma_range_map = map;
> +}
> +
> void pcibios_bus_add_device(struct pci_dev *pdev)
> {
> struct zpci_dev *zdev = to_zpci(pdev);
>
> + pci_dma_range_setup(pdev);
> +
> /*
> * With pdev->no_vf_scan the common PCI probing code does not
> * perform PF/VF linking.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region
2024-12-16 9:29 ` Niklas Schnelle
@ 2024-12-16 10:18 ` Niklas Schnelle
2024-12-16 16:51 ` Matthew Rosato
0 siblings, 1 reply; 7+ messages in thread
From: Niklas Schnelle @ 2024-12-16 10:18 UTC (permalink / raw)
To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
On Mon, 2024-12-16 at 10:29 +0100, Niklas Schnelle wrote:
> On Fri, 2024-12-13 at 17:49 -0500, Matthew Rosato wrote:
> > PCI devices on s390 have a DMA offset that is reported via CLP. In
> > preparation for allowing identity domains, setup the bus_dma_region
> > for all PCI devices using the reported CLP value.
> >
> > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > ---
> > arch/s390/pci/pci_bus.c | 18 ++++++++++++++++++
> > 1 file changed, 18 insertions(+)
> >
> > diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
> > index d5ace00d10f0..14527687d0f2 100644
> > --- a/arch/s390/pci/pci_bus.c
> > +++ b/arch/s390/pci/pci_bus.c
> > @@ -19,6 +19,7 @@
> > #include <linux/jump_label.h>
> > #include <linux/pci.h>
> > #include <linux/printk.h>
> > +#include <linux/dma-direct.h>
> >
> > #include <asm/pci_clp.h>
> > #include <asm/pci_dma.h>
> > @@ -284,10 +285,27 @@ static struct zpci_bus *zpci_bus_alloc(int topo, bool topo_is_tid)
> > return zbus;
> > }
> >
> > +static void pci_dma_range_setup(struct pci_dev *pdev)
> > +{
> > + struct zpci_dev *zdev = to_zpci(pdev);
> > + struct bus_dma_region *map;
> > +
> > + map = kzalloc(sizeof(*map), GFP_KERNEL);
> > + if (!map)
> > + return;
> > +
> > + map->cpu_start = 0;
> > + map->dma_start = PAGE_ALIGN(zdev->start_dma);
> > + map->size = (u64)virt_to_phys(high_memory);
>
> I don't think we should restrict the size here to the size of memory.
> Instead I think it should be zdev->end_dma - zdev->start_dma.
>
> Since we handle the restriction to memory size as reserved regions I
> think that should be compatible. Also I think otherwise this might
> break the admittedly odd s390_iommu_aperture=X kernel parameter on
> LPARs.
Correction, zdev->end_dma - zdev->start_dma + 1 because zdev->end_dma
is inclusive ;-)
>
>
> > + pdev->dev.dma_range_map = map;
> > +}
> > +
> > void pcibios_bus_add_device(struct pci_dev *pdev)
> > {
> > struct zpci_dev *zdev = to_zpci(pdev);
> >
> > + pci_dma_range_setup(pdev);
> > +
> > /*
> > * With pdev->no_vf_scan the common PCI probing code does not
> > * perform PF/VF linking.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region
2024-12-16 10:18 ` Niklas Schnelle
@ 2024-12-16 16:51 ` Matthew Rosato
0 siblings, 0 replies; 7+ messages in thread
From: Matthew Rosato @ 2024-12-16 16:51 UTC (permalink / raw)
To: Niklas Schnelle, joro, will, robin.murphy, gerald.schaefer
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
>>> + map->cpu_start = 0;
>>> + map->dma_start = PAGE_ALIGN(zdev->start_dma);
>>> + map->size = (u64)virt_to_phys(high_memory);
>>
>> I don't think we should restrict the size here to the size of memory.
>> Instead I think it should be zdev->end_dma - zdev->start_dma.
>>
>> Since we handle the restriction to memory size as reserved regions I
>> think that should be compatible. Also I think otherwise this might
>> break the admittedly odd s390_iommu_aperture=X kernel parameter on
>> LPARs.
>
> Correction, zdev->end_dma - zdev->start_dma + 1 because zdev->end_dma
> is inclusive ;-)
>
Forgot about that parameter, thanks... OK, will change to:
map->size = zdev->end_dma - zdev->start_dma + 1;
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-12-16 16:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-13 22:49 [PATCH v2 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
2024-12-16 9:29 ` Niklas Schnelle
2024-12-16 10:18 ` Niklas Schnelle
2024-12-16 16:51 ` Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox