* [PATCH v2 0/3] iommu/s390: add support for IOMMU passthrough
@ 2024-12-13 22:49 Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Matthew Rosato @ 2024-12-13 22:49 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
This series introduces the ability for certain devices on s390 to bypass
a layer of IOMMU via the iommu.passthrough=1 option. In order to enable
this, the concept of an identity domain is added to s390-iommu. On s390,
IOMMU passthrough is only allowed if indicated via a special bit in s390
CLP data for the associated device group, otherwise we must fall back to
dma-iommu.
Changes for v2:
- Remove ARCH_HAS_PHYS_TO_DMA, use bus_dma_region
- Remove use of def_domain_type, use 1 of 2 ops chosen at init
Matthew Rosato (3):
s390/pci: check for relaxed translation capability
s390/pci: store DMA offset in bus_dma_region
iommu/s390: implement iommu passthrough via identity domain
arch/s390/include/asm/pci.h | 2 +-
arch/s390/include/asm/pci_clp.h | 4 +-
arch/s390/pci/pci.c | 6 ++-
arch/s390/pci/pci_bus.c | 18 +++++++
arch/s390/pci/pci_clp.c | 1 +
drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++--------
6 files changed, 99 insertions(+), 27 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/3] s390/pci: check for relaxed translation capability
2024-12-13 22:49 [PATCH v2 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
@ 2024-12-13 22:49 ` Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
2 siblings, 0 replies; 7+ messages in thread
From: Matthew Rosato @ 2024-12-13 22:49 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
For each zdev, record whether or not CLP indicates relaxed translation
capability for the associated device group.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
arch/s390/include/asm/pci.h | 2 +-
arch/s390/include/asm/pci_clp.h | 4 +++-
arch/s390/pci/pci_clp.c | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 474e1f8d1d3c..8fe4c7a72c0b 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -144,7 +144,7 @@ struct zpci_dev {
u8 util_str_avail : 1;
u8 irqs_registered : 1;
u8 tid_avail : 1;
- u8 reserved : 1;
+ u8 rtr_avail : 1; /* Relaxed translation allowed */
unsigned int devfn; /* DEVFN part of the RID*/
u8 pfip[CLP_PFIP_NR_SEGMENTS]; /* pci function internal path */
diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
index 3fff2f7095c8..7ebff39c84b3 100644
--- a/arch/s390/include/asm/pci_clp.h
+++ b/arch/s390/include/asm/pci_clp.h
@@ -156,7 +156,9 @@ struct clp_rsp_query_pci_grp {
u16 : 4;
u16 noi : 12; /* number of interrupts */
u8 version;
- u8 : 6;
+ u8 : 2;
+ u8 rtr : 1; /* Relaxed translation requirement */
+ u8 : 3;
u8 frame : 1;
u8 refresh : 1; /* TLB refresh mode */
u16 : 3;
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index 14bf7e8d06b7..27248686e588 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -112,6 +112,7 @@ static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
zdev->version = response->version;
zdev->maxstbl = response->maxstbl;
zdev->dtsm = response->dtsm;
+ zdev->rtr_avail = response->rtr;
switch (response->version) {
case 1:
--
2.47.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region
2024-12-13 22:49 [PATCH v2 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
@ 2024-12-13 22:49 ` Matthew Rosato
2024-12-16 9:29 ` Niklas Schnelle
2024-12-13 22:49 ` [PATCH v2 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
2 siblings, 1 reply; 7+ messages in thread
From: Matthew Rosato @ 2024-12-13 22:49 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
PCI devices on s390 have a DMA offset that is reported via CLP. In
preparation for allowing identity domains, setup the bus_dma_region
for all PCI devices using the reported CLP value.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
arch/s390/pci/pci_bus.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
index d5ace00d10f0..14527687d0f2 100644
--- a/arch/s390/pci/pci_bus.c
+++ b/arch/s390/pci/pci_bus.c
@@ -19,6 +19,7 @@
#include <linux/jump_label.h>
#include <linux/pci.h>
#include <linux/printk.h>
+#include <linux/dma-direct.h>
#include <asm/pci_clp.h>
#include <asm/pci_dma.h>
@@ -284,10 +285,27 @@ static struct zpci_bus *zpci_bus_alloc(int topo, bool topo_is_tid)
return zbus;
}
+static void pci_dma_range_setup(struct pci_dev *pdev)
+{
+ struct zpci_dev *zdev = to_zpci(pdev);
+ struct bus_dma_region *map;
+
+ map = kzalloc(sizeof(*map), GFP_KERNEL);
+ if (!map)
+ return;
+
+ map->cpu_start = 0;
+ map->dma_start = PAGE_ALIGN(zdev->start_dma);
+ map->size = (u64)virt_to_phys(high_memory);
+ pdev->dev.dma_range_map = map;
+}
+
void pcibios_bus_add_device(struct pci_dev *pdev)
{
struct zpci_dev *zdev = to_zpci(pdev);
+ pci_dma_range_setup(pdev);
+
/*
* With pdev->no_vf_scan the common PCI probing code does not
* perform PF/VF linking.
--
2.47.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 3/3] iommu/s390: implement iommu passthrough via identity domain
2024-12-13 22:49 [PATCH v2 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
@ 2024-12-13 22:49 ` Matthew Rosato
2 siblings, 0 replies; 7+ messages in thread
From: Matthew Rosato @ 2024-12-13 22:49 UTC (permalink / raw)
To: joro, will, robin.murphy, gerald.schaefer, schnelle
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
Enabled via the kernel command-line 'iommu.passthrough=1' option.
Introduce the concept of identity domains to s390-iommu, which relies on
the bus_dma_region to offset identity mappings to the start of the DMA
aperture advertized by CLP.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
arch/s390/pci/pci.c | 6 ++-
drivers/iommu/s390-iommu.c | 95 +++++++++++++++++++++++++++++---------
2 files changed, 76 insertions(+), 25 deletions(-)
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 88f72745fa59..758b23331754 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -124,14 +124,16 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
struct zpci_fib fib = {0};
u8 cc;
- WARN_ON_ONCE(iota & 0x3fff);
fib.pba = base;
/* Work around off by one in ISM virt device */
if (zdev->pft == PCI_FUNC_TYPE_ISM && limit > base)
fib.pal = limit + (1 << 12);
else
fib.pal = limit;
- fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
+ if (iota == 0)
+ fib.iota = iota;
+ else
+ fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
fib.gd = zdev->gisa;
cc = zpci_mod_fc(req, &fib, status);
if (cc)
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index fbdeded3d48b..3d93a9644fca 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -16,7 +16,7 @@
#include "dma-iommu.h"
-static const struct iommu_ops s390_iommu_ops;
+static const struct iommu_ops s390_iommu_ops, s390_iommu_rtr_ops;
static struct kmem_cache *dma_region_table_cache;
static struct kmem_cache *dma_page_table_cache;
@@ -392,9 +392,11 @@ static int blocking_domain_attach_device(struct iommu_domain *domain,
return 0;
s390_domain = to_s390_domain(zdev->s390_domain);
- spin_lock_irqsave(&s390_domain->list_lock, flags);
- list_del_rcu(&zdev->iommu_list);
- spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+ if (zdev->dma_table) {
+ spin_lock_irqsave(&s390_domain->list_lock, flags);
+ list_del_rcu(&zdev->iommu_list);
+ spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+ }
zpci_unregister_ioat(zdev, 0);
zdev->dma_table = NULL;
@@ -723,7 +725,13 @@ int zpci_init_iommu(struct zpci_dev *zdev)
if (rc)
goto out_err;
- rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops, NULL);
+ if (zdev->rtr_avail) {
+ rc = iommu_device_register(&zdev->iommu_dev,
+ &s390_iommu_rtr_ops, NULL);
+ } else {
+ rc = iommu_device_register(&zdev->iommu_dev, &s390_iommu_ops,
+ NULL);
+ }
if (rc)
goto out_sysfs;
@@ -787,6 +795,39 @@ static int __init s390_iommu_init(void)
}
subsys_initcall(s390_iommu_init);
+static int s390_attach_dev_identity(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct zpci_dev *zdev = to_zpci_dev(dev);
+ u8 status;
+ int cc;
+
+ blocking_domain_attach_device(&blocking_domain, dev);
+
+ /* If we fail now DMA remains blocked via blocking domain */
+ cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
+ 0, &status);
+ /*
+ * If the device is undergoing error recovery the reset code
+ * will re-establish the new domain.
+ */
+ if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL)
+ return -EIO;
+
+ zdev_s390_domain_update(zdev, domain);
+
+ return 0;
+}
+
+static const struct iommu_domain_ops s390_identity_ops = {
+ .attach_dev = s390_attach_dev_identity,
+};
+
+static struct iommu_domain s390_identity_domain = {
+ .type = IOMMU_DOMAIN_IDENTITY,
+ .ops = &s390_identity_ops,
+};
+
static struct iommu_domain blocking_domain = {
.type = IOMMU_DOMAIN_BLOCKED,
.ops = &(const struct iommu_domain_ops) {
@@ -794,23 +835,31 @@ static struct iommu_domain blocking_domain = {
}
};
-static const struct iommu_ops s390_iommu_ops = {
- .blocked_domain = &blocking_domain,
- .release_domain = &blocking_domain,
- .capable = s390_iommu_capable,
- .domain_alloc_paging = s390_domain_alloc_paging,
- .probe_device = s390_iommu_probe_device,
- .device_group = generic_device_group,
- .pgsize_bitmap = SZ_4K,
- .get_resv_regions = s390_iommu_get_resv_regions,
- .default_domain_ops = &(const struct iommu_domain_ops) {
- .attach_dev = s390_iommu_attach_device,
- .map_pages = s390_iommu_map_pages,
- .unmap_pages = s390_iommu_unmap_pages,
- .flush_iotlb_all = s390_iommu_flush_iotlb_all,
- .iotlb_sync = s390_iommu_iotlb_sync,
- .iotlb_sync_map = s390_iommu_iotlb_sync_map,
- .iova_to_phys = s390_iommu_iova_to_phys,
- .free = s390_domain_free,
+#define S390_IOMMU_COMMON_OPS() \
+ .blocked_domain = &blocking_domain, \
+ .release_domain = &blocking_domain, \
+ .capable = s390_iommu_capable, \
+ .domain_alloc_paging = s390_domain_alloc_paging, \
+ .probe_device = s390_iommu_probe_device, \
+ .device_group = generic_device_group, \
+ .pgsize_bitmap = SZ_4K, \
+ .get_resv_regions = s390_iommu_get_resv_regions, \
+ .default_domain_ops = &(const struct iommu_domain_ops) { \
+ .attach_dev = s390_iommu_attach_device, \
+ .map_pages = s390_iommu_map_pages, \
+ .unmap_pages = s390_iommu_unmap_pages, \
+ .flush_iotlb_all = s390_iommu_flush_iotlb_all, \
+ .iotlb_sync = s390_iommu_iotlb_sync, \
+ .iotlb_sync_map = s390_iommu_iotlb_sync_map, \
+ .iova_to_phys = s390_iommu_iova_to_phys, \
+ .free = s390_domain_free, \
}
+
+static const struct iommu_ops s390_iommu_ops = {
+ S390_IOMMU_COMMON_OPS()
+};
+
+static const struct iommu_ops s390_iommu_rtr_ops = {
+ .identity_domain = &s390_identity_domain,
+ S390_IOMMU_COMMON_OPS()
};
--
2.47.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region
2024-12-13 22:49 ` [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
@ 2024-12-16 9:29 ` Niklas Schnelle
2024-12-16 10:18 ` Niklas Schnelle
0 siblings, 1 reply; 7+ messages in thread
From: Niklas Schnelle @ 2024-12-16 9:29 UTC (permalink / raw)
To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
On Fri, 2024-12-13 at 17:49 -0500, Matthew Rosato wrote:
> PCI devices on s390 have a DMA offset that is reported via CLP. In
> preparation for allowing identity domains, setup the bus_dma_region
> for all PCI devices using the reported CLP value.
>
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
> arch/s390/pci/pci_bus.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
> index d5ace00d10f0..14527687d0f2 100644
> --- a/arch/s390/pci/pci_bus.c
> +++ b/arch/s390/pci/pci_bus.c
> @@ -19,6 +19,7 @@
> #include <linux/jump_label.h>
> #include <linux/pci.h>
> #include <linux/printk.h>
> +#include <linux/dma-direct.h>
>
> #include <asm/pci_clp.h>
> #include <asm/pci_dma.h>
> @@ -284,10 +285,27 @@ static struct zpci_bus *zpci_bus_alloc(int topo, bool topo_is_tid)
> return zbus;
> }
>
> +static void pci_dma_range_setup(struct pci_dev *pdev)
> +{
> + struct zpci_dev *zdev = to_zpci(pdev);
> + struct bus_dma_region *map;
> +
> + map = kzalloc(sizeof(*map), GFP_KERNEL);
> + if (!map)
> + return;
> +
> + map->cpu_start = 0;
> + map->dma_start = PAGE_ALIGN(zdev->start_dma);
> + map->size = (u64)virt_to_phys(high_memory);
I don't think we should restrict the size here to the size of memory.
Instead I think it should be zdev->end_dma - zdev->start_dma.
Since we handle the restriction to memory size as reserved regions I
think that should be compatible. Also I think otherwise this might
break the admittedly odd s390_iommu_aperture=X kernel parameter on
LPARs.
> + pdev->dev.dma_range_map = map;
> +}
> +
> void pcibios_bus_add_device(struct pci_dev *pdev)
> {
> struct zpci_dev *zdev = to_zpci(pdev);
>
> + pci_dma_range_setup(pdev);
> +
> /*
> * With pdev->no_vf_scan the common PCI probing code does not
> * perform PF/VF linking.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region
2024-12-16 9:29 ` Niklas Schnelle
@ 2024-12-16 10:18 ` Niklas Schnelle
2024-12-16 16:51 ` Matthew Rosato
0 siblings, 1 reply; 7+ messages in thread
From: Niklas Schnelle @ 2024-12-16 10:18 UTC (permalink / raw)
To: Matthew Rosato, joro, will, robin.murphy, gerald.schaefer
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
On Mon, 2024-12-16 at 10:29 +0100, Niklas Schnelle wrote:
> On Fri, 2024-12-13 at 17:49 -0500, Matthew Rosato wrote:
> > PCI devices on s390 have a DMA offset that is reported via CLP. In
> > preparation for allowing identity domains, setup the bus_dma_region
> > for all PCI devices using the reported CLP value.
> >
> > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > ---
> > arch/s390/pci/pci_bus.c | 18 ++++++++++++++++++
> > 1 file changed, 18 insertions(+)
> >
> > diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
> > index d5ace00d10f0..14527687d0f2 100644
> > --- a/arch/s390/pci/pci_bus.c
> > +++ b/arch/s390/pci/pci_bus.c
> > @@ -19,6 +19,7 @@
> > #include <linux/jump_label.h>
> > #include <linux/pci.h>
> > #include <linux/printk.h>
> > +#include <linux/dma-direct.h>
> >
> > #include <asm/pci_clp.h>
> > #include <asm/pci_dma.h>
> > @@ -284,10 +285,27 @@ static struct zpci_bus *zpci_bus_alloc(int topo, bool topo_is_tid)
> > return zbus;
> > }
> >
> > +static void pci_dma_range_setup(struct pci_dev *pdev)
> > +{
> > + struct zpci_dev *zdev = to_zpci(pdev);
> > + struct bus_dma_region *map;
> > +
> > + map = kzalloc(sizeof(*map), GFP_KERNEL);
> > + if (!map)
> > + return;
> > +
> > + map->cpu_start = 0;
> > + map->dma_start = PAGE_ALIGN(zdev->start_dma);
> > + map->size = (u64)virt_to_phys(high_memory);
>
> I don't think we should restrict the size here to the size of memory.
> Instead I think it should be zdev->end_dma - zdev->start_dma.
>
> Since we handle the restriction to memory size as reserved regions I
> think that should be compatible. Also I think otherwise this might
> break the admittedly odd s390_iommu_aperture=X kernel parameter on
> LPARs.
Correction, zdev->end_dma - zdev->start_dma + 1 because zdev->end_dma
is inclusive ;-)
>
>
> > + pdev->dev.dma_range_map = map;
> > +}
> > +
> > void pcibios_bus_add_device(struct pci_dev *pdev)
> > {
> > struct zpci_dev *zdev = to_zpci(pdev);
> >
> > + pci_dma_range_setup(pdev);
> > +
> > /*
> > * With pdev->no_vf_scan the common PCI probing code does not
> > * perform PF/VF linking.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region
2024-12-16 10:18 ` Niklas Schnelle
@ 2024-12-16 16:51 ` Matthew Rosato
0 siblings, 0 replies; 7+ messages in thread
From: Matthew Rosato @ 2024-12-16 16:51 UTC (permalink / raw)
To: Niklas Schnelle, joro, will, robin.murphy, gerald.schaefer
Cc: hca, gor, agordeev, svens, borntraeger, farman, clegoate, iommu,
linux-kernel, linux-s390
>>> + map->cpu_start = 0;
>>> + map->dma_start = PAGE_ALIGN(zdev->start_dma);
>>> + map->size = (u64)virt_to_phys(high_memory);
>>
>> I don't think we should restrict the size here to the size of memory.
>> Instead I think it should be zdev->end_dma - zdev->start_dma.
>>
>> Since we handle the restriction to memory size as reserved regions I
>> think that should be compatible. Also I think otherwise this might
>> break the admittedly odd s390_iommu_aperture=X kernel parameter on
>> LPARs.
>
> Correction, zdev->end_dma - zdev->start_dma + 1 because zdev->end_dma
> is inclusive ;-)
>
Forgot about that parameter, thanks... OK, will change to:
map->size = zdev->end_dma - zdev->start_dma + 1;
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-12-16 16:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-13 22:49 [PATCH v2 0/3] iommu/s390: add support for IOMMU passthrough Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 1/3] s390/pci: check for relaxed translation capability Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 2/3] s390/pci: store DMA offset in bus_dma_region Matthew Rosato
2024-12-16 9:29 ` Niklas Schnelle
2024-12-16 10:18 ` Niklas Schnelle
2024-12-16 16:51 ` Matthew Rosato
2024-12-13 22:49 ` [PATCH v2 3/3] iommu/s390: implement iommu passthrough via identity domain Matthew Rosato
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.