* [PATCH v4 0/2] s390x/pci: relax I/O address translation requirement @ 2025-02-07 20:56 Matthew Rosato 2025-02-07 20:56 ` [PATCH v4 1/2] s390x/pci: add support for guests that request direct mapping Matthew Rosato 2025-02-07 20:56 ` [PATCH v4 2/2] s390x/pci: indicate QEMU supports relaxed translation for passthrough Matthew Rosato 0 siblings, 2 replies; 7+ messages in thread From: Matthew Rosato @ 2025-02-07 20:56 UTC (permalink / raw) To: qemu-s390x Cc: farman, schnelle, thuth, pasic, borntraeger, richard.henderson, david, iii, clegoate, qemu-devel This series introduces the concept of the relaxed translation requirement for s390x guests in order to allow bypass of the guest IOMMU for more efficient PCI passthrough. With this series, QEMU can indicate to the guest that an IOMMU is not strictly required for a zPCI device. This would subsequently allow a guest linux to use iommu.passthrough=1 and bypass their guest IOMMU for PCI devices. When this occurs, QEMU will note the behavior via an intercepted MPCIFC instruction and will fill the host iommu with mappings of the entire guest address space in response. There is a kernel series that adds the relevant behavior needed to exploit this new feature from within a s390x linux guest. Most recent version of that is at [1]. [1]: https://lore.kernel.org/linux-s390/20250207205335.473946-1-mjrosato@linux.ibm.com/ Changes for v4: - use get_system_memory() instead of ms->ram - rename rtr_allowed to rtr_avail - turn off rtr_avail for emulated devices so MPCFIC fence properly rejects an attempt at direct mapping (we only advertise via CLP for passthrough devices) - turn off rtr_avail for passthrough ISM devices - various minor changes Changes for v3: - use s390_get_memory_limit - advertise full aperture for relaxed-translation-capable devices Changes for v2: - Add relax-translation property, fence for older machines - Add a new MPCIFC failure case when direct-mapping requested but the relax-translation property is set to off. - For direct mapping, use a memory alias to handle the SMDA offset and then just let vfio handle the pinning of memory. Matthew Rosato (2): s390x/pci: add support for guests that request direct mapping s390x/pci: indicate QEMU supports relaxed translation for passthrough hw/s390x/s390-pci-bus.c | 38 +++++++++++++++++++++++++++++++-- hw/s390x/s390-pci-inst.c | 13 +++++++++-- hw/s390x/s390-pci-vfio.c | 28 +++++++++++++++++++----- hw/s390x/s390-virtio-ccw.c | 5 +++++ include/hw/s390x/s390-pci-bus.h | 4 ++++ include/hw/s390x/s390-pci-clp.h | 1 + 6 files changed, 80 insertions(+), 9 deletions(-) -- 2.48.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 1/2] s390x/pci: add support for guests that request direct mapping 2025-02-07 20:56 [PATCH v4 0/2] s390x/pci: relax I/O address translation requirement Matthew Rosato @ 2025-02-07 20:56 ` Matthew Rosato 2025-02-10 13:12 ` Niklas Schnelle 2025-02-10 14:52 ` David Hildenbrand 2025-02-07 20:56 ` [PATCH v4 2/2] s390x/pci: indicate QEMU supports relaxed translation for passthrough Matthew Rosato 1 sibling, 2 replies; 7+ messages in thread From: Matthew Rosato @ 2025-02-07 20:56 UTC (permalink / raw) To: qemu-s390x Cc: farman, schnelle, thuth, pasic, borntraeger, richard.henderson, david, iii, clegoate, qemu-devel When receiving a guest mpcifc(4) or mpcifc(6) instruction without the T bit set, treat this as a request to perform direct mapping instead of address translation. In order to facilitate this, pin the entirety of guest memory into the host iommu. Pinning for the direct mapping case is handled via vfio and its memory listener. Additionally, ram discard settings are inherited from vfio: coordinated discards (e.g. virtio-mem) are allowed while uncoordinated discards (e.g. virtio-balloon) are disabled. Subsequent guest DMA operations are all expected to be of the format guest_phys+sdma, allowing them to be used as lookup into the host iommu table. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> --- hw/s390x/s390-pci-bus.c | 38 +++++++++++++++++++++++++++++++-- hw/s390x/s390-pci-inst.c | 13 +++++++++-- hw/s390x/s390-pci-vfio.c | 23 ++++++++++++++++---- hw/s390x/s390-virtio-ccw.c | 5 +++++ include/hw/s390x/s390-pci-bus.h | 4 ++++ 5 files changed, 75 insertions(+), 8 deletions(-) diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c index eead269cc2..81e5843c81 100644 --- a/hw/s390x/s390-pci-bus.c +++ b/hw/s390x/s390-pci-bus.c @@ -18,6 +18,8 @@ #include "hw/s390x/s390-pci-inst.h" #include "hw/s390x/s390-pci-kvm.h" #include "hw/s390x/s390-pci-vfio.h" +#include "hw/s390x/s390-virtio-ccw.h" +#include "hw/boards.h" #include "hw/pci/pci_bus.h" #include "hw/qdev-properties.h" #include "hw/pci/pci_bridge.h" @@ -720,16 +722,45 @@ void s390_pci_iommu_enable(S390PCIIOMMU *iommu) TYPE_S390_IOMMU_MEMORY_REGION, OBJECT(&iommu->mr), name, iommu->pal + 1); iommu->enabled = true; + iommu->direct_map = false; memory_region_add_subregion(&iommu->mr, 0, MEMORY_REGION(&iommu->iommu_mr)); g_free(name); } +void s390_pci_iommu_direct_map_enable(S390PCIIOMMU *iommu) +{ + MachineState *ms = MACHINE(qdev_get_machine()); + S390CcwMachineState *s390ms = S390_CCW_MACHINE(ms); + + /* + * For direct-mapping we must map the entire guest address space. Rather + * than using an iommu, create a memory region alias that maps GPA X to + * IOVA X + SDMA. VFIO will handle pinning via its memory listener. + */ + g_autofree char *name = g_strdup_printf("iommu-dm-s390-%04x", + iommu->pbdev->uid); + memory_region_init_alias(&iommu->dm_mr, OBJECT(&iommu->mr), name, + get_system_memory(), 0, + s390_get_memory_limit(s390ms)); + iommu->enabled = true; + iommu->direct_map = true; + memory_region_add_subregion(&iommu->mr, iommu->pbdev->zpci_fn.sdma, + &iommu->dm_mr); +} + void s390_pci_iommu_disable(S390PCIIOMMU *iommu) { iommu->enabled = false; g_hash_table_remove_all(iommu->iotlb); - memory_region_del_subregion(&iommu->mr, MEMORY_REGION(&iommu->iommu_mr)); - object_unparent(OBJECT(&iommu->iommu_mr)); + if (iommu->direct_map) { + memory_region_del_subregion(&iommu->mr, &iommu->dm_mr); + iommu->direct_map = false; + object_unparent(OBJECT(&iommu->dm_mr)); + } else { + memory_region_del_subregion(&iommu->mr, + MEMORY_REGION(&iommu->iommu_mr)); + object_unparent(OBJECT(&iommu->iommu_mr)); + } } static void s390_pci_iommu_free(S390pciState *s, PCIBus *bus, int32_t devfn) @@ -1130,6 +1161,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev, /* Always intercept emulated devices */ pbdev->interp = false; pbdev->forwarding_assist = false; + pbdev->rtr_avail = false; } if (s390_pci_msix_init(pbdev) && !pbdev->interp) { @@ -1488,6 +1520,8 @@ static const Property s390_pci_device_properties[] = { DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true), DEFINE_PROP_BOOL("forwarding-assist", S390PCIBusDevice, forwarding_assist, true), + DEFINE_PROP_BOOL("relaxed-translation", S390PCIBusDevice, rtr_avail, + true), }; static const VMStateDescription s390_pci_device_vmstate = { diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c index e386d75d58..8cdeb6cb7f 100644 --- a/hw/s390x/s390-pci-inst.c +++ b/hw/s390x/s390-pci-inst.c @@ -16,6 +16,7 @@ #include "exec/memory.h" #include "qemu/error-report.h" #include "system/hw_accel.h" +#include "hw/boards.h" #include "hw/pci/pci_device.h" #include "hw/s390x/s390-pci-inst.h" #include "hw/s390x/s390-pci-bus.h" @@ -1008,17 +1009,25 @@ static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib, } /* currently we only support designation type 1 with translation */ - if (!(dt == ZPCI_IOTA_RTTO && t)) { + if (t && dt != ZPCI_IOTA_RTTO) { error_report("unsupported ioat dt %d t %d", dt, t); s390_program_interrupt(env, PGM_OPERAND, ra); return -EINVAL; + } else if (!t && !pbdev->rtr_avail) { + error_report("relaxed translation not allowed"); + s390_program_interrupt(env, PGM_OPERAND, ra); + return -EINVAL; } iommu->pba = pba; iommu->pal = pal; iommu->g_iota = g_iota; - s390_pci_iommu_enable(iommu); + if (t) { + s390_pci_iommu_enable(iommu); + } else { + s390_pci_iommu_direct_map_enable(iommu); + } return 0; } diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c index 7dbbc76823..443e222912 100644 --- a/hw/s390x/s390-pci-vfio.c +++ b/hw/s390x/s390-pci-vfio.c @@ -131,13 +131,28 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev, /* Store function type separately for type-specific behavior */ pbdev->pft = cap->pft; + /* + * If the device is a passthrough ISM device, disallow relaxed + * translation. + */ + if (pbdev->pft == ZPCI_PFT_ISM) { + pbdev->rtr_avail = false; + } + /* * If appropriate, reduce the size of the supported DMA aperture reported - * to the guest based upon the vfio DMA limit. + * to the guest based upon the vfio DMA limit. This is applicable for + * devices that are guaranteed to not use relaxed translation. If the + * device is capable of relaxed translation then we must advertise the + * full aperture. In this case, if translation is used then we will + * rely on the vfio DMA limit counting and use RPCIT CC1 / status 16 + * to request that the guest free DMA mappings as necessary. */ - vfio_size = pbdev->iommu->max_dma_limit << TARGET_PAGE_BITS; - if (vfio_size > 0 && vfio_size < cap->end_dma - cap->start_dma + 1) { - pbdev->zpci_fn.edma = cap->start_dma + vfio_size - 1; + if (!pbdev->rtr_avail) { + vfio_size = pbdev->iommu->max_dma_limit << TARGET_PAGE_BITS; + if (vfio_size > 0 && vfio_size < cap->end_dma - cap->start_dma + 1) { + pbdev->zpci_fn.edma = cap->start_dma + vfio_size - 1; + } } } diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index d9e683c5b4..6a6cb39808 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -937,8 +937,13 @@ static void ccw_machine_9_2_instance_options(MachineState *machine) static void ccw_machine_9_2_class_options(MachineClass *mc) { + static GlobalProperty compat[] = { + { TYPE_S390_PCI_DEVICE, "relaxed-translation", "off", }, + }; + ccw_machine_10_0_class_options(mc); compat_props_add(mc->compat_props, hw_compat_9_2, hw_compat_9_2_len); + compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat)); } DEFINE_CCW_MACHINE(9, 2); diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h index 2c43ea123f..ea9e04ec49 100644 --- a/include/hw/s390x/s390-pci-bus.h +++ b/include/hw/s390x/s390-pci-bus.h @@ -277,7 +277,9 @@ struct S390PCIIOMMU { AddressSpace as; MemoryRegion mr; IOMMUMemoryRegion iommu_mr; + MemoryRegion dm_mr; bool enabled; + bool direct_map; uint64_t g_iota; uint64_t pba; uint64_t pal; @@ -362,6 +364,7 @@ struct S390PCIBusDevice { bool interp; bool forwarding_assist; bool aif; + bool rtr_avail; QTAILQ_ENTRY(S390PCIBusDevice) link; }; @@ -389,6 +392,7 @@ int pci_chsc_sei_nt2_have_event(void); void s390_pci_sclp_configure(SCCB *sccb); void s390_pci_sclp_deconfigure(SCCB *sccb); void s390_pci_iommu_enable(S390PCIIOMMU *iommu); +void s390_pci_iommu_direct_map_enable(S390PCIIOMMU *iommu); void s390_pci_iommu_disable(S390PCIIOMMU *iommu); void s390_pci_generate_error_event(uint16_t pec, uint32_t fh, uint32_t fid, uint64_t faddr, uint32_t e); -- 2.48.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/2] s390x/pci: add support for guests that request direct mapping 2025-02-07 20:56 ` [PATCH v4 1/2] s390x/pci: add support for guests that request direct mapping Matthew Rosato @ 2025-02-10 13:12 ` Niklas Schnelle 2025-02-10 13:26 ` Cédric Le Goater 2025-02-10 14:52 ` David Hildenbrand 1 sibling, 1 reply; 7+ messages in thread From: Niklas Schnelle @ 2025-02-10 13:12 UTC (permalink / raw) To: Matthew Rosato, qemu-s390x Cc: farman, thuth, pasic, borntraeger, richard.henderson, david, iii, clegoate, qemu-devel On Fri, 2025-02-07 at 15:56 -0500, Matthew Rosato wrote: > When receiving a guest mpcifc(4) or mpcifc(6) instruction without the T > bit set, treat this as a request to perform direct mapping instead of > address translation. In order to facilitate this, pin the entirety of > guest memory into the host iommu. > > Pinning for the direct mapping case is handled via vfio and its memory > listener. Additionally, ram discard settings are inherited from vfio: > coordinated discards (e.g. virtio-mem) are allowed while uncoordinated > discards (e.g. virtio-balloon) are disabled. > > Subsequent guest DMA operations are all expected to be of the format > guest_phys+sdma, allowing them to be used as lookup into the host > iommu table. > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > hw/s390x/s390-pci-bus.c | 38 +++++++++++++++++++++++++++++++-- > hw/s390x/s390-pci-inst.c | 13 +++++++++-- > hw/s390x/s390-pci-vfio.c | 23 ++++++++++++++++---- > hw/s390x/s390-virtio-ccw.c | 5 +++++ > include/hw/s390x/s390-pci-bus.h | 4 ++++ > 5 files changed, 75 insertions(+), 8 deletions(-) > > ---8<--- > > static const VMStateDescription s390_pci_device_vmstate = { > diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c > index e386d75d58..8cdeb6cb7f 100644 > --- a/hw/s390x/s390-pci-inst.c > +++ b/hw/s390x/s390-pci-inst.c > @@ -16,6 +16,7 @@ > #include "exec/memory.h" > #include "qemu/error-report.h" > #include "system/hw_accel.h" > +#include "hw/boards.h" > #include "hw/pci/pci_device.h" > #include "hw/s390x/s390-pci-inst.h" > #include "hw/s390x/s390-pci-bus.h" > @@ -1008,17 +1009,25 @@ static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib, > } > > /* currently we only support designation type 1 with translation */ > - if (!(dt == ZPCI_IOTA_RTTO && t)) { > + if (t && dt != ZPCI_IOTA_RTTO) { > error_report("unsupported ioat dt %d t %d", dt, t); > s390_program_interrupt(env, PGM_OPERAND, ra); > return -EINVAL; > + } else if (!t && !pbdev->rtr_avail) { > + error_report("relaxed translation not allowed"); > + s390_program_interrupt(env, PGM_OPERAND, ra); > + return -EINVAL; > } > > iommu->pba = pba; > iommu->pal = pal; > iommu->g_iota = g_iota; > > - s390_pci_iommu_enable(iommu); > + if (t) { > + s390_pci_iommu_enable(iommu); > + } else { > + s390_pci_iommu_direct_map_enable(iommu); > + } > > return 0; > } > diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c > index 7dbbc76823..443e222912 100644 > --- a/hw/s390x/s390-pci-vfio.c > +++ b/hw/s390x/s390-pci-vfio.c > @@ -131,13 +131,28 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev, > /* Store function type separately for type-specific behavior */ > pbdev->pft = cap->pft; > > + /* > + * If the device is a passthrough ISM device, disallow relaxed > + * translation. > + */ > + if (pbdev->pft == ZPCI_PFT_ISM) { > + pbdev->rtr_avail = false; > + } Just a note for external readers. The ISM device does work without the above but having explicit guest IOMMU mappings plus the respective shadow mappings in the host would catch driver misbehavior more easily. At the same time ISM uses long lived IOMMU mappings so the cost of shadowing its mappings is low. > + > /* > * If appropriate, reduce the size of the supported DMA aperture reported > - * to the guest based upon the vfio DMA limit. > + * to the guest based upon the vfio DMA limit. This is applicable for > + * devices that are guaranteed to not use relaxed translation. If the > + * device is capable of relaxed translation then we must advertise the > + * full aperture. In this case, if translation is used then we will > + * rely on the vfio DMA limit counting and use RPCIT CC1 / status 16 > + * to request that the guest free DMA mappings as necessary. > */ > - vfio_size = pbdev->iommu->max_dma_limit << TARGET_PAGE_BITS; > - if (vfio_size > 0 && vfio_size < cap->end_dma - cap->start_dma + 1) { > - pbdev->zpci_fn.edma = cap->start_dma + vfio_size - 1; > + if (!pbdev->rtr_avail) { > + vfio_size = pbdev->iommu->max_dma_limit << TARGET_PAGE_BITS; > + if (vfio_size > 0 && vfio_size < cap->end_dma - cap->start_dma + 1) { > + pbdev->zpci_fn.edma = cap->start_dma + vfio_size - 1; > + } > } > } > > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c > index d9e683c5b4..6a6cb39808 100644 > --- a/hw/s390x/s390-virtio-ccw.c > +++ b/hw/s390x/s390-virtio-ccw.c > @@ -937,8 +937,13 @@ static void ccw_machine_9_2_instance_options(MachineState *machine) > > static void ccw_machine_9_2_class_options(MachineClass *mc) > { > + static GlobalProperty compat[] = { > + { TYPE_S390_PCI_DEVICE, "relaxed-translation", "off", }, > + }; > + > ccw_machine_10_0_class_options(mc); > compat_props_add(mc->compat_props, hw_compat_9_2, hw_compat_9_2_len); > + compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat)); > } > DEFINE_CCW_MACHINE(9, 2); > > diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h > index 2c43ea123f..ea9e04ec49 100644 > --- a/include/hw/s390x/s390-pci-bus.h > +++ b/include/hw/s390x/s390-pci-bus.h > @@ -277,7 +277,9 @@ struct S390PCIIOMMU { > AddressSpace as; > MemoryRegion mr; > IOMMUMemoryRegion iommu_mr; > + MemoryRegion dm_mr; > bool enabled; > + bool direct_map; > uint64_t g_iota; > uint64_t pba; > uint64_t pal; > @@ -362,6 +364,7 @@ struct S390PCIBusDevice { > bool interp; > bool forwarding_assist; > bool aif; > + bool rtr_avail; > QTAILQ_ENTRY(S390PCIBusDevice) link; > }; > > @@ -389,6 +392,7 @@ int pci_chsc_sei_nt2_have_event(void); > void s390_pci_sclp_configure(SCCB *sccb); > void s390_pci_sclp_deconfigure(SCCB *sccb); > void s390_pci_iommu_enable(S390PCIIOMMU *iommu); > +void s390_pci_iommu_direct_map_enable(S390PCIIOMMU *iommu); > void s390_pci_iommu_disable(S390PCIIOMMU *iommu); > void s390_pci_generate_error_event(uint16_t pec, uint32_t fh, uint32_t fid, > uint64_t faddr, uint32_t e); I'm not too familiar with the existing code or QEMU in general, but the changes makes sense to me. I'm assuming the braces around single statement bodies are accepted style in QEMU? I retested this version together with the v4 of the kernel version too. So feel free to add: Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/2] s390x/pci: add support for guests that request direct mapping 2025-02-10 13:12 ` Niklas Schnelle @ 2025-02-10 13:26 ` Cédric Le Goater 0 siblings, 0 replies; 7+ messages in thread From: Cédric Le Goater @ 2025-02-10 13:26 UTC (permalink / raw) To: Niklas Schnelle, Matthew Rosato, qemu-s390x Cc: farman, thuth, pasic, borntraeger, richard.henderson, david, iii, qemu-devel On 2/10/25 14:12, Niklas Schnelle wrote: > On Fri, 2025-02-07 at 15:56 -0500, Matthew Rosato wrote: >> When receiving a guest mpcifc(4) or mpcifc(6) instruction without the T >> bit set, treat this as a request to perform direct mapping instead of >> address translation. In order to facilitate this, pin the entirety of >> guest memory into the host iommu. >> >> Pinning for the direct mapping case is handled via vfio and its memory >> listener. Additionally, ram discard settings are inherited from vfio: >> coordinated discards (e.g. virtio-mem) are allowed while uncoordinated >> discards (e.g. virtio-balloon) are disabled. >> >> Subsequent guest DMA operations are all expected to be of the format >> guest_phys+sdma, allowing them to be used as lookup into the host >> iommu table. >> >> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> >> --- >> hw/s390x/s390-pci-bus.c | 38 +++++++++++++++++++++++++++++++-- >> hw/s390x/s390-pci-inst.c | 13 +++++++++-- >> hw/s390x/s390-pci-vfio.c | 23 ++++++++++++++++---- >> hw/s390x/s390-virtio-ccw.c | 5 +++++ >> include/hw/s390x/s390-pci-bus.h | 4 ++++ >> 5 files changed, 75 insertions(+), 8 deletions(-) >> >> > ---8<--- >> >> static const VMStateDescription s390_pci_device_vmstate = { >> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c >> index e386d75d58..8cdeb6cb7f 100644 >> --- a/hw/s390x/s390-pci-inst.c >> +++ b/hw/s390x/s390-pci-inst.c >> @@ -16,6 +16,7 @@ >> #include "exec/memory.h" >> #include "qemu/error-report.h" >> #include "system/hw_accel.h" >> +#include "hw/boards.h" >> #include "hw/pci/pci_device.h" >> #include "hw/s390x/s390-pci-inst.h" >> #include "hw/s390x/s390-pci-bus.h" >> @@ -1008,17 +1009,25 @@ static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib, >> } >> >> /* currently we only support designation type 1 with translation */ >> - if (!(dt == ZPCI_IOTA_RTTO && t)) { >> + if (t && dt != ZPCI_IOTA_RTTO) { >> error_report("unsupported ioat dt %d t %d", dt, t); >> s390_program_interrupt(env, PGM_OPERAND, ra); >> return -EINVAL; >> + } else if (!t && !pbdev->rtr_avail) { >> + error_report("relaxed translation not allowed"); >> + s390_program_interrupt(env, PGM_OPERAND, ra); >> + return -EINVAL; >> } >> >> iommu->pba = pba; >> iommu->pal = pal; >> iommu->g_iota = g_iota; >> >> - s390_pci_iommu_enable(iommu); >> + if (t) { >> + s390_pci_iommu_enable(iommu); >> + } else { >> + s390_pci_iommu_direct_map_enable(iommu); >> + } >> >> return 0; >> } >> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c >> index 7dbbc76823..443e222912 100644 >> --- a/hw/s390x/s390-pci-vfio.c >> +++ b/hw/s390x/s390-pci-vfio.c >> @@ -131,13 +131,28 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev, >> /* Store function type separately for type-specific behavior */ >> pbdev->pft = cap->pft; >> >> + /* >> + * If the device is a passthrough ISM device, disallow relaxed >> + * translation. >> + */ >> + if (pbdev->pft == ZPCI_PFT_ISM) { >> + pbdev->rtr_avail = false; >> + } > > Just a note for external readers. The ISM device does work without the > above but having explicit guest IOMMU mappings plus the respective > shadow mappings in the host would catch driver misbehavior more easily. > At the same time ISM uses long lived IOMMU mappings so the cost of > shadowing its mappings is low. > >> + >> /* >> * If appropriate, reduce the size of the supported DMA aperture reported >> - * to the guest based upon the vfio DMA limit. >> + * to the guest based upon the vfio DMA limit. This is applicable for >> + * devices that are guaranteed to not use relaxed translation. If the >> + * device is capable of relaxed translation then we must advertise the >> + * full aperture. In this case, if translation is used then we will >> + * rely on the vfio DMA limit counting and use RPCIT CC1 / status 16 >> + * to request that the guest free DMA mappings as necessary. >> */ >> - vfio_size = pbdev->iommu->max_dma_limit << TARGET_PAGE_BITS; >> - if (vfio_size > 0 && vfio_size < cap->end_dma - cap->start_dma + 1) { >> - pbdev->zpci_fn.edma = cap->start_dma + vfio_size - 1; >> + if (!pbdev->rtr_avail) { >> + vfio_size = pbdev->iommu->max_dma_limit << TARGET_PAGE_BITS; >> + if (vfio_size > 0 && vfio_size < cap->end_dma - cap->start_dma + 1) { >> + pbdev->zpci_fn.edma = cap->start_dma + vfio_size - 1; >> + } >> } >> } >> >> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c >> index d9e683c5b4..6a6cb39808 100644 >> --- a/hw/s390x/s390-virtio-ccw.c >> +++ b/hw/s390x/s390-virtio-ccw.c >> @@ -937,8 +937,13 @@ static void ccw_machine_9_2_instance_options(MachineState *machine) >> >> static void ccw_machine_9_2_class_options(MachineClass *mc) >> { >> + static GlobalProperty compat[] = { >> + { TYPE_S390_PCI_DEVICE, "relaxed-translation", "off", }, >> + }; >> + >> ccw_machine_10_0_class_options(mc); >> compat_props_add(mc->compat_props, hw_compat_9_2, hw_compat_9_2_len); >> + compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat)); >> } >> DEFINE_CCW_MACHINE(9, 2); >> >> diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h >> index 2c43ea123f..ea9e04ec49 100644 >> --- a/include/hw/s390x/s390-pci-bus.h >> +++ b/include/hw/s390x/s390-pci-bus.h >> @@ -277,7 +277,9 @@ struct S390PCIIOMMU { >> AddressSpace as; >> MemoryRegion mr; >> IOMMUMemoryRegion iommu_mr; >> + MemoryRegion dm_mr; >> bool enabled; >> + bool direct_map; >> uint64_t g_iota; >> uint64_t pba; >> uint64_t pal; >> @@ -362,6 +364,7 @@ struct S390PCIBusDevice { >> bool interp; >> bool forwarding_assist; >> bool aif; >> + bool rtr_avail; >> QTAILQ_ENTRY(S390PCIBusDevice) link; >> }; >> >> @@ -389,6 +392,7 @@ int pci_chsc_sei_nt2_have_event(void); >> void s390_pci_sclp_configure(SCCB *sccb); >> void s390_pci_sclp_deconfigure(SCCB *sccb); >> void s390_pci_iommu_enable(S390PCIIOMMU *iommu); >> +void s390_pci_iommu_direct_map_enable(S390PCIIOMMU *iommu); >> void s390_pci_iommu_disable(S390PCIIOMMU *iommu); >> void s390_pci_generate_error_event(uint16_t pec, uint32_t fh, uint32_t fid, >> uint64_t faddr, uint32_t e); > > I'm not too familiar with the existing code or QEMU in general, but the > changes makes sense to me. I'm assuming the braces around single > statement bodies are accepted style in QEMU? They are required : https://qemu.readthedocs.io/en/v9.2.0/devel/style.html#block-structure > > I retested this version together with the v4 of the kernel version too. > So feel free to add: > > Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> > Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> > Thanks, C. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/2] s390x/pci: add support for guests that request direct mapping 2025-02-07 20:56 ` [PATCH v4 1/2] s390x/pci: add support for guests that request direct mapping Matthew Rosato 2025-02-10 13:12 ` Niklas Schnelle @ 2025-02-10 14:52 ` David Hildenbrand 1 sibling, 0 replies; 7+ messages in thread From: David Hildenbrand @ 2025-02-10 14:52 UTC (permalink / raw) To: Matthew Rosato, qemu-s390x Cc: farman, schnelle, thuth, pasic, borntraeger, richard.henderson, iii, clegoate, qemu-devel On 07.02.25 21:56, Matthew Rosato wrote: > When receiving a guest mpcifc(4) or mpcifc(6) instruction without the T > bit set, treat this as a request to perform direct mapping instead of > address translation. In order to facilitate this, pin the entirety of > guest memory into the host iommu. > > Pinning for the direct mapping case is handled via vfio and its memory > listener. Additionally, ram discard settings are inherited from vfio: > coordinated discards (e.g. virtio-mem) are allowed while uncoordinated > discards (e.g. virtio-balloon) are disabled. > > Subsequent guest DMA operations are all expected to be of the format > guest_phys+sdma, allowing them to be used as lookup into the host > iommu table. > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > hw/s390x/s390-pci-bus.c | 38 +++++++++++++++++++++++++++++++-- > hw/s390x/s390-pci-inst.c | 13 +++++++++-- > hw/s390x/s390-pci-vfio.c | 23 ++++++++++++++++---- > hw/s390x/s390-virtio-ccw.c | 5 +++++ > include/hw/s390x/s390-pci-bus.h | 4 ++++ > 5 files changed, 75 insertions(+), 8 deletions(-) > > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c > index eead269cc2..81e5843c81 100644 > --- a/hw/s390x/s390-pci-bus.c > +++ b/hw/s390x/s390-pci-bus.c > @@ -18,6 +18,8 @@ > #include "hw/s390x/s390-pci-inst.h" > #include "hw/s390x/s390-pci-kvm.h" > #include "hw/s390x/s390-pci-vfio.h" > +#include "hw/s390x/s390-virtio-ccw.h" > +#include "hw/boards.h" > #include "hw/pci/pci_bus.h" > #include "hw/qdev-properties.h" > #include "hw/pci/pci_bridge.h" > @@ -720,16 +722,45 @@ void s390_pci_iommu_enable(S390PCIIOMMU *iommu) > TYPE_S390_IOMMU_MEMORY_REGION, OBJECT(&iommu->mr), > name, iommu->pal + 1); > iommu->enabled = true; > + iommu->direct_map = false; > memory_region_add_subregion(&iommu->mr, 0, MEMORY_REGION(&iommu->iommu_mr)); > g_free(name); > } > > +void s390_pci_iommu_direct_map_enable(S390PCIIOMMU *iommu) > +{ > + MachineState *ms = MACHINE(qdev_get_machine()); > + S390CcwMachineState *s390ms = S390_CCW_MACHINE(ms); > + > + /* > + * For direct-mapping we must map the entire guest address space. Rather > + * than using an iommu, create a memory region alias that maps GPA X to > + * IOVA X + SDMA. VFIO will handle pinning via its memory listener. > + */ > + g_autofree char *name = g_strdup_printf("iommu-dm-s390-%04x", > + iommu->pbdev->uid); Empty line. > + memory_region_init_alias(&iommu->dm_mr, OBJECT(&iommu->mr), name, > + get_system_memory(), 0, > + s390_get_memory_limit(s390ms)); > + iommu->enabled = true; > + iommu->direct_map = true; You could dynamically allocate the dm_mr instead, and use that as indication if the direct mapping is active. Whatever you prefer. Nothing else jumped at me, thanks! -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 2/2] s390x/pci: indicate QEMU supports relaxed translation for passthrough 2025-02-07 20:56 [PATCH v4 0/2] s390x/pci: relax I/O address translation requirement Matthew Rosato 2025-02-07 20:56 ` [PATCH v4 1/2] s390x/pci: add support for guests that request direct mapping Matthew Rosato @ 2025-02-07 20:56 ` Matthew Rosato 2025-02-10 13:29 ` Niklas Schnelle 1 sibling, 1 reply; 7+ messages in thread From: Matthew Rosato @ 2025-02-07 20:56 UTC (permalink / raw) To: qemu-s390x Cc: farman, schnelle, thuth, pasic, borntraeger, richard.henderson, david, iii, clegoate, qemu-devel Specifying this bit in the guest CLP response indicates that the guest can optionally choose to skip translation and instead use identity-mapped operations. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> --- hw/s390x/s390-pci-vfio.c | 5 ++++- include/hw/s390x/s390-pci-clp.h | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c index 443e222912..6236ac7f1e 100644 --- a/hw/s390x/s390-pci-vfio.c +++ b/hw/s390x/s390-pci-vfio.c @@ -238,8 +238,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev, pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid, start_gid); resgrp = &pbdev->pci_group->zpci_group; + if (pbdev->rtr_avail) { + resgrp->fr |= CLP_RSP_QPCIG_MASK_RTR; + } if (cap->flags & VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH) { - resgrp->fr = 1; + resgrp->fr |= CLP_RSP_QPCIG_MASK_REFRESH; } resgrp->dasm = cap->dasm; resgrp->msia = cap->msi_addr; diff --git a/include/hw/s390x/s390-pci-clp.h b/include/hw/s390x/s390-pci-clp.h index 03b7f9ba5f..6a635d693b 100644 --- a/include/hw/s390x/s390-pci-clp.h +++ b/include/hw/s390x/s390-pci-clp.h @@ -158,6 +158,7 @@ typedef struct ClpRspQueryPciGrp { #define CLP_RSP_QPCIG_MASK_NOI 0xfff uint16_t i; uint8_t version; +#define CLP_RSP_QPCIG_MASK_RTR 0x20 #define CLP_RSP_QPCIG_MASK_FRAME 0x2 #define CLP_RSP_QPCIG_MASK_REFRESH 0x1 uint8_t fr; -- 2.48.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] s390x/pci: indicate QEMU supports relaxed translation for passthrough 2025-02-07 20:56 ` [PATCH v4 2/2] s390x/pci: indicate QEMU supports relaxed translation for passthrough Matthew Rosato @ 2025-02-10 13:29 ` Niklas Schnelle 0 siblings, 0 replies; 7+ messages in thread From: Niklas Schnelle @ 2025-02-10 13:29 UTC (permalink / raw) To: Matthew Rosato, qemu-s390x Cc: farman, thuth, pasic, borntraeger, richard.henderson, david, iii, clegoate, qemu-devel On Fri, 2025-02-07 at 15:56 -0500, Matthew Rosato wrote: > Specifying this bit in the guest CLP response indicates that the guest > can optionally choose to skip translation and instead use > identity-mapped operations. > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > hw/s390x/s390-pci-vfio.c | 5 ++++- > include/hw/s390x/s390-pci-clp.h | 1 + > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c > index 443e222912..6236ac7f1e 100644 > --- a/hw/s390x/s390-pci-vfio.c > +++ b/hw/s390x/s390-pci-vfio.c > @@ -238,8 +238,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev, > pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid, start_gid); > > resgrp = &pbdev->pci_group->zpci_group; > + if (pbdev->rtr_avail) { > + resgrp->fr |= CLP_RSP_QPCIG_MASK_RTR; > + } > if (cap->flags & VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH) { > - resgrp->fr = 1; > + resgrp->fr |= CLP_RSP_QPCIG_MASK_REFRESH; > } > resgrp->dasm = cap->dasm; > resgrp->msia = cap->msi_addr; > diff --git a/include/hw/s390x/s390-pci-clp.h b/include/hw/s390x/s390-pci-clp.h > index 03b7f9ba5f..6a635d693b 100644 > --- a/include/hw/s390x/s390-pci-clp.h > +++ b/include/hw/s390x/s390-pci-clp.h > @@ -158,6 +158,7 @@ typedef struct ClpRspQueryPciGrp { > #define CLP_RSP_QPCIG_MASK_NOI 0xfff > uint16_t i; > uint8_t version; > +#define CLP_RSP_QPCIG_MASK_RTR 0x20 > #define CLP_RSP_QPCIG_MASK_FRAME 0x2 > #define CLP_RSP_QPCIG_MASK_REFRESH 0x1 > uint8_t fr; Looks good to me! Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-02-10 14:53 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-07 20:56 [PATCH v4 0/2] s390x/pci: relax I/O address translation requirement Matthew Rosato 2025-02-07 20:56 ` [PATCH v4 1/2] s390x/pci: add support for guests that request direct mapping Matthew Rosato 2025-02-10 13:12 ` Niklas Schnelle 2025-02-10 13:26 ` Cédric Le Goater 2025-02-10 14:52 ` David Hildenbrand 2025-02-07 20:56 ` [PATCH v4 2/2] s390x/pci: indicate QEMU supports relaxed translation for passthrough Matthew Rosato 2025-02-10 13:29 ` Niklas Schnelle
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).