* [PATCH v5 0/2] vfio/pci: s390: Fix issues preventing VFIO_PCI_MMAP=y for s390 and enable it
@ 2025-02-12 15:28 Niklas Schnelle
2025-02-12 15:28 ` [PATCH v5 1/2] s390/pci: Fix s390_mmio_read/write syscall page fault handling Niklas Schnelle
2025-02-12 15:28 ` [PATCH v5 2/2] PCI: s390: Support mmap() of BARs and replace VFIO_PCI_MMAP by a device flag Niklas Schnelle
0 siblings, 2 replies; 6+ messages in thread
From: Niklas Schnelle @ 2025-02-12 15:28 UTC (permalink / raw)
To: Bjorn Helgaas, Christoph Hellwig, Alexandra Winter,
Alex Williamson, Gerd Bayer, Matthew Rosato, Jason Gunthorpe,
Thorsten Winkler, Bjorn Helgaas
Cc: Julian Ruess, Halil Pasic, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
linux-s390, linux-kernel, kvm, linux-pci, Niklas Schnelle
With the introduction of memory I/O (MIO) instructions enbaled in commit
71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
gained support for direct user-space access to mapped PCI resources.
Even without those however user-space can access mapped PCI resources
via the s390 specific MMIO syscalls. There is thus nothing fundamentally
preventing s390 from supporting VFIO_PCI_MMAP, allowing user-space
drivers to access PCI resources without going through the pread()
interface. To actually enable VFIO_PCI_MMAP a few issues need fixing
however.
Firstly the s390 MMIO syscalls do not cause a page fault when
follow_pte() fails due to the page not being present. This breaks
vfio-pci's mmap() handling which lazily maps on first access.
Secondly on s390 there is a virtual PCI device called ISM which has
a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
which leads to any attempt to mmap() it fail with the following message:
vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size
Even if one tried to map this BAR only partially the mapping would not
be usable on systems with MIO support enabled. So just block mapping
BARs which don't fit between IOREMAP_START and IOREMAP_END. Solve this
by keeping the vfio-pci mmap() blocking behavior around for this
specific device via a PCI quirk and new pdev->non_mappable_bars
flag.
Note:
For your convenience the code is also available in the tagged
b4/vfio_pci_mmap branch on my git.kernel.org site below:
https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/
Thanks,
Niklas
Link: https://lore.kernel.org/all/c5ba134a1d4f4465b5956027e6a4ea6f6beff969.camel@linux.ibm.com/
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
Changes in v5:
- Instead of relying on the existing pdev->non_compliant_bars introduce
a new pdev->non_mappable_bars flag. This replaces the VFIO_PCI_MMAP
Kconfig option and makes it per-device. This is necessary to not break
upcoming vfio-pci use of ISM devices (Julian Ruess)
- Squash the removal of VFIO_PCI_MMAP into the second commit as this
is now where its only use goes away.
- Switch to using follow_pfnmap_start() in MMIO syscall page fault
handling to match upstream changes
- Dropped R-b's because the changes are significant
- Link to v4: https://lore.kernel.org/r/20240626-vfio_pci_mmap-v4-0-7f038870f022@linux.ibm.com
Changes in v4:
- Overhauled and split up patch 2 which caused errors on ppc due to
unexported __kernel_io_end. Replaced it with a minimal s390 PCI fixup
harness to set pdev->non_compliant_bars for ISM plus ignoring devices
with this flag in vfio-pci. Idea for using PCI quirks came from
Christoph Hellwig, thanks. Dropped R-bs for patch 2 accordingly.
- Rebased on v6.10-rc5 which includes the vfio-pci mmap fault handler
fix to the issue I stumbled over independently in v3
- Link to v3: https://lore.kernel.org/r/20240529-vfio_pci_mmap-v3-0-cd217d019218@linux.ibm.com
Changes in v3:
- Rebased on v6.10-rc1 requiring change to follow_pte() call
- Use current->mm for fixup_user_fault() as seems more common
- Collected new trailers
- Link to v2: https://lore.kernel.org/r/20240523-vfio_pci_mmap-v2-0-0dc6c139a4f1@linux.ibm.com
Changes in v2:
- Changed last patch to remove VFIO_PCI_MMAP instead of just enabling it
for s390 as it is unconditionally true with s390 supporting PCI resource mmap() (Jason)
- Collected R-bs from Jason
- Link to v1: https://lore.kernel.org/r/20240521-vfio_pci_mmap-v1-0-2f6315e0054e@linux.ibm.com
---
Niklas Schnelle (2):
s390/pci: Fix s390_mmio_read/write syscall page fault handling
PCI: s390: Support mmap() of BARs and replace VFIO_PCI_MMAP by a device flag
arch/s390/Kconfig | 4 +---
arch/s390/pci/Makefile | 2 +-
arch/s390/pci/pci_fixup.c | 23 +++++++++++++++++++++++
arch/s390/pci/pci_mmio.c | 18 +++++++++++++-----
drivers/s390/net/ism_drv.c | 1 -
drivers/vfio/pci/Kconfig | 4 ----
drivers/vfio/pci/vfio_pci_core.c | 2 +-
include/linux/pci.h | 1 +
include/linux/pci_ids.h | 1 +
9 files changed, 41 insertions(+), 15 deletions(-)
---
base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
change-id: 20240503-vfio_pci_mmap-1549e3d02ca7
Best regards,
--
Niklas Schnelle
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v5 1/2] s390/pci: Fix s390_mmio_read/write syscall page fault handling
2025-02-12 15:28 [PATCH v5 0/2] vfio/pci: s390: Fix issues preventing VFIO_PCI_MMAP=y for s390 and enable it Niklas Schnelle
@ 2025-02-12 15:28 ` Niklas Schnelle
2025-02-12 15:28 ` [PATCH v5 2/2] PCI: s390: Support mmap() of BARs and replace VFIO_PCI_MMAP by a device flag Niklas Schnelle
1 sibling, 0 replies; 6+ messages in thread
From: Niklas Schnelle @ 2025-02-12 15:28 UTC (permalink / raw)
To: Bjorn Helgaas, Christoph Hellwig, Alexandra Winter,
Alex Williamson, Gerd Bayer, Matthew Rosato, Jason Gunthorpe,
Thorsten Winkler, Bjorn Helgaas
Cc: Julian Ruess, Halil Pasic, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
linux-s390, linux-kernel, kvm, linux-pci, Niklas Schnelle
The s390 MMIO syscalls when using the classic PCI instructions do not
cause a page fault when follow_pfnmap_start() fails due to the page not
being present. Besides being a general deficiency this breaks vfio-pci's
mmap() handling once VFIO_PCI_MMAP gets enabled as this lazily maps on
first access. Fix this by following a failed follow_pfnmap_start() with
fixup_user_page() and retrying the follow_pfnmap_start(). Also fix
a VM_READ vs VM_WRITE mixup in the read syscall.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
arch/s390/pci/pci_mmio.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index 46f99dc164ade4ca10f170cd66bdb648f92aa904..1997d9b7965df3b9b6019c7537028cd29d52fc13 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -175,8 +175,12 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
args.address = mmio_addr;
args.vma = vma;
ret = follow_pfnmap_start(&args);
- if (ret)
- goto out_unlock_mmap;
+ if (ret) {
+ fixup_user_fault(current->mm, mmio_addr, FAULT_FLAG_WRITE, NULL);
+ ret = follow_pfnmap_start(&args);
+ if (ret)
+ goto out_unlock_mmap;
+ }
io_addr = (void __iomem *)((args.pfn << PAGE_SHIFT) |
(mmio_addr & ~PAGE_MASK));
@@ -315,14 +319,18 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
goto out_unlock_mmap;
ret = -EACCES;
- if (!(vma->vm_flags & VM_WRITE))
+ if (!(vma->vm_flags & VM_READ))
goto out_unlock_mmap;
args.vma = vma;
args.address = mmio_addr;
ret = follow_pfnmap_start(&args);
- if (ret)
- goto out_unlock_mmap;
+ if (ret) {
+ fixup_user_fault(current->mm, mmio_addr, 0, NULL);
+ ret = follow_pfnmap_start(&args);
+ if (ret)
+ goto out_unlock_mmap;
+ }
io_addr = (void __iomem *)((args.pfn << PAGE_SHIFT) |
(mmio_addr & ~PAGE_MASK));
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v5 2/2] PCI: s390: Support mmap() of BARs and replace VFIO_PCI_MMAP by a device flag
2025-02-12 15:28 [PATCH v5 0/2] vfio/pci: s390: Fix issues preventing VFIO_PCI_MMAP=y for s390 and enable it Niklas Schnelle
2025-02-12 15:28 ` [PATCH v5 1/2] s390/pci: Fix s390_mmio_read/write syscall page fault handling Niklas Schnelle
@ 2025-02-12 15:28 ` Niklas Schnelle
2025-02-12 17:51 ` Bjorn Helgaas
2025-02-12 20:28 ` Alex Williamson
1 sibling, 2 replies; 6+ messages in thread
From: Niklas Schnelle @ 2025-02-12 15:28 UTC (permalink / raw)
To: Bjorn Helgaas, Christoph Hellwig, Alexandra Winter,
Alex Williamson, Gerd Bayer, Matthew Rosato, Jason Gunthorpe,
Thorsten Winkler, Bjorn Helgaas
Cc: Julian Ruess, Halil Pasic, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
linux-s390, linux-kernel, kvm, linux-pci, Niklas Schnelle
On s390 there is a virtual PCI device called ISM which has a few
peculiarities. For one, it presents a 256 TiB PCI BAR whose size leads
to any attempt to ioremap() the whole BAR failing. This is problematic
since mapping the whole BAR is the default behavior of for example
vfio-pci in combination with QEMU and VFIO_PCI_MMAP enabled.
Even if one tried to map this BAR only partially, the mapping would not
be usable without extra precautions on systems with MIO support enabled.
This is because of another oddity, in that this virtual PCI device does
not support the newer memory I/O (MIO) PCI instructions and legacy PCI
instructions are not accessible through writeq()/readq() when MIO is in
use.
In short the ISM device's BAR is not accessible through memory mappings.
Indicate this by introducing a new non_mappable_bars flag for the ISM
device and set it using a PCI quirk. Use this flag instead of the
VFIO_PCI_MMAP Kconfig option to block mapping with vfio-pci. This was
the only use of the Kconfig option so remove it. Note that there are no
PCI resource sysfs files on s390x already as HAVE_PCI_MMAP is currently
not set. If this were to be set in the future pdev->non_mappable_bars
can be used to prevent unusable resource files for ISM from being
created.
As s390x has no PCI quirk handling add basic support modeled after x86's
arch/x86/pci/fixup.c and move the ISM device's PCI ID to the common
header to make it accessible. Also enable CONFIG_PCI_QUIRKS whenever
CONFIG_PCI is enabled.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
arch/s390/Kconfig | 4 +---
arch/s390/pci/Makefile | 2 +-
arch/s390/pci/pci_fixup.c | 23 +++++++++++++++++++++++
drivers/s390/net/ism_drv.c | 1 -
drivers/vfio/pci/Kconfig | 4 ----
drivers/vfio/pci/vfio_pci_core.c | 2 +-
include/linux/pci.h | 1 +
include/linux/pci_ids.h | 1 +
8 files changed, 28 insertions(+), 10 deletions(-)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 9c9ec08d78c71b4d227beeafab1b82d6434cb5c7..e48741e001476f765e8aba0037a1b386df393683 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -41,9 +41,6 @@ config AUDIT_ARCH
config NO_IOPORT_MAP
def_bool y
-config PCI_QUIRKS
- def_bool n
-
config ARCH_SUPPORTS_UPROBES
def_bool y
@@ -258,6 +255,7 @@ config S390
select PCI_DOMAINS if PCI
select PCI_MSI if PCI
select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
+ select PCI_QUIRKS if PCI
select SPARSE_IRQ
select SWIOTLB
select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/s390/pci/Makefile b/arch/s390/pci/Makefile
index df73c5182990ad3ae4ed5a785953011feb9a093c..1810e0944a4ed9d31261788f0f6eb341e5316546 100644
--- a/arch/s390/pci/Makefile
+++ b/arch/s390/pci/Makefile
@@ -5,6 +5,6 @@
obj-$(CONFIG_PCI) += pci.o pci_irq.o pci_clp.o \
pci_event.o pci_debug.o pci_insn.o pci_mmio.o \
- pci_bus.o pci_kvm_hook.o pci_report.o
+ pci_bus.o pci_kvm_hook.o pci_report.o pci_fixup.o
obj-$(CONFIG_PCI_IOV) += pci_iov.o
obj-$(CONFIG_SYSFS) += pci_sysfs.o
diff --git a/arch/s390/pci/pci_fixup.c b/arch/s390/pci/pci_fixup.c
new file mode 100644
index 0000000000000000000000000000000000000000..35688b645098329f082d0c40cc8c59231c390eaa
--- /dev/null
+++ b/arch/s390/pci/pci_fixup.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Exceptions for specific devices,
+ *
+ * Copyright IBM Corp. 2025
+ *
+ * Author(s):
+ * Niklas Schnelle <schnelle@linux.ibm.com>
+ */
+#include <linux/pci.h>
+
+static void zpci_ism_bar_no_mmap(struct pci_dev *pdev)
+{
+ /*
+ * ISM's BAR is special. Drivers written for ISM know
+ * how to handle this but others need to be aware of their
+ * special nature e.g. to prevent attempts to mmap() it.
+ */
+ pdev->non_mappable_bars = 1;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM,
+ PCI_DEVICE_ID_IBM_ISM,
+ zpci_ism_bar_no_mmap);
diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index e36e3ea165d3b2b01d68e53634676cb8c2c40220..d32633ed9fa80c1764724f493b363bfd6cb4f9cf 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -20,7 +20,6 @@
MODULE_DESCRIPTION("ISM driver for s390");
MODULE_LICENSE("GPL");
-#define PCI_DEVICE_ID_IBM_ISM 0x04ED
#define DRV_NAME "ism"
static const struct pci_device_id ism_device_table[] = {
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index bf50ffa10bdea9e52a9d01cc3d6ee4cade39a08c..c3bcb6911c538286f7985f9c5e938d587fc04b56 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -7,10 +7,6 @@ config VFIO_PCI_CORE
select VFIO_VIRQFD
select IRQ_BYPASS_MANAGER
-config VFIO_PCI_MMAP
- def_bool y if !S390
- depends on VFIO_PCI_CORE
-
config VFIO_PCI_INTX
def_bool y if !S390
depends on VFIO_PCI_CORE
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 586e49efb81be32ccb50ca554a60cec684c37402..c8586d47704c74cf9a5256d65bbf888db72b2f91 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -116,7 +116,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
res = &vdev->pdev->resource[bar];
- if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
+ if (vdev->pdev->non_mappable_bars)
goto no_mmap;
if (!(res->flags & IORESOURCE_MEM))
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 47b31ad724fa5bf7abd7c3dc572947551b0f2148..7192b9d78d7e337ce6144190325458fe3c0f1696 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -476,6 +476,7 @@ struct pci_dev {
unsigned int no_command_memory:1; /* No PCI_COMMAND_MEMORY */
unsigned int rom_bar_overlap:1; /* ROM BAR disable broken */
unsigned int rom_attr_enabled:1; /* Display of ROM attribute enabled? */
+ unsigned int non_mappable_bars:1; /* BARs can't be mapped to user-space */
pci_dev_flags_t dev_flags;
atomic_t enable_cnt; /* pci_enable_device has been called */
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index de5deb1a0118fcf56570d461cbe7a501d4bd0da3..ec6d311ed12e174dc0bad2ce8c92454bed668fee 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -518,6 +518,7 @@
#define PCI_DEVICE_ID_IBM_ICOM_V2_ONE_PORT_RVX_ONE_PORT_MDM 0x0251
#define PCI_DEVICE_ID_IBM_ICOM_V2_ONE_PORT_RVX_ONE_PORT_MDM_PCIE 0x0361
#define PCI_DEVICE_ID_IBM_ICOM_FOUR_PORT_MODEL 0x252
+#define PCI_DEVICE_ID_IBM_ISM 0x04ED
#define PCI_SUBVENDOR_ID_IBM 0x1014
#define PCI_SUBDEVICE_ID_IBM_SATURN_SERIAL_ONE_PORT 0x03d4
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v5 2/2] PCI: s390: Support mmap() of BARs and replace VFIO_PCI_MMAP by a device flag
2025-02-12 15:28 ` [PATCH v5 2/2] PCI: s390: Support mmap() of BARs and replace VFIO_PCI_MMAP by a device flag Niklas Schnelle
@ 2025-02-12 17:51 ` Bjorn Helgaas
2025-02-12 20:28 ` Alex Williamson
1 sibling, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2025-02-12 17:51 UTC (permalink / raw)
To: Niklas Schnelle
Cc: Christoph Hellwig, Alexandra Winter, Alex Williamson, Gerd Bayer,
Matthew Rosato, Jason Gunthorpe, Thorsten Winkler, Bjorn Helgaas,
Julian Ruess, Halil Pasic, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
linux-s390, linux-kernel, kvm, linux-pci
On Wed, Feb 12, 2025 at 04:28:32PM +0100, Niklas Schnelle wrote:
> On s390 there is a virtual PCI device called ISM which has a few
> peculiarities. For one, it presents a 256 TiB PCI BAR whose size leads
> to any attempt to ioremap() the whole BAR failing. This is problematic
> since mapping the whole BAR is the default behavior of for example
> vfio-pci in combination with QEMU and VFIO_PCI_MMAP enabled.
>
> Even if one tried to map this BAR only partially, the mapping would not
> be usable without extra precautions on systems with MIO support enabled.
> This is because of another oddity, in that this virtual PCI device does
> not support the newer memory I/O (MIO) PCI instructions and legacy PCI
> instructions are not accessible through writeq()/readq() when MIO is in
> use.
>
> In short the ISM device's BAR is not accessible through memory mappings.
> Indicate this by introducing a new non_mappable_bars flag for the ISM
> device and set it using a PCI quirk. Use this flag instead of the
> VFIO_PCI_MMAP Kconfig option to block mapping with vfio-pci. This was
> the only use of the Kconfig option so remove it. Note that there are no
> PCI resource sysfs files on s390x already as HAVE_PCI_MMAP is currently
> not set. If this were to be set in the future pdev->non_mappable_bars
> can be used to prevent unusable resource files for ISM from being
> created.
>
> As s390x has no PCI quirk handling add basic support modeled after x86's
> arch/x86/pci/fixup.c and move the ISM device's PCI ID to the common
> header to make it accessible. Also enable CONFIG_PCI_QUIRKS whenever
> CONFIG_PCI is enabled.
>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> +++ b/include/linux/pci_ids.h
> @@ -518,6 +518,7 @@
> #define PCI_DEVICE_ID_IBM_ICOM_V2_ONE_PORT_RVX_ONE_PORT_MDM 0x0251
> #define PCI_DEVICE_ID_IBM_ICOM_V2_ONE_PORT_RVX_ONE_PORT_MDM_PCIE 0x0361
> #define PCI_DEVICE_ID_IBM_ICOM_FOUR_PORT_MODEL 0x252
> +#define PCI_DEVICE_ID_IBM_ISM 0x04ED
Use lower-case hex to match other entries.
> #define PCI_SUBVENDOR_ID_IBM 0x1014
> #define PCI_SUBDEVICE_ID_IBM_SATURN_SERIAL_ONE_PORT 0x03d4
>
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5 2/2] PCI: s390: Support mmap() of BARs and replace VFIO_PCI_MMAP by a device flag
2025-02-12 15:28 ` [PATCH v5 2/2] PCI: s390: Support mmap() of BARs and replace VFIO_PCI_MMAP by a device flag Niklas Schnelle
2025-02-12 17:51 ` Bjorn Helgaas
@ 2025-02-12 20:28 ` Alex Williamson
2025-02-13 10:06 ` Niklas Schnelle
1 sibling, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2025-02-12 20:28 UTC (permalink / raw)
To: Niklas Schnelle
Cc: Bjorn Helgaas, Christoph Hellwig, Alexandra Winter, Gerd Bayer,
Matthew Rosato, Jason Gunthorpe, Thorsten Winkler, Bjorn Helgaas,
Julian Ruess, Halil Pasic, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
linux-s390, linux-kernel, kvm, linux-pci
On Wed, 12 Feb 2025 16:28:32 +0100
Niklas Schnelle <schnelle@linux.ibm.com> wrote:
> On s390 there is a virtual PCI device called ISM which has a few
> peculiarities. For one, it presents a 256 TiB PCI BAR whose size leads
> to any attempt to ioremap() the whole BAR failing. This is problematic
> since mapping the whole BAR is the default behavior of for example
> vfio-pci in combination with QEMU and VFIO_PCI_MMAP enabled.
>
> Even if one tried to map this BAR only partially, the mapping would not
> be usable without extra precautions on systems with MIO support enabled.
> This is because of another oddity, in that this virtual PCI device does
> not support the newer memory I/O (MIO) PCI instructions and legacy PCI
> instructions are not accessible through writeq()/readq() when MIO is in
> use.
>
> In short the ISM device's BAR is not accessible through memory mappings.
> Indicate this by introducing a new non_mappable_bars flag for the ISM
> device and set it using a PCI quirk. Use this flag instead of the
> VFIO_PCI_MMAP Kconfig option to block mapping with vfio-pci. This was
> the only use of the Kconfig option so remove it. Note that there are no
> PCI resource sysfs files on s390x already as HAVE_PCI_MMAP is currently
> not set. If this were to be set in the future pdev->non_mappable_bars
> can be used to prevent unusable resource files for ISM from being
> created.
I think we should also look at it from the opposite side, not just
s390x maybe adding HAVE_PCI_MMAP in the future, but the fact that we're
currently adding a generic PCI device flag which isn't honored by the
one mechanism that PCI core provides to mmap MMIO BARs to userspace.
It seems easier to implement it in pci_mmap_resource() now rather than
someone later discovering there's no enforcement outside of the very
narrow s390x use case. Thanks,
Alex
> As s390x has no PCI quirk handling add basic support modeled after x86's
> arch/x86/pci/fixup.c and move the ISM device's PCI ID to the common
> header to make it accessible. Also enable CONFIG_PCI_QUIRKS whenever
> CONFIG_PCI is enabled.
>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
> arch/s390/Kconfig | 4 +---
> arch/s390/pci/Makefile | 2 +-
> arch/s390/pci/pci_fixup.c | 23 +++++++++++++++++++++++
> drivers/s390/net/ism_drv.c | 1 -
> drivers/vfio/pci/Kconfig | 4 ----
> drivers/vfio/pci/vfio_pci_core.c | 2 +-
> include/linux/pci.h | 1 +
> include/linux/pci_ids.h | 1 +
> 8 files changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index 9c9ec08d78c71b4d227beeafab1b82d6434cb5c7..e48741e001476f765e8aba0037a1b386df393683 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -41,9 +41,6 @@ config AUDIT_ARCH
> config NO_IOPORT_MAP
> def_bool y
>
> -config PCI_QUIRKS
> - def_bool n
> -
> config ARCH_SUPPORTS_UPROBES
> def_bool y
>
> @@ -258,6 +255,7 @@ config S390
> select PCI_DOMAINS if PCI
> select PCI_MSI if PCI
> select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
> + select PCI_QUIRKS if PCI
> select SPARSE_IRQ
> select SWIOTLB
> select SYSCTL_EXCEPTION_TRACE
> diff --git a/arch/s390/pci/Makefile b/arch/s390/pci/Makefile
> index df73c5182990ad3ae4ed5a785953011feb9a093c..1810e0944a4ed9d31261788f0f6eb341e5316546 100644
> --- a/arch/s390/pci/Makefile
> +++ b/arch/s390/pci/Makefile
> @@ -5,6 +5,6 @@
>
> obj-$(CONFIG_PCI) += pci.o pci_irq.o pci_clp.o \
> pci_event.o pci_debug.o pci_insn.o pci_mmio.o \
> - pci_bus.o pci_kvm_hook.o pci_report.o
> + pci_bus.o pci_kvm_hook.o pci_report.o pci_fixup.o
> obj-$(CONFIG_PCI_IOV) += pci_iov.o
> obj-$(CONFIG_SYSFS) += pci_sysfs.o
> diff --git a/arch/s390/pci/pci_fixup.c b/arch/s390/pci/pci_fixup.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..35688b645098329f082d0c40cc8c59231c390eaa
> --- /dev/null
> +++ b/arch/s390/pci/pci_fixup.c
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Exceptions for specific devices,
> + *
> + * Copyright IBM Corp. 2025
> + *
> + * Author(s):
> + * Niklas Schnelle <schnelle@linux.ibm.com>
> + */
> +#include <linux/pci.h>
> +
> +static void zpci_ism_bar_no_mmap(struct pci_dev *pdev)
> +{
> + /*
> + * ISM's BAR is special. Drivers written for ISM know
> + * how to handle this but others need to be aware of their
> + * special nature e.g. to prevent attempts to mmap() it.
> + */
> + pdev->non_mappable_bars = 1;
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM,
> + PCI_DEVICE_ID_IBM_ISM,
> + zpci_ism_bar_no_mmap);
> diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
> index e36e3ea165d3b2b01d68e53634676cb8c2c40220..d32633ed9fa80c1764724f493b363bfd6cb4f9cf 100644
> --- a/drivers/s390/net/ism_drv.c
> +++ b/drivers/s390/net/ism_drv.c
> @@ -20,7 +20,6 @@
> MODULE_DESCRIPTION("ISM driver for s390");
> MODULE_LICENSE("GPL");
>
> -#define PCI_DEVICE_ID_IBM_ISM 0x04ED
> #define DRV_NAME "ism"
>
> static const struct pci_device_id ism_device_table[] = {
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index bf50ffa10bdea9e52a9d01cc3d6ee4cade39a08c..c3bcb6911c538286f7985f9c5e938d587fc04b56 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -7,10 +7,6 @@ config VFIO_PCI_CORE
> select VFIO_VIRQFD
> select IRQ_BYPASS_MANAGER
>
> -config VFIO_PCI_MMAP
> - def_bool y if !S390
> - depends on VFIO_PCI_CORE
> -
> config VFIO_PCI_INTX
> def_bool y if !S390
> depends on VFIO_PCI_CORE
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 586e49efb81be32ccb50ca554a60cec684c37402..c8586d47704c74cf9a5256d65bbf888db72b2f91 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -116,7 +116,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
>
> res = &vdev->pdev->resource[bar];
>
> - if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
> + if (vdev->pdev->non_mappable_bars)
> goto no_mmap;
>
> if (!(res->flags & IORESOURCE_MEM))
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 47b31ad724fa5bf7abd7c3dc572947551b0f2148..7192b9d78d7e337ce6144190325458fe3c0f1696 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -476,6 +476,7 @@ struct pci_dev {
> unsigned int no_command_memory:1; /* No PCI_COMMAND_MEMORY */
> unsigned int rom_bar_overlap:1; /* ROM BAR disable broken */
> unsigned int rom_attr_enabled:1; /* Display of ROM attribute enabled? */
> + unsigned int non_mappable_bars:1; /* BARs can't be mapped to user-space */
> pci_dev_flags_t dev_flags;
> atomic_t enable_cnt; /* pci_enable_device has been called */
>
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index de5deb1a0118fcf56570d461cbe7a501d4bd0da3..ec6d311ed12e174dc0bad2ce8c92454bed668fee 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -518,6 +518,7 @@
> #define PCI_DEVICE_ID_IBM_ICOM_V2_ONE_PORT_RVX_ONE_PORT_MDM 0x0251
> #define PCI_DEVICE_ID_IBM_ICOM_V2_ONE_PORT_RVX_ONE_PORT_MDM_PCIE 0x0361
> #define PCI_DEVICE_ID_IBM_ICOM_FOUR_PORT_MODEL 0x252
> +#define PCI_DEVICE_ID_IBM_ISM 0x04ED
>
> #define PCI_SUBVENDOR_ID_IBM 0x1014
> #define PCI_SUBDEVICE_ID_IBM_SATURN_SERIAL_ONE_PORT 0x03d4
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5 2/2] PCI: s390: Support mmap() of BARs and replace VFIO_PCI_MMAP by a device flag
2025-02-12 20:28 ` Alex Williamson
@ 2025-02-13 10:06 ` Niklas Schnelle
0 siblings, 0 replies; 6+ messages in thread
From: Niklas Schnelle @ 2025-02-13 10:06 UTC (permalink / raw)
To: Alex Williamson
Cc: Bjorn Helgaas, Christoph Hellwig, Alexandra Winter, Gerd Bayer,
Matthew Rosato, Jason Gunthorpe, Thorsten Winkler, Bjorn Helgaas,
Julian Ruess, Halil Pasic, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
linux-s390, linux-kernel, kvm, linux-pci
On Wed, 2025-02-12 at 13:28 -0700, Alex Williamson wrote:
> On Wed, 12 Feb 2025 16:28:32 +0100
> Niklas Schnelle <schnelle@linux.ibm.com> wrote:
>
> > On s390 there is a virtual PCI device called ISM which has a few
> > peculiarities. For one, it presents a 256 TiB PCI BAR whose size leads
> > to any attempt to ioremap() the whole BAR failing. This is problematic
> > since mapping the whole BAR is the default behavior of for example
> > vfio-pci in combination with QEMU and VFIO_PCI_MMAP enabled.
> >
> > Even if one tried to map this BAR only partially, the mapping would not
> > be usable without extra precautions on systems with MIO support enabled.
> > This is because of another oddity, in that this virtual PCI device does
> > not support the newer memory I/O (MIO) PCI instructions and legacy PCI
> > instructions are not accessible through writeq()/readq() when MIO is in
> > use.
> >
> > In short the ISM device's BAR is not accessible through memory mappings.
> > Indicate this by introducing a new non_mappable_bars flag for the ISM
> > device and set it using a PCI quirk. Use this flag instead of the
> > VFIO_PCI_MMAP Kconfig option to block mapping with vfio-pci. This was
> > the only use of the Kconfig option so remove it. Note that there are no
> > PCI resource sysfs files on s390x already as HAVE_PCI_MMAP is currently
> > not set. If this were to be set in the future pdev->non_mappable_bars
> > can be used to prevent unusable resource files for ISM from being
> > created.
>
> I think we should also look at it from the opposite side, not just
> s390x maybe adding HAVE_PCI_MMAP in the future, but the fact that we're
> currently adding a generic PCI device flag which isn't honored by the
> one mechanism that PCI core provides to mmap MMIO BARs to userspace.
> It seems easier to implement it in pci_mmap_resource() now rather than
> someone later discovering there's no enforcement outside of the very
> narrow s390x use case. Thanks,
>
> Alex
That is a very good point! I did try enabling HAVE_PCI_MMAP for s390 a
while back and I believe that ran into trouble with ISM devices too. So
I just did a quick test of enabling HAVE_PCI_MMAP with
ARCH_GENERIC_PCI_MMAP_RESOURCE for s390. Then added a check for
pdev->non_mappable_bars to pci_create_resource_files() and
proc_bus_pci_mmap(). I pondered adding it to pci_mmap_resource() too
but felt like not showing the resource at all, like we do now with
!HAVE_PCI_MMAP is cleaner.
Using a little test program that just mmap()s BAR 0 of an NVMe and
reads the NVMe version at offset 8 using our PCI MIO load instruction
works. Also, as expected I don't get resourceX files for ISM devices
with the added check.
I still have to test the /proc/bus/pci based mmap() but would expect
that to work too. So I'd be open to adding another patch which adds
HAVE_PCI_MMAP for s390, if we see too much risk with that, we could
alternatively add just the pdev->non_mappable_bars but then they would
be untested, still better than hoping that someone remembers to add
that in the future.
Thanks,
Niklas
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-02-13 10:06 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-12 15:28 [PATCH v5 0/2] vfio/pci: s390: Fix issues preventing VFIO_PCI_MMAP=y for s390 and enable it Niklas Schnelle
2025-02-12 15:28 ` [PATCH v5 1/2] s390/pci: Fix s390_mmio_read/write syscall page fault handling Niklas Schnelle
2025-02-12 15:28 ` [PATCH v5 2/2] PCI: s390: Support mmap() of BARs and replace VFIO_PCI_MMAP by a device flag Niklas Schnelle
2025-02-12 17:51 ` Bjorn Helgaas
2025-02-12 20:28 ` Alex Williamson
2025-02-13 10:06 ` Niklas Schnelle
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox