[PATCH v15 0/5] PCI devices passthrough on Arm, part 3

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v15 0/5] PCI devices passthrough on Arm, part 3
@ 2024-05-17 17:06 Stewart Hildebrand
  2024-05-17 17:06 ` [PATCH v15 1/5] arm/vpci: honor access size when returning an error Stewart Hildebrand
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Stewart Hildebrand @ 2024-05-17 17:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Stewart Hildebrand, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Roger Pau Monné, Andrew Cooper, George Dunlap, Jan Beulich

This is next version of vPCI rework. Aim of this series is to prepare
ground for introducing PCI support on ARM platform.

in v15:
 - reorder so ("arm/vpci: honor access size when returning an error")
   comes first

in v14:
 - drop first 9 patches as they were committed
 - updated ("vpci/header: emulate PCI_COMMAND register for guests")

in v13:
 - drop ("xen/arm: vpci: permit access to guest vpci space") as it was
   unnecessary

in v12:
 - I (Stewart) coordinated with Volodomyr to send this whole series. So,
   add my (Stewart) Signed-off-by to all patches.
 - The biggest change is to re-work the PCI_COMMAND register patch.
   Additional feedback has also been addressed - see individual patches.
 - Drop ("pci: msi: pass pdev to pci_enable_msi() function") and
   ("pci: introduce per-domain PCI rwlock") as they were committed
 - Rename ("rangeset: add rangeset_empty() function")
       to ("rangeset: add rangeset_purge() function")
 - Rename ("vpci/header: rework exit path in init_bars")
       to ("vpci/header: rework exit path in init_header()")

in v11:
 - Added my (Volodymyr) Signed-off-by tag to all patches
 - Patch "vpci/header: emulate PCI_COMMAND register for guests" is in
   intermediate state, because it was agreed to rework it once Stewart's
   series on register handling are in.
 - Addressed comments, please see patch descriptions for details.

in v10:

 - Removed patch ("xen/arm: vpci: check guest range"), proper fix
   for the issue is part of ("vpci/header: emulate PCI_COMMAND
   register for guests")
 - Removed patch ("pci/header: reset the command register when adding
   devices")
 - Added patch ("rangeset: add rangeset_empty() function") because
   this function is needed in ("vpci/header: handle p2m range sets
   per BAR")
 - Added ("vpci/header: handle p2m range sets per BAR") which addressed
   an issue discovered by Andrii Chepurnyi during virtio integration
 - Added ("pci: msi: pass pdev to pci_enable_msi() function"), which is
   prereq for ("pci: introduce per-domain PCI rwlock")
 - Fixed "Since v9/v8/... " comments in changelogs to reduce confusion.
   I left "Since" entries for older versions, because they were added
   by original author of the patches.

in v9:

v9 includes addressed commentes from a previous one. Also it
introduces a couple patches from Stewart. This patches are related to
vPCI use on ARM. Patch "vpci/header: rework exit path in init_bars"
was factored-out from "vpci/header: handle p2m range sets per BAR".

in v8:

The biggest change from previous, mistakenly named, v7 series is how
locking is implemented. Instead of d->vpci_rwlock we introduce
d->pci_lock which has broader scope, as it protects not only domain's
vpci state, but domain's list of PCI devices as well.

As we discussed in IRC with Roger, it is not feasible to rework all
the existing code to use the new lock right away. It was agreed that
any write access to d->pdev_list will be protected by **both**
d->pci_lock in write mode and pcidevs_lock(). Read access on other
hand should be protected by either d->pci_lock in read mode or
pcidevs_lock(). It is expected that existing code will use
pcidevs_lock() and new users will use new rw lock. Of course, this
does not mean that new users shall not use pcidevs_lock() when it is
appropriate.

Changes from previous versions are described in each separate patch.

Oleksandr Andrushchenko (4):
  vpci/header: emulate PCI_COMMAND register for guests
  vpci: add initial support for virtual PCI bus topology
  xen/arm: translate virtual PCI bus topology for guests
  xen/arm: account IO handlers for emulated PCI MSI-X

Volodymyr Babchuk (1):
  arm/vpci: honor access size when returning an error

 xen/arch/arm/vpci.c        | 63 +++++++++++++++++++++++------
 xen/drivers/Kconfig        |  4 ++
 xen/drivers/vpci/header.c  | 60 +++++++++++++++++++++++++---
 xen/drivers/vpci/msi.c     |  9 +++++
 xen/drivers/vpci/msix.c    |  7 ++++
 xen/drivers/vpci/vpci.c    | 81 ++++++++++++++++++++++++++++++++++++++
 xen/include/xen/pci_regs.h |  1 +
 xen/include/xen/sched.h    | 10 ++++-
 xen/include/xen/vpci.h     | 27 +++++++++++++
 9 files changed, 243 insertions(+), 19 deletions(-)


base-commit: 21611c68702dae2e18cb519a6e166cdceeaff4ca
-- 
2.45.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v15 1/5] arm/vpci: honor access size when returning an error
  2024-05-17 17:06 [PATCH v15 0/5] PCI devices passthrough on Arm, part 3 Stewart Hildebrand
@ 2024-05-17 17:06 ` Stewart Hildebrand
  2024-05-21 13:45   ` Julien Grall
  2024-05-17 17:06 ` [PATCH v15 2/5] vpci/header: emulate PCI_COMMAND register for guests Stewart Hildebrand
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Stewart Hildebrand @ 2024-05-17 17:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Volodymyr Babchuk, Stewart Hildebrand, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk

From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>

Guest can try to read config space using different access sizes: 8,
16, 32, 64 bits. We need to take this into account when we are
returning an error back to MMIO handler, otherwise it is possible to
provide more data than requested: i.e. guest issues LDRB instruction
to read one byte, but we are writing 0xFFFFFFFFFFFFFFFF in the target
register.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
v14->v15:
* re-order so this patch comes before ("xen/arm: translate virtual PCI
  bus topology for guests")
* s/access_mask/invalid/
* add U suffix to 1
* s/uint8_t/unsigned int/
* s/uint64_t/register_t/
* although Julien gave an Acked-by on v14, I omitted it due to the
  changes made in v15

v9->10:
* New patch in v10.
---
 xen/arch/arm/vpci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 3bc4bb55082a..b63a356bb4a8 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -29,6 +29,8 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 {
     struct pci_host_bridge *bridge = p;
     pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+    const unsigned int access_size = (1U << info->dabt.size) * 8;
+    const register_t invalid = GENMASK_ULL(access_size - 1, 0);
     /* data is needed to prevent a pointer cast on 32bit */
     unsigned long data;
 
@@ -39,7 +41,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
         return 1;
     }
 
-    *r = ~0ul;
+    *r = invalid;
 
     return 0;
 }
-- 
2.45.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v15 2/5] vpci/header: emulate PCI_COMMAND register for guests
  2024-05-17 17:06 [PATCH v15 0/5] PCI devices passthrough on Arm, part 3 Stewart Hildebrand
  2024-05-17 17:06 ` [PATCH v15 1/5] arm/vpci: honor access size when returning an error Stewart Hildebrand
@ 2024-05-17 17:06 ` Stewart Hildebrand
  2024-05-22  9:28   ` Roger Pau Monné
  2024-05-17 17:06 ` [PATCH v15 3/5] vpci: add initial support for virtual PCI bus topology Stewart Hildebrand
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Stewart Hildebrand @ 2024-05-17 17:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Andrushchenko, Roger Pau Monné, Stewart Hildebrand,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Volodymyr Babchuk

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's (domU) view of this will want to be zero (for now), the host
having set it to 1 should be preserved, or else we'd effectively be
giving the domU control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.

According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the
guest's (domU) view of the command register.

Here is the full list of command register bits with notes about
PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
also documented here since that is our current reference implementation
for PCI passthrough.

PCI_COMMAND_IO (bit 0)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU sets this bit to 1 in
    hardware if an I/O BAR is exposed to the guest.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
    don't yet support I/O BARs for domUs.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_MEMORY (bit 1)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU sets this bit to 1 in
    hardware if a Memory BAR is exposed to the guest.
  Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
    regions.
  Xen domU: For devices assigned to DomUs, memory decoding will be
    disabled at the time of initialization.

PCI_COMMAND_MASTER (bit 2)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_SPECIAL (bit 3)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_INVALIDATE (bit 4)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_VGA_PALETTE (bit 5)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_PARITY (bit 6)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_WAIT (bit 7)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: hardwire to 0
  QEMU: res_mask
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_SERR (bit 8)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_FAST_BACK (bit 9)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_INTX_DISABLE (bit 10)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU checks if INTx was mapped
    for a device. If it is not, then guest can't control
    PCI_COMMAND_INTX_DISABLE bit.
  Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
  Xen dom0: We allow dom0 to control this bit freely.

Bits 11-15
  PCIe 6.1: RsvdP
  PCI LB 3.0: Reserved
  QEMU: res_mask
  Xen domU/dom0: rsvdp_mask

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
RFC: There is an unaddressed question for Roger: should we update the
     guest view of the PCI_COMMAND_INTX_DISABLE bit in
     msi.c/msix.c:control_write()? See prior discussion at [1].
     In my opinion, I think we should make sure that hardware state and
     the guest view are consistent (i.e. don't lie to the guest).

[1] https://lore.kernel.org/xen-devel/86b25777-788c-4b9a-8166-a6f8174bedc9@suse.com/

In v15:
- add Jan's R-b
- add blank line after declaration in msi.c:control_write()

In v14:
- check for 0->1 transition in INTX_DISABLE-setting logic in
  msi.c:control_write() to match msix.c:control_write()
- clear domU-controllable bits in header.c:init_header()

In v13:
- Update right away (don't defer) PCI_COMMAND_MEMORY bit in guest_cmd
  variable in cmd_write()
- Make comment single line in xen/drivers/vpci/msi.c:control_write()
- Rearrange memory decoding disabling snippet in init_header()

In v12:
- Rework patch using vpci_add_register_mask()
- Add bitmask #define in pci_regs.h according to PCIe 6.1 spec, except
  don't add the RO bits because they were RW in PCI LB 3.0 spec.
- Move and expand TODO comment about properly emulating bits
- Update guest_cmd in msi.c/msix.c:control_write()
- Simplify cmd_write(), thanks to rsvdp_mask
- Update commit description

In v11:
- Fix copy-paste mistake: vpci->msi should be vpci->msix
- Handle PCI_COMMAND_IO
- Fix condition for disabling INTx in the MSI-X code
- Show domU changes to only allowed bits
- Show PCI_COMMAND_MEMORY write only after P2M was altered
- Update comments in the code
In v10:
- Added cf_check attribute to guest_cmd_read
- Removed warning about non-zero cmd
- Updated comment MSI code regarding disabling INTX
- Used ternary operator in vpci_add_register() call
- Disable memory decoding for DomUs in init_bars()
In v9:
- Reworked guest_cmd_read
- Added handling for more bits
Since v6:
- fold guest's logic into cmd_write
- implement cmd_read, so we can report emulated INTx state to guests
- introduce header->guest_cmd to hold the emulated state of the
  PCI_COMMAND register for guests
Since v5:
- add additional check for MSI-X enabled while altering INTX bit
- make sure INTx disabled while guests enable MSI/MSI-X
Since v3:
- gate more code on CONFIG_HAS_MSI
- removed logic for the case when MSI/MSI-X not enabled
---
 xen/drivers/vpci/header.c  | 60 ++++++++++++++++++++++++++++++++++----
 xen/drivers/vpci/msi.c     |  9 ++++++
 xen/drivers/vpci/msix.c    |  7 +++++
 xen/include/xen/pci_regs.h |  1 +
 xen/include/xen/vpci.h     |  3 ++
 5 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 47648c395132..2491dbae8901 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -524,9 +524,21 @@ static void cf_check cmd_write(
 {
     struct vpci_header *header = data;
 
+    if ( !is_hardware_domain(pdev->domain) )
+    {
+        const struct vpci *vpci = pdev->vpci;
+
+        if ( (vpci->msi && vpci->msi->enabled) ||
+             (vpci->msix && vpci->msix->enabled) )
+            cmd |= PCI_COMMAND_INTX_DISABLE;
+
+        header->guest_cmd = cmd;
+    }
+
     /*
      * Let Dom0 play with all the bits directly except for the memory
-     * decoding one.
+     * decoding one. Bits that are not allowed for DomU are already
+     * handled above and by the rsvdp_mask.
      */
     if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) )
         /*
@@ -540,6 +552,14 @@ static void cf_check cmd_write(
         pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
+static uint32_t cf_check guest_cmd_read(
+    const struct pci_dev *pdev, unsigned int reg, void *data)
+{
+    const struct vpci_header *header = data;
+
+    return header->guest_cmd;
+}
+
 static void cf_check bar_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
@@ -754,9 +774,23 @@ static int cf_check init_header(struct pci_dev *pdev)
         return -EOPNOTSUPP;
     }
 
-    /* Setup a handler for the command register. */
-    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
-                           2, header);
+    /*
+     * Setup a handler for the command register.
+     *
+     * TODO: If support for emulated bits is added, re-visit how to handle
+     * PCI_COMMAND_PARITY, PCI_COMMAND_SERR, and PCI_COMMAND_FAST_BACK.
+     */
+    rc = vpci_add_register_mask(pdev->vpci,
+                                is_hwdom ? vpci_hw_read16 : guest_cmd_read,
+                                cmd_write, PCI_COMMAND, 2, header, 0, 0,
+                                PCI_COMMAND_RSVDP_MASK |
+                                    (is_hwdom ? 0
+                                              : PCI_COMMAND_IO |
+                                                PCI_COMMAND_PARITY |
+                                                PCI_COMMAND_WAIT |
+                                                PCI_COMMAND_SERR |
+                                                PCI_COMMAND_FAST_BACK),
+                                0);
     if ( rc )
         return rc;
 
@@ -836,9 +870,23 @@ static int cf_check init_header(struct pci_dev *pdev)
     if ( pdev->ignore_bars )
         return 0;
 
-    /* Disable memory decoding before sizing. */
     cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
-    if ( cmd & PCI_COMMAND_MEMORY )
+
+    /*
+     * For DomUs, clear PCI_COMMAND_{MASTER,MEMORY,IO} and other
+     * DomU-controllable bits in PCI_COMMAND. Devices assigned to DomUs will
+     * start with memory decoding disabled, and modify_bars() will not be called
+     * at the end of this function.
+     */
+    if ( !is_hwdom )
+        cmd &= ~(PCI_COMMAND_VGA_PALETTE | PCI_COMMAND_INVALIDATE |
+                 PCI_COMMAND_SPECIAL | PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY |
+                 PCI_COMMAND_IO);
+
+    header->guest_cmd = cmd;
+
+    /* Disable memory decoding before sizing. */
+    if ( !is_hwdom || (cmd & PCI_COMMAND_MEMORY) )
         pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
 
     for ( i = 0; i < num_bars; i++ )
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 30adcf7df05d..dd6620ec5674 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -57,6 +57,8 @@ static void cf_check control_write(
 
     if ( new_enabled )
     {
+        bool old_enabled = msi->enabled;
+
         /*
          * If the device is already enabled it means the number of
          * enabled messages has changed. Disable and re-enable the
@@ -70,6 +72,13 @@ static void cf_check control_write(
 
         if ( vpci_msi_arch_enable(msi, pdev, vectors) )
             return;
+
+        /* Make sure domU doesn't enable INTx while enabling MSI. */
+        if ( !old_enabled && !is_hardware_domain(pdev->domain) )
+        {
+            pci_intx(pdev, false);
+            pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
+        }
     }
     else
         vpci_msi_arch_disable(msi, pdev);
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 58c16ebdf283..fbe710ab92ef 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -135,6 +135,13 @@ static void cf_check control_write(
         }
     }
 
+    /* Make sure domU doesn't enable INTx while enabling MSI-X. */
+    if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) )
+    {
+        pci_intx(pdev, false);
+        pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
+    }
+
     msix->masked = new_masked;
     msix->enabled = new_enabled;
 
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index 0bc18efabb74..250ba106dbd3 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -48,6 +48,7 @@
 #define  PCI_COMMAND_SERR	0x100	/* Enable SERR */
 #define  PCI_COMMAND_FAST_BACK	0x200	/* Enable back-to-back writes */
 #define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
+#define  PCI_COMMAND_RSVDP_MASK	0xf800
 
 #define PCI_STATUS		0x06	/* 16 bits */
 #define  PCI_STATUS_IMM_READY	0x01	/* Immediate Readiness */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 6e4c972f35ed..2064d9354f5b 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -107,6 +107,9 @@ struct vpci {
         } bars[PCI_HEADER_NORMAL_NR_BARS + 1];
         /* At most 6 BARS + 1 expansion ROM BAR. */
 
+        /* Guest (domU only) view of the PCI_COMMAND register. */
+        uint16_t guest_cmd;
+
         /*
          * Store whether the ROM enable bit is set (doesn't imply ROM BAR
          * is mapped into guest p2m) if there's a ROM BAR on the device.
-- 
2.45.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v15 3/5] vpci: add initial support for virtual PCI bus topology
  2024-05-17 17:06 [PATCH v15 0/5] PCI devices passthrough on Arm, part 3 Stewart Hildebrand
  2024-05-17 17:06 ` [PATCH v15 1/5] arm/vpci: honor access size when returning an error Stewart Hildebrand
  2024-05-17 17:06 ` [PATCH v15 2/5] vpci/header: emulate PCI_COMMAND register for guests Stewart Hildebrand
@ 2024-05-17 17:06 ` Stewart Hildebrand
  2024-05-22 10:33   ` Roger Pau Monné
  2024-05-17 17:06 ` [PATCH v15 4/5] xen/arm: translate virtual PCI bus topology for guests Stewart Hildebrand
  2024-05-17 17:06 ` [PATCH v15 5/5] xen/arm: account IO handlers for emulated PCI MSI-X Stewart Hildebrand
  4 siblings, 1 reply; 11+ messages in thread
From: Stewart Hildebrand @ 2024-05-17 17:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Andrushchenko, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini,
	Roger Pau Monné, Stewart Hildebrand, Volodymyr Babchuk

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Please note, that at the moment only function 0 of a multifunction
device can be passed through.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
In v15:
- add Jan's A-b
In v13:
- s/depends on/select/ in Kconfig
- check pdev->sbdf.fn instead of two booleans in add_virtual_device()
- comment #endifs in sched.h
- clarify comment about limits in vpci.h with seg/bus limit
In v11:
- Fixed code formatting
- Removed bogus write_unlock() call
- Fixed type for new_dev_number
In v10:
- Removed ASSERT(pcidevs_locked())
- Removed redundant code (local sbdf variable, clearing sbdf during
device removal, etc)
- Added __maybe_unused attribute to "out:" label
- Introduced HAS_VPCI_GUEST_SUPPORT Kconfig option, as this is the
  first patch where it is used (previously was in "vpci: add hooks for
  PCI device assign/de-assign")
In v9:
- Lock in add_virtual_device() replaced with ASSERT (thanks, Stewart)
In v8:
- Added write lock in add_virtual_device
Since v6:
- re-work wrt new locking scheme
- OT: add ASSERT(pcidevs_write_locked()); to add_virtual_device()
Since v5:
- s/vpci_add_virtual_device/add_virtual_device and make it static
- call add_virtual_device from vpci_assign_device and do not use
  REGISTER_VPCI_INIT machinery
- add pcidevs_locked ASSERT
- use DECLARE_BITMAP for vpci_dev_assigned_map
Since v4:
- moved and re-worked guest sbdf initializers
- s/set_bit/__set_bit
- s/clear_bit/__clear_bit
- minor comment fix s/Virtual/Guest/
- added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
  later for counting the number of MMIO handlers required for a guest
  (Julien)
Since v3:
 - make use of VPCI_INIT
 - moved all new code to vpci.c which belongs to it
 - changed open-coded 31 to PCI_SLOT(~0)
 - added comments and code to reject multifunction devices with
   functions other than 0
 - updated comment about vpci_dev_next and made it unsigned int
 - implement roll back in case of error while assigning/deassigning devices
 - s/dom%pd/%pd
Since v2:
 - remove casts that are (a) malformed and (b) unnecessary
 - add new line for better readability
 - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
    functions are now completely gated with this config
 - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/drivers/Kconfig     |  4 +++
 xen/drivers/vpci/vpci.c | 57 +++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/sched.h | 10 +++++++-
 xen/include/xen/vpci.h  | 12 +++++++++
 4 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
index db94393f47a6..20050e9bb8b3 100644
--- a/xen/drivers/Kconfig
+++ b/xen/drivers/Kconfig
@@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
 config HAS_VPCI
 	bool
 
+config HAS_VPCI_GUEST_SUPPORT
+	bool
+	select HAS_VPCI
+
 endmenu
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 97e115dc5798..23722634d50b 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -40,6 +40,49 @@ extern vpci_register_init_t *const __start_vpci_array[];
 extern vpci_register_init_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+static int add_virtual_device(struct pci_dev *pdev)
+{
+    struct domain *d = pdev->domain;
+    unsigned int new_dev_number;
+
+    if ( is_hardware_domain(d) )
+        return 0;
+
+    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+    /*
+     * Each PCI bus supports 32 devices/slots at max or up to 256 when
+     * there are multi-function ones which are not yet supported.
+     */
+    if ( pdev->sbdf.fn )
+    {
+        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
+                 &pdev->sbdf);
+        return -EOPNOTSUPP;
+    }
+    new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
+                                         VPCI_MAX_VIRT_DEV);
+    if ( new_dev_number == VPCI_MAX_VIRT_DEV )
+        return -ENOSPC;
+
+    __set_bit(new_dev_number, &d->vpci_dev_assigned_map);
+
+    /*
+     * Both segment and bus number are 0:
+     *  - we emulate a single host bridge for the guest, e.g. segment 0
+     *  - with bus 0 the virtual devices are seen as embedded
+     *    endpoints behind the root complex
+     *
+     * TODO: add support for multi-function devices.
+     */
+    pdev->vpci->guest_sbdf = PCI_SBDF(0, 0, new_dev_number, 0);
+
+    return 0;
+}
+
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
 void vpci_deassign_device(struct pci_dev *pdev)
 {
     unsigned int i;
@@ -49,6 +92,12 @@ void vpci_deassign_device(struct pci_dev *pdev)
     if ( !has_vpci(pdev->domain) || !pdev->vpci )
         return;
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    if ( pdev->vpci->guest_sbdf.sbdf != ~0 )
+        __clear_bit(pdev->vpci->guest_sbdf.dev,
+                    &pdev->domain->vpci_dev_assigned_map);
+#endif
+
     spin_lock(&pdev->vpci->lock);
     while ( !list_empty(&pdev->vpci->handlers) )
     {
@@ -103,6 +152,13 @@ int vpci_assign_device(struct pci_dev *pdev)
     INIT_LIST_HEAD(&pdev->vpci->handlers);
     spin_lock_init(&pdev->vpci->lock);
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    pdev->vpci->guest_sbdf.sbdf = ~0;
+    rc = add_virtual_device(pdev);
+    if ( rc )
+        goto out;
+#endif
+
     for ( i = 0; i < NUM_VPCI_INIT; i++ )
     {
         rc = __start_vpci_array[i](pdev);
@@ -110,6 +166,7 @@ int vpci_assign_device(struct pci_dev *pdev)
             break;
     }
 
+ out: __maybe_unused;
     if ( rc )
         vpci_deassign_device(pdev);
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 132b84199598..2dcd1d1a4f8a 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -486,7 +486,15 @@ struct domain
      * 2. pdev->vpci->lock
      */
     rwlock_t pci_lock;
-#endif
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    /*
+     * The bitmap which shows which device numbers are already used by the
+     * virtual PCI bus topology and is used to assign a unique SBDF to the
+     * next passed through virtual PCI device.
+     */
+    DECLARE_BITMAP(vpci_dev_assigned_map, VPCI_MAX_VIRT_DEV);
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+#endif /* CONFIG_HAS_PCI */
 
 #ifdef CONFIG_HAS_PASSTHROUGH
     struct domain_iommu iommu;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 2064d9354f5b..980aded26fc1 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -21,6 +21,14 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
 
 #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
 
+/*
+ * Maximum number of devices supported by the virtual bus topology:
+ * each PCI bus supports 32 devices/slots at max or up to 256 when
+ * there are multi-function ones which are not yet supported.
+ * This limit implies only segment 0, bus 0 is supported.
+ */
+#define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
+
 #define REGISTER_VPCI_INIT(x, p)                \
   static vpci_register_init_t *const x##_entry  \
                __used_section(".data.vpci." p) = (x)
@@ -175,6 +183,10 @@ struct vpci {
             struct vpci_arch_msix_entry arch;
         } entries[];
     } *msix;
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    /* Guest SBDF of the device. */
+    pci_sbdf_t guest_sbdf;
+#endif
 #endif
 };
 
-- 
2.45.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v15 4/5] xen/arm: translate virtual PCI bus topology for guests
  2024-05-17 17:06 [PATCH v15 0/5] PCI devices passthrough on Arm, part 3 Stewart Hildebrand
                   ` (2 preceding siblings ...)
  2024-05-17 17:06 ` [PATCH v15 3/5] vpci: add initial support for virtual PCI bus topology Stewart Hildebrand
@ 2024-05-17 17:06 ` Stewart Hildebrand
  2024-05-24 13:17   ` Julien Grall
  2024-05-17 17:06 ` [PATCH v15 5/5] xen/arm: account IO handlers for emulated PCI MSI-X Stewart Hildebrand
  4 siblings, 1 reply; 11+ messages in thread
From: Stewart Hildebrand @ 2024-05-17 17:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Andrushchenko, Stewart Hildebrand, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Roger Pau Monné, Volodymyr Babchuk

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

There are three  originators for the PCI configuration space access:
1. The domain that owns physical host bridge: MMIO handlers are
there so we can update vPCI register handlers with the values
written by the hardware domain, e.g. physical view of the registers
vs guest's view on the configuration space.
2. Guest access to the passed through PCI devices: we need to properly
map virtual bus topology to the physical one, e.g. pass the configuration
space access to the corresponding physical devices.
3. Emulated host PCI bridge access. It doesn't exist in the physical
topology, e.g. it can't be mapped to some physical host bridge.
So, all access to the host bridge itself needs to be trapped and
emulated.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
In v15:
- base on top of ("arm/vpci: honor access size when returning an error")
In v11:
- Fixed format issues
- Added ASSERT_UNREACHABLE() to the dummy implementation of
vpci_translate_virtual_device()
- Moved variable in vpci_sbdf_from_gpa(), now it is easier to follow
the logic in the function
Since v9:
- Commend about required lock replaced with ASSERT()
- Style fixes
- call to vpci_translate_virtual_device folded into vpci_sbdf_from_gpa
Since v8:
- locks moved out of vpci_translate_virtual_device()
Since v6:
- add pcidevs locking to vpci_translate_virtual_device
- update wrt to the new locking scheme
Since v5:
- add vpci_translate_virtual_device for #ifndef CONFIG_HAS_VPCI_GUEST_SUPPORT
  case to simplify ifdefery
- add ASSERT(!is_hardware_domain(d)); to vpci_translate_virtual_device
- reset output register on failed virtual SBDF translation
Since v4:
- indentation fixes
- constify struct domain
- updated commit message
- updates to the new locking scheme (pdev->vpci_lock)
Since v3:
- revisit locking
- move code to vpci.c
Since v2:
 - pass struct domain instead of struct vcpu
 - constify arguments where possible
 - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/arch/arm/vpci.c     | 45 ++++++++++++++++++++++++++++++++---------
 xen/drivers/vpci/vpci.c | 24 ++++++++++++++++++++++
 xen/include/xen/vpci.h  | 12 +++++++++++
 3 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index b63a356bb4a8..516933bebfb3 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -7,33 +7,53 @@
 
 #include <asm/mmio.h>
 
-static pci_sbdf_t vpci_sbdf_from_gpa(const struct pci_host_bridge *bridge,
-                                     paddr_t gpa)
+static bool vpci_sbdf_from_gpa(struct domain *d,
+                               const struct pci_host_bridge *bridge,
+                               paddr_t gpa, pci_sbdf_t *sbdf)
 {
-    pci_sbdf_t sbdf;
+    bool translated = true;
+
+    ASSERT(sbdf);
 
     if ( bridge )
     {
-        sbdf.sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr);
-        sbdf.seg = bridge->segment;
-        sbdf.bus += bridge->cfg->busn_start;
+        sbdf->sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr);
+        sbdf->seg = bridge->segment;
+        sbdf->bus += bridge->cfg->busn_start;
     }
     else
-        sbdf.sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE);
+    {
+        /*
+         * For the passed through devices we need to map their virtual SBDF
+         * to the physical PCI device being passed through.
+         */
+        sbdf->sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE);
+        read_lock(&d->pci_lock);
+        translated = vpci_translate_virtual_device(d, sbdf);
+        read_unlock(&d->pci_lock);
+    }
 
-    return sbdf;
+    return translated;
 }
 
 static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
                           register_t *r, void *p)
 {
     struct pci_host_bridge *bridge = p;
-    pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+    pci_sbdf_t sbdf;
     const unsigned int access_size = (1U << info->dabt.size) * 8;
     const register_t invalid = GENMASK_ULL(access_size - 1, 0);
     /* data is needed to prevent a pointer cast on 32bit */
     unsigned long data;
 
+    ASSERT(!bridge == !is_hardware_domain(v->domain));
+
+    if ( !vpci_sbdf_from_gpa(v->domain, bridge, info->gpa, &sbdf) )
+    {
+        *r = invalid;
+        return 1;
+    }
+
     if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
                         1U << info->dabt.size, &data) )
     {
@@ -50,7 +70,12 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
                            register_t r, void *p)
 {
     struct pci_host_bridge *bridge = p;
-    pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+    pci_sbdf_t sbdf;
+
+    ASSERT(!bridge == !is_hardware_domain(v->domain));
+
+    if ( !vpci_sbdf_from_gpa(v->domain, bridge, info->gpa, &sbdf) )
+        return 1;
 
     return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
                            1U << info->dabt.size, r);
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 23722634d50b..98b294f09688 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -81,6 +81,30 @@ static int add_virtual_device(struct pci_dev *pdev)
     return 0;
 }
 
+/*
+ * Find the physical device which is mapped to the virtual device
+ * and translate virtual SBDF to the physical one.
+ */
+bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
+{
+    const struct pci_dev *pdev;
+
+    ASSERT(!is_hardware_domain(d));
+    ASSERT(rw_is_locked(&d->pci_lock));
+
+    for_each_pdev ( d, pdev )
+    {
+        if ( pdev->vpci && (pdev->vpci->guest_sbdf.sbdf == sbdf->sbdf) )
+        {
+            /* Replace guest SBDF with the physical one. */
+            *sbdf = pdev->sbdf;
+            return true;
+        }
+    }
+
+    return false;
+}
+
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
 void vpci_deassign_device(struct pci_dev *pdev)
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 980aded26fc1..7e5a0f0c50c1 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -303,6 +303,18 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v)
 }
 #endif
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf);
+#else
+static inline bool vpci_translate_virtual_device(const struct domain *d,
+                                                 pci_sbdf_t *sbdf)
+{
+    ASSERT_UNREACHABLE();
+
+    return false;
+}
+#endif
+
 #endif
 
 /*
-- 
2.45.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v15 5/5] xen/arm: account IO handlers for emulated PCI MSI-X
  2024-05-17 17:06 [PATCH v15 0/5] PCI devices passthrough on Arm, part 3 Stewart Hildebrand
                   ` (3 preceding siblings ...)
  2024-05-17 17:06 ` [PATCH v15 4/5] xen/arm: translate virtual PCI bus topology for guests Stewart Hildebrand
@ 2024-05-17 17:06 ` Stewart Hildebrand
  4 siblings, 0 replies; 11+ messages in thread
From: Stewart Hildebrand @ 2024-05-17 17:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Andrushchenko, Stewart Hildebrand, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Julien Grall, Volodymyr Babchuk

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

At the moment, we always allocate an extra 16 slots for IO handlers
(see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated
MSI-X registers we need to explicitly tell that we have additional IO
handlers, so those are accounted.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
This depends on a constant defined in ("vpci: add initial support for
virtual PCI bus topology"), so cannot be committed without the
dependency.

Since v5:
- optimize with IS_ENABLED(CONFIG_HAS_PCI_MSI) since VPCI_MAX_VIRT_DEV is
  defined unconditionally
New in v5
---
 xen/arch/arm/vpci.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 516933bebfb3..4779bbfa9be3 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -132,6 +132,8 @@ static int vpci_get_num_handlers_cb(struct domain *d,
 
 unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
 {
+    unsigned int count;
+
     if ( !has_vpci(d) )
         return 0;
 
@@ -152,7 +154,17 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
      * For guests each host bridge requires one region to cover the
      * configuration space. At the moment, we only expose a single host bridge.
      */
-    return 1;
+    count = 1;
+
+    /*
+     * There's a single MSI-X MMIO handler that deals with both PBA
+     * and MSI-X tables per each PCI device being passed through.
+     * Maximum number of emulated virtual devices is VPCI_MAX_VIRT_DEV.
+     */
+    if ( IS_ENABLED(CONFIG_HAS_PCI_MSI) )
+        count += VPCI_MAX_VIRT_DEV;
+
+    return count;
 }
 
 /*
-- 
2.45.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v15 1/5] arm/vpci: honor access size when returning an error
  2024-05-17 17:06 ` [PATCH v15 1/5] arm/vpci: honor access size when returning an error Stewart Hildebrand
@ 2024-05-21 13:45   ` Julien Grall
  0 siblings, 0 replies; 11+ messages in thread
From: Julien Grall @ 2024-05-21 13:45 UTC (permalink / raw)
  To: Stewart Hildebrand, xen-devel
  Cc: Volodymyr Babchuk, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel

Hi Stewart,

On 17/05/2024 18:06, Stewart Hildebrand wrote:
> From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> 
> Guest can try to read config space using different access sizes: 8,
> 16, 32, 64 bits. We need to take this into account when we are
> returning an error back to MMIO handler, otherwise it is possible to
> provide more data than requested: i.e. guest issues LDRB instruction
> to read one byte, but we are writing 0xFFFFFFFFFFFFFFFF in the target
> register.
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>

Acked-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v15 2/5] vpci/header: emulate PCI_COMMAND register for guests
  2024-05-17 17:06 ` [PATCH v15 2/5] vpci/header: emulate PCI_COMMAND register for guests Stewart Hildebrand
@ 2024-05-22  9:28   ` Roger Pau Monné
  2024-05-22  9:35     ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Roger Pau Monné @ 2024-05-22  9:28 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Oleksandr Andrushchenko, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Volodymyr Babchuk

On Fri, May 17, 2024 at 01:06:12PM -0400, Stewart Hildebrand wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
> to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
> guest's (domU) view of this will want to be zero (for now), the host
> having set it to 1 should be preserved, or else we'd effectively be
> giving the domU control of the bit. Thus, PCI_COMMAND register needs
> proper emulation in order to honor host's settings.
> 
> According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
> Device Control" the reset state of the command register is typically 0,
> so when assigning a PCI device use 0 as the initial state for the
> guest's (domU) view of the command register.
> 
> Here is the full list of command register bits with notes about
> PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
> also documented here since that is our current reference implementation
> for PCI passthrough.
> 
> PCI_COMMAND_IO (bit 0)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
>     writes do not propagate to hardware. QEMU sets this bit to 1 in
>     hardware if an I/O BAR is exposed to the guest.
>   Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
>     don't yet support I/O BARs for domUs.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> PCI_COMMAND_MEMORY (bit 1)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
>     writes do not propagate to hardware. QEMU sets this bit to 1 in
>     hardware if a Memory BAR is exposed to the guest.
>   Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
>     regions.
>   Xen domU: For devices assigned to DomUs, memory decoding will be
>     disabled at the time of initialization.
> 
> PCI_COMMAND_MASTER (bit 2)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: Pass through writes to hardware.
>   Xen domU/dom0: Pass through writes to hardware.
> 
> PCI_COMMAND_SPECIAL (bit 3)
>   PCIe 6.1: RO, hardwire to 0
>   PCI LB 3.0: RW
>   QEMU: Pass through writes to hardware.
>   Xen domU/dom0: Pass through writes to hardware.
> 
> PCI_COMMAND_INVALIDATE (bit 4)
>   PCIe 6.1: RO, hardwire to 0
>   PCI LB 3.0: RW
>   QEMU: Pass through writes to hardware.
>   Xen domU/dom0: Pass through writes to hardware.
> 
> PCI_COMMAND_VGA_PALETTE (bit 5)
>   PCIe 6.1: RO, hardwire to 0
>   PCI LB 3.0: RW
>   QEMU: Pass through writes to hardware.
>   Xen domU/dom0: Pass through writes to hardware.
> 
> PCI_COMMAND_PARITY (bit 6)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
>     writes do not propagate to hardware.
>   Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> PCI_COMMAND_WAIT (bit 7)
>   PCIe 6.1: RO, hardwire to 0
>   PCI LB 3.0: hardwire to 0
>   QEMU: res_mask
>   Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> PCI_COMMAND_SERR (bit 8)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
>     writes do not propagate to hardware.
>   Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> PCI_COMMAND_FAST_BACK (bit 9)
>   PCIe 6.1: RO, hardwire to 0
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
>     writes do not propagate to hardware.
>   Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> PCI_COMMAND_INTX_DISABLE (bit 10)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
>     writes do not propagate to hardware. QEMU checks if INTx was mapped
>     for a device. If it is not, then guest can't control
>     PCI_COMMAND_INTX_DISABLE bit.
>   Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> Bits 11-15
>   PCIe 6.1: RsvdP
>   PCI LB 3.0: Reserved
>   QEMU: res_mask
>   Xen domU/dom0: rsvdp_mask
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: There is an unaddressed question for Roger: should we update the
>      guest view of the PCI_COMMAND_INTX_DISABLE bit in
>      msi.c/msix.c:control_write()? See prior discussion at [1].
>      In my opinion, I think we should make sure that hardware state and
>      the guest view are consistent (i.e. don't lie to the guest).
> 
> [1] https://lore.kernel.org/xen-devel/86b25777-788c-4b9a-8166-a6f8174bedc9@suse.com/

I think updating the guest view is helpful in case we need to debug
issues in the guest.

> 
> In v15:
> - add Jan's R-b
> - add blank line after declaration in msi.c:control_write()
> 
> In v14:
> - check for 0->1 transition in INTX_DISABLE-setting logic in
>   msi.c:control_write() to match msix.c:control_write()
> - clear domU-controllable bits in header.c:init_header()
> 
> In v13:
> - Update right away (don't defer) PCI_COMMAND_MEMORY bit in guest_cmd
>   variable in cmd_write()
> - Make comment single line in xen/drivers/vpci/msi.c:control_write()
> - Rearrange memory decoding disabling snippet in init_header()
> 
> In v12:
> - Rework patch using vpci_add_register_mask()
> - Add bitmask #define in pci_regs.h according to PCIe 6.1 spec, except
>   don't add the RO bits because they were RW in PCI LB 3.0 spec.
> - Move and expand TODO comment about properly emulating bits
> - Update guest_cmd in msi.c/msix.c:control_write()
> - Simplify cmd_write(), thanks to rsvdp_mask
> - Update commit description
> 
> In v11:
> - Fix copy-paste mistake: vpci->msi should be vpci->msix
> - Handle PCI_COMMAND_IO
> - Fix condition for disabling INTx in the MSI-X code
> - Show domU changes to only allowed bits
> - Show PCI_COMMAND_MEMORY write only after P2M was altered
> - Update comments in the code
> In v10:
> - Added cf_check attribute to guest_cmd_read
> - Removed warning about non-zero cmd
> - Updated comment MSI code regarding disabling INTX
> - Used ternary operator in vpci_add_register() call
> - Disable memory decoding for DomUs in init_bars()
> In v9:
> - Reworked guest_cmd_read
> - Added handling for more bits
> Since v6:
> - fold guest's logic into cmd_write
> - implement cmd_read, so we can report emulated INTx state to guests
> - introduce header->guest_cmd to hold the emulated state of the
>   PCI_COMMAND register for guests
> Since v5:
> - add additional check for MSI-X enabled while altering INTX bit
> - make sure INTx disabled while guests enable MSI/MSI-X
> Since v3:
> - gate more code on CONFIG_HAS_MSI
> - removed logic for the case when MSI/MSI-X not enabled
> ---
>  xen/drivers/vpci/header.c  | 60 ++++++++++++++++++++++++++++++++++----
>  xen/drivers/vpci/msi.c     |  9 ++++++
>  xen/drivers/vpci/msix.c    |  7 +++++
>  xen/include/xen/pci_regs.h |  1 +
>  xen/include/xen/vpci.h     |  3 ++
>  5 files changed, 74 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 47648c395132..2491dbae8901 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -524,9 +524,21 @@ static void cf_check cmd_write(
>  {
>      struct vpci_header *header = data;
>  
> +    if ( !is_hardware_domain(pdev->domain) )
> +    {
> +        const struct vpci *vpci = pdev->vpci;
> +
> +        if ( (vpci->msi && vpci->msi->enabled) ||
> +             (vpci->msix && vpci->msix->enabled) )
> +            cmd |= PCI_COMMAND_INTX_DISABLE;
> +
> +        header->guest_cmd = cmd;
> +    }
> +
>      /*
>       * Let Dom0 play with all the bits directly except for the memory
> -     * decoding one.
> +     * decoding one. Bits that are not allowed for DomU are already
> +     * handled above and by the rsvdp_mask.
>       */
>      if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) )
>          /*
> @@ -540,6 +552,14 @@ static void cf_check cmd_write(
>          pci_conf_write16(pdev->sbdf, reg, cmd);
>  }
>  
> +static uint32_t cf_check guest_cmd_read(
> +    const struct pci_dev *pdev, unsigned int reg, void *data)
> +{
> +    const struct vpci_header *header = data;
> +
> +    return header->guest_cmd;
> +}
> +
>  static void cf_check bar_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>  {
> @@ -754,9 +774,23 @@ static int cf_check init_header(struct pci_dev *pdev)
>          return -EOPNOTSUPP;
>      }
>  
> -    /* Setup a handler for the command register. */
> -    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
> -                           2, header);
> +    /*
> +     * Setup a handler for the command register.
> +     *
> +     * TODO: If support for emulated bits is added, re-visit how to handle
> +     * PCI_COMMAND_PARITY, PCI_COMMAND_SERR, and PCI_COMMAND_FAST_BACK.
> +     */
> +    rc = vpci_add_register_mask(pdev->vpci,
> +                                is_hwdom ? vpci_hw_read16 : guest_cmd_read,
> +                                cmd_write, PCI_COMMAND, 2, header, 0, 0,
> +                                PCI_COMMAND_RSVDP_MASK |
> +                                    (is_hwdom ? 0
> +                                              : PCI_COMMAND_IO |
> +                                                PCI_COMMAND_PARITY |
> +                                                PCI_COMMAND_WAIT |
> +                                                PCI_COMMAND_SERR |
> +                                                PCI_COMMAND_FAST_BACK),

We want to allow full access to the hw domain and only apply the
PCI_COMMAND_RSVDP_MASK when !is_hwdom in order to keep the current
behavior for dom0.

I don't think it makes a difference in practice, but we are very lax
in explicitly not applying any of such restrictions to dom0.

With that fixed:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v15 2/5] vpci/header: emulate PCI_COMMAND register for guests
  2024-05-22  9:28   ` Roger Pau Monné
@ 2024-05-22  9:35     ` Jan Beulich
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2024-05-22  9:35 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Oleksandr Andrushchenko, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Volodymyr Babchuk,
	Roger Pau Monné

On 22.05.2024 11:28, Roger Pau Monné wrote:
> On Fri, May 17, 2024 at 01:06:12PM -0400, Stewart Hildebrand wrote:
>> @@ -754,9 +774,23 @@ static int cf_check init_header(struct pci_dev *pdev)
>>          return -EOPNOTSUPP;
>>      }
>>  
>> -    /* Setup a handler for the command register. */
>> -    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
>> -                           2, header);
>> +    /*
>> +     * Setup a handler for the command register.
>> +     *
>> +     * TODO: If support for emulated bits is added, re-visit how to handle
>> +     * PCI_COMMAND_PARITY, PCI_COMMAND_SERR, and PCI_COMMAND_FAST_BACK.
>> +     */
>> +    rc = vpci_add_register_mask(pdev->vpci,
>> +                                is_hwdom ? vpci_hw_read16 : guest_cmd_read,
>> +                                cmd_write, PCI_COMMAND, 2, header, 0, 0,
>> +                                PCI_COMMAND_RSVDP_MASK |
>> +                                    (is_hwdom ? 0
>> +                                              : PCI_COMMAND_IO |
>> +                                                PCI_COMMAND_PARITY |
>> +                                                PCI_COMMAND_WAIT |
>> +                                                PCI_COMMAND_SERR |
>> +                                                PCI_COMMAND_FAST_BACK),
> 
> We want to allow full access to the hw domain and only apply the
> PCI_COMMAND_RSVDP_MASK when !is_hwdom in order to keep the current
> behavior for dom0.
> 
> I don't think it makes a difference in practice, but we are very lax
> in explicitly not applying any of such restrictions to dom0.
> 
> With that fixed:
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Makes sense to me, so please feel free to retain my R-b with that adjustment.

Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v15 3/5] vpci: add initial support for virtual PCI bus topology
  2024-05-17 17:06 ` [PATCH v15 3/5] vpci: add initial support for virtual PCI bus topology Stewart Hildebrand
@ 2024-05-22 10:33   ` Roger Pau Monné
  0 siblings, 0 replies; 11+ messages in thread
From: Roger Pau Monné @ 2024-05-22 10:33 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Oleksandr Andrushchenko, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Volodymyr Babchuk

On Fri, May 17, 2024 at 01:06:13PM -0400, Stewart Hildebrand wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Assign SBDF to the PCI devices being passed through with bus 0.
> The resulting topology is where PCIe devices reside on the bus 0 of the
> root complex itself (embedded endpoints).
> This implementation is limited to 32 devices which are allowed on
> a single PCI bus.
> 
> Please note, that at the moment only function 0 of a multifunction
> device can be passed through.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
> ---
> In v15:
> - add Jan's A-b
> In v13:
> - s/depends on/select/ in Kconfig
> - check pdev->sbdf.fn instead of two booleans in add_virtual_device()
> - comment #endifs in sched.h
> - clarify comment about limits in vpci.h with seg/bus limit
> In v11:
> - Fixed code formatting
> - Removed bogus write_unlock() call
> - Fixed type for new_dev_number
> In v10:
> - Removed ASSERT(pcidevs_locked())
> - Removed redundant code (local sbdf variable, clearing sbdf during
> device removal, etc)
> - Added __maybe_unused attribute to "out:" label
> - Introduced HAS_VPCI_GUEST_SUPPORT Kconfig option, as this is the
>   first patch where it is used (previously was in "vpci: add hooks for
>   PCI device assign/de-assign")
> In v9:
> - Lock in add_virtual_device() replaced with ASSERT (thanks, Stewart)
> In v8:
> - Added write lock in add_virtual_device
> Since v6:
> - re-work wrt new locking scheme
> - OT: add ASSERT(pcidevs_write_locked()); to add_virtual_device()
> Since v5:
> - s/vpci_add_virtual_device/add_virtual_device and make it static
> - call add_virtual_device from vpci_assign_device and do not use
>   REGISTER_VPCI_INIT machinery
> - add pcidevs_locked ASSERT
> - use DECLARE_BITMAP for vpci_dev_assigned_map
> Since v4:
> - moved and re-worked guest sbdf initializers
> - s/set_bit/__set_bit
> - s/clear_bit/__clear_bit
> - minor comment fix s/Virtual/Guest/
> - added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
>   later for counting the number of MMIO handlers required for a guest
>   (Julien)
> Since v3:
>  - make use of VPCI_INIT
>  - moved all new code to vpci.c which belongs to it
>  - changed open-coded 31 to PCI_SLOT(~0)
>  - added comments and code to reject multifunction devices with
>    functions other than 0
>  - updated comment about vpci_dev_next and made it unsigned int
>  - implement roll back in case of error while assigning/deassigning devices
>  - s/dom%pd/%pd
> Since v2:
>  - remove casts that are (a) malformed and (b) unnecessary
>  - add new line for better readability
>  - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>     functions are now completely gated with this config
>  - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
> New in v2
> ---
>  xen/drivers/Kconfig     |  4 +++
>  xen/drivers/vpci/vpci.c | 57 +++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/sched.h | 10 +++++++-
>  xen/include/xen/vpci.h  | 12 +++++++++
>  4 files changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
> index db94393f47a6..20050e9bb8b3 100644
> --- a/xen/drivers/Kconfig
> +++ b/xen/drivers/Kconfig
> @@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
>  config HAS_VPCI
>  	bool
>  
> +config HAS_VPCI_GUEST_SUPPORT
> +	bool
> +	select HAS_VPCI
> +
>  endmenu
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 97e115dc5798..23722634d50b 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -40,6 +40,49 @@ extern vpci_register_init_t *const __start_vpci_array[];
>  extern vpci_register_init_t *const __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +static int add_virtual_device(struct pci_dev *pdev)

This seems quite generic, IMO it would better named
`assign_{guest,virtual}_sbdf()` or similar, unless there are plans to
add more code here that's not strictly only about setting the guest
SBDF.

> +{
> +    struct domain *d = pdev->domain;
> +    unsigned int new_dev_number;
> +
> +    if ( is_hardware_domain(d) )
> +        return 0;
> +
> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));

Shouldn't the assert be done before the is_hardware_domain() check, so
that we assert that all possible paths (even those from dom0) have
taken the correct lock?

> +
> +    /*
> +     * Each PCI bus supports 32 devices/slots at max or up to 256 when
> +     * there are multi-function ones which are not yet supported.
> +     */
> +    if ( pdev->sbdf.fn )
> +    {
> +        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
> +                 &pdev->sbdf);
> +        return -EOPNOTSUPP;
> +    }
> +    new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
> +                                         VPCI_MAX_VIRT_DEV);
> +    if ( new_dev_number == VPCI_MAX_VIRT_DEV )
> +        return -ENOSPC;
> +
> +    __set_bit(new_dev_number, &d->vpci_dev_assigned_map);
> +
> +    /*
> +     * Both segment and bus number are 0:
> +     *  - we emulate a single host bridge for the guest, e.g. segment 0
> +     *  - with bus 0 the virtual devices are seen as embedded
> +     *    endpoints behind the root complex
> +     *
> +     * TODO: add support for multi-function devices.
> +     */
> +    pdev->vpci->guest_sbdf = PCI_SBDF(0, 0, new_dev_number, 0);
> +
> +    return 0;
> +}
> +
> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
> +
>  void vpci_deassign_device(struct pci_dev *pdev)
>  {
>      unsigned int i;
> @@ -49,6 +92,12 @@ void vpci_deassign_device(struct pci_dev *pdev)
>      if ( !has_vpci(pdev->domain) || !pdev->vpci )
>          return;
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    if ( pdev->vpci->guest_sbdf.sbdf != ~0 )
> +        __clear_bit(pdev->vpci->guest_sbdf.dev,
> +                    &pdev->domain->vpci_dev_assigned_map);
> +#endif
> +
>      spin_lock(&pdev->vpci->lock);
>      while ( !list_empty(&pdev->vpci->handlers) )
>      {
> @@ -103,6 +152,13 @@ int vpci_assign_device(struct pci_dev *pdev)
>      INIT_LIST_HEAD(&pdev->vpci->handlers);
>      spin_lock_init(&pdev->vpci->lock);
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    pdev->vpci->guest_sbdf.sbdf = ~0;

I think ~0 wants to be in a define here:

#define INVALID_GUEST_SBDF ~0

Or similar.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v15 4/5] xen/arm: translate virtual PCI bus topology for guests
  2024-05-17 17:06 ` [PATCH v15 4/5] xen/arm: translate virtual PCI bus topology for guests Stewart Hildebrand
@ 2024-05-24 13:17   ` Julien Grall
  0 siblings, 0 replies; 11+ messages in thread
From: Julien Grall @ 2024-05-24 13:17 UTC (permalink / raw)
  To: Stewart Hildebrand, xen-devel
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Roger Pau Monné

Hi Stewart,

On 17/05/2024 18:06, Stewart Hildebrand wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> There are three  originators for the PCI configuration space access:
> 1. The domain that owns physical host bridge: MMIO handlers are
> there so we can update vPCI register handlers with the values
> written by the hardware domain, e.g. physical view of the registers
> vs guest's view on the configuration space.
> 2. Guest access to the passed through PCI devices: we need to properly
> map virtual bus topology to the physical one, e.g. pass the configuration
> space access to the corresponding physical devices.
> 3. Emulated host PCI bridge access. It doesn't exist in the physical
> topology, e.g. it can't be mapped to some physical host bridge.
> So, all access to the host bridge itself needs to be trapped and
> emulated.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>

With one remark below, for Arm:

Acked-by: Julien Grall <jgrall@amazon.com>

[...]

> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 23722634d50b..98b294f09688 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -81,6 +81,30 @@ static int add_virtual_device(struct pci_dev *pdev)
>       return 0;
>   }
>   
> +/*
> + * Find the physical device which is mapped to the virtual device
> + * and translate virtual SBDF to the physical one.
> + */
> +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
> +{
> +    const struct pci_dev *pdev;
> +
> +    ASSERT(!is_hardware_domain(d));
> +    ASSERT(rw_is_locked(&d->pci_lock));
> +
> +    for_each_pdev ( d, pdev )

(This doesn't need to be addressed now)


I see that we have other places with for_each_pdev() in the vCPI. So are 
we expecting the list to be smallish?

If not, then we may want to consider reworking the datastructure or put 
a limit on the number of PCI devices assigned.


> +    {
> +        if ( pdev->vpci && (pdev->vpci->guest_sbdf.sbdf == sbdf->sbdf) )
> +        {
> +            /* Replace guest SBDF with the physical one. */
> +            *sbdf = pdev->sbdf;
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>   
>   void vpci_deassign_device(struct pci_dev *pdev)
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 980aded26fc1..7e5a0f0c50c1 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -303,6 +303,18 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v)
>   }
>   #endif
>   
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf);
> +#else
> +static inline bool vpci_translate_virtual_device(const struct domain *d,
> +                                                 pci_sbdf_t *sbdf)
> +{
> +    ASSERT_UNREACHABLE();
> +
> +    return false;
> +}
> +#endif
> +
>   #endif
>   
>   /*

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-05-24 13:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-17 17:06 [PATCH v15 0/5] PCI devices passthrough on Arm, part 3 Stewart Hildebrand
2024-05-17 17:06 ` [PATCH v15 1/5] arm/vpci: honor access size when returning an error Stewart Hildebrand
2024-05-21 13:45   ` Julien Grall
2024-05-17 17:06 ` [PATCH v15 2/5] vpci/header: emulate PCI_COMMAND register for guests Stewart Hildebrand
2024-05-22  9:28   ` Roger Pau Monné
2024-05-22  9:35     ` Jan Beulich
2024-05-17 17:06 ` [PATCH v15 3/5] vpci: add initial support for virtual PCI bus topology Stewart Hildebrand
2024-05-22 10:33   ` Roger Pau Monné
2024-05-17 17:06 ` [PATCH v15 4/5] xen/arm: translate virtual PCI bus topology for guests Stewart Hildebrand
2024-05-24 13:17   ` Julien Grall
2024-05-17 17:06 ` [PATCH v15 5/5] xen/arm: account IO handlers for emulated PCI MSI-X Stewart Hildebrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.