All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/8] Support hiding capability when its initialization fails
@ 2025-07-04  7:07 Jiqian Chen
  2025-07-04  7:07 ` [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0 Jiqian Chen
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: Jiqian Chen @ 2025-07-04  7:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
	Stefano Stabellini

Hi,

This series is to
emulate extended capability list for dom0, including patch #1.
hide legacy and extended capability when its initialization fails, including patch #2, #3, #4.
remove all related registers and other resources when initializing capability fails, including patch #5, #6, #7, #8.

Best regards,
Jiqian Chen.
---
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
Jiqian Chen (8):
  vpci/header: Emulate extended capability list for dom0
  vpci: Refactor REGISTER_VPCI_INIT
  vpci: Hide legacy capability when it fails to initialize
  vpci: Hide extended capability when it fails to initialize
  vpci: Refactor vpci_remove_register to remove matched registers
  vpci/rebar: Free Rebar resources when init_rebar() fails
  vpci/msi: Free MSI resources when init_msi() fails
  vpci/msix: Free MSIX resources when init_msix() fails

 tools/tests/vpci/main.c    |   4 +-
 xen/arch/arm/xen.lds.S     |   3 +-
 xen/arch/ppc/xen.lds.S     |   3 +-
 xen/arch/riscv/xen.lds.S   |   3 +-
 xen/arch/x86/xen.lds.S     |   2 +-
 xen/drivers/vpci/header.c  |  60 +++++---
 xen/drivers/vpci/msi.c     | 111 ++++++++++++---
 xen/drivers/vpci/msix.c    |  63 ++++++++-
 xen/drivers/vpci/rebar.c   |  41 ++++--
 xen/drivers/vpci/vpci.c    | 279 +++++++++++++++++++++++++++++++++----
 xen/include/xen/pci_regs.h |   5 +-
 xen/include/xen/vpci.h     |  38 +++--
 xen/include/xen/xen.lds.h  |   2 +-
 13 files changed, 507 insertions(+), 107 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0
  2025-07-04  7:07 [PATCH v7 0/8] Support hiding capability when its initialization fails Jiqian Chen
@ 2025-07-04  7:07 ` Jiqian Chen
  2025-07-08 14:10   ` Jan Beulich
  2025-07-04  7:07 ` [PATCH v7 2/8] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Jiqian Chen @ 2025-07-04  7:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

Add a new function to emulate extended capability list for dom0,
and call it in init_header(). So that it will be easy to hide a
extended capability whose initialization fails.

As for the extended capability list of domU, just move the logic
into above function and keep hiding it for domU.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v6->v7 changes:
* Change word "guest" to "DomU" in vpci_init_ext_capability_list().
* Change parameter of vpci_init_ext_capability_list() to be const.
* Delete check "if ( !header )" in the while loop of vpci_init_ext_capability_list().
* Change the loop from while to do while in vpci_init_ext_capability_list().

v5->v6 changes:
* Delete unnecessary parameter "ttl" in vpci_init_ext_capability_list()
  since vpci_add_register() can already detect the overlaps.

v4->v5 changes:
* Add check: if capability list of hardware has a overlap, print warning and return 0.

v3->v4 changes:
* Add check "if ( !header )   return 0;" to avoid adding handler for
  device that has no extended capabilities.

v2->v3 changes:
* In vpci_init_ext_capability_list(), when domain is domU, directly return after
  adding a handler(hiding all extended capability for domU).
* In vpci_init_ext_capability_list(), change condition to be "while ( pos >= 0x100U && ttl-- )"
  instead of "while ( pos && ttl-- )".
* Add new function vpci_hw_write32, and pass it to extended capability handler for dom0.

v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c | 44 ++++++++++++++++++++++++++++++++-------
 xen/drivers/vpci/vpci.c   |  6 ++++++
 xen/include/xen/vpci.h    |  2 ++
 3 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index d26cbba08ee1..8ee8052cd4a3 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -836,6 +836,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
                                   PCI_STATUS_RSVDZ_MASK);
 }
 
+static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
+{
+    unsigned int pos = PCI_CFG_SPACE_SIZE;
+
+    if ( !is_hardware_domain(pdev->domain) )
+        /* Extended capabilities read as zero, write ignore for DomU */
+        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                 pos, 4, (void *)0);
+
+    do
+    {
+        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
+        int rc;
+
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
+                               pos, 4, (void *)(uintptr_t)header);
+        if ( rc == -EEXIST )
+        {
+            printk(XENLOG_WARNING
+                   "%pd %pp: overlap in extended cap list, offset %#x\n",
+                   pdev->domain, &pdev->sbdf, pos);
+            return 0;
+        }
+
+        if ( rc )
+            return rc;
+
+        pos = PCI_EXT_CAP_NEXT(header);
+    } while ( pos >= PCI_CFG_SPACE_SIZE );
+
+    return 0;
+}
+
 static int cf_check init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -888,14 +921,9 @@ static int cf_check init_header(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
-    if ( !is_hwdom )
-    {
-        /* Extended capabilities read as zero, write ignore */
-        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL, 0x100, 4,
-                               (void *)0);
-        if ( rc )
-            return rc;
-    }
+    rc = vpci_init_ext_capability_list(pdev);
+    if ( rc )
+        return rc;
 
     if ( pdev->ignore_bars )
         return 0;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 09988f04c27c..8474c0e3b995 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -267,6 +267,12 @@ void cf_check vpci_hw_write16(
     pci_conf_write16(pdev->sbdf, reg, val);
 }
 
+void cf_check vpci_hw_write32(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
+{
+    pci_conf_write32(pdev->sbdf, reg, val);
+}
+
 int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
                            vpci_write_t *write_handler, unsigned int offset,
                            unsigned int size, void *data, uint32_t ro_mask,
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index fc8d5b470b0b..61d16cc8b897 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -80,6 +80,8 @@ void cf_check vpci_hw_write8(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 void cf_check vpci_hw_write16(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
+void cf_check vpci_hw_write32(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 
 /*
  * Check for pending vPCI operations on this vcpu. Returns true if the vcpu
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 2/8] vpci: Refactor REGISTER_VPCI_INIT
  2025-07-04  7:07 [PATCH v7 0/8] Support hiding capability when its initialization fails Jiqian Chen
  2025-07-04  7:07 ` [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0 Jiqian Chen
@ 2025-07-04  7:07 ` Jiqian Chen
  2025-07-08 14:52   ` Jan Beulich
  2025-07-21 14:37   ` Roger Pau Monné
  2025-07-04  7:07 ` [PATCH v7 3/8] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 31+ messages in thread
From: Jiqian Chen @ 2025-07-04  7:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall,
	Stefano Stabellini

Refactor REGISTER_VPCI_INIT to contain more capability specific
information, this will benefit further follow-on changes to hide
capability when initialization fails.

What's more, change the definition of init_header() since it is
not a capability and it is needed for all devices' PCI config space.

After refactor, the "priority" of initializing capabilities isn't
needed anymore, so delete its related codes.

Note:
Call vpci_make_msix_hole() in the end of init_msix() since the change
of sequence of init_header() and init_msix(). And delete the call of
vpci_make_msix_hole() in modify_decoding() since it is not needed.

The cleanup hook is also added in this change, even if it's still
unused. Further changes will make use of it.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
There is a byte alignment problem in the array __start_vpci_array, which can be solved after
"[PATCH] x86: don't have gcc over-align data" is merged.
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
v6->v7 changes:
* Change the pointer parameter of cleanup hook of vpci_capability_t to be const.
  If change parameter of init hook to be const will affect init_msix, and it assigns pdev
  to struct vpci_msix, so keep no const to expanding the impact.
* Delete the vpci_make_msix_hole() call in modify_decoding().
* Change __start_vpci_array from vpci_capability_t* array to vpci_capability_t array.
* Change the name "finit##_t" to be "name##_entry" and add a "name" parameter to macro
  REGISTER_VPCI_CAPABILITY.

v5->v6 changes:
* Rename REGISTER_PCI_CAPABILITY to REGISTER_VPCI_CAPABILITY.
* Move vpci_capability_t entry from ".data.vpci" to ".data.rel.ro.vpci" and
  move the instances of VPCI_ARRAY in the linker scripts before *(.data.rel.ro).
* Change _start/end_vpci_array[] to be const pointer array.

v4->v5 changes:
* Rename REGISTER_VPCI_CAP to REGISTER_PCI_CAPABILITY, rename REGISTER_VPCI_LEGACY_CAP to
  REGISTER_VPCI_CAP, rename REGISTER_VPCI_EXTENDED_CAP to REGISTER_VPCI_EXTCAP.
* Change cleanup hook of vpci_capability_t from void to int.

v3->v4 changes
* Delete the useless trailing dot of section ".data.vpci".
* Add description about priority since this patch removes the initializing priority of
  capabilities and priority is not needed anymore.
* Change the hook name from fini to cleanup.
* Change the name x and y to be finit and fclean.
* Remove the unnecessary check "!capability->init"

v2->v3 changes:
* This is separated from patch "vpci: Hide capability when it fails to initialize" of v2.
* Delete __maybe_unused attribute of "out" in function vpci_assign_devic().
* Rename REGISTER_VPCI_EXTEND_CAP to REGISTER_VPCI_EXTENDED_CAP.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove
  failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/arch/arm/xen.lds.S    |  3 +--
 xen/arch/ppc/xen.lds.S    |  3 +--
 xen/arch/riscv/xen.lds.S  |  3 +--
 xen/arch/x86/xen.lds.S    |  2 +-
 xen/drivers/vpci/header.c | 16 +-------------
 xen/drivers/vpci/msi.c    |  2 +-
 xen/drivers/vpci/msix.c   | 11 +++++++---
 xen/drivers/vpci/rebar.c  |  2 +-
 xen/drivers/vpci/vpci.c   | 44 ++++++++++++++++++++++++++++++---------
 xen/include/xen/vpci.h    | 32 ++++++++++++++++++----------
 xen/include/xen/xen.lds.h |  2 +-
 11 files changed, 71 insertions(+), 49 deletions(-)

diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 5bfbe1e92c1e..9f30c3a13ed1 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -57,6 +57,7 @@ SECTIONS
 
        *(.rodata)
        *(.rodata.*)
+       VPCI_ARRAY
        *(.data.rel.ro)
        *(.data.rel.ro.*)
 
@@ -64,8 +65,6 @@ SECTIONS
        __proc_info_start = .;
        *(.proc.info)
        __proc_info_end = .;
-
-       VPCI_ARRAY
   } :text
 
 #if defined(BUILD_ID)
diff --git a/xen/arch/ppc/xen.lds.S b/xen/arch/ppc/xen.lds.S
index 1366e2819eed..1de0b77fc6b9 100644
--- a/xen/arch/ppc/xen.lds.S
+++ b/xen/arch/ppc/xen.lds.S
@@ -51,11 +51,10 @@ SECTIONS
 
         *(.rodata)
         *(.rodata.*)
+        VPCI_ARRAY
         *(.data.rel.ro)
         *(.data.rel.ro.*)
 
-        VPCI_ARRAY
-
         . = ALIGN(POINTER_ALIGN);
     } :text
 
diff --git a/xen/arch/riscv/xen.lds.S b/xen/arch/riscv/xen.lds.S
index 8c3c06de01f6..edcadff90bfe 100644
--- a/xen/arch/riscv/xen.lds.S
+++ b/xen/arch/riscv/xen.lds.S
@@ -46,11 +46,10 @@ SECTIONS
 
         *(.rodata)
         *(.rodata.*)
+        VPCI_ARRAY
         *(.data.rel.ro)
         *(.data.rel.ro.*)
 
-        VPCI_ARRAY
-
         . = ALIGN(POINTER_ALIGN);
     } :text
 
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 636c7768aa3c..8e9cac75b09e 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -135,6 +135,7 @@ SECTIONS
 
        *(.rodata)
        *(.rodata.*)
+       VPCI_ARRAY
        *(.data.rel.ro)
        *(.data.rel.ro.*)
 
@@ -148,7 +149,6 @@ SECTIONS
        *(.note.gnu.build-id)
        __note_gnu_build_id_end = .;
 #endif
-       VPCI_ARRAY
   } PHDR(text)
 
 #if defined(CONFIG_PVH_GUEST) && !defined(EFI)
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 8ee8052cd4a3..069253b5f721 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -122,19 +122,6 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
     bool map = cmd & PCI_COMMAND_MEMORY;
     unsigned int i;
 
-    /*
-     * Make sure there are no mappings in the MSIX MMIO areas, so that accesses
-     * can be trapped (and emulated) by Xen when the memory decoding bit is
-     * enabled.
-     *
-     * FIXME: punching holes after the p2m has been set up might be racy for
-     * DomU usage, needs to be revisited.
-     */
-#ifdef CONFIG_HAS_PCI_MSI
-    if ( map && !rom_only && vpci_make_msix_hole(pdev) )
-        return;
-#endif
-
     for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
     {
         struct vpci_bar *bar = &header->bars[i];
@@ -869,7 +856,7 @@ static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
     return 0;
 }
 
-static int cf_check init_header(struct pci_dev *pdev)
+int vpci_init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
     uint64_t addr, size;
@@ -1065,7 +1052,6 @@ static int cf_check init_header(struct pci_dev *pdev)
     pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
     return rc;
 }
-REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 66e5a8a116be..c3eba4e14870 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_CAP(MSI, init_msi, NULL);
 
 void vpci_dump_msi(void)
 {
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 74211301ba10..a1692b9d9f6a 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -589,7 +589,8 @@ static const struct hvm_mmio_ops vpci_msix_table_ops = {
     .write = msix_write,
 };
 
-int vpci_make_msix_hole(const struct pci_dev *pdev)
+/* Make sure there's a hole in the p2m for the MSIX mmio areas. */
+static int vpci_make_msix_hole(const struct pci_dev *pdev)
 {
     struct domain *d = pdev->domain;
     unsigned int i;
@@ -703,9 +704,13 @@ static int cf_check init_msix(struct pci_dev *pdev)
     pdev->vpci->msix = msix;
     list_add(&msix->next, &d->arch.hvm.msix_tables);
 
-    return 0;
+    spin_lock(&pdev->vpci->lock);
+    rc = vpci_make_msix_hole(pdev);
+    spin_unlock(&pdev->vpci->lock);
+
+    return rc;
 }
-REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
+REGISTER_VPCI_CAP(MSIX, init_msix, NULL);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
index 793937449af7..3c18792d9bcd 100644
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_EXTCAP(REBAR, init_rebar, NULL);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 8474c0e3b995..e7e5b64f1be4 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -36,8 +36,8 @@ struct vpci_register {
 };
 
 #ifdef __XEN__
-extern vpci_register_init_t *const __start_vpci_array[];
-extern vpci_register_init_t *const __end_vpci_array[];
+extern const vpci_capability_t __start_vpci_array[];
+extern const vpci_capability_t __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
@@ -83,6 +83,32 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
 
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
+static int vpci_init_capabilities(struct pci_dev *pdev)
+{
+    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        const vpci_capability_t *capability = &__start_vpci_array[i];
+        const unsigned int cap = capability->id;
+        const bool is_ext = capability->is_ext;
+        unsigned int pos = 0;
+        int rc;
+
+        if ( !is_ext )
+            pos = pci_find_cap_offset(pdev->sbdf, cap);
+        else if ( is_hardware_domain(pdev->domain) )
+            pos = pci_find_ext_capability(pdev->sbdf, cap);
+
+        if ( !pos )
+            continue;
+
+        rc = capability->init(pdev);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 void vpci_deassign_device(struct pci_dev *pdev)
 {
     unsigned int i;
@@ -128,7 +154,6 @@ void vpci_deassign_device(struct pci_dev *pdev)
 
 int vpci_assign_device(struct pci_dev *pdev)
 {
-    unsigned int i;
     const unsigned long *ro_map;
     int rc = 0;
 
@@ -159,14 +184,13 @@ int vpci_assign_device(struct pci_dev *pdev)
         goto out;
 #endif
 
-    for ( i = 0; i < NUM_VPCI_INIT; i++ )
-    {
-        rc = __start_vpci_array[i](pdev);
-        if ( rc )
-            break;
-    }
+    rc = vpci_init_header(pdev);
+    if ( rc )
+        goto out;
+
+    rc = vpci_init_capabilities(pdev);
 
- out: __maybe_unused;
+ out:
     if ( rc )
         vpci_deassign_device(pdev);
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 61d16cc8b897..61287e5d2e12 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -13,11 +13,12 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
 typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
                           uint32_t val, void *data);
 
-typedef int vpci_register_init_t(struct pci_dev *dev);
-
-#define VPCI_PRIORITY_HIGH      "1"
-#define VPCI_PRIORITY_MIDDLE    "5"
-#define VPCI_PRIORITY_LOW       "9"
+typedef struct {
+    unsigned int id;
+    bool is_ext;
+    int (* init)(struct pci_dev *pdev);
+    int (* cleanup)(const struct pci_dev *pdev);
+} vpci_capability_t;
 
 #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
 
@@ -29,9 +30,21 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
  */
 #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
 
-#define REGISTER_VPCI_INIT(x, p)                \
-  static vpci_register_init_t *const x##_entry  \
-               __used_section(".data.vpci." p) = (x)
+#define REGISTER_VPCI_CAPABILITY(cap, name, finit, fclean, ext) \
+    static const vpci_capability_t name##_entry \
+        __used_section(".data.rel.ro.vpci") = { \
+        .id = (cap), \
+        .init = (finit), \
+        .cleanup = (fclean), \
+        .is_ext = (ext), \
+    }
+
+#define REGISTER_VPCI_CAP(name, finit, fclean) \
+    REGISTER_VPCI_CAPABILITY(PCI_CAP_ID_##name, name, finit, fclean, false)
+#define REGISTER_VPCI_EXTCAP(name, finit, fclean) \
+    REGISTER_VPCI_CAPABILITY(PCI_EXT_CAP_ID_##name, name, finit, fclean, true)
+
+int __must_check vpci_init_header(struct pci_dev *pdev);
 
 /* Assign vPCI to device by adding handlers. */
 int __must_check vpci_assign_device(struct pci_dev *pdev);
@@ -206,9 +219,6 @@ struct vpci_vcpu {
 #ifdef __XEN__
 void vpci_dump_msi(void);
 
-/* Make sure there's a hole in the p2m for the MSIX mmio areas. */
-int vpci_make_msix_hole(const struct pci_dev *pdev);
-
 /* Arch-specific vPCI MSI helpers. */
 void vpci_msi_arch_mask(struct vpci_msi *msi, const struct pci_dev *pdev,
                         unsigned int entry, bool mask);
diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
index 793d0e11450c..eb86305c11c7 100644
--- a/xen/include/xen/xen.lds.h
+++ b/xen/include/xen/xen.lds.h
@@ -188,7 +188,7 @@
 #define VPCI_ARRAY               \
        . = ALIGN(POINTER_ALIGN); \
        __start_vpci_array = .;   \
-       *(SORT(.data.vpci.*))     \
+       *(.data.rel.ro.vpci)           \
        __end_vpci_array = .;
 #else
 #define VPCI_ARRAY
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 3/8] vpci: Hide legacy capability when it fails to initialize
  2025-07-04  7:07 [PATCH v7 0/8] Support hiding capability when its initialization fails Jiqian Chen
  2025-07-04  7:07 ` [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0 Jiqian Chen
  2025-07-04  7:07 ` [PATCH v7 2/8] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
@ 2025-07-04  7:07 ` Jiqian Chen
  2025-07-21 15:48   ` Roger Pau Monné
  2025-07-04  7:07 ` [PATCH v7 4/8] vpci: Hide extended " Jiqian Chen
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Jiqian Chen @ 2025-07-04  7:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When vpci fails to initialize a legacy capability of device, it just
returns an error and vPCI gets disabled for the whole device.  That
most likely renders the device unusable, plus possibly causing issues
to Xen itself if guest attempts to program the native MSI or MSI-X
capabilities if present.

So, add new function to hide legacy capability when initialization
fails. And remove the failed legacy capability from the vpci emulated
legacy capability list.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v6->v7 changes:
* Change the pointer parameter of vpci_get_register(),
  vpci_get_previous_cap_register() and vpci_capability_hide() to be const.

v5->v6 changes:
* Rename parameter rm to r in vpci_get_register().
* Use for loop to compact the code of vpci_get_previous_cap_register().
* Rename prev_next_r to prev_r in vpci_capability_hide(().
* Add printing when cap init, cleanup and hide fail.

v4->v5 changes:
* Modify vpci_get_register() to delete some unnecessary check, so that
  I don't need to move function vpci_register_cmp().
* Rename vpci_capability_mask() to vpci_capability_hide().

v3->v4 changes:
* Modify the commit message.
* In function vpci_get_previous_cap_register(), add an ASSERT_UNREACHABLE() if offset below 0x40.
* Modify vpci_capability_mask() to return error instead of using ASSERT.
* Use vpci_remove_register to remove PCI_CAP_LIST_ID register instead of open code.
* Add check "if ( !offset )" in vpci_capability_mask().

v2->v3 changes:
* Separated from the last version patch "vpci: Hide capability when it fails to initialize"
* Whole implementation changed because last version is wrong.
  This version adds a new helper function vpci_get_register() and uses it to get
  target handler and previous handler from vpci->handlers, then remove the target.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
  remove failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/vpci.c | 109 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 108 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index e7e5b64f1be4..a91c3d4a1415 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -83,6 +83,88 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
 
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
+static struct vpci_register *vpci_get_register(const struct vpci *vpci,
+                                               unsigned int offset,
+                                               unsigned int size)
+{
+    struct vpci_register *r;
+
+    ASSERT(spin_is_locked(&vpci->lock));
+
+    list_for_each_entry ( r, &vpci->handlers, node )
+    {
+        if ( r->offset == offset && r->size == size )
+            return r;
+
+        if ( offset <= r->offset )
+            break;
+    }
+
+    return NULL;
+}
+
+static struct vpci_register *vpci_get_previous_cap_register(
+    const struct vpci *vpci, unsigned int offset)
+{
+    uint32_t next;
+    struct vpci_register *r;
+
+    if ( offset < 0x40 )
+    {
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    for ( r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1); r;
+          r = next >= 0x40 ? vpci_get_register(vpci,
+                                               next + PCI_CAP_LIST_NEXT, 1)
+                           : NULL )
+    {
+        next = (uint32_t)(uintptr_t)r->private;
+        ASSERT(next == (uintptr_t)r->private);
+        if ( next == offset )
+            break;
+    }
+
+    return r;
+}
+
+static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
+{
+    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
+    struct vpci_register *prev_r, *next_r;
+    struct vpci *vpci = pdev->vpci;
+
+    if ( !offset )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&vpci->lock);
+    prev_r = vpci_get_previous_cap_register(vpci, offset);
+    next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
+    if ( !prev_r || !next_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    prev_r->private = next_r->private;
+    /*
+     * Not calling vpci_remove_register() here is to avoid redoing
+     * the register search
+     */
+    list_del(&next_r->node);
+    spin_unlock(&vpci->lock);
+    xfree(next_r);
+
+    if ( !is_hardware_domain(pdev->domain) )
+        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
+
+    return 0;
+}
+
 static int vpci_init_capabilities(struct pci_dev *pdev)
 {
     for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
@@ -103,7 +185,32 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
 
         rc = capability->init(pdev);
         if ( rc )
-            return rc;
+        {
+            const char *type = is_ext ? "extended" : "legacy";
+
+            printk(XENLOG_WARNING "%pd %pp: init %s cap %u fail rc=%d, mask it\n",
+                   pdev->domain, &pdev->sbdf, type, cap, rc);
+
+            if ( capability->cleanup )
+            {
+                rc = capability->cleanup(pdev);
+                if ( rc )
+                {
+                    printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
+                           pdev->domain, &pdev->sbdf, type, cap, rc);
+                    return rc;
+                }
+            }
+
+            if ( !is_ext )
+                rc = vpci_capability_hide(pdev, cap);
+            if ( rc )
+            {
+                printk(XENLOG_ERR "%pd %pp: hide %s cap %u fail rc=%d\n",
+                       pdev->domain, &pdev->sbdf, type, cap, rc);
+                return rc;
+            }
+        }
     }
 
     return 0;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 4/8] vpci: Hide extended capability when it fails to initialize
  2025-07-04  7:07 [PATCH v7 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (2 preceding siblings ...)
  2025-07-04  7:07 ` [PATCH v7 3/8] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
@ 2025-07-04  7:07 ` Jiqian Chen
  2025-07-21 16:04   ` Roger Pau Monné
  2025-07-04  7:08 ` [PATCH v7 5/8] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Jiqian Chen @ 2025-07-04  7:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall,
	Stefano Stabellini

When vpci fails to initialize a extended capability of device, it
just returns an error and vPCI gets disabled for the whole device.

So, add function to hide extended capability when initialization
fails. And remove the failed extended capability handler from vpci
extended capability list.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
Comment left over for the situation that "capability in 0x100 can't be removed case"
from Jan in last version, need Roger's input.

Jan:
Can we rely on OSes recognizing ID 0 as "just skip"?
Since the size isn't encoded in the header, there might be issues lurking here.
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
v6->v7 changes:
* Change the pointer parameter of vpci_get_previous_ext_cap_register()
  and vpci_ext_capability_hide() to be const.

v5->v6 changes:
* Change to use for loop to compact code of vpci_get_previous_ext_cap_register().
* Rename parameter rm to r in vpci_ext_capability_hide().
* Change comment to describ the case that hide capability of position
  0x100U.

v4->v5 changes:
* Modify the hex digits of PCI_EXT_CAP_NEXT_MASK and PCI_EXT_CAP_NEXT to be low case.
* Rename vpci_ext_capability_mask to vpci_ext_capability_hide.

v3->v4 changes:
* Change definition of PCI_EXT_CAP_NEXT to be "#define PCI_EXT_CAP_NEXT(header)
  (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xFFCU)" to avoid redundancy.
* Modify the commit message.
* Change vpci_ext_capability_mask() to return error instead of using ASSERT.
* Set the capability ID part to be zero when we need to hide the capability of position 0x100U.
* Add check "if ( !offset )" in vpci_ext_capability_mask().

v2->v3 changes:
* Separated from the last version patch "vpci: Hide capability when it fails to initialize".
* Whole implementation changed because last version is wrong.
  This version gets target handler and previous handler from vpci->handlers, then remove the target.
* Note: a case in function vpci_ext_capability_mask() needs to be discussed,
  because it may change the offset of next capability when the offset of target
  capability is 0x100U(the first extended capability), my implementation is just to
  ignore and let hardware to handle the target capability.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
  remove failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/vpci.c    | 88 ++++++++++++++++++++++++++++++++++++++
 xen/include/xen/pci_regs.h |  5 ++-
 2 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index a91c3d4a1415..8be4b53533a3 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -165,6 +165,92 @@ static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
     return 0;
 }
 
+static struct vpci_register *vpci_get_previous_ext_cap_register(
+    const struct vpci *vpci, unsigned int offset)
+{
+    unsigned int pos = PCI_CFG_SPACE_SIZE;
+    struct vpci_register *r;
+
+    if ( offset <= PCI_CFG_SPACE_SIZE )
+    {
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    for ( r = vpci_get_register(vpci, pos, 4); r;
+          r = pos > PCI_CFG_SPACE_SIZE ? vpci_get_register(vpci, pos, 4)
+                                       : NULL )
+    {
+        uint32_t header = (uint32_t)(uintptr_t)r->private;
+
+        ASSERT(header == (uintptr_t)r->private);
+
+        pos = PCI_EXT_CAP_NEXT(header);
+        if ( pos == offset )
+            break;
+    }
+
+    return r;
+}
+
+static int vpci_ext_capability_hide(
+    const struct pci_dev *pdev, unsigned int cap)
+{
+    const unsigned int offset = pci_find_ext_capability(pdev->sbdf, cap);
+    struct vpci_register *r, *prev_r;
+    struct vpci *vpci = pdev->vpci;
+    uint32_t header, pre_header;
+
+    if ( offset < PCI_CFG_SPACE_SIZE )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&vpci->lock);
+    r = vpci_get_register(vpci, offset, 4);
+    if ( !r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    header = (uint32_t)(uintptr_t)r->private;
+    if ( offset == PCI_CFG_SPACE_SIZE )
+    {
+        if ( PCI_EXT_CAP_NEXT(header) <= PCI_CFG_SPACE_SIZE )
+            r->private = (void *)(uintptr_t)0;
+        else
+            /*
+             * The first extended capability (0x100) can not be removed from
+             * the linked list, so instead mask its capability ID to return 0
+             * and force OSes to skip it.
+             */
+            r->private = (void *)(uintptr_t)(header & ~PCI_EXT_CAP_ID(header));
+
+        spin_unlock(&vpci->lock);
+        return 0;
+    }
+
+    prev_r = vpci_get_previous_ext_cap_register(vpci, offset);
+    if ( !prev_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    pre_header = (uint32_t)(uintptr_t)prev_r->private;
+    pre_header &= ~PCI_EXT_CAP_NEXT_MASK;
+    pre_header |= header & PCI_EXT_CAP_NEXT_MASK;
+    prev_r->private = (void *)(uintptr_t)pre_header;
+
+    list_del(&r->node);
+    spin_unlock(&vpci->lock);
+    xfree(r);
+
+    return 0;
+}
+
 static int vpci_init_capabilities(struct pci_dev *pdev)
 {
     for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
@@ -204,6 +290,8 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
 
             if ( !is_ext )
                 rc = vpci_capability_hide(pdev, cap);
+            else
+                rc = vpci_ext_capability_hide(pdev, cap);
             if ( rc )
             {
                 printk(XENLOG_ERR "%pd %pp: hide %s cap %u fail rc=%d\n",
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index 27b4f44eedf3..3b6963133dbd 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -448,7 +448,10 @@
 /* Extended Capabilities (PCI-X 2.0 and Express) */
 #define PCI_EXT_CAP_ID(header)		((header) & 0x0000ffff)
 #define PCI_EXT_CAP_VER(header)		(((header) >> 16) & 0xf)
-#define PCI_EXT_CAP_NEXT(header)	(((header) >> 20) & 0xffc)
+#define PCI_EXT_CAP_NEXT_MASK		0xfff00000U
+/* Bottom two bits of next capability position are reserved. */
+#define PCI_EXT_CAP_NEXT(header) \
+    (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xffcU)
 
 #define PCI_EXT_CAP_ID_ERR	1
 #define PCI_EXT_CAP_ID_VC	2
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 5/8] vpci: Refactor vpci_remove_register to remove matched registers
  2025-07-04  7:07 [PATCH v7 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (3 preceding siblings ...)
  2025-07-04  7:07 ` [PATCH v7 4/8] vpci: Hide extended " Jiqian Chen
@ 2025-07-04  7:08 ` Jiqian Chen
  2025-07-04  7:08 ` [PATCH v7 6/8] vpci/rebar: Free Rebar resources when init_rebar() fails Jiqian Chen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Jiqian Chen @ 2025-07-04  7:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Anthony PERARD

vpci_remove_register() only supports removing a register in a time,
but the follow-on changes need to remove all registers within a range.
So, refactor it to support removing all registers in a given region.

And it is no issue to remove a non exist register, so remove the
__must_check prefix.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
---
v6->v7 changes:
No.

v5->v6 changes:
* Modify commit message.
* Add Roger's Reviewed-by.

v4->v5 changes:
No.

v3->v4 changes:
* Use list_for_each_entry_safe instead of list_for_each_entry.
* Return ERANGE if overlap.

v2->v3 changes:
* Add new check to return error if registers overlap but not inside range.

v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 tools/tests/vpci/main.c |  4 ++--
 xen/drivers/vpci/vpci.c | 38 ++++++++++++++++++++------------------
 xen/include/xen/vpci.h  |  4 ++--
 3 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
index 33223db3eb77..ca72877d60cd 100644
--- a/tools/tests/vpci/main.c
+++ b/tools/tests/vpci/main.c
@@ -132,10 +132,10 @@ static void vpci_write32_mask(const struct pci_dev *pdev, unsigned int reg,
                                   rsvdz_mask))
 
 #define VPCI_REMOVE_REG(off, size)                                          \
-    assert(!vpci_remove_register(test_pdev.vpci, off, size))
+    assert(!vpci_remove_registers(test_pdev.vpci, off, size))
 
 #define VPCI_REMOVE_INVALID_REG(off, size)                                  \
-    assert(vpci_remove_register(test_pdev.vpci, off, size))
+    assert(vpci_remove_registers(test_pdev.vpci, off, size))
 
 /* Read a 32b register using all possible sizes. */
 void multiread4_check(unsigned int reg, uint32_t val)
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 8be4b53533a3..71ba30bbbd6b 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -152,7 +152,7 @@ static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
 
     prev_r->private = next_r->private;
     /*
-     * Not calling vpci_remove_register() here is to avoid redoing
+     * Not calling vpci_remove_registers() here is to avoid redoing
      * the register search
      */
     list_del(&next_r->node);
@@ -160,7 +160,7 @@ static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
     xfree(next_r);
 
     if ( !is_hardware_domain(pdev->domain) )
-        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
+        return vpci_remove_registers(vpci, offset + PCI_CAP_LIST_ID, 1);
 
     return 0;
 }
@@ -553,34 +553,36 @@ int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
     return 0;
 }
 
-int vpci_remove_register(struct vpci *vpci, unsigned int offset,
-                         unsigned int size)
+int vpci_remove_registers(struct vpci *vpci, unsigned int start,
+                          unsigned int size)
 {
-    const struct vpci_register r = { .offset = offset, .size = size };
-    struct vpci_register *rm;
+    struct vpci_register *rm, *tmp;
+    unsigned int end = start + size;
 
     spin_lock(&vpci->lock);
-    list_for_each_entry ( rm, &vpci->handlers, node )
+    list_for_each_entry_safe ( rm, tmp, &vpci->handlers, node )
     {
-        int cmp = vpci_register_cmp(&r, rm);
-
-        /*
-         * NB: do not use a switch so that we can use break to
-         * get out of the list loop earlier if required.
-         */
-        if ( !cmp && rm->offset == offset && rm->size == size )
+        /* Remove rm if rm is inside the range. */
+        if ( rm->offset >= start && rm->offset + rm->size <= end )
         {
             list_del(&rm->node);
-            spin_unlock(&vpci->lock);
             xfree(rm);
-            return 0;
+            continue;
         }
-        if ( cmp <= 0 )
+
+        /* Return error if registers overlap but not inside. */
+        if ( rm->offset + rm->size > start && rm->offset < end )
+        {
+            spin_unlock(&vpci->lock);
+            return -ERANGE;
+        }
+
+        if ( start < rm->offset )
             break;
     }
     spin_unlock(&vpci->lock);
 
-    return -ENOENT;
+    return 0;
 }
 
 /* Wrappers for performing reads/writes to the underlying hardware. */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 61287e5d2e12..91bb407c728c 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -70,8 +70,8 @@ static inline int __must_check vpci_add_register(struct vpci *vpci,
                                   size, data, 0, 0, 0, 0);
 }
 
-int __must_check vpci_remove_register(struct vpci *vpci, unsigned int offset,
-                                      unsigned int size);
+int vpci_remove_registers(struct vpci *vpci, unsigned int start,
+                          unsigned int size);
 
 /* Generic read/write handlers for the PCI config space. */
 uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 6/8] vpci/rebar: Free Rebar resources when init_rebar() fails
  2025-07-04  7:07 [PATCH v7 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (4 preceding siblings ...)
  2025-07-04  7:08 ` [PATCH v7 5/8] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
@ 2025-07-04  7:08 ` Jiqian Chen
  2025-07-21 16:08   ` Roger Pau Monné
  2025-07-04  7:08 ` [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
  2025-07-04  7:08 ` [PATCH v7 8/8] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
  7 siblings, 1 reply; 31+ messages in thread
From: Jiqian Chen @ 2025-07-04  7:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_rebar() fails, current logic return fail and free Rebar-related
resources in vpci_deassign_device(). But the previous new changes will
hide Rebar capability and return success, it can't reach
vpci_deassign_device() to remove resources if hiding success, so those
resources must be removed in cleanup function of Rebar.

To do that, implement cleanup function for Rebar.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v6->v7 changes:
* Change the pointer parameter of cleanup_rebar() to be const.
* Print error when vpci_remove_registers() fail in cleanup_rebar().

v5->v6 changes:
No.

v4->v5 changes:
* Change definition "static void cleanup_rebar" to "static int cf_check cleanup_rebar" since cleanup hook is changed to be int.

v3->v4 changes:
* Change function name from fini_rebar() to cleanup_rebar().
* Change the error number to be E2BIG and ENXIO in init_rebar().

v2->v3 changes:
* Use fini_rebar() to remove all register instead of in the failure path of init_rebar();

v1->v2 changes:
* Called vpci_remove_registers() to remove all possible registered registers instead of using a array to record all registered register.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/rebar.c | 41 +++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
index 3c18792d9bcd..39ab9c2eb7d9 100644
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -49,6 +49,32 @@ static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
     bar->guest_addr = bar->addr;
 }
 
+static int cf_check cleanup_rebar(const struct pci_dev *pdev)
+{
+    int rc;
+    uint32_t ctrl;
+    unsigned int nbars;
+    unsigned int rebar_offset = pci_find_ext_capability(pdev->sbdf,
+                                                        PCI_EXT_CAP_ID_REBAR);
+
+    if ( !rebar_offset || !is_hardware_domain(pdev->domain) )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    ctrl = pci_conf_read32(pdev->sbdf, rebar_offset + PCI_REBAR_CTRL(0));
+    nbars = MASK_EXTR(ctrl, PCI_REBAR_CTRL_NBAR_MASK);
+
+    rc = vpci_remove_registers(pdev->vpci, rebar_offset + PCI_REBAR_CAP(0),
+                               PCI_REBAR_CTRL(nbars - 1));
+    if ( rc )
+        printk(XENLOG_ERR "%pd %pp: fail to remove Rebar handlers rc=%d\n",
+               pdev->domain, &pdev->sbdf, rc);
+
+    return rc;
+}
+
 static int cf_check init_rebar(struct pci_dev *pdev)
 {
     uint32_t ctrl;
@@ -80,7 +106,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: too big BAR number %u in REBAR_CTRL\n",
                    pdev->domain, &pdev->sbdf, index);
-            continue;
+            return -E2BIG;
         }
 
         bar = &pdev->vpci->header.bars[index];
@@ -88,7 +114,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: BAR%u is not in memory space\n",
                    pdev->domain, &pdev->sbdf, index);
-            continue;
+            return -ENXIO;
         }
 
         rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rebar_ctrl_write,
@@ -97,14 +123,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: BAR%u fail to add reg of REBAR_CTRL rc=%d\n",
                    pdev->domain, &pdev->sbdf, index, rc);
-            /*
-             * Ideally we would hide the ReBar capability on error, but code
-             * for doing so still needs to be written. Use continue instead
-             * to keep any already setup register hooks, as returning an
-             * error will cause the hardware domain to get unmediated access
-             * to all device registers.
-             */
-            continue;
+            return rc;
         }
 
         bar->resizable_sizes =
@@ -118,7 +137,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_EXTCAP(REBAR, init_rebar, NULL);
+REGISTER_VPCI_EXTCAP(REBAR, init_rebar, cleanup_rebar);
 
 /*
  * Local variables:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-07-04  7:07 [PATCH v7 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (5 preceding siblings ...)
  2025-07-04  7:08 ` [PATCH v7 6/8] vpci/rebar: Free Rebar resources when init_rebar() fails Jiqian Chen
@ 2025-07-04  7:08 ` Jiqian Chen
  2025-07-08 15:13   ` Jan Beulich
  2025-07-21 16:21   ` Roger Pau Monné
  2025-07-04  7:08 ` [PATCH v7 8/8] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
  7 siblings, 2 replies; 31+ messages in thread
From: Jiqian Chen @ 2025-07-04  7:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_msi() fails, current logic return fail and free MSI-related
resources in vpci_deassign_device(). But the previous new changes will
hide MSI capability and return success, it can't reach
vpci_deassign_device() to remove resources if hiding success, so those
resources must be removed in cleanup function of MSI.

To do that, implement cleanup function for MSI.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v6->v7 changes:
* Change the pointer parameter of cleanup_msi() to be const.
* When vpci_remove_registers() in cleanup_msi() fails, not to return
  directly, instead try to free msi and re-add ctrl handler.
* Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msi in
  init_msi() since we need that every handler realize that msi is NULL
  when msi is free but handlers are still in there.

v5->v6 changes:
No.

v4->v5 changes:
* Change definition "static void cleanup_msi" to "static int cf_check cleanup_msi"
  since cleanup hook is changed to be int.
* Add a read-only register for MSI Control Register in the end of cleanup_msi.

v3->v4 changes:
* Change function name from fini_msi() to cleanup_msi().
* Remove unnecessary comment.
* Change to use XFREE to free vpci->msi.

v2->v3 changes:
* Remove all fail path, and use fini_msi() hook instead.
* Change the method to calculating the size of msi registers.

v1->v2 changes:
* Added a new function fini_msi to free all MSI resources instead of using an array
  to record registered registers.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/msi.c | 111 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 94 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index c3eba4e14870..09b91a685df5 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -25,7 +25,11 @@
 static uint32_t cf_check control_read(
     const struct pci_dev *pdev, unsigned int reg, void *data)
 {
-    const struct vpci_msi *msi = data;
+    const struct vpci *vpci = data;
+    const struct vpci_msi *msi = vpci->msi;
+
+    if ( !msi )
+        return pci_conf_read16(pdev->sbdf, reg);
 
     return MASK_INSR(fls(pdev->msi_maxvec) - 1, PCI_MSI_FLAGS_QMASK) |
            MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE) |
@@ -37,12 +41,16 @@ static uint32_t cf_check control_read(
 static void cf_check control_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
-    struct vpci_msi *msi = data;
+    struct vpci *vpci = data;
+    struct vpci_msi *msi = vpci->msi;
     unsigned int vectors = min_t(uint8_t,
                                  1u << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE),
                                  pdev->msi_maxvec);
     bool new_enabled = val & PCI_MSI_FLAGS_ENABLE;
 
+    if ( !msi )
+        return;
+
     /*
      * No change if the enable field and the number of vectors is
      * the same or the device is not enabled, in which case the
@@ -101,7 +109,11 @@ static void update_msi(const struct pci_dev *pdev, struct vpci_msi *msi)
 static uint32_t cf_check address_read(
     const struct pci_dev *pdev, unsigned int reg, void *data)
 {
-    const struct vpci_msi *msi = data;
+    const struct vpci *vpci = data;
+    const struct vpci_msi *msi = vpci->msi;
+
+    if ( !msi )
+        return ~(uint32_t)0;
 
     return msi->address;
 }
@@ -109,7 +121,11 @@ static uint32_t cf_check address_read(
 static void cf_check address_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
-    struct vpci_msi *msi = data;
+    struct vpci *vpci = data;
+    struct vpci_msi *msi = vpci->msi;
+
+    if ( !msi )
+        return;
 
     /* Clear low part. */
     msi->address &= ~0xffffffffULL;
@@ -122,7 +138,11 @@ static void cf_check address_write(
 static uint32_t cf_check address_hi_read(
     const struct pci_dev *pdev, unsigned int reg, void *data)
 {
-    const struct vpci_msi *msi = data;
+    const struct vpci *vpci = data;
+    const struct vpci_msi *msi = vpci->msi;
+
+    if ( !msi )
+        return ~(uint32_t)0;
 
     return msi->address >> 32;
 }
@@ -130,7 +150,11 @@ static uint32_t cf_check address_hi_read(
 static void cf_check address_hi_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
-    struct vpci_msi *msi = data;
+    struct vpci *vpci = data;
+    struct vpci_msi *msi = vpci->msi;
+
+    if ( !msi )
+        return;
 
     /* Clear and update high part. */
     msi->address  = (uint32_t)msi->address;
@@ -143,7 +167,11 @@ static void cf_check address_hi_write(
 static uint32_t cf_check data_read(
     const struct pci_dev *pdev, unsigned int reg, void *data)
 {
-    const struct vpci_msi *msi = data;
+    const struct vpci *vpci = data;
+    const struct vpci_msi *msi = vpci->msi;
+
+    if ( !msi )
+        return ~(uint32_t)0;
 
     return msi->data;
 }
@@ -151,7 +179,11 @@ static uint32_t cf_check data_read(
 static void cf_check data_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
-    struct vpci_msi *msi = data;
+    struct vpci *vpci = data;
+    struct vpci_msi *msi = vpci->msi;
+
+    if ( !msi )
+        return;
 
     msi->data = val;
 
@@ -162,7 +194,11 @@ static void cf_check data_write(
 static uint32_t cf_check mask_read(
     const struct pci_dev *pdev, unsigned int reg, void *data)
 {
-    const struct vpci_msi *msi = data;
+    const struct vpci *vpci = data;
+    const struct vpci_msi *msi = vpci->msi;
+
+    if ( !msi )
+        return ~(uint32_t)0;
 
     return msi->mask;
 }
@@ -170,9 +206,14 @@ static uint32_t cf_check mask_read(
 static void cf_check mask_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
-    struct vpci_msi *msi = data;
-    uint32_t dmask = msi->mask ^ val;
+    struct vpci *vpci = data;
+    struct vpci_msi *msi = vpci->msi;
+    uint32_t dmask;
+
+    if ( !msi )
+        return;
 
+    dmask = msi->mask ^ val;
     if ( !dmask )
         return;
 
@@ -193,6 +234,42 @@ static void cf_check mask_write(
     msi->mask = val;
 }
 
+static int cf_check cleanup_msi(const struct pci_dev *pdev)
+{
+    int rc;
+    unsigned int end;
+    struct vpci *vpci = pdev->vpci;
+    const unsigned int msi_pos = pdev->msi_pos;
+    const unsigned int ctrl = msi_control_reg(msi_pos);
+
+    if ( !msi_pos || !vpci->msi )
+        return 0;
+
+    if ( vpci->msi->masking )
+        end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
+    else
+        end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
+
+    rc = vpci_remove_registers(vpci, ctrl, end - ctrl);
+    if ( rc )
+        printk(XENLOG_WARNING "%pd %pp: fail to remove MSI handlers rc=%d\n",
+               pdev->domain, &pdev->sbdf, rc);
+
+    XFREE(vpci->msi);
+
+    /*
+     * The driver may not traverse the capability list and think device
+     * supports MSI by default. So here let the control register of MSI
+     * be Read-Only is to ensure MSI disabled.
+     */
+    rc = vpci_add_register(vpci, vpci_hw_read16, NULL, ctrl, 2, NULL);
+    if ( rc )
+        printk(XENLOG_ERR "%pd %pp: fail to add MSI ctrl handler rc=%d\n",
+               pdev->domain, &pdev->sbdf, rc);
+
+    return rc;
+}
+
 static int cf_check init_msi(struct pci_dev *pdev)
 {
     unsigned int pos = pdev->msi_pos;
@@ -207,7 +284,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
         return -ENOMEM;
 
     ret = vpci_add_register(pdev->vpci, control_read, control_write,
-                            msi_control_reg(pos), 2, pdev->vpci->msi);
+                            msi_control_reg(pos), 2, pdev->vpci);
     if ( ret )
         /*
          * NB: there's no need to free the msi struct or remove the register
@@ -235,20 +312,20 @@ static int cf_check init_msi(struct pci_dev *pdev)
     pdev->vpci->msi->masking = is_mask_bit_support(control);
 
     ret = vpci_add_register(pdev->vpci, address_read, address_write,
-                            msi_lower_address_reg(pos), 4, pdev->vpci->msi);
+                            msi_lower_address_reg(pos), 4, pdev->vpci);
     if ( ret )
         return ret;
 
     ret = vpci_add_register(pdev->vpci, data_read, data_write,
                             msi_data_reg(pos, pdev->vpci->msi->address64), 2,
-                            pdev->vpci->msi);
+                            pdev->vpci);
     if ( ret )
         return ret;
 
     if ( pdev->vpci->msi->address64 )
     {
         ret = vpci_add_register(pdev->vpci, address_hi_read, address_hi_write,
-                                msi_upper_address_reg(pos), 4, pdev->vpci->msi);
+                                msi_upper_address_reg(pos), 4, pdev->vpci);
         if ( ret )
             return ret;
     }
@@ -258,7 +335,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
         ret = vpci_add_register(pdev->vpci, mask_read, mask_write,
                                 msi_mask_bits_reg(pos,
                                                   pdev->vpci->msi->address64),
-                                4, pdev->vpci->msi);
+                                4, pdev->vpci);
         if ( ret )
             return ret;
         /*
@@ -270,7 +347,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_CAP(MSI, init_msi, NULL);
+REGISTER_VPCI_CAP(MSI, init_msi, cleanup_msi);
 
 void vpci_dump_msi(void)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 8/8] vpci/msix: Free MSIX resources when init_msix() fails
  2025-07-04  7:07 [PATCH v7 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (6 preceding siblings ...)
  2025-07-04  7:08 ` [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
@ 2025-07-04  7:08 ` Jiqian Chen
  2025-07-21 16:24   ` Roger Pau Monné
  7 siblings, 1 reply; 31+ messages in thread
From: Jiqian Chen @ 2025-07-04  7:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_msix() fails, current logic return fail and free MSIX-related
resources in vpci_deassign_device(). But the previous new changes will
hide MSIX capability and return success, it can't reach
vpci_deassign_device() to remove resources if hiding success, so those
resources must be removed in cleanup function of MSIX.

To do that, implement cleanup function for MSIX.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v6->v7 changes:
* Change the pointer parameter of cleanup_msix() to be const.
* When vpci_remove_registers() in cleanup_msix() fails, not to return
  directly, instead try to free msix and re-add ctrl handler.
* Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msix in
  init_msix() since we need that every handler realize that msix is NULL
  when msix is freed but handlers are still in there.

v5->v6 changes:
* Change the logic to add dummy handler when !vpci->msix in cleanup_msix().

v4->v5 changes:
* Change definition "static void cleanup_msix" to "static int cf_check cleanup_msix"
  since cleanup hook is changed to be int.
* Add a read-only register for MSIX Control Register in the end of cleanup_msix().

v3->v4 changes:
* Change function name from fini_msix() to cleanup_msix().
* Change to use XFREE to free vpci->msix.
* In cleanup function, change the sequence of check and remove action according to
  init_msix().

v2->v3 changes:
* Remove unnecessary clean operations in fini_msix().

v1->v2 changes:
new patch.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/msix.c | 54 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 50 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index a1692b9d9f6a..114280337f3f 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -36,7 +36,11 @@
 static uint32_t cf_check control_read(
     const struct pci_dev *pdev, unsigned int reg, void *data)
 {
-    const struct vpci_msix *msix = data;
+    const struct vpci *vpci = data;
+    const struct vpci_msix *msix = vpci->msix;
+
+    if ( !msix )
+        return pci_conf_read16(pdev->sbdf, reg);
 
     return (msix->max_entries - 1) |
            (msix->enabled ? PCI_MSIX_FLAGS_ENABLE : 0) |
@@ -74,12 +78,16 @@ static void update_entry(struct vpci_msix_entry *entry,
 static void cf_check control_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
-    struct vpci_msix *msix = data;
+    struct vpci *vpci = data;
+    struct vpci_msix *msix = vpci->msix;
     bool new_masked = val & PCI_MSIX_FLAGS_MASKALL;
     bool new_enabled = val & PCI_MSIX_FLAGS_ENABLE;
     unsigned int i;
     int rc;
 
+    if ( !msix )
+        return;
+
     if ( new_masked == msix->masked && new_enabled == msix->enabled )
         return;
 
@@ -656,6 +664,44 @@ static int vpci_make_msix_hole(const struct pci_dev *pdev)
     return 0;
 }
 
+static int cf_check cleanup_msix(const struct pci_dev *pdev)
+{
+    int rc;
+    struct vpci *vpci = pdev->vpci;
+    const unsigned int msix_pos = pdev->msix_pos;
+
+    if ( !msix_pos )
+        return 0;
+
+    rc = vpci_remove_registers(vpci, msix_control_reg(msix_pos), 2);
+    if ( rc )
+        printk(XENLOG_WARNING "%pd %pp: fail to remove MSIX handlers rc=%d\n",
+               pdev->domain, &pdev->sbdf, rc);
+
+    if ( vpci->msix )
+    {
+        for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
+            if ( vpci->msix->table[i] )
+                iounmap(vpci->msix->table[i]);
+
+        list_del(&vpci->msix->next);
+        XFREE(vpci->msix);
+    }
+
+    /*
+     * The driver may not traverse the capability list and think device
+     * supports MSIX by default. So here let the control register of MSIX
+     * be Read-Only is to ensure MSIX disabled.
+     */
+    rc = vpci_add_register(vpci, vpci_hw_read16, NULL,
+                           msix_control_reg(msix_pos), 2, NULL);
+    if ( rc )
+        printk(XENLOG_ERR "%pd %pp: fail to add MSIX ctrl handler rc=%d\n",
+               pdev->domain, &pdev->sbdf, rc);
+
+    return rc;
+}
+
 static int cf_check init_msix(struct pci_dev *pdev)
 {
     struct domain *d = pdev->domain;
@@ -677,7 +723,7 @@ static int cf_check init_msix(struct pci_dev *pdev)
         return -ENOMEM;
 
     rc = vpci_add_register(pdev->vpci, control_read, control_write,
-                           msix_control_reg(msix_offset), 2, msix);
+                           msix_control_reg(msix_offset), 2, pdev->vpci);
     if ( rc )
     {
         xfree(msix);
@@ -710,7 +756,7 @@ static int cf_check init_msix(struct pci_dev *pdev)
 
     return rc;
 }
-REGISTER_VPCI_CAP(MSIX, init_msix, NULL);
+REGISTER_VPCI_CAP(MSIX, init_msix, cleanup_msix);
 
 /*
  * Local variables:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0
  2025-07-04  7:07 ` [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0 Jiqian Chen
@ 2025-07-08 14:10   ` Jan Beulich
  2025-07-09  5:29     ` Chen, Jiqian
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2025-07-08 14:10 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: Huang Rui, Roger Pau Monné, xen-devel

On 04.07.2025 09:07, Jiqian Chen wrote:
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -836,6 +836,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>                                    PCI_STATUS_RSVDZ_MASK);
>  }
>  
> +static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
> +{
> +    unsigned int pos = PCI_CFG_SPACE_SIZE;
> +
> +    if ( !is_hardware_domain(pdev->domain) )
> +        /* Extended capabilities read as zero, write ignore for DomU */
> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +                                 pos, 4, (void *)0);
> +
> +    do
> +    {
> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
> +        int rc;
> +
> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
> +                               pos, 4, (void *)(uintptr_t)header);

If it wasn't for this use of vpci_hw_write32(), I'd be happy to provide my
R-b. But this continues to look bogus to me: What use is it to allow writes
when Dom0 then can't read back any possible effect of such a write (in the
unexpected event that some of the bits were indeed writable)?

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 2/8] vpci: Refactor REGISTER_VPCI_INIT
  2025-07-04  7:07 ` [PATCH v7 2/8] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
@ 2025-07-08 14:52   ` Jan Beulich
  2025-07-21 14:37   ` Roger Pau Monné
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2025-07-08 14:52 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Huang Rui, Roger Pau Monné, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel

On 04.07.2025 09:07, Jiqian Chen wrote:
> Refactor REGISTER_VPCI_INIT to contain more capability specific
> information, this will benefit further follow-on changes to hide
> capability when initialization fails.
> 
> What's more, change the definition of init_header() since it is
> not a capability and it is needed for all devices' PCI config space.
> 
> After refactor, the "priority" of initializing capabilities isn't
> needed anymore, so delete its related codes.
> 
> Note:
> Call vpci_make_msix_hole() in the end of init_msix() since the change
> of sequence of init_header() and init_msix(). And delete the call of
> vpci_make_msix_hole() in modify_decoding() since it is not needed.
> 
> The cleanup hook is also added in this change, even if it's still
> unused. Further changes will make use of it.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> There is a byte alignment problem in the array __start_vpci_array, which can be solved after
> "[PATCH] x86: don't have gcc over-align data" is merged.

I was meaning to suggest a way to actually detect the issue at build time,
but sadly what's wanted to do so will first need adding to gas. Which now
will likely be only after 2.45 went out.

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-07-04  7:08 ` [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
@ 2025-07-08 15:13   ` Jan Beulich
  2025-07-09  6:10     ` Chen, Jiqian
  2025-07-21 16:21   ` Roger Pau Monné
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2025-07-08 15:13 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: Huang Rui, Roger Pau Monné, xen-devel

On 04.07.2025 09:08, Jiqian Chen wrote:
> ---
> v6->v7 changes:
> * Change the pointer parameter of cleanup_msi() to be const.
> * When vpci_remove_registers() in cleanup_msi() fails, not to return
>   directly, instead try to free msi and re-add ctrl handler.
> * Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msi in
>   init_msi() since we need that every handler realize that msi is NULL
>   when msi is free but handlers are still in there.

Imo this latter change would better have been a separate patch. I'm not
going to insist though (also I really can't, for not being a maintainer
of this file).

> @@ -193,6 +234,42 @@ static void cf_check mask_write(
>      msi->mask = val;
>  }
>  
> +static int cf_check cleanup_msi(const struct pci_dev *pdev)
> +{
> +    int rc;
> +    unsigned int end;
> +    struct vpci *vpci = pdev->vpci;
> +    const unsigned int msi_pos = pdev->msi_pos;
> +    const unsigned int ctrl = msi_control_reg(msi_pos);
> +
> +    if ( !msi_pos || !vpci->msi )
> +        return 0;
> +
> +    if ( vpci->msi->masking )
> +        end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
> +    else
> +        end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;

What's this "- 2" for? If there's no masking support, you want to cut off
_at_ the mask register, not 2 bytes ahead of it? Just like you cut off at
the pending bits register when there is masking support.

> +    rc = vpci_remove_registers(vpci, ctrl, end - ctrl);
> +    if ( rc )
> +        printk(XENLOG_WARNING "%pd %pp: fail to remove MSI handlers rc=%d\n",
> +               pdev->domain, &pdev->sbdf, rc);
> +
> +    XFREE(vpci->msi);
> +
> +    /*
> +     * The driver may not traverse the capability list and think device
> +     * supports MSI by default. So here let the control register of MSI
> +     * be Read-Only is to ensure MSI disabled.
> +     */
> +    rc = vpci_add_register(vpci, vpci_hw_read16, NULL, ctrl, 2, NULL);
> +    if ( rc )
> +        printk(XENLOG_ERR "%pd %pp: fail to add MSI ctrl handler rc=%d\n",
> +               pdev->domain, &pdev->sbdf, rc);

Imo the uses of XENLOG_ERR and XENLOG_WARNING want to change places. The latter
is extremely likely to be a follow-on failure from the first one failing. Plus
the latter failing is covered by what you add to control_read(). Which leaves
as the only case where this is really an error (and XENLOG_ERR might then be
warranted in both places) if the former succeeds and only the latter fails.

Hmm, then again vpci_init_capabilities() would too issue an error message in
that case. Perhaps keep as is then.

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0
  2025-07-08 14:10   ` Jan Beulich
@ 2025-07-09  5:29     ` Chen, Jiqian
  2025-07-09  5:32       ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Chen, Jiqian @ 2025-07-09  5:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Huang, Ray, Roger Pau Monné, xen-devel@lists.xenproject.org,
	Chen, Jiqian

On 2025/7/8 22:10, Jan Beulich wrote:
> On 04.07.2025 09:07, Jiqian Chen wrote:
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -836,6 +836,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>                                    PCI_STATUS_RSVDZ_MASK);
>>  }
>>  
>> +static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
>> +{
>> +    unsigned int pos = PCI_CFG_SPACE_SIZE;
>> +
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +        /* Extended capabilities read as zero, write ignore for DomU */
>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>> +                                 pos, 4, (void *)0);
>> +
>> +    do
>> +    {
>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>> +        int rc;
>> +
>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>> +                               pos, 4, (void *)(uintptr_t)header);
> 
> If it wasn't for this use of vpci_hw_write32(), I'd be happy to provide my
> R-b. But this continues to look bogus to me: What use is it to allow writes
> when Dom0 then can't read back any possible effect of such a write (in the
> unexpected event that some of the bits were indeed writable)?
Oh, I got your concern.
What do you think about updating the header value after writing to hardware in write function?
Or you prefer to pass NULL here?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0
  2025-07-09  5:29     ` Chen, Jiqian
@ 2025-07-09  5:32       ` Jan Beulich
  2025-07-09  5:34         ` Chen, Jiqian
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2025-07-09  5:32 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Huang, Ray, Roger Pau Monné, xen-devel@lists.xenproject.org

On 09.07.2025 07:29, Chen, Jiqian wrote:
> On 2025/7/8 22:10, Jan Beulich wrote:
>> On 04.07.2025 09:07, Jiqian Chen wrote:
>>> --- a/xen/drivers/vpci/header.c
>>> +++ b/xen/drivers/vpci/header.c
>>> @@ -836,6 +836,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>>                                    PCI_STATUS_RSVDZ_MASK);
>>>  }
>>>  
>>> +static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
>>> +{
>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE;
>>> +
>>> +    if ( !is_hardware_domain(pdev->domain) )
>>> +        /* Extended capabilities read as zero, write ignore for DomU */
>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>> +                                 pos, 4, (void *)0);
>>> +
>>> +    do
>>> +    {
>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>>> +        int rc;
>>> +
>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>>> +                               pos, 4, (void *)(uintptr_t)header);
>>
>> If it wasn't for this use of vpci_hw_write32(), I'd be happy to provide my
>> R-b. But this continues to look bogus to me: What use is it to allow writes
>> when Dom0 then can't read back any possible effect of such a write (in the
>> unexpected event that some of the bits were indeed writable)?
> Oh, I got your concern.
> What do you think about updating the header value after writing to hardware in write function?

That would imo be a layering violation. Once again that's something that you
primarily would need Roger's input on.

> Or you prefer to pass NULL here?

Yes, that's what I've been trying to argue for.

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0
  2025-07-09  5:32       ` Jan Beulich
@ 2025-07-09  5:34         ` Chen, Jiqian
  2025-07-21 14:16           ` Roger Pau Monné
  0 siblings, 1 reply; 31+ messages in thread
From: Chen, Jiqian @ 2025-07-09  5:34 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Chen, Jiqian, Huang, Ray

On 2025/7/9 13:32, Jan Beulich wrote:
> On 09.07.2025 07:29, Chen, Jiqian wrote:
>> On 2025/7/8 22:10, Jan Beulich wrote:
>>> On 04.07.2025 09:07, Jiqian Chen wrote:
>>>> --- a/xen/drivers/vpci/header.c
>>>> +++ b/xen/drivers/vpci/header.c
>>>> @@ -836,6 +836,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>>>                                    PCI_STATUS_RSVDZ_MASK);
>>>>  }
>>>>  
>>>> +static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
>>>> +{
>>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE;
>>>> +
>>>> +    if ( !is_hardware_domain(pdev->domain) )
>>>> +        /* Extended capabilities read as zero, write ignore for DomU */
>>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>>> +                                 pos, 4, (void *)0);
>>>> +
>>>> +    do
>>>> +    {
>>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>>>> +        int rc;
>>>> +
>>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>>>> +                               pos, 4, (void *)(uintptr_t)header);
>>>
>>> If it wasn't for this use of vpci_hw_write32(), I'd be happy to provide my
>>> R-b. But this continues to look bogus to me: What use is it to allow writes
>>> when Dom0 then can't read back any possible effect of such a write (in the
>>> unexpected event that some of the bits were indeed writable)?
>> Oh, I got your concern.
>> What do you think about updating the header value after writing to hardware in write function?
> 
> That would imo be a layering violation. Once again that's something that you
> primarily would need Roger's input on.
OK, wait for Roger's input.

> 
>> Or you prefer to pass NULL here?
> 
> Yes, that's what I've been trying to argue for.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-07-08 15:13   ` Jan Beulich
@ 2025-07-09  6:10     ` Chen, Jiqian
  2025-07-09  6:49       ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Chen, Jiqian @ 2025-07-09  6:10 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: Huang, Ray, Chen, Jiqian, xen-devel@lists.xenproject.org

On 2025/7/8 23:13, Jan Beulich wrote:
> On 04.07.2025 09:08, Jiqian Chen wrote:
>> ---
>> v6->v7 changes:
>> * Change the pointer parameter of cleanup_msi() to be const.
>> * When vpci_remove_registers() in cleanup_msi() fails, not to return
>>   directly, instead try to free msi and re-add ctrl handler.
>> * Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msi in
>>   init_msi() since we need that every handler realize that msi is NULL
>>   when msi is free but handlers are still in there.
> 
> Imo this latter change would better have been a separate patch. I'm not
> going to insist though (also I really can't, for not being a maintainer
> of this file).
Thanks. I Will do if Roger has the same opinion.

> 
>> @@ -193,6 +234,42 @@ static void cf_check mask_write(
>>      msi->mask = val;
>>  }
>>  
>> +static int cf_check cleanup_msi(const struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +    unsigned int end;
>> +    struct vpci *vpci = pdev->vpci;
>> +    const unsigned int msi_pos = pdev->msi_pos;
>> +    const unsigned int ctrl = msi_control_reg(msi_pos);
>> +
>> +    if ( !msi_pos || !vpci->msi )
>> +        return 0;
>> +
>> +    if ( vpci->msi->masking )
>> +        end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
>> +    else
>> +        end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
> 
> What's this "- 2" for? If there's no masking support, you want to cut off
> _at_ the mask register, not 2 bytes ahead of it? Just like you cut off at
> the pending bits register when there is masking support.
"-2" here is to cut the reserved 2 bytes of Message Data if there is no masking support.

> 
>> +    rc = vpci_remove_registers(vpci, ctrl, end - ctrl);
>> +    if ( rc )
>> +        printk(XENLOG_WARNING "%pd %pp: fail to remove MSI handlers rc=%d\n",
>> +               pdev->domain, &pdev->sbdf, rc);
>> +
>> +    XFREE(vpci->msi);
>> +
>> +    /*
>> +     * The driver may not traverse the capability list and think device
>> +     * supports MSI by default. So here let the control register of MSI
>> +     * be Read-Only is to ensure MSI disabled.
>> +     */
>> +    rc = vpci_add_register(vpci, vpci_hw_read16, NULL, ctrl, 2, NULL);
>> +    if ( rc )
>> +        printk(XENLOG_ERR "%pd %pp: fail to add MSI ctrl handler rc=%d\n",
>> +               pdev->domain, &pdev->sbdf, rc);
> 
> Imo the uses of XENLOG_ERR and XENLOG_WARNING want to change places. The latter
> is extremely likely to be a follow-on failure from the first one failing. Plus
> the latter failing is covered by what you add to control_read(). Which leaves
> as the only case where this is really an error (and XENLOG_ERR might then be
> warranted in both places) if the former succeeds and only the latter fails.
> 
> Hmm, then again vpci_init_capabilities() would too issue an error message in
> that case. Perhaps keep as is then.
I am thinking maybe I need to add a check that "if ( rc == -EEXIST ) return 0;" here.
Since in that case, the failure is because vpci_remove_register fails to remove control handler and it can do the same thing we need here, so should return success.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-07-09  6:10     ` Chen, Jiqian
@ 2025-07-09  6:49       ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2025-07-09  6:49 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Huang, Ray, xen-devel@lists.xenproject.org, Roger Pau Monné

On 09.07.2025 08:10, Chen, Jiqian wrote:
> On 2025/7/8 23:13, Jan Beulich wrote:
>> On 04.07.2025 09:08, Jiqian Chen wrote:
>>> @@ -193,6 +234,42 @@ static void cf_check mask_write(
>>>      msi->mask = val;
>>>  }
>>>  
>>> +static int cf_check cleanup_msi(const struct pci_dev *pdev)
>>> +{
>>> +    int rc;
>>> +    unsigned int end;
>>> +    struct vpci *vpci = pdev->vpci;
>>> +    const unsigned int msi_pos = pdev->msi_pos;
>>> +    const unsigned int ctrl = msi_control_reg(msi_pos);
>>> +
>>> +    if ( !msi_pos || !vpci->msi )
>>> +        return 0;
>>> +
>>> +    if ( vpci->msi->masking )
>>> +        end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
>>> +    else
>>> +        end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
>>
>> What's this "- 2" for? If there's no masking support, you want to cut off
>> _at_ the mask register, not 2 bytes ahead of it? Just like you cut off at
>> the pending bits register when there is masking support.
> "-2" here is to cut the reserved 2 bytes of Message Data if there is no masking support.

Hmm, init_msi() doesn't intercept Extended Message Data when present. I
think that's wrong there, leading to the oddity here. (Imo such use of
hard coded numbers would almost always want to be accompanied by a
comment.)

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0
  2025-07-09  5:34         ` Chen, Jiqian
@ 2025-07-21 14:16           ` Roger Pau Monné
  2025-07-23  6:45             ` Chen, Jiqian
  0 siblings, 1 reply; 31+ messages in thread
From: Roger Pau Monné @ 2025-07-21 14:16 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: Jan Beulich, xen-devel@lists.xenproject.org, Huang, Ray

On Wed, Jul 09, 2025 at 05:34:28AM +0000, Chen, Jiqian wrote:
> On 2025/7/9 13:32, Jan Beulich wrote:
> > On 09.07.2025 07:29, Chen, Jiqian wrote:
> >> On 2025/7/8 22:10, Jan Beulich wrote:
> >>> On 04.07.2025 09:07, Jiqian Chen wrote:
> >>>> --- a/xen/drivers/vpci/header.c
> >>>> +++ b/xen/drivers/vpci/header.c
> >>>> @@ -836,6 +836,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
> >>>>                                    PCI_STATUS_RSVDZ_MASK);
> >>>>  }
> >>>>  
> >>>> +static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
> >>>> +{
> >>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE;
> >>>> +
> >>>> +    if ( !is_hardware_domain(pdev->domain) )
> >>>> +        /* Extended capabilities read as zero, write ignore for DomU */
> >>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> >>>> +                                 pos, 4, (void *)0);
> >>>> +
> >>>> +    do
> >>>> +    {
> >>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
> >>>> +        int rc;
> >>>> +
> >>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
> >>>> +                               pos, 4, (void *)(uintptr_t)header);
> >>>
> >>> If it wasn't for this use of vpci_hw_write32(), I'd be happy to provide my
> >>> R-b. But this continues to look bogus to me: What use is it to allow writes
> >>> when Dom0 then can't read back any possible effect of such a write (in the
> >>> unexpected event that some of the bits were indeed writable)?
> >> Oh, I got your concern.
> >> What do you think about updating the header value after writing to hardware in write function?
> 
> > That would imo be a layering violation. Once again that's something that you
> > primarily would need Roger's input on.
> OK, wait for Roger's input.

Hm, I see the asymmetry of allowing writes but not direct writes, my
thought was to give the hw domain as less interference as possibly,
hence my recommendation to use vpci_hw_write32().

I practice I think it's very unlikely that devices re-use reserved
bits in the capability register, so I'm fine with using NULL (thus
discarding the write).  We can always add more complex handling here
if we ever came across a device that requires it.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 2/8] vpci: Refactor REGISTER_VPCI_INIT
  2025-07-04  7:07 ` [PATCH v7 2/8] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
  2025-07-08 14:52   ` Jan Beulich
@ 2025-07-21 14:37   ` Roger Pau Monné
  2025-07-23  7:20     ` Chen, Jiqian
  1 sibling, 1 reply; 31+ messages in thread
From: Roger Pau Monné @ 2025-07-21 14:37 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Huang Rui, Andrew Cooper, Anthony PERARD, Michal Orzel,
	Jan Beulich, Julien Grall, Stefano Stabellini

On Fri, Jul 04, 2025 at 03:07:57PM +0800, Jiqian Chen wrote:
> Refactor REGISTER_VPCI_INIT to contain more capability specific
> information, this will benefit further follow-on changes to hide
> capability when initialization fails.
> 
> What's more, change the definition of init_header() since it is
> not a capability and it is needed for all devices' PCI config space.
> 
> After refactor, the "priority" of initializing capabilities isn't
> needed anymore, so delete its related codes.
> 
> Note:
> Call vpci_make_msix_hole() in the end of init_msix() since the change
> of sequence of init_header() and init_msix(). And delete the call of
> vpci_make_msix_hole() in modify_decoding() since it is not needed.
> 
> The cleanup hook is also added in this change, even if it's still
> unused. Further changes will make use of it.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> There is a byte alignment problem in the array __start_vpci_array, which can be solved after
> "[PATCH] x86: don't have gcc over-align data" is merged.
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> cc: Andrew Cooper <andrew.cooper3@citrix.com>
> cc: Anthony PERARD <anthony.perard@vates.tech>
> cc: Michal Orzel <michal.orzel@amd.com>
> cc: Jan Beulich <jbeulich@suse.com>
> cc: Julien Grall <julien@xen.org>
> cc: Stefano Stabellini <sstabellini@kernel.org>
> ---
> v6->v7 changes:
> * Change the pointer parameter of cleanup hook of vpci_capability_t to be const.
>   If change parameter of init hook to be const will affect init_msix, and it assigns pdev
>   to struct vpci_msix, so keep no const to expanding the impact.
> * Delete the vpci_make_msix_hole() call in modify_decoding().
> * Change __start_vpci_array from vpci_capability_t* array to vpci_capability_t array.
> * Change the name "finit##_t" to be "name##_entry" and add a "name" parameter to macro
>   REGISTER_VPCI_CAPABILITY.
> 
> v5->v6 changes:
> * Rename REGISTER_PCI_CAPABILITY to REGISTER_VPCI_CAPABILITY.
> * Move vpci_capability_t entry from ".data.vpci" to ".data.rel.ro.vpci" and
>   move the instances of VPCI_ARRAY in the linker scripts before *(.data.rel.ro).
> * Change _start/end_vpci_array[] to be const pointer array.
> 
> v4->v5 changes:
> * Rename REGISTER_VPCI_CAP to REGISTER_PCI_CAPABILITY, rename REGISTER_VPCI_LEGACY_CAP to
>   REGISTER_VPCI_CAP, rename REGISTER_VPCI_EXTENDED_CAP to REGISTER_VPCI_EXTCAP.
> * Change cleanup hook of vpci_capability_t from void to int.
> 
> v3->v4 changes
> * Delete the useless trailing dot of section ".data.vpci".
> * Add description about priority since this patch removes the initializing priority of
>   capabilities and priority is not needed anymore.
> * Change the hook name from fini to cleanup.
> * Change the name x and y to be finit and fclean.
> * Remove the unnecessary check "!capability->init"
> 
> v2->v3 changes:
> * This is separated from patch "vpci: Hide capability when it fails to initialize" of v2.
> * Delete __maybe_unused attribute of "out" in function vpci_assign_devic().
> * Rename REGISTER_VPCI_EXTEND_CAP to REGISTER_VPCI_EXTENDED_CAP.
> 
> v1->v2 changes:
> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove
>   failed capability from list.
> * Called vpci_make_msix_hole() in the end of init_msix().
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/arch/arm/xen.lds.S    |  3 +--
>  xen/arch/ppc/xen.lds.S    |  3 +--
>  xen/arch/riscv/xen.lds.S  |  3 +--
>  xen/arch/x86/xen.lds.S    |  2 +-
>  xen/drivers/vpci/header.c | 16 +-------------
>  xen/drivers/vpci/msi.c    |  2 +-
>  xen/drivers/vpci/msix.c   | 11 +++++++---
>  xen/drivers/vpci/rebar.c  |  2 +-
>  xen/drivers/vpci/vpci.c   | 44 ++++++++++++++++++++++++++++++---------
>  xen/include/xen/vpci.h    | 32 ++++++++++++++++++----------
>  xen/include/xen/xen.lds.h |  2 +-
>  11 files changed, 71 insertions(+), 49 deletions(-)
> 
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index 5bfbe1e92c1e..9f30c3a13ed1 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -57,6 +57,7 @@ SECTIONS
>  
>         *(.rodata)
>         *(.rodata.*)
> +       VPCI_ARRAY
>         *(.data.rel.ro)
>         *(.data.rel.ro.*)
>  
> @@ -64,8 +65,6 @@ SECTIONS
>         __proc_info_start = .;
>         *(.proc.info)
>         __proc_info_end = .;
> -
> -       VPCI_ARRAY
>    } :text
>  
>  #if defined(BUILD_ID)
> diff --git a/xen/arch/ppc/xen.lds.S b/xen/arch/ppc/xen.lds.S
> index 1366e2819eed..1de0b77fc6b9 100644
> --- a/xen/arch/ppc/xen.lds.S
> +++ b/xen/arch/ppc/xen.lds.S
> @@ -51,11 +51,10 @@ SECTIONS
>  
>          *(.rodata)
>          *(.rodata.*)
> +        VPCI_ARRAY
>          *(.data.rel.ro)
>          *(.data.rel.ro.*)
>  
> -        VPCI_ARRAY
> -
>          . = ALIGN(POINTER_ALIGN);
>      } :text
>  
> diff --git a/xen/arch/riscv/xen.lds.S b/xen/arch/riscv/xen.lds.S
> index 8c3c06de01f6..edcadff90bfe 100644
> --- a/xen/arch/riscv/xen.lds.S
> +++ b/xen/arch/riscv/xen.lds.S
> @@ -46,11 +46,10 @@ SECTIONS
>  
>          *(.rodata)
>          *(.rodata.*)
> +        VPCI_ARRAY
>          *(.data.rel.ro)
>          *(.data.rel.ro.*)
>  
> -        VPCI_ARRAY
> -
>          . = ALIGN(POINTER_ALIGN);
>      } :text
>  
> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> index 636c7768aa3c..8e9cac75b09e 100644
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -135,6 +135,7 @@ SECTIONS
>  
>         *(.rodata)
>         *(.rodata.*)
> +       VPCI_ARRAY
>         *(.data.rel.ro)
>         *(.data.rel.ro.*)
>  
> @@ -148,7 +149,6 @@ SECTIONS
>         *(.note.gnu.build-id)
>         __note_gnu_build_id_end = .;
>  #endif
> -       VPCI_ARRAY
>    } PHDR(text)
>  
>  #if defined(CONFIG_PVH_GUEST) && !defined(EFI)
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 8ee8052cd4a3..069253b5f721 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -122,19 +122,6 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>      bool map = cmd & PCI_COMMAND_MEMORY;
>      unsigned int i;
>  
> -    /*
> -     * Make sure there are no mappings in the MSIX MMIO areas, so that accesses
> -     * can be trapped (and emulated) by Xen when the memory decoding bit is
> -     * enabled.
> -     *
> -     * FIXME: punching holes after the p2m has been set up might be racy for
> -     * DomU usage, needs to be revisited.
> -     */
> -#ifdef CONFIG_HAS_PCI_MSI
> -    if ( map && !rom_only && vpci_make_msix_hole(pdev) )
> -        return;
> -#endif

I think you need to keep this.  What about BARs being repositioned by
dom0 over reserved region(s), and thus needing the MSI-X hole to be
craved out there?  It's not a common use-case, but we should support
dom0 moving BARs around.

I think you need both the added chunk in init_msix(), plus the code
above to not regress the current functionality.

>      for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>      {
>          struct vpci_bar *bar = &header->bars[i];
> @@ -869,7 +856,7 @@ static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
>      return 0;
>  }
>  
> -static int cf_check init_header(struct pci_dev *pdev)
> +int vpci_init_header(struct pci_dev *pdev)
>  {
>      uint16_t cmd;
>      uint64_t addr, size;
> @@ -1065,7 +1052,6 @@ static int cf_check init_header(struct pci_dev *pdev)
>      pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>      return rc;
>  }
> -REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
>  
>  /*
>   * Local variables:
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> index 66e5a8a116be..c3eba4e14870 100644
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>  
>      return 0;
>  }
> -REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
> +REGISTER_VPCI_CAP(MSI, init_msi, NULL);
>  
>  void vpci_dump_msi(void)
>  {
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index 74211301ba10..a1692b9d9f6a 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -589,7 +589,8 @@ static const struct hvm_mmio_ops vpci_msix_table_ops = {
>      .write = msix_write,
>  };
>  
> -int vpci_make_msix_hole(const struct pci_dev *pdev)
> +/* Make sure there's a hole in the p2m for the MSIX mmio areas. */
> +static int vpci_make_msix_hole(const struct pci_dev *pdev)
>  {
>      struct domain *d = pdev->domain;
>      unsigned int i;
> @@ -703,9 +704,13 @@ static int cf_check init_msix(struct pci_dev *pdev)
>      pdev->vpci->msix = msix;
>      list_add(&msix->next, &d->arch.hvm.msix_tables);
>  
> -    return 0;
> +    spin_lock(&pdev->vpci->lock);
> +    rc = vpci_make_msix_hole(pdev);
> +    spin_unlock(&pdev->vpci->lock);
> +
> +    return rc;
>  }
> -REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
> +REGISTER_VPCI_CAP(MSIX, init_msix, NULL);
>  
>  /*
>   * Local variables:
> diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
> index 793937449af7..3c18792d9bcd 100644
> --- a/xen/drivers/vpci/rebar.c
> +++ b/xen/drivers/vpci/rebar.c
> @@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
>  
>      return 0;
>  }
> -REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
> +REGISTER_VPCI_EXTCAP(REBAR, init_rebar, NULL);
>  
>  /*
>   * Local variables:
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 8474c0e3b995..e7e5b64f1be4 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -36,8 +36,8 @@ struct vpci_register {
>  };
>  
>  #ifdef __XEN__
> -extern vpci_register_init_t *const __start_vpci_array[];
> -extern vpci_register_init_t *const __end_vpci_array[];
> +extern const vpci_capability_t __start_vpci_array[];
> +extern const vpci_capability_t __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
>  #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> @@ -83,6 +83,32 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>  
> +static int vpci_init_capabilities(struct pci_dev *pdev)
> +{
> +    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
> +    {
> +        const vpci_capability_t *capability = &__start_vpci_array[i];
> +        const unsigned int cap = capability->id;
> +        const bool is_ext = capability->is_ext;
> +        unsigned int pos = 0;
> +        int rc;
> +
> +        if ( !is_ext )
> +            pos = pci_find_cap_offset(pdev->sbdf, cap);
> +        else if ( is_hardware_domain(pdev->domain) )
> +            pos = pci_find_ext_capability(pdev->sbdf, cap);
> +
> +        if ( !pos )
> +            continue;
> +
> +        rc = capability->init(pdev);
> +        if ( rc )
> +            return rc;
> +    }
> +
> +    return 0;
> +}
> +
>  void vpci_deassign_device(struct pci_dev *pdev)
>  {
>      unsigned int i;
> @@ -128,7 +154,6 @@ void vpci_deassign_device(struct pci_dev *pdev)
>  
>  int vpci_assign_device(struct pci_dev *pdev)
>  {
> -    unsigned int i;
>      const unsigned long *ro_map;
>      int rc = 0;
>  
> @@ -159,14 +184,13 @@ int vpci_assign_device(struct pci_dev *pdev)
>          goto out;
>  #endif
>  
> -    for ( i = 0; i < NUM_VPCI_INIT; i++ )
> -    {
> -        rc = __start_vpci_array[i](pdev);
> -        if ( rc )
> -            break;
> -    }
> +    rc = vpci_init_header(pdev);
> +    if ( rc )
> +        goto out;
> +
> +    rc = vpci_init_capabilities(pdev);
>  
> - out: __maybe_unused;
> + out:
>      if ( rc )
>          vpci_deassign_device(pdev);
>  
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 61d16cc8b897..61287e5d2e12 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -13,11 +13,12 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
>  typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
>                            uint32_t val, void *data);
>  
> -typedef int vpci_register_init_t(struct pci_dev *dev);
> -
> -#define VPCI_PRIORITY_HIGH      "1"
> -#define VPCI_PRIORITY_MIDDLE    "5"
> -#define VPCI_PRIORITY_LOW       "9"
> +typedef struct {
> +    unsigned int id;
> +    bool is_ext;
> +    int (* init)(struct pci_dev *pdev);
> +    int (* cleanup)(const struct pci_dev *pdev);
> +} vpci_capability_t;
>  
>  #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
>  
> @@ -29,9 +30,21 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
>   */
>  #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
>  
> -#define REGISTER_VPCI_INIT(x, p)                \
> -  static vpci_register_init_t *const x##_entry  \
> -               __used_section(".data.vpci." p) = (x)
> +#define REGISTER_VPCI_CAPABILITY(cap, name, finit, fclean, ext) \
> +    static const vpci_capability_t name##_entry \
> +        __used_section(".data.rel.ro.vpci") = { \
> +        .id = (cap), \
> +        .init = (finit), \
> +        .cleanup = (fclean), \
> +        .is_ext = (ext), \
> +    }
> +
> +#define REGISTER_VPCI_CAP(name, finit, fclean) \
> +    REGISTER_VPCI_CAPABILITY(PCI_CAP_ID_##name, name, finit, fclean, false)
> +#define REGISTER_VPCI_EXTCAP(name, finit, fclean) \
> +    REGISTER_VPCI_CAPABILITY(PCI_EXT_CAP_ID_##name, name, finit, fclean, true)
> +
> +int __must_check vpci_init_header(struct pci_dev *pdev);
>  
>  /* Assign vPCI to device by adding handlers. */
>  int __must_check vpci_assign_device(struct pci_dev *pdev);
> @@ -206,9 +219,6 @@ struct vpci_vcpu {
>  #ifdef __XEN__
>  void vpci_dump_msi(void);
>  
> -/* Make sure there's a hole in the p2m for the MSIX mmio areas. */
> -int vpci_make_msix_hole(const struct pci_dev *pdev);
> -
>  /* Arch-specific vPCI MSI helpers. */
>  void vpci_msi_arch_mask(struct vpci_msi *msi, const struct pci_dev *pdev,
>                          unsigned int entry, bool mask);
> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
> index 793d0e11450c..eb86305c11c7 100644
> --- a/xen/include/xen/xen.lds.h
> +++ b/xen/include/xen/xen.lds.h
> @@ -188,7 +188,7 @@
>  #define VPCI_ARRAY               \
>         . = ALIGN(POINTER_ALIGN); \
>         __start_vpci_array = .;   \
> -       *(SORT(.data.vpci.*))     \
> +       *(.data.rel.ro.vpci)           \

Indentation of the trailing '\' seems to be off?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 3/8] vpci: Hide legacy capability when it fails to initialize
  2025-07-04  7:07 ` [PATCH v7 3/8] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
@ 2025-07-21 15:48   ` Roger Pau Monné
  2025-07-23  7:33     ` Chen, Jiqian
  0 siblings, 1 reply; 31+ messages in thread
From: Roger Pau Monné @ 2025-07-21 15:48 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Fri, Jul 04, 2025 at 03:07:58PM +0800, Jiqian Chen wrote:
> When vpci fails to initialize a legacy capability of device, it just
> returns an error and vPCI gets disabled for the whole device.  That
> most likely renders the device unusable, plus possibly causing issues
> to Xen itself if guest attempts to program the native MSI or MSI-X
> capabilities if present.
> 
> So, add new function to hide legacy capability when initialization
> fails. And remove the failed legacy capability from the vpci emulated
> legacy capability list.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v6->v7 changes:
> * Change the pointer parameter of vpci_get_register(),
>   vpci_get_previous_cap_register() and vpci_capability_hide() to be const.
> 
> v5->v6 changes:
> * Rename parameter rm to r in vpci_get_register().
> * Use for loop to compact the code of vpci_get_previous_cap_register().
> * Rename prev_next_r to prev_r in vpci_capability_hide(().
> * Add printing when cap init, cleanup and hide fail.
> 
> v4->v5 changes:
> * Modify vpci_get_register() to delete some unnecessary check, so that
>   I don't need to move function vpci_register_cmp().
> * Rename vpci_capability_mask() to vpci_capability_hide().
> 
> v3->v4 changes:
> * Modify the commit message.
> * In function vpci_get_previous_cap_register(), add an ASSERT_UNREACHABLE() if offset below 0x40.
> * Modify vpci_capability_mask() to return error instead of using ASSERT.
> * Use vpci_remove_register to remove PCI_CAP_LIST_ID register instead of open code.
> * Add check "if ( !offset )" in vpci_capability_mask().
> 
> v2->v3 changes:
> * Separated from the last version patch "vpci: Hide capability when it fails to initialize"
> * Whole implementation changed because last version is wrong.
>   This version adds a new helper function vpci_get_register() and uses it to get
>   target handler and previous handler from vpci->handlers, then remove the target.
> 
> v1->v2 changes:
> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
>   remove failed capability from list.
> * Called vpci_make_msix_hole() in the end of init_msix().
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/vpci.c | 109 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 108 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index e7e5b64f1be4..a91c3d4a1415 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -83,6 +83,88 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>  
> +static struct vpci_register *vpci_get_register(const struct vpci *vpci,
> +                                               unsigned int offset,
> +                                               unsigned int size)
> +{
> +    struct vpci_register *r;
> +
> +    ASSERT(spin_is_locked(&vpci->lock));
> +
> +    list_for_each_entry ( r, &vpci->handlers, node )
> +    {
> +        if ( r->offset == offset && r->size == size )
> +            return r;
> +
> +        if ( offset <= r->offset )
> +            break;
> +    }
> +
> +    return NULL;
> +}
> +
> +static struct vpci_register *vpci_get_previous_cap_register(
> +    const struct vpci *vpci, unsigned int offset)
> +{
> +    uint32_t next;
> +    struct vpci_register *r;
> +
> +    if ( offset < 0x40 )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return NULL;
> +    }
> +
> +    for ( r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1); r;
> +          r = next >= 0x40 ? vpci_get_register(vpci,
> +                                               next + PCI_CAP_LIST_NEXT, 1)
> +                           : NULL )
> +    {
> +        next = (uint32_t)(uintptr_t)r->private;

Both 'next' type and the explicit truncation done here would better
use "unsigned int" to match the type of the input offset parameter?

> +        ASSERT(next == (uintptr_t)r->private);
> +        if ( next == offset )
> +            break;
> +    }
> +
> +    return r;
> +}
> +
> +static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
> +{
> +    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
> +    struct vpci_register *prev_r, *next_r;
> +    struct vpci *vpci = pdev->vpci;
> +
> +    if ( !offset )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return 0;
> +    }
> +
> +    spin_lock(&vpci->lock);
> +    prev_r = vpci_get_previous_cap_register(vpci, offset);
> +    next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
> +    if ( !prev_r || !next_r )
> +    {
> +        spin_unlock(&vpci->lock);
> +        return -ENODEV;
> +    }
> +
> +    prev_r->private = next_r->private;
> +    /*
> +     * Not calling vpci_remove_register() here is to avoid redoing
> +     * the register search
> +     */
> +    list_del(&next_r->node);
> +    spin_unlock(&vpci->lock);
> +    xfree(next_r);
> +
> +    if ( !is_hardware_domain(pdev->domain) )
> +        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
> +
> +    return 0;
> +}
> +
>  static int vpci_init_capabilities(struct pci_dev *pdev)
>  {
>      for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
> @@ -103,7 +185,32 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
>  
>          rc = capability->init(pdev);
>          if ( rc )
> -            return rc;
> +        {
> +            const char *type = is_ext ? "extended" : "legacy";
> +
> +            printk(XENLOG_WARNING "%pd %pp: init %s cap %u fail rc=%d, mask it\n",

Nit: in order to avoid the strictly speaking overly long line here you
could split it like:

            printk(XENLOG_WARNING
                   "%pd %pp: init %s cap %u fail rc=%d, mask it\n",
                   pdev->domain, ...

> +                   pdev->domain, &pdev->sbdf, type, cap, rc);
> +
> +            if ( capability->cleanup )
> +            {
> +                rc = capability->cleanup(pdev);
> +                if ( rc )
> +                {
> +                    printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
> +                           pdev->domain, &pdev->sbdf, type, cap, rc);

I think it would be fine to not return error here for the hardware
domain, and try to mask the capability even if cleanup() has failed?

For the hardware domain it's likely better to not fail and attempt to
mask, rather than fail and then end up exposing the device without any
kind of vPCI mediation.  For domU the proposed behavior is fine.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 4/8] vpci: Hide extended capability when it fails to initialize
  2025-07-04  7:07 ` [PATCH v7 4/8] vpci: Hide extended " Jiqian Chen
@ 2025-07-21 16:04   ` Roger Pau Monné
  0 siblings, 0 replies; 31+ messages in thread
From: Roger Pau Monné @ 2025-07-21 16:04 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Huang Rui, Andrew Cooper, Anthony PERARD, Michal Orzel,
	Jan Beulich, Julien Grall, Stefano Stabellini

On Fri, Jul 04, 2025 at 03:07:59PM +0800, Jiqian Chen wrote:
> When vpci fails to initialize a extended capability of device, it
> just returns an error and vPCI gets disabled for the whole device.
> 
> So, add function to hide extended capability when initialization
> fails. And remove the failed extended capability handler from vpci
> extended capability list.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> Comment left over for the situation that "capability in 0x100 can't be removed case"
> from Jan in last version, need Roger's input.
> 
> Jan:
> Can we rely on OSes recognizing ID 0 as "just skip"?
> Since the size isn't encoded in the header, there might be issues lurking here.

I think it's the best we can aim to do TBH, I don't see any other way
to workaround this.  Size is indeed not encoded, but the next
capability pointer doesn't need to know the size of the current
capability to fetch the next one.

> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> cc: Andrew Cooper <andrew.cooper3@citrix.com>
> cc: Anthony PERARD <anthony.perard@vates.tech>
> cc: Michal Orzel <michal.orzel@amd.com>
> cc: Jan Beulich <jbeulich@suse.com>
> cc: Julien Grall <julien@xen.org>
> cc: Stefano Stabellini <sstabellini@kernel.org>
> ---
> v6->v7 changes:
> * Change the pointer parameter of vpci_get_previous_ext_cap_register()
>   and vpci_ext_capability_hide() to be const.
> 
> v5->v6 changes:
> * Change to use for loop to compact code of vpci_get_previous_ext_cap_register().
> * Rename parameter rm to r in vpci_ext_capability_hide().
> * Change comment to describ the case that hide capability of position
>   0x100U.
> 
> v4->v5 changes:
> * Modify the hex digits of PCI_EXT_CAP_NEXT_MASK and PCI_EXT_CAP_NEXT to be low case.
> * Rename vpci_ext_capability_mask to vpci_ext_capability_hide.
> 
> v3->v4 changes:
> * Change definition of PCI_EXT_CAP_NEXT to be "#define PCI_EXT_CAP_NEXT(header)
>   (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xFFCU)" to avoid redundancy.
> * Modify the commit message.
> * Change vpci_ext_capability_mask() to return error instead of using ASSERT.
> * Set the capability ID part to be zero when we need to hide the capability of position 0x100U.
> * Add check "if ( !offset )" in vpci_ext_capability_mask().
> 
> v2->v3 changes:
> * Separated from the last version patch "vpci: Hide capability when it fails to initialize".
> * Whole implementation changed because last version is wrong.
>   This version gets target handler and previous handler from vpci->handlers, then remove the target.
> * Note: a case in function vpci_ext_capability_mask() needs to be discussed,
>   because it may change the offset of next capability when the offset of target
>   capability is 0x100U(the first extended capability), my implementation is just to
>   ignore and let hardware to handle the target capability.
> 
> v1->v2 changes:
> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
>   remove failed capability from list.
> * Called vpci_make_msix_hole() in the end of init_msix().
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/vpci.c    | 88 ++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/pci_regs.h |  5 ++-
>  2 files changed, 92 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index a91c3d4a1415..8be4b53533a3 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -165,6 +165,92 @@ static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
>      return 0;
>  }
>  
> +static struct vpci_register *vpci_get_previous_ext_cap_register(
> +    const struct vpci *vpci, unsigned int offset)
> +{
> +    unsigned int pos = PCI_CFG_SPACE_SIZE;
> +    struct vpci_register *r;
> +
> +    if ( offset <= PCI_CFG_SPACE_SIZE )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return NULL;
> +    }
> +
> +    for ( r = vpci_get_register(vpci, pos, 4); r;
> +          r = pos > PCI_CFG_SPACE_SIZE ? vpci_get_register(vpci, pos, 4)
> +                                       : NULL )
> +    {
> +        uint32_t header = (uint32_t)(uintptr_t)r->private;
> +
> +        ASSERT(header == (uintptr_t)r->private);
> +
> +        pos = PCI_EXT_CAP_NEXT(header);
> +        if ( pos == offset )
> +            break;
> +    }
> +
> +    return r;
> +}
> +
> +static int vpci_ext_capability_hide(
> +    const struct pci_dev *pdev, unsigned int cap)
> +{
> +    const unsigned int offset = pci_find_ext_capability(pdev->sbdf, cap);
> +    struct vpci_register *r, *prev_r;
> +    struct vpci *vpci = pdev->vpci;
> +    uint32_t header, pre_header;
> +
> +    if ( offset < PCI_CFG_SPACE_SIZE )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return 0;
> +    }
> +
> +    spin_lock(&vpci->lock);
> +    r = vpci_get_register(vpci, offset, 4);
> +    if ( !r )
> +    {
> +        spin_unlock(&vpci->lock);
> +        return -ENODEV;
> +    }
> +
> +    header = (uint32_t)(uintptr_t)r->private;
> +    if ( offset == PCI_CFG_SPACE_SIZE )
> +    {
> +        if ( PCI_EXT_CAP_NEXT(header) <= PCI_CFG_SPACE_SIZE )
> +            r->private = (void *)(uintptr_t)0;
> +        else
> +            /*
> +             * The first extended capability (0x100) can not be removed from
> +             * the linked list, so instead mask its capability ID to return 0
> +             * and force OSes to skip it.

s/and force/hopefully forcing/.  I think that's the best Xen can
possibly do in this situation.  Unless someone has a more reliable
idea.

With that adjusted:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 6/8] vpci/rebar: Free Rebar resources when init_rebar() fails
  2025-07-04  7:08 ` [PATCH v7 6/8] vpci/rebar: Free Rebar resources when init_rebar() fails Jiqian Chen
@ 2025-07-21 16:08   ` Roger Pau Monné
  0 siblings, 0 replies; 31+ messages in thread
From: Roger Pau Monné @ 2025-07-21 16:08 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Fri, Jul 04, 2025 at 03:08:01PM +0800, Jiqian Chen wrote:
> When init_rebar() fails, current logic return fail and free Rebar-related
> resources in vpci_deassign_device(). But the previous new changes will
> hide Rebar capability and return success, it can't reach
> vpci_deassign_device() to remove resources if hiding success, so those
> resources must be removed in cleanup function of Rebar.
> 
> To do that, implement cleanup function for Rebar.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-07-04  7:08 ` [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
  2025-07-08 15:13   ` Jan Beulich
@ 2025-07-21 16:21   ` Roger Pau Monné
  2025-07-23  7:54     ` Chen, Jiqian
  1 sibling, 1 reply; 31+ messages in thread
From: Roger Pau Monné @ 2025-07-21 16:21 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Fri, Jul 04, 2025 at 03:08:02PM +0800, Jiqian Chen wrote:
> When init_msi() fails, current logic return fail and free MSI-related
> resources in vpci_deassign_device(). But the previous new changes will
> hide MSI capability and return success, it can't reach
> vpci_deassign_device() to remove resources if hiding success, so those
> resources must be removed in cleanup function of MSI.
> 
> To do that, implement cleanup function for MSI.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v6->v7 changes:
> * Change the pointer parameter of cleanup_msi() to be const.
> * When vpci_remove_registers() in cleanup_msi() fails, not to return
>   directly, instead try to free msi and re-add ctrl handler.
> * Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msi in
>   init_msi() since we need that every handler realize that msi is NULL
>   when msi is free but handlers are still in there.
> 
> v5->v6 changes:
> No.
> 
> v4->v5 changes:
> * Change definition "static void cleanup_msi" to "static int cf_check cleanup_msi"
>   since cleanup hook is changed to be int.
> * Add a read-only register for MSI Control Register in the end of cleanup_msi.
> 
> v3->v4 changes:
> * Change function name from fini_msi() to cleanup_msi().
> * Remove unnecessary comment.
> * Change to use XFREE to free vpci->msi.
> 
> v2->v3 changes:
> * Remove all fail path, and use fini_msi() hook instead.
> * Change the method to calculating the size of msi registers.
> 
> v1->v2 changes:
> * Added a new function fini_msi to free all MSI resources instead of using an array
>   to record registered registers.
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/msi.c | 111 ++++++++++++++++++++++++++++++++++-------
>  1 file changed, 94 insertions(+), 17 deletions(-)
> 
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> index c3eba4e14870..09b91a685df5 100644
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -25,7 +25,11 @@
>  static uint32_t cf_check control_read(
>      const struct pci_dev *pdev, unsigned int reg, void *data)
>  {
> -    const struct vpci_msi *msi = data;
> +    const struct vpci *vpci = data;
> +    const struct vpci_msi *msi = vpci->msi;
> +
> +    if ( !msi )
> +        return pci_conf_read16(pdev->sbdf, reg);
>  
>      return MASK_INSR(fls(pdev->msi_maxvec) - 1, PCI_MSI_FLAGS_QMASK) |
>             MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE) |
> @@ -37,12 +41,16 @@ static uint32_t cf_check control_read(
>  static void cf_check control_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>  {
> -    struct vpci_msi *msi = data;
> +    struct vpci *vpci = data;
> +    struct vpci_msi *msi = vpci->msi;
>      unsigned int vectors = min_t(uint8_t,
>                                   1u << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE),
>                                   pdev->msi_maxvec);
>      bool new_enabled = val & PCI_MSI_FLAGS_ENABLE;
>  
> +    if ( !msi )
> +        return;
> +
>      /*
>       * No change if the enable field and the number of vectors is
>       * the same or the device is not enabled, in which case the
> @@ -101,7 +109,11 @@ static void update_msi(const struct pci_dev *pdev, struct vpci_msi *msi)
>  static uint32_t cf_check address_read(
>      const struct pci_dev *pdev, unsigned int reg, void *data)
>  {
> -    const struct vpci_msi *msi = data;
> +    const struct vpci *vpci = data;
> +    const struct vpci_msi *msi = vpci->msi;
> +
> +    if ( !msi )
> +        return ~(uint32_t)0;
>  
>      return msi->address;
>  }
> @@ -109,7 +121,11 @@ static uint32_t cf_check address_read(
>  static void cf_check address_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>  {
> -    struct vpci_msi *msi = data;
> +    struct vpci *vpci = data;
> +    struct vpci_msi *msi = vpci->msi;
> +
> +    if ( !msi )
> +        return;
>  
>      /* Clear low part. */
>      msi->address &= ~0xffffffffULL;
> @@ -122,7 +138,11 @@ static void cf_check address_write(
>  static uint32_t cf_check address_hi_read(
>      const struct pci_dev *pdev, unsigned int reg, void *data)
>  {
> -    const struct vpci_msi *msi = data;
> +    const struct vpci *vpci = data;
> +    const struct vpci_msi *msi = vpci->msi;
> +
> +    if ( !msi )
> +        return ~(uint32_t)0;
>  
>      return msi->address >> 32;
>  }
> @@ -130,7 +150,11 @@ static uint32_t cf_check address_hi_read(
>  static void cf_check address_hi_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>  {
> -    struct vpci_msi *msi = data;
> +    struct vpci *vpci = data;
> +    struct vpci_msi *msi = vpci->msi;
> +
> +    if ( !msi )
> +        return;
>  
>      /* Clear and update high part. */
>      msi->address  = (uint32_t)msi->address;
> @@ -143,7 +167,11 @@ static void cf_check address_hi_write(
>  static uint32_t cf_check data_read(
>      const struct pci_dev *pdev, unsigned int reg, void *data)
>  {
> -    const struct vpci_msi *msi = data;
> +    const struct vpci *vpci = data;
> +    const struct vpci_msi *msi = vpci->msi;
> +
> +    if ( !msi )
> +        return ~(uint32_t)0;
>  
>      return msi->data;
>  }
> @@ -151,7 +179,11 @@ static uint32_t cf_check data_read(
>  static void cf_check data_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>  {
> -    struct vpci_msi *msi = data;
> +    struct vpci *vpci = data;
> +    struct vpci_msi *msi = vpci->msi;
> +
> +    if ( !msi )
> +        return;
>  
>      msi->data = val;
>  
> @@ -162,7 +194,11 @@ static void cf_check data_write(
>  static uint32_t cf_check mask_read(
>      const struct pci_dev *pdev, unsigned int reg, void *data)
>  {
> -    const struct vpci_msi *msi = data;
> +    const struct vpci *vpci = data;
> +    const struct vpci_msi *msi = vpci->msi;
> +
> +    if ( !msi )
> +        return ~(uint32_t)0;
>  
>      return msi->mask;
>  }
> @@ -170,9 +206,14 @@ static uint32_t cf_check mask_read(
>  static void cf_check mask_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>  {
> -    struct vpci_msi *msi = data;
> -    uint32_t dmask = msi->mask ^ val;
> +    struct vpci *vpci = data;
> +    struct vpci_msi *msi = vpci->msi;
> +    uint32_t dmask;
> +
> +    if ( !msi )
> +        return;
>  
> +    dmask = msi->mask ^ val;
>      if ( !dmask )
>          return;
>  
> @@ -193,6 +234,42 @@ static void cf_check mask_write(
>      msi->mask = val;
>  }
>  
> +static int cf_check cleanup_msi(const struct pci_dev *pdev)
> +{
> +    int rc;
> +    unsigned int end;
> +    struct vpci *vpci = pdev->vpci;
> +    const unsigned int msi_pos = pdev->msi_pos;
> +    const unsigned int ctrl = msi_control_reg(msi_pos);
> +
> +    if ( !msi_pos || !vpci->msi )
> +        return 0;
> +
> +    if ( vpci->msi->masking )
> +        end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
> +    else
> +        end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
> +
> +    rc = vpci_remove_registers(vpci, ctrl, end - ctrl);
> +    if ( rc )
> +        printk(XENLOG_WARNING "%pd %pp: fail to remove MSI handlers rc=%d\n",
> +               pdev->domain, &pdev->sbdf, rc);

I think you could possibly do this as:

if ( rc )
{
    printk(...);
    ASSERT_UNREACHABLE();
    return rc;
}

Given the code in vpci_remove_registers() an error in the removal of
registers would likely imply memory corruption, at which point it's
best to fully disable the device.  That would allow you having to
modify all the handlers to pass vpci instead of msi structs.

That will avoid a lot of the extra code churn of having to change the
handler parameters.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 8/8] vpci/msix: Free MSIX resources when init_msix() fails
  2025-07-04  7:08 ` [PATCH v7 8/8] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
@ 2025-07-21 16:24   ` Roger Pau Monné
  0 siblings, 0 replies; 31+ messages in thread
From: Roger Pau Monné @ 2025-07-21 16:24 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Fri, Jul 04, 2025 at 03:08:03PM +0800, Jiqian Chen wrote:
> When init_msix() fails, current logic return fail and free MSIX-related
> resources in vpci_deassign_device(). But the previous new changes will
> hide MSIX capability and return success, it can't reach
> vpci_deassign_device() to remove resources if hiding success, so those
> resources must be removed in cleanup function of MSIX.
> 
> To do that, implement cleanup function for MSIX.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v6->v7 changes:
> * Change the pointer parameter of cleanup_msix() to be const.
> * When vpci_remove_registers() in cleanup_msix() fails, not to return
>   directly, instead try to free msix and re-add ctrl handler.
> * Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msix in
>   init_msix() since we need that every handler realize that msix is NULL
>   when msix is freed but handlers are still in there.
> 
> v5->v6 changes:
> * Change the logic to add dummy handler when !vpci->msix in cleanup_msix().
> 
> v4->v5 changes:
> * Change definition "static void cleanup_msix" to "static int cf_check cleanup_msix"
>   since cleanup hook is changed to be int.
> * Add a read-only register for MSIX Control Register in the end of cleanup_msix().
> 
> v3->v4 changes:
> * Change function name from fini_msix() to cleanup_msix().
> * Change to use XFREE to free vpci->msix.
> * In cleanup function, change the sequence of check and remove action according to
>   init_msix().
> 
> v2->v3 changes:
> * Remove unnecessary clean operations in fini_msix().
> 
> v1->v2 changes:
> new patch.
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/msix.c | 54 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 50 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index a1692b9d9f6a..114280337f3f 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -36,7 +36,11 @@
>  static uint32_t cf_check control_read(
>      const struct pci_dev *pdev, unsigned int reg, void *data)
>  {
> -    const struct vpci_msix *msix = data;
> +    const struct vpci *vpci = data;
> +    const struct vpci_msix *msix = vpci->msix;
> +
> +    if ( !msix )
> +        return pci_conf_read16(pdev->sbdf, reg);
>  
>      return (msix->max_entries - 1) |
>             (msix->enabled ? PCI_MSIX_FLAGS_ENABLE : 0) |
> @@ -74,12 +78,16 @@ static void update_entry(struct vpci_msix_entry *entry,
>  static void cf_check control_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>  {
> -    struct vpci_msix *msix = data;
> +    struct vpci *vpci = data;
> +    struct vpci_msix *msix = vpci->msix;
>      bool new_masked = val & PCI_MSIX_FLAGS_MASKALL;
>      bool new_enabled = val & PCI_MSIX_FLAGS_ENABLE;
>      unsigned int i;
>      int rc;
>  
> +    if ( !msix )
> +        return;
> +
>      if ( new_masked == msix->masked && new_enabled == msix->enabled )
>          return;
>  
> @@ -656,6 +664,44 @@ static int vpci_make_msix_hole(const struct pci_dev *pdev)
>      return 0;
>  }
>  
> +static int cf_check cleanup_msix(const struct pci_dev *pdev)
> +{
> +    int rc;
> +    struct vpci *vpci = pdev->vpci;
> +    const unsigned int msix_pos = pdev->msix_pos;
> +
> +    if ( !msix_pos )
> +        return 0;
> +
> +    rc = vpci_remove_registers(vpci, msix_control_reg(msix_pos), 2);
> +    if ( rc )
> +        printk(XENLOG_WARNING "%pd %pp: fail to remove MSIX handlers rc=%d\n",
> +               pdev->domain, &pdev->sbdf, rc);

The same comment as in the previous patch: vpci_remove_registers()
returning an error would likely imply memory corruption, and hence
it's best to just return error and avoid having to modify the
handlers.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0
  2025-07-21 14:16           ` Roger Pau Monné
@ 2025-07-23  6:45             ` Chen, Jiqian
  0 siblings, 0 replies; 31+ messages in thread
From: Chen, Jiqian @ 2025-07-23  6:45 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/7/21 22:16, Roger Pau Monné wrote:
> On Wed, Jul 09, 2025 at 05:34:28AM +0000, Chen, Jiqian wrote:
>> On 2025/7/9 13:32, Jan Beulich wrote:
>>> On 09.07.2025 07:29, Chen, Jiqian wrote:
>>>> On 2025/7/8 22:10, Jan Beulich wrote:
>>>>> On 04.07.2025 09:07, Jiqian Chen wrote:
>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>> @@ -836,6 +836,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>>>>>                                    PCI_STATUS_RSVDZ_MASK);
>>>>>>  }
>>>>>>  
>>>>>> +static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
>>>>>> +{
>>>>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE;
>>>>>> +
>>>>>> +    if ( !is_hardware_domain(pdev->domain) )
>>>>>> +        /* Extended capabilities read as zero, write ignore for DomU */
>>>>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>>>>> +                                 pos, 4, (void *)0);
>>>>>> +
>>>>>> +    do
>>>>>> +    {
>>>>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>>>>>> +        int rc;
>>>>>> +
>>>>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>>>>>> +                               pos, 4, (void *)(uintptr_t)header);
>>>>>
>>>>> If it wasn't for this use of vpci_hw_write32(), I'd be happy to provide my
>>>>> R-b. But this continues to look bogus to me: What use is it to allow writes
>>>>> when Dom0 then can't read back any possible effect of such a write (in the
>>>>> unexpected event that some of the bits were indeed writable)?
>>>> Oh, I got your concern.
>>>> What do you think about updating the header value after writing to hardware in write function?
>>
>>> That would imo be a layering violation. Once again that's something that you
>>> primarily would need Roger's input on.
>> OK, wait for Roger's input.
> 
> Hm, I see the asymmetry of allowing writes but not direct writes, my
> thought was to give the hw domain as less interference as possibly,
> hence my recommendation to use vpci_hw_write32().
> 
> I practice I think it's very unlikely that devices re-use reserved
> bits in the capability register, so I'm fine with using NULL (thus
> discarding the write).  We can always add more complex handling here
> if we ever came across a device that requires it.
OK, I will delete vpci_hw_write32() in next version.
Thanks.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 2/8] vpci: Refactor REGISTER_VPCI_INIT
  2025-07-21 14:37   ` Roger Pau Monné
@ 2025-07-23  7:20     ` Chen, Jiqian
  2025-07-23  8:19       ` Roger Pau Monné
  0 siblings, 1 reply; 31+ messages in thread
From: Chen, Jiqian @ 2025-07-23  7:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Jan Beulich, Julien Grall,
	Stefano Stabellini, Chen, Jiqian

On 2025/7/21 22:37, Roger Pau Monné wrote:
> On Fri, Jul 04, 2025 at 03:07:57PM +0800, Jiqian Chen wrote:
>> Refactor REGISTER_VPCI_INIT to contain more capability specific
>> information, this will benefit further follow-on changes to hide
>> capability when initialization fails.
>>
>> What's more, change the definition of init_header() since it is
>> not a capability and it is needed for all devices' PCI config space.
>>
>> After refactor, the "priority" of initializing capabilities isn't
>> needed anymore, so delete its related codes.
>>
>> Note:
>> Call vpci_make_msix_hole() in the end of init_msix() since the change
>> of sequence of init_header() and init_msix(). And delete the call of
>> vpci_make_msix_hole() in modify_decoding() since it is not needed.
>>
>> The cleanup hook is also added in this change, even if it's still
>> unused. Further changes will make use of it.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> There is a byte alignment problem in the array __start_vpci_array, which can be solved after
>> "[PATCH] x86: don't have gcc over-align data" is merged.
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> cc: Anthony PERARD <anthony.perard@vates.tech>
>> cc: Michal Orzel <michal.orzel@amd.com>
>> cc: Jan Beulich <jbeulich@suse.com>
>> cc: Julien Grall <julien@xen.org>
>> cc: Stefano Stabellini <sstabellini@kernel.org>
>> ---
>> v6->v7 changes:
>> * Change the pointer parameter of cleanup hook of vpci_capability_t to be const.
>>   If change parameter of init hook to be const will affect init_msix, and it assigns pdev
>>   to struct vpci_msix, so keep no const to expanding the impact.
>> * Delete the vpci_make_msix_hole() call in modify_decoding().
>> * Change __start_vpci_array from vpci_capability_t* array to vpci_capability_t array.
>> * Change the name "finit##_t" to be "name##_entry" and add a "name" parameter to macro
>>   REGISTER_VPCI_CAPABILITY.
>>
>> v5->v6 changes:
>> * Rename REGISTER_PCI_CAPABILITY to REGISTER_VPCI_CAPABILITY.
>> * Move vpci_capability_t entry from ".data.vpci" to ".data.rel.ro.vpci" and
>>   move the instances of VPCI_ARRAY in the linker scripts before *(.data.rel.ro).
>> * Change _start/end_vpci_array[] to be const pointer array.
>>
>> v4->v5 changes:
>> * Rename REGISTER_VPCI_CAP to REGISTER_PCI_CAPABILITY, rename REGISTER_VPCI_LEGACY_CAP to
>>   REGISTER_VPCI_CAP, rename REGISTER_VPCI_EXTENDED_CAP to REGISTER_VPCI_EXTCAP.
>> * Change cleanup hook of vpci_capability_t from void to int.
>>
>> v3->v4 changes
>> * Delete the useless trailing dot of section ".data.vpci".
>> * Add description about priority since this patch removes the initializing priority of
>>   capabilities and priority is not needed anymore.
>> * Change the hook name from fini to cleanup.
>> * Change the name x and y to be finit and fclean.
>> * Remove the unnecessary check "!capability->init"
>>
>> v2->v3 changes:
>> * This is separated from patch "vpci: Hide capability when it fails to initialize" of v2.
>> * Delete __maybe_unused attribute of "out" in function vpci_assign_devic().
>> * Rename REGISTER_VPCI_EXTEND_CAP to REGISTER_VPCI_EXTENDED_CAP.
>>
>> v1->v2 changes:
>> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
>> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove
>>   failed capability from list.
>> * Called vpci_make_msix_hole() in the end of init_msix().
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/arch/arm/xen.lds.S    |  3 +--
>>  xen/arch/ppc/xen.lds.S    |  3 +--
>>  xen/arch/riscv/xen.lds.S  |  3 +--
>>  xen/arch/x86/xen.lds.S    |  2 +-
>>  xen/drivers/vpci/header.c | 16 +-------------
>>  xen/drivers/vpci/msi.c    |  2 +-
>>  xen/drivers/vpci/msix.c   | 11 +++++++---
>>  xen/drivers/vpci/rebar.c  |  2 +-
>>  xen/drivers/vpci/vpci.c   | 44 ++++++++++++++++++++++++++++++---------
>>  xen/include/xen/vpci.h    | 32 ++++++++++++++++++----------
>>  xen/include/xen/xen.lds.h |  2 +-
>>  11 files changed, 71 insertions(+), 49 deletions(-)
>>
>> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
>> index 5bfbe1e92c1e..9f30c3a13ed1 100644
>> --- a/xen/arch/arm/xen.lds.S
>> +++ b/xen/arch/arm/xen.lds.S
>> @@ -57,6 +57,7 @@ SECTIONS
>>  
>>         *(.rodata)
>>         *(.rodata.*)
>> +       VPCI_ARRAY
>>         *(.data.rel.ro)
>>         *(.data.rel.ro.*)
>>  
>> @@ -64,8 +65,6 @@ SECTIONS
>>         __proc_info_start = .;
>>         *(.proc.info)
>>         __proc_info_end = .;
>> -
>> -       VPCI_ARRAY
>>    } :text
>>  
>>  #if defined(BUILD_ID)
>> diff --git a/xen/arch/ppc/xen.lds.S b/xen/arch/ppc/xen.lds.S
>> index 1366e2819eed..1de0b77fc6b9 100644
>> --- a/xen/arch/ppc/xen.lds.S
>> +++ b/xen/arch/ppc/xen.lds.S
>> @@ -51,11 +51,10 @@ SECTIONS
>>  
>>          *(.rodata)
>>          *(.rodata.*)
>> +        VPCI_ARRAY
>>          *(.data.rel.ro)
>>          *(.data.rel.ro.*)
>>  
>> -        VPCI_ARRAY
>> -
>>          . = ALIGN(POINTER_ALIGN);
>>      } :text
>>  
>> diff --git a/xen/arch/riscv/xen.lds.S b/xen/arch/riscv/xen.lds.S
>> index 8c3c06de01f6..edcadff90bfe 100644
>> --- a/xen/arch/riscv/xen.lds.S
>> +++ b/xen/arch/riscv/xen.lds.S
>> @@ -46,11 +46,10 @@ SECTIONS
>>  
>>          *(.rodata)
>>          *(.rodata.*)
>> +        VPCI_ARRAY
>>          *(.data.rel.ro)
>>          *(.data.rel.ro.*)
>>  
>> -        VPCI_ARRAY
>> -
>>          . = ALIGN(POINTER_ALIGN);
>>      } :text
>>  
>> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
>> index 636c7768aa3c..8e9cac75b09e 100644
>> --- a/xen/arch/x86/xen.lds.S
>> +++ b/xen/arch/x86/xen.lds.S
>> @@ -135,6 +135,7 @@ SECTIONS
>>  
>>         *(.rodata)
>>         *(.rodata.*)
>> +       VPCI_ARRAY
>>         *(.data.rel.ro)
>>         *(.data.rel.ro.*)
>>  
>> @@ -148,7 +149,6 @@ SECTIONS
>>         *(.note.gnu.build-id)
>>         __note_gnu_build_id_end = .;
>>  #endif
>> -       VPCI_ARRAY
>>    } PHDR(text)
>>  
>>  #if defined(CONFIG_PVH_GUEST) && !defined(EFI)
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 8ee8052cd4a3..069253b5f721 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -122,19 +122,6 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>>      bool map = cmd & PCI_COMMAND_MEMORY;
>>      unsigned int i;
>>  
>> -    /*
>> -     * Make sure there are no mappings in the MSIX MMIO areas, so that accesses
>> -     * can be trapped (and emulated) by Xen when the memory decoding bit is
>> -     * enabled.
>> -     *
>> -     * FIXME: punching holes after the p2m has been set up might be racy for
>> -     * DomU usage, needs to be revisited.
>> -     */
>> -#ifdef CONFIG_HAS_PCI_MSI
>> -    if ( map && !rom_only && vpci_make_msix_hole(pdev) )
>> -        return;
>> -#endif
> 
> I think you need to keep this.  What about BARs being repositioned by
> dom0 over reserved region(s), and thus needing the MSI-X hole to be
> craved out there?  It's not a common use-case, but we should support
> dom0 moving BARs around.
> 
> I think you need both the added chunk in init_msix(), plus the code
> above to not regress the current functionality.
OK, will do.
As Jan required me to add some comment to describe the chunk in init_msix() if not to delete here.
Do you think below is appropriate?

    /*
     * To make sure there's a hole for the MSIX table/PBA in the p2m since
     * init_msix is called after init_header. Here and the calling in another
     * place are not redundant, another is to support dom0 moving BARs.
     */
    spin_lock(&pdev->vpci->lock);
    rc = vpci_make_msix_hole(pdev);
    spin_unlock(&pdev->vpci->lock);

    return rc;
}
REGISTER_VPCI_CAP(MSIX, init_msix, cleanup_msix);

> 
>>      for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>      {
>>          struct vpci_bar *bar = &header->bars[i];
>> @@ -869,7 +856,7 @@ static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
>>      return 0;
>>  }
>>  
>> -static int cf_check init_header(struct pci_dev *pdev)
>> +int vpci_init_header(struct pci_dev *pdev)
>>  {
>>      uint16_t cmd;
>>      uint64_t addr, size;
>> @@ -1065,7 +1052,6 @@ static int cf_check init_header(struct pci_dev *pdev)
>>      pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>>      return rc;
>>  }
>> -REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
>>  
>>  /*
>>   * Local variables:
>> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
>> index 66e5a8a116be..c3eba4e14870 100644
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>>  
>>      return 0;
>>  }
>> -REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
>> +REGISTER_VPCI_CAP(MSI, init_msi, NULL);
>>  
>>  void vpci_dump_msi(void)
>>  {
>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>> index 74211301ba10..a1692b9d9f6a 100644
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -589,7 +589,8 @@ static const struct hvm_mmio_ops vpci_msix_table_ops = {
>>      .write = msix_write,
>>  };
>>  
>> -int vpci_make_msix_hole(const struct pci_dev *pdev)
>> +/* Make sure there's a hole in the p2m for the MSIX mmio areas. */
>> +static int vpci_make_msix_hole(const struct pci_dev *pdev)
>>  {
>>      struct domain *d = pdev->domain;
>>      unsigned int i;
>> @@ -703,9 +704,13 @@ static int cf_check init_msix(struct pci_dev *pdev)
>>      pdev->vpci->msix = msix;
>>      list_add(&msix->next, &d->arch.hvm.msix_tables);
>>  
>> -    return 0;
>> +    spin_lock(&pdev->vpci->lock);
>> +    rc = vpci_make_msix_hole(pdev);
>> +    spin_unlock(&pdev->vpci->lock);
>> +
>> +    return rc;
>>  }
>> -REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
>> +REGISTER_VPCI_CAP(MSIX, init_msix, NULL);
>>  
>>  /*
>>   * Local variables:
>> diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
>> index 793937449af7..3c18792d9bcd 100644
>> --- a/xen/drivers/vpci/rebar.c
>> +++ b/xen/drivers/vpci/rebar.c
>> @@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
>>  
>>      return 0;
>>  }
>> -REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
>> +REGISTER_VPCI_EXTCAP(REBAR, init_rebar, NULL);
>>  
>>  /*
>>   * Local variables:
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 8474c0e3b995..e7e5b64f1be4 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -36,8 +36,8 @@ struct vpci_register {
>>  };
>>  
>>  #ifdef __XEN__
>> -extern vpci_register_init_t *const __start_vpci_array[];
>> -extern vpci_register_init_t *const __end_vpci_array[];
>> +extern const vpci_capability_t __start_vpci_array[];
>> +extern const vpci_capability_t __end_vpci_array[];
>>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>>  
>>  #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> @@ -83,6 +83,32 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>>  
>>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>  
>> +static int vpci_init_capabilities(struct pci_dev *pdev)
>> +{
>> +    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
>> +    {
>> +        const vpci_capability_t *capability = &__start_vpci_array[i];
>> +        const unsigned int cap = capability->id;
>> +        const bool is_ext = capability->is_ext;
>> +        unsigned int pos = 0;
>> +        int rc;
>> +
>> +        if ( !is_ext )
>> +            pos = pci_find_cap_offset(pdev->sbdf, cap);
>> +        else if ( is_hardware_domain(pdev->domain) )
>> +            pos = pci_find_ext_capability(pdev->sbdf, cap);
>> +
>> +        if ( !pos )
>> +            continue;
>> +
>> +        rc = capability->init(pdev);
>> +        if ( rc )
>> +            return rc;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>  void vpci_deassign_device(struct pci_dev *pdev)
>>  {
>>      unsigned int i;
>> @@ -128,7 +154,6 @@ void vpci_deassign_device(struct pci_dev *pdev)
>>  
>>  int vpci_assign_device(struct pci_dev *pdev)
>>  {
>> -    unsigned int i;
>>      const unsigned long *ro_map;
>>      int rc = 0;
>>  
>> @@ -159,14 +184,13 @@ int vpci_assign_device(struct pci_dev *pdev)
>>          goto out;
>>  #endif
>>  
>> -    for ( i = 0; i < NUM_VPCI_INIT; i++ )
>> -    {
>> -        rc = __start_vpci_array[i](pdev);
>> -        if ( rc )
>> -            break;
>> -    }
>> +    rc = vpci_init_header(pdev);
>> +    if ( rc )
>> +        goto out;
>> +
>> +    rc = vpci_init_capabilities(pdev);
>>  
>> - out: __maybe_unused;
>> + out:
>>      if ( rc )
>>          vpci_deassign_device(pdev);
>>  
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index 61d16cc8b897..61287e5d2e12 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -13,11 +13,12 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
>>  typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
>>                            uint32_t val, void *data);
>>  
>> -typedef int vpci_register_init_t(struct pci_dev *dev);
>> -
>> -#define VPCI_PRIORITY_HIGH      "1"
>> -#define VPCI_PRIORITY_MIDDLE    "5"
>> -#define VPCI_PRIORITY_LOW       "9"
>> +typedef struct {
>> +    unsigned int id;
>> +    bool is_ext;
>> +    int (* init)(struct pci_dev *pdev);
>> +    int (* cleanup)(const struct pci_dev *pdev);
>> +} vpci_capability_t;
>>  
>>  #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
>>  
>> @@ -29,9 +30,21 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
>>   */
>>  #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
>>  
>> -#define REGISTER_VPCI_INIT(x, p)                \
>> -  static vpci_register_init_t *const x##_entry  \
>> -               __used_section(".data.vpci." p) = (x)
>> +#define REGISTER_VPCI_CAPABILITY(cap, name, finit, fclean, ext) \
>> +    static const vpci_capability_t name##_entry \
>> +        __used_section(".data.rel.ro.vpci") = { \
>> +        .id = (cap), \
>> +        .init = (finit), \
>> +        .cleanup = (fclean), \
>> +        .is_ext = (ext), \
>> +    }
>> +
>> +#define REGISTER_VPCI_CAP(name, finit, fclean) \
>> +    REGISTER_VPCI_CAPABILITY(PCI_CAP_ID_##name, name, finit, fclean, false)
>> +#define REGISTER_VPCI_EXTCAP(name, finit, fclean) \
>> +    REGISTER_VPCI_CAPABILITY(PCI_EXT_CAP_ID_##name, name, finit, fclean, true)
>> +
>> +int __must_check vpci_init_header(struct pci_dev *pdev);
>>  
>>  /* Assign vPCI to device by adding handlers. */
>>  int __must_check vpci_assign_device(struct pci_dev *pdev);
>> @@ -206,9 +219,6 @@ struct vpci_vcpu {
>>  #ifdef __XEN__
>>  void vpci_dump_msi(void);
>>  
>> -/* Make sure there's a hole in the p2m for the MSIX mmio areas. */
>> -int vpci_make_msix_hole(const struct pci_dev *pdev);
>> -
>>  /* Arch-specific vPCI MSI helpers. */
>>  void vpci_msi_arch_mask(struct vpci_msi *msi, const struct pci_dev *pdev,
>>                          unsigned int entry, bool mask);
>> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
>> index 793d0e11450c..eb86305c11c7 100644
>> --- a/xen/include/xen/xen.lds.h
>> +++ b/xen/include/xen/xen.lds.h
>> @@ -188,7 +188,7 @@
>>  #define VPCI_ARRAY               \
>>         . = ALIGN(POINTER_ALIGN); \
>>         __start_vpci_array = .;   \
>> -       *(SORT(.data.vpci.*))     \
>> +       *(.data.rel.ro.vpci)           \
> 
> Indentation of the trailing '\' seems to be off?
Will change.
> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 3/8] vpci: Hide legacy capability when it fails to initialize
  2025-07-21 15:48   ` Roger Pau Monné
@ 2025-07-23  7:33     ` Chen, Jiqian
  2025-07-23  8:15       ` Roger Pau Monné
  0 siblings, 1 reply; 31+ messages in thread
From: Chen, Jiqian @ 2025-07-23  7:33 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/7/21 23:48, Roger Pau Monné wrote:
> On Fri, Jul 04, 2025 at 03:07:58PM +0800, Jiqian Chen wrote:
>> When vpci fails to initialize a legacy capability of device, it just
>> returns an error and vPCI gets disabled for the whole device.  That
>> most likely renders the device unusable, plus possibly causing issues
>> to Xen itself if guest attempts to program the native MSI or MSI-X
>> capabilities if present.
>>
>> So, add new function to hide legacy capability when initialization
>> fails. And remove the failed legacy capability from the vpci emulated
>> legacy capability list.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> ---
>> v6->v7 changes:
>> * Change the pointer parameter of vpci_get_register(),
>>   vpci_get_previous_cap_register() and vpci_capability_hide() to be const.
>>
>> v5->v6 changes:
>> * Rename parameter rm to r in vpci_get_register().
>> * Use for loop to compact the code of vpci_get_previous_cap_register().
>> * Rename prev_next_r to prev_r in vpci_capability_hide(().
>> * Add printing when cap init, cleanup and hide fail.
>>
>> v4->v5 changes:
>> * Modify vpci_get_register() to delete some unnecessary check, so that
>>   I don't need to move function vpci_register_cmp().
>> * Rename vpci_capability_mask() to vpci_capability_hide().
>>
>> v3->v4 changes:
>> * Modify the commit message.
>> * In function vpci_get_previous_cap_register(), add an ASSERT_UNREACHABLE() if offset below 0x40.
>> * Modify vpci_capability_mask() to return error instead of using ASSERT.
>> * Use vpci_remove_register to remove PCI_CAP_LIST_ID register instead of open code.
>> * Add check "if ( !offset )" in vpci_capability_mask().
>>
>> v2->v3 changes:
>> * Separated from the last version patch "vpci: Hide capability when it fails to initialize"
>> * Whole implementation changed because last version is wrong.
>>   This version adds a new helper function vpci_get_register() and uses it to get
>>   target handler and previous handler from vpci->handlers, then remove the target.
>>
>> v1->v2 changes:
>> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
>> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
>>   remove failed capability from list.
>> * Called vpci_make_msix_hole() in the end of init_msix().
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/vpci.c | 109 +++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 108 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index e7e5b64f1be4..a91c3d4a1415 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -83,6 +83,88 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>>  
>>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>  
>> +static struct vpci_register *vpci_get_register(const struct vpci *vpci,
>> +                                               unsigned int offset,
>> +                                               unsigned int size)
>> +{
>> +    struct vpci_register *r;
>> +
>> +    ASSERT(spin_is_locked(&vpci->lock));
>> +
>> +    list_for_each_entry ( r, &vpci->handlers, node )
>> +    {
>> +        if ( r->offset == offset && r->size == size )
>> +            return r;
>> +
>> +        if ( offset <= r->offset )
>> +            break;
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +static struct vpci_register *vpci_get_previous_cap_register(
>> +    const struct vpci *vpci, unsigned int offset)
>> +{
>> +    uint32_t next;
>> +    struct vpci_register *r;
>> +
>> +    if ( offset < 0x40 )
>> +    {
>> +        ASSERT_UNREACHABLE();
>> +        return NULL;
>> +    }
>> +
>> +    for ( r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1); r;
>> +          r = next >= 0x40 ? vpci_get_register(vpci,
>> +                                               next + PCI_CAP_LIST_NEXT, 1)
>> +                           : NULL )
>> +    {
>> +        next = (uint32_t)(uintptr_t)r->private;
> 
> Both 'next' type and the explicit truncation done here would better
> use "unsigned int" to match the type of the input offset parameter?
Make sense, will change.

> 
>> +        ASSERT(next == (uintptr_t)r->private);
>> +        if ( next == offset )
>> +            break;
>> +    }
>> +
>> +    return r;
>> +}
>> +
>> +static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
>> +{
>> +    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
>> +    struct vpci_register *prev_r, *next_r;
>> +    struct vpci *vpci = pdev->vpci;
>> +
>> +    if ( !offset )
>> +    {
>> +        ASSERT_UNREACHABLE();
>> +        return 0;
>> +    }
>> +
>> +    spin_lock(&vpci->lock);
>> +    prev_r = vpci_get_previous_cap_register(vpci, offset);
>> +    next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
>> +    if ( !prev_r || !next_r )
>> +    {
>> +        spin_unlock(&vpci->lock);
>> +        return -ENODEV;
>> +    }
>> +
>> +    prev_r->private = next_r->private;
>> +    /*
>> +     * Not calling vpci_remove_register() here is to avoid redoing
>> +     * the register search
>> +     */
>> +    list_del(&next_r->node);
>> +    spin_unlock(&vpci->lock);
>> +    xfree(next_r);
>> +
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
>> +
>> +    return 0;
>> +}
>> +
>>  static int vpci_init_capabilities(struct pci_dev *pdev)
>>  {
>>      for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
>> @@ -103,7 +185,32 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
>>  
>>          rc = capability->init(pdev);
>>          if ( rc )
>> -            return rc;
>> +        {
>> +            const char *type = is_ext ? "extended" : "legacy";
>> +
>> +            printk(XENLOG_WARNING "%pd %pp: init %s cap %u fail rc=%d, mask it\n",
> 
> Nit: in order to avoid the strictly speaking overly long line here you
> could split it like:
> 
>             printk(XENLOG_WARNING
>                    "%pd %pp: init %s cap %u fail rc=%d, mask it\n",
>                    pdev->domain, ...
Will do.
> 
>> +                   pdev->domain, &pdev->sbdf, type, cap, rc);
>> +
>> +            if ( capability->cleanup )
>> +            {
>> +                rc = capability->cleanup(pdev);
>> +                if ( rc )
>> +                {
>> +                    printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
>> +                           pdev->domain, &pdev->sbdf, type, cap, rc);
> 
> I think it would be fine to not return error here for the hardware
> domain, and try to mask the capability even if cleanup() has failed?
> 
> For the hardware domain it's likely better to not fail and attempt to
> mask, rather than fail and then end up exposing the device without any
> kind of vPCI mediation.  For domU the proposed behavior is fine.
Got it.
So here should be?
                if ( rc && !is_hardware_domain(pdev->domain) )
                {
                    printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
                           pdev->domain, &pdev->sbdf, type, cap, rc);
                    return rc;
                }
or
                if ( rc )
                {
                    printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
                           pdev->domain, &pdev->sbdf, type, cap, rc);
                    if ( !is_hardware_domain(pdev->domain) )
                        return rc;
                }

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-07-21 16:21   ` Roger Pau Monné
@ 2025-07-23  7:54     ` Chen, Jiqian
  2025-07-23  9:39       ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Chen, Jiqian @ 2025-07-23  7:54 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/7/22 00:21, Roger Pau Monné wrote:
> On Fri, Jul 04, 2025 at 03:08:02PM +0800, Jiqian Chen wrote:
>> When init_msi() fails, current logic return fail and free MSI-related
>> resources in vpci_deassign_device(). But the previous new changes will
>> hide MSI capability and return success, it can't reach
>> vpci_deassign_device() to remove resources if hiding success, so those
>> resources must be removed in cleanup function of MSI.
>>
>> To do that, implement cleanup function for MSI.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> ---
>> v6->v7 changes:
>> * Change the pointer parameter of cleanup_msi() to be const.
>> * When vpci_remove_registers() in cleanup_msi() fails, not to return
>>   directly, instead try to free msi and re-add ctrl handler.
>> * Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msi in
>>   init_msi() since we need that every handler realize that msi is NULL
>>   when msi is free but handlers are still in there.
>>
>> v5->v6 changes:
>> No.
>>
>> v4->v5 changes:
>> * Change definition "static void cleanup_msi" to "static int cf_check cleanup_msi"
>>   since cleanup hook is changed to be int.
>> * Add a read-only register for MSI Control Register in the end of cleanup_msi.
>>
>> v3->v4 changes:
>> * Change function name from fini_msi() to cleanup_msi().
>> * Remove unnecessary comment.
>> * Change to use XFREE to free vpci->msi.
>>
>> v2->v3 changes:
>> * Remove all fail path, and use fini_msi() hook instead.
>> * Change the method to calculating the size of msi registers.
>>
>> v1->v2 changes:
>> * Added a new function fini_msi to free all MSI resources instead of using an array
>>   to record registered registers.
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/msi.c | 111 ++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 94 insertions(+), 17 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
>> index c3eba4e14870..09b91a685df5 100644
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -25,7 +25,11 @@
>>  static uint32_t cf_check control_read(
>>      const struct pci_dev *pdev, unsigned int reg, void *data)
>>  {
>> -    const struct vpci_msi *msi = data;
>> +    const struct vpci *vpci = data;
>> +    const struct vpci_msi *msi = vpci->msi;
>> +
>> +    if ( !msi )
>> +        return pci_conf_read16(pdev->sbdf, reg);
>>  
>>      return MASK_INSR(fls(pdev->msi_maxvec) - 1, PCI_MSI_FLAGS_QMASK) |
>>             MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE) |
>> @@ -37,12 +41,16 @@ static uint32_t cf_check control_read(
>>  static void cf_check control_write(
>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>  {
>> -    struct vpci_msi *msi = data;
>> +    struct vpci *vpci = data;
>> +    struct vpci_msi *msi = vpci->msi;
>>      unsigned int vectors = min_t(uint8_t,
>>                                   1u << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE),
>>                                   pdev->msi_maxvec);
>>      bool new_enabled = val & PCI_MSI_FLAGS_ENABLE;
>>  
>> +    if ( !msi )
>> +        return;
>> +
>>      /*
>>       * No change if the enable field and the number of vectors is
>>       * the same or the device is not enabled, in which case the
>> @@ -101,7 +109,11 @@ static void update_msi(const struct pci_dev *pdev, struct vpci_msi *msi)
>>  static uint32_t cf_check address_read(
>>      const struct pci_dev *pdev, unsigned int reg, void *data)
>>  {
>> -    const struct vpci_msi *msi = data;
>> +    const struct vpci *vpci = data;
>> +    const struct vpci_msi *msi = vpci->msi;
>> +
>> +    if ( !msi )
>> +        return ~(uint32_t)0;
>>  
>>      return msi->address;
>>  }
>> @@ -109,7 +121,11 @@ static uint32_t cf_check address_read(
>>  static void cf_check address_write(
>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>  {
>> -    struct vpci_msi *msi = data;
>> +    struct vpci *vpci = data;
>> +    struct vpci_msi *msi = vpci->msi;
>> +
>> +    if ( !msi )
>> +        return;
>>  
>>      /* Clear low part. */
>>      msi->address &= ~0xffffffffULL;
>> @@ -122,7 +138,11 @@ static void cf_check address_write(
>>  static uint32_t cf_check address_hi_read(
>>      const struct pci_dev *pdev, unsigned int reg, void *data)
>>  {
>> -    const struct vpci_msi *msi = data;
>> +    const struct vpci *vpci = data;
>> +    const struct vpci_msi *msi = vpci->msi;
>> +
>> +    if ( !msi )
>> +        return ~(uint32_t)0;
>>  
>>      return msi->address >> 32;
>>  }
>> @@ -130,7 +150,11 @@ static uint32_t cf_check address_hi_read(
>>  static void cf_check address_hi_write(
>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>  {
>> -    struct vpci_msi *msi = data;
>> +    struct vpci *vpci = data;
>> +    struct vpci_msi *msi = vpci->msi;
>> +
>> +    if ( !msi )
>> +        return;
>>  
>>      /* Clear and update high part. */
>>      msi->address  = (uint32_t)msi->address;
>> @@ -143,7 +167,11 @@ static void cf_check address_hi_write(
>>  static uint32_t cf_check data_read(
>>      const struct pci_dev *pdev, unsigned int reg, void *data)
>>  {
>> -    const struct vpci_msi *msi = data;
>> +    const struct vpci *vpci = data;
>> +    const struct vpci_msi *msi = vpci->msi;
>> +
>> +    if ( !msi )
>> +        return ~(uint32_t)0;
>>  
>>      return msi->data;
>>  }
>> @@ -151,7 +179,11 @@ static uint32_t cf_check data_read(
>>  static void cf_check data_write(
>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>  {
>> -    struct vpci_msi *msi = data;
>> +    struct vpci *vpci = data;
>> +    struct vpci_msi *msi = vpci->msi;
>> +
>> +    if ( !msi )
>> +        return;
>>  
>>      msi->data = val;
>>  
>> @@ -162,7 +194,11 @@ static void cf_check data_write(
>>  static uint32_t cf_check mask_read(
>>      const struct pci_dev *pdev, unsigned int reg, void *data)
>>  {
>> -    const struct vpci_msi *msi = data;
>> +    const struct vpci *vpci = data;
>> +    const struct vpci_msi *msi = vpci->msi;
>> +
>> +    if ( !msi )
>> +        return ~(uint32_t)0;
>>  
>>      return msi->mask;
>>  }
>> @@ -170,9 +206,14 @@ static uint32_t cf_check mask_read(
>>  static void cf_check mask_write(
>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>  {
>> -    struct vpci_msi *msi = data;
>> -    uint32_t dmask = msi->mask ^ val;
>> +    struct vpci *vpci = data;
>> +    struct vpci_msi *msi = vpci->msi;
>> +    uint32_t dmask;
>> +
>> +    if ( !msi )
>> +        return;
>>  
>> +    dmask = msi->mask ^ val;
>>      if ( !dmask )
>>          return;
>>  
>> @@ -193,6 +234,42 @@ static void cf_check mask_write(
>>      msi->mask = val;
>>  }
>>  
>> +static int cf_check cleanup_msi(const struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +    unsigned int end;
>> +    struct vpci *vpci = pdev->vpci;
>> +    const unsigned int msi_pos = pdev->msi_pos;
>> +    const unsigned int ctrl = msi_control_reg(msi_pos);
>> +
>> +    if ( !msi_pos || !vpci->msi )
>> +        return 0;
>> +
>> +    if ( vpci->msi->masking )
>> +        end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
>> +    else
>> +        end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
>> +
>> +    rc = vpci_remove_registers(vpci, ctrl, end - ctrl);
>> +    if ( rc )
>> +        printk(XENLOG_WARNING "%pd %pp: fail to remove MSI handlers rc=%d\n",
>> +               pdev->domain, &pdev->sbdf, rc);
> 
> I think you could possibly do this as:
> 
> if ( rc )
> {
>     printk(...);
>     ASSERT_UNREACHABLE();
>     return rc;
> }
> 
> Given the code in vpci_remove_registers() an error in the removal of
> registers would likely imply memory corruption, at which point it's
> best to fully disable the device.  That would allow you having to
> modify all the handlers to pass vpci instead of msi structs.
> 
> That will avoid a lot of the extra code churn of having to change the
> handler parameters.
OK, got it.
Since Jan proposed this consideration in v6, I need to ask for his opinion.
Hi Jan, do you fine with this change?

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 3/8] vpci: Hide legacy capability when it fails to initialize
  2025-07-23  7:33     ` Chen, Jiqian
@ 2025-07-23  8:15       ` Roger Pau Monné
  0 siblings, 0 replies; 31+ messages in thread
From: Roger Pau Monné @ 2025-07-23  8:15 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: xen-devel@lists.xenproject.org, Huang, Ray

On Wed, Jul 23, 2025 at 07:33:04AM +0000, Chen, Jiqian wrote:
> On 2025/7/21 23:48, Roger Pau Monné wrote:
> > On Fri, Jul 04, 2025 at 03:07:58PM +0800, Jiqian Chen wrote:
> >> +                   pdev->domain, &pdev->sbdf, type, cap, rc);
> >> +
> >> +            if ( capability->cleanup )
> >> +            {
> >> +                rc = capability->cleanup(pdev);
> >> +                if ( rc )
> >> +                {
> >> +                    printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
> >> +                           pdev->domain, &pdev->sbdf, type, cap, rc);
> > 
> > I think it would be fine to not return error here for the hardware
> > domain, and try to mask the capability even if cleanup() has failed?
> > 
> > For the hardware domain it's likely better to not fail and attempt to
> > mask, rather than fail and then end up exposing the device without any
> > kind of vPCI mediation.  For domU the proposed behavior is fine.
> Got it.
> So here should be?
>                 if ( rc && !is_hardware_domain(pdev->domain) )
>                 {
>                     printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
>                            pdev->domain, &pdev->sbdf, type, cap, rc);
>                     return rc;
>                 }
> or
>                 if ( rc )
>                 {
>                     printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
>                            pdev->domain, &pdev->sbdf, type, cap, rc);
>                     if ( !is_hardware_domain(pdev->domain) )
>                         return rc;
>                 }

The last one will be my preference, we want to log the error, just
not forward it to the caller.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 2/8] vpci: Refactor REGISTER_VPCI_INIT
  2025-07-23  7:20     ` Chen, Jiqian
@ 2025-07-23  8:19       ` Roger Pau Monné
  0 siblings, 0 replies; 31+ messages in thread
From: Roger Pau Monné @ 2025-07-23  8:19 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Jan Beulich, Julien Grall,
	Stefano Stabellini

On Wed, Jul 23, 2025 at 07:20:27AM +0000, Chen, Jiqian wrote:
> On 2025/7/21 22:37, Roger Pau Monné wrote:
> > On Fri, Jul 04, 2025 at 03:07:57PM +0800, Jiqian Chen wrote:
> >> Refactor REGISTER_VPCI_INIT to contain more capability specific
> >> information, this will benefit further follow-on changes to hide
> >> capability when initialization fails.
> >>
> >> What's more, change the definition of init_header() since it is
> >> not a capability and it is needed for all devices' PCI config space.
> >>
> >> After refactor, the "priority" of initializing capabilities isn't
> >> needed anymore, so delete its related codes.
> >>
> >> Note:
> >> Call vpci_make_msix_hole() in the end of init_msix() since the change
> >> of sequence of init_header() and init_msix(). And delete the call of
> >> vpci_make_msix_hole() in modify_decoding() since it is not needed.
> >>
> >> The cleanup hook is also added in this change, even if it's still
> >> unused. Further changes will make use of it.
> >>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >> ---
> >> There is a byte alignment problem in the array __start_vpci_array, which can be solved after
> >> "[PATCH] x86: don't have gcc over-align data" is merged.
> >> ---
> >> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> >> cc: Andrew Cooper <andrew.cooper3@citrix.com>
> >> cc: Anthony PERARD <anthony.perard@vates.tech>
> >> cc: Michal Orzel <michal.orzel@amd.com>
> >> cc: Jan Beulich <jbeulich@suse.com>
> >> cc: Julien Grall <julien@xen.org>
> >> cc: Stefano Stabellini <sstabellini@kernel.org>
> >> ---
> >> v6->v7 changes:
> >> * Change the pointer parameter of cleanup hook of vpci_capability_t to be const.
> >>   If change parameter of init hook to be const will affect init_msix, and it assigns pdev
> >>   to struct vpci_msix, so keep no const to expanding the impact.
> >> * Delete the vpci_make_msix_hole() call in modify_decoding().
> >> * Change __start_vpci_array from vpci_capability_t* array to vpci_capability_t array.
> >> * Change the name "finit##_t" to be "name##_entry" and add a "name" parameter to macro
> >>   REGISTER_VPCI_CAPABILITY.
> >>
> >> v5->v6 changes:
> >> * Rename REGISTER_PCI_CAPABILITY to REGISTER_VPCI_CAPABILITY.
> >> * Move vpci_capability_t entry from ".data.vpci" to ".data.rel.ro.vpci" and
> >>   move the instances of VPCI_ARRAY in the linker scripts before *(.data.rel.ro).
> >> * Change _start/end_vpci_array[] to be const pointer array.
> >>
> >> v4->v5 changes:
> >> * Rename REGISTER_VPCI_CAP to REGISTER_PCI_CAPABILITY, rename REGISTER_VPCI_LEGACY_CAP to
> >>   REGISTER_VPCI_CAP, rename REGISTER_VPCI_EXTENDED_CAP to REGISTER_VPCI_EXTCAP.
> >> * Change cleanup hook of vpci_capability_t from void to int.
> >>
> >> v3->v4 changes
> >> * Delete the useless trailing dot of section ".data.vpci".
> >> * Add description about priority since this patch removes the initializing priority of
> >>   capabilities and priority is not needed anymore.
> >> * Change the hook name from fini to cleanup.
> >> * Change the name x and y to be finit and fclean.
> >> * Remove the unnecessary check "!capability->init"
> >>
> >> v2->v3 changes:
> >> * This is separated from patch "vpci: Hide capability when it fails to initialize" of v2.
> >> * Delete __maybe_unused attribute of "out" in function vpci_assign_devic().
> >> * Rename REGISTER_VPCI_EXTEND_CAP to REGISTER_VPCI_EXTENDED_CAP.
> >>
> >> v1->v2 changes:
> >> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
> >> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove
> >>   failed capability from list.
> >> * Called vpci_make_msix_hole() in the end of init_msix().
> >>
> >> Best regards,
> >> Jiqian Chen.
> >> ---
> >>  xen/arch/arm/xen.lds.S    |  3 +--
> >>  xen/arch/ppc/xen.lds.S    |  3 +--
> >>  xen/arch/riscv/xen.lds.S  |  3 +--
> >>  xen/arch/x86/xen.lds.S    |  2 +-
> >>  xen/drivers/vpci/header.c | 16 +-------------
> >>  xen/drivers/vpci/msi.c    |  2 +-
> >>  xen/drivers/vpci/msix.c   | 11 +++++++---
> >>  xen/drivers/vpci/rebar.c  |  2 +-
> >>  xen/drivers/vpci/vpci.c   | 44 ++++++++++++++++++++++++++++++---------
> >>  xen/include/xen/vpci.h    | 32 ++++++++++++++++++----------
> >>  xen/include/xen/xen.lds.h |  2 +-
> >>  11 files changed, 71 insertions(+), 49 deletions(-)
> >>
> >> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> >> index 5bfbe1e92c1e..9f30c3a13ed1 100644
> >> --- a/xen/arch/arm/xen.lds.S
> >> +++ b/xen/arch/arm/xen.lds.S
> >> @@ -57,6 +57,7 @@ SECTIONS
> >>  
> >>         *(.rodata)
> >>         *(.rodata.*)
> >> +       VPCI_ARRAY
> >>         *(.data.rel.ro)
> >>         *(.data.rel.ro.*)
> >>  
> >> @@ -64,8 +65,6 @@ SECTIONS
> >>         __proc_info_start = .;
> >>         *(.proc.info)
> >>         __proc_info_end = .;
> >> -
> >> -       VPCI_ARRAY
> >>    } :text
> >>  
> >>  #if defined(BUILD_ID)
> >> diff --git a/xen/arch/ppc/xen.lds.S b/xen/arch/ppc/xen.lds.S
> >> index 1366e2819eed..1de0b77fc6b9 100644
> >> --- a/xen/arch/ppc/xen.lds.S
> >> +++ b/xen/arch/ppc/xen.lds.S
> >> @@ -51,11 +51,10 @@ SECTIONS
> >>  
> >>          *(.rodata)
> >>          *(.rodata.*)
> >> +        VPCI_ARRAY
> >>          *(.data.rel.ro)
> >>          *(.data.rel.ro.*)
> >>  
> >> -        VPCI_ARRAY
> >> -
> >>          . = ALIGN(POINTER_ALIGN);
> >>      } :text
> >>  
> >> diff --git a/xen/arch/riscv/xen.lds.S b/xen/arch/riscv/xen.lds.S
> >> index 8c3c06de01f6..edcadff90bfe 100644
> >> --- a/xen/arch/riscv/xen.lds.S
> >> +++ b/xen/arch/riscv/xen.lds.S
> >> @@ -46,11 +46,10 @@ SECTIONS
> >>  
> >>          *(.rodata)
> >>          *(.rodata.*)
> >> +        VPCI_ARRAY
> >>          *(.data.rel.ro)
> >>          *(.data.rel.ro.*)
> >>  
> >> -        VPCI_ARRAY
> >> -
> >>          . = ALIGN(POINTER_ALIGN);
> >>      } :text
> >>  
> >> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> >> index 636c7768aa3c..8e9cac75b09e 100644
> >> --- a/xen/arch/x86/xen.lds.S
> >> +++ b/xen/arch/x86/xen.lds.S
> >> @@ -135,6 +135,7 @@ SECTIONS
> >>  
> >>         *(.rodata)
> >>         *(.rodata.*)
> >> +       VPCI_ARRAY
> >>         *(.data.rel.ro)
> >>         *(.data.rel.ro.*)
> >>  
> >> @@ -148,7 +149,6 @@ SECTIONS
> >>         *(.note.gnu.build-id)
> >>         __note_gnu_build_id_end = .;
> >>  #endif
> >> -       VPCI_ARRAY
> >>    } PHDR(text)
> >>  
> >>  #if defined(CONFIG_PVH_GUEST) && !defined(EFI)
> >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> >> index 8ee8052cd4a3..069253b5f721 100644
> >> --- a/xen/drivers/vpci/header.c
> >> +++ b/xen/drivers/vpci/header.c
> >> @@ -122,19 +122,6 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
> >>      bool map = cmd & PCI_COMMAND_MEMORY;
> >>      unsigned int i;
> >>  
> >> -    /*
> >> -     * Make sure there are no mappings in the MSIX MMIO areas, so that accesses
> >> -     * can be trapped (and emulated) by Xen when the memory decoding bit is
> >> -     * enabled.
> >> -     *
> >> -     * FIXME: punching holes after the p2m has been set up might be racy for
> >> -     * DomU usage, needs to be revisited.
> >> -     */
> >> -#ifdef CONFIG_HAS_PCI_MSI
> >> -    if ( map && !rom_only && vpci_make_msix_hole(pdev) )
> >> -        return;
> >> -#endif
> > 
> > I think you need to keep this.  What about BARs being repositioned by
> > dom0 over reserved region(s), and thus needing the MSI-X hole to be
> > craved out there?  It's not a common use-case, but we should support
> > dom0 moving BARs around.
> > 
> > I think you need both the added chunk in init_msix(), plus the code
> > above to not regress the current functionality.
> OK, will do.
> As Jan required me to add some comment to describe the chunk in init_msix() if not to delete here.
> Do you think below is appropriate?
> 
>     /*
>      * To make sure there's a hole for the MSIX table/PBA in the p2m since
>      * init_msix is called after init_header. Here and the calling in another
>      * place are not redundant, another is to support dom0 moving BARs.
>      */
>     spin_lock(&pdev->vpci->lock);
>     rc = vpci_make_msix_hole(pdev);
>     spin_unlock(&pdev->vpci->lock);

I would use:

/*
 * vPCI header initialization will have mapped the whole BAR into the
 * p2m, as MSI-X capability was not yet initialized.  Crave a hole for
 * the MSI-X table here, so that Xen can trap accesses.
 */

I think referencing "another is to support dom0..." is not helpful,
and likely to get out of sync if we ever change that code.  If
anything, the comment in modify_decoding() needs updating, but not via
a cross reference from a different context.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-07-23  7:54     ` Chen, Jiqian
@ 2025-07-23  9:39       ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2025-07-23  9:39 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Roger Pau Monné

On 23.07.2025 09:54, Chen, Jiqian wrote:
> On 2025/7/22 00:21, Roger Pau Monné wrote:
>> On Fri, Jul 04, 2025 at 03:08:02PM +0800, Jiqian Chen wrote:
>>> When init_msi() fails, current logic return fail and free MSI-related
>>> resources in vpci_deassign_device(). But the previous new changes will
>>> hide MSI capability and return success, it can't reach
>>> vpci_deassign_device() to remove resources if hiding success, so those
>>> resources must be removed in cleanup function of MSI.
>>>
>>> To do that, implement cleanup function for MSI.
>>>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> ---
>>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>>> ---
>>> v6->v7 changes:
>>> * Change the pointer parameter of cleanup_msi() to be const.
>>> * When vpci_remove_registers() in cleanup_msi() fails, not to return
>>>   directly, instead try to free msi and re-add ctrl handler.
>>> * Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msi in
>>>   init_msi() since we need that every handler realize that msi is NULL
>>>   when msi is free but handlers are still in there.
>>>
>>> v5->v6 changes:
>>> No.
>>>
>>> v4->v5 changes:
>>> * Change definition "static void cleanup_msi" to "static int cf_check cleanup_msi"
>>>   since cleanup hook is changed to be int.
>>> * Add a read-only register for MSI Control Register in the end of cleanup_msi.
>>>
>>> v3->v4 changes:
>>> * Change function name from fini_msi() to cleanup_msi().
>>> * Remove unnecessary comment.
>>> * Change to use XFREE to free vpci->msi.
>>>
>>> v2->v3 changes:
>>> * Remove all fail path, and use fini_msi() hook instead.
>>> * Change the method to calculating the size of msi registers.
>>>
>>> v1->v2 changes:
>>> * Added a new function fini_msi to free all MSI resources instead of using an array
>>>   to record registered registers.
>>>
>>> Best regards,
>>> Jiqian Chen.
>>> ---
>>>  xen/drivers/vpci/msi.c | 111 ++++++++++++++++++++++++++++++++++-------
>>>  1 file changed, 94 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
>>> index c3eba4e14870..09b91a685df5 100644
>>> --- a/xen/drivers/vpci/msi.c
>>> +++ b/xen/drivers/vpci/msi.c
>>> @@ -25,7 +25,11 @@
>>>  static uint32_t cf_check control_read(
>>>      const struct pci_dev *pdev, unsigned int reg, void *data)
>>>  {
>>> -    const struct vpci_msi *msi = data;
>>> +    const struct vpci *vpci = data;
>>> +    const struct vpci_msi *msi = vpci->msi;
>>> +
>>> +    if ( !msi )
>>> +        return pci_conf_read16(pdev->sbdf, reg);
>>>  
>>>      return MASK_INSR(fls(pdev->msi_maxvec) - 1, PCI_MSI_FLAGS_QMASK) |
>>>             MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE) |
>>> @@ -37,12 +41,16 @@ static uint32_t cf_check control_read(
>>>  static void cf_check control_write(
>>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>>  {
>>> -    struct vpci_msi *msi = data;
>>> +    struct vpci *vpci = data;
>>> +    struct vpci_msi *msi = vpci->msi;
>>>      unsigned int vectors = min_t(uint8_t,
>>>                                   1u << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE),
>>>                                   pdev->msi_maxvec);
>>>      bool new_enabled = val & PCI_MSI_FLAGS_ENABLE;
>>>  
>>> +    if ( !msi )
>>> +        return;
>>> +
>>>      /*
>>>       * No change if the enable field and the number of vectors is
>>>       * the same or the device is not enabled, in which case the
>>> @@ -101,7 +109,11 @@ static void update_msi(const struct pci_dev *pdev, struct vpci_msi *msi)
>>>  static uint32_t cf_check address_read(
>>>      const struct pci_dev *pdev, unsigned int reg, void *data)
>>>  {
>>> -    const struct vpci_msi *msi = data;
>>> +    const struct vpci *vpci = data;
>>> +    const struct vpci_msi *msi = vpci->msi;
>>> +
>>> +    if ( !msi )
>>> +        return ~(uint32_t)0;
>>>  
>>>      return msi->address;
>>>  }
>>> @@ -109,7 +121,11 @@ static uint32_t cf_check address_read(
>>>  static void cf_check address_write(
>>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>>  {
>>> -    struct vpci_msi *msi = data;
>>> +    struct vpci *vpci = data;
>>> +    struct vpci_msi *msi = vpci->msi;
>>> +
>>> +    if ( !msi )
>>> +        return;
>>>  
>>>      /* Clear low part. */
>>>      msi->address &= ~0xffffffffULL;
>>> @@ -122,7 +138,11 @@ static void cf_check address_write(
>>>  static uint32_t cf_check address_hi_read(
>>>      const struct pci_dev *pdev, unsigned int reg, void *data)
>>>  {
>>> -    const struct vpci_msi *msi = data;
>>> +    const struct vpci *vpci = data;
>>> +    const struct vpci_msi *msi = vpci->msi;
>>> +
>>> +    if ( !msi )
>>> +        return ~(uint32_t)0;
>>>  
>>>      return msi->address >> 32;
>>>  }
>>> @@ -130,7 +150,11 @@ static uint32_t cf_check address_hi_read(
>>>  static void cf_check address_hi_write(
>>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>>  {
>>> -    struct vpci_msi *msi = data;
>>> +    struct vpci *vpci = data;
>>> +    struct vpci_msi *msi = vpci->msi;
>>> +
>>> +    if ( !msi )
>>> +        return;
>>>  
>>>      /* Clear and update high part. */
>>>      msi->address  = (uint32_t)msi->address;
>>> @@ -143,7 +167,11 @@ static void cf_check address_hi_write(
>>>  static uint32_t cf_check data_read(
>>>      const struct pci_dev *pdev, unsigned int reg, void *data)
>>>  {
>>> -    const struct vpci_msi *msi = data;
>>> +    const struct vpci *vpci = data;
>>> +    const struct vpci_msi *msi = vpci->msi;
>>> +
>>> +    if ( !msi )
>>> +        return ~(uint32_t)0;
>>>  
>>>      return msi->data;
>>>  }
>>> @@ -151,7 +179,11 @@ static uint32_t cf_check data_read(
>>>  static void cf_check data_write(
>>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>>  {
>>> -    struct vpci_msi *msi = data;
>>> +    struct vpci *vpci = data;
>>> +    struct vpci_msi *msi = vpci->msi;
>>> +
>>> +    if ( !msi )
>>> +        return;
>>>  
>>>      msi->data = val;
>>>  
>>> @@ -162,7 +194,11 @@ static void cf_check data_write(
>>>  static uint32_t cf_check mask_read(
>>>      const struct pci_dev *pdev, unsigned int reg, void *data)
>>>  {
>>> -    const struct vpci_msi *msi = data;
>>> +    const struct vpci *vpci = data;
>>> +    const struct vpci_msi *msi = vpci->msi;
>>> +
>>> +    if ( !msi )
>>> +        return ~(uint32_t)0;
>>>  
>>>      return msi->mask;
>>>  }
>>> @@ -170,9 +206,14 @@ static uint32_t cf_check mask_read(
>>>  static void cf_check mask_write(
>>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>>  {
>>> -    struct vpci_msi *msi = data;
>>> -    uint32_t dmask = msi->mask ^ val;
>>> +    struct vpci *vpci = data;
>>> +    struct vpci_msi *msi = vpci->msi;
>>> +    uint32_t dmask;
>>> +
>>> +    if ( !msi )
>>> +        return;
>>>  
>>> +    dmask = msi->mask ^ val;
>>>      if ( !dmask )
>>>          return;
>>>  
>>> @@ -193,6 +234,42 @@ static void cf_check mask_write(
>>>      msi->mask = val;
>>>  }
>>>  
>>> +static int cf_check cleanup_msi(const struct pci_dev *pdev)
>>> +{
>>> +    int rc;
>>> +    unsigned int end;
>>> +    struct vpci *vpci = pdev->vpci;
>>> +    const unsigned int msi_pos = pdev->msi_pos;
>>> +    const unsigned int ctrl = msi_control_reg(msi_pos);
>>> +
>>> +    if ( !msi_pos || !vpci->msi )
>>> +        return 0;
>>> +
>>> +    if ( vpci->msi->masking )
>>> +        end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
>>> +    else
>>> +        end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
>>> +
>>> +    rc = vpci_remove_registers(vpci, ctrl, end - ctrl);
>>> +    if ( rc )
>>> +        printk(XENLOG_WARNING "%pd %pp: fail to remove MSI handlers rc=%d\n",
>>> +               pdev->domain, &pdev->sbdf, rc);
>>
>> I think you could possibly do this as:
>>
>> if ( rc )
>> {
>>     printk(...);
>>     ASSERT_UNREACHABLE();
>>     return rc;
>> }
>>
>> Given the code in vpci_remove_registers() an error in the removal of
>> registers would likely imply memory corruption, at which point it's
>> best to fully disable the device.  That would allow you having to
>> modify all the handlers to pass vpci instead of msi structs.
>>
>> That will avoid a lot of the extra code churn of having to change the
>> handler parameters.
> OK, got it.
> Since Jan proposed this consideration in v6, I need to ask for his opinion.
> Hi Jan, do you fine with this change?

If that's what Roger prefers, so be it. (Imo if we suspected memory
corruption, we'd better halt the system. I agree though that in
practice vpci_remove_registers() shouldn't fail here.)

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-07-23  9:39 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-04  7:07 [PATCH v7 0/8] Support hiding capability when its initialization fails Jiqian Chen
2025-07-04  7:07 ` [PATCH v7 1/8] vpci/header: Emulate extended capability list for dom0 Jiqian Chen
2025-07-08 14:10   ` Jan Beulich
2025-07-09  5:29     ` Chen, Jiqian
2025-07-09  5:32       ` Jan Beulich
2025-07-09  5:34         ` Chen, Jiqian
2025-07-21 14:16           ` Roger Pau Monné
2025-07-23  6:45             ` Chen, Jiqian
2025-07-04  7:07 ` [PATCH v7 2/8] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
2025-07-08 14:52   ` Jan Beulich
2025-07-21 14:37   ` Roger Pau Monné
2025-07-23  7:20     ` Chen, Jiqian
2025-07-23  8:19       ` Roger Pau Monné
2025-07-04  7:07 ` [PATCH v7 3/8] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
2025-07-21 15:48   ` Roger Pau Monné
2025-07-23  7:33     ` Chen, Jiqian
2025-07-23  8:15       ` Roger Pau Monné
2025-07-04  7:07 ` [PATCH v7 4/8] vpci: Hide extended " Jiqian Chen
2025-07-21 16:04   ` Roger Pau Monné
2025-07-04  7:08 ` [PATCH v7 5/8] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
2025-07-04  7:08 ` [PATCH v7 6/8] vpci/rebar: Free Rebar resources when init_rebar() fails Jiqian Chen
2025-07-21 16:08   ` Roger Pau Monné
2025-07-04  7:08 ` [PATCH v7 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
2025-07-08 15:13   ` Jan Beulich
2025-07-09  6:10     ` Chen, Jiqian
2025-07-09  6:49       ` Jan Beulich
2025-07-21 16:21   ` Roger Pau Monné
2025-07-23  7:54     ` Chen, Jiqian
2025-07-23  9:39       ` Jan Beulich
2025-07-04  7:08 ` [PATCH v7 8/8] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
2025-07-21 16:24   ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.