All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/8] Support hiding capability when its initialization fails
@ 2025-07-24  5:49 Jiqian Chen
  2025-07-24  5:49 ` [PATCH v8 1/8] vpci/header: Emulate extended capability list for dom0 Jiqian Chen
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Jiqian Chen @ 2025-07-24  5:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
	Stefano Stabellini

Hi,

This series is to
emulate extended capability list for dom0, including patch #1.
hide legacy and extended capability when its initialization fails, including patch #2, #3, #4.
remove all related registers and other resources when initializing capability fails, including patch #5, #6, #7, #8.

Best regards,
Jiqian Chen.
---
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
Jiqian Chen (8):
  vpci/header: Emulate extended capability list for dom0
  vpci: Refactor REGISTER_VPCI_INIT
  vpci: Hide legacy capability when it fails to initialize
  vpci: Hide extended capability when it fails to initialize
  vpci: Refactor vpci_remove_register to remove matched registers
  vpci/rebar: Free Rebar resources when init_rebar() fails
  vpci/msi: Free MSI resources when init_msi() fails
  vpci/msix: Free MSIX resources when init_msix() fails

 tools/tests/vpci/main.c    |   4 +-
 xen/arch/arm/xen.lds.S     |   3 +-
 xen/arch/ppc/xen.lds.S     |   3 +-
 xen/arch/riscv/xen.lds.S   |   3 +-
 xen/arch/x86/xen.lds.S     |   2 +-
 xen/drivers/vpci/header.c  |  47 +++++--
 xen/drivers/vpci/msi.c     |  46 ++++++-
 xen/drivers/vpci/msix.c    |  55 +++++++-
 xen/drivers/vpci/rebar.c   |  41 ++++--
 xen/drivers/vpci/vpci.c    | 275 +++++++++++++++++++++++++++++++++----
 xen/include/xen/pci_regs.h |   5 +-
 xen/include/xen/vpci.h     |  33 +++--
 xen/include/xen/xen.lds.h  |   2 +-
 13 files changed, 448 insertions(+), 71 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v8 1/8] vpci/header: Emulate extended capability list for dom0
  2025-07-24  5:49 [PATCH v8 0/8] Support hiding capability when its initialization fails Jiqian Chen
@ 2025-07-24  5:49 ` Jiqian Chen
  2025-07-24  8:09   ` Jan Beulich
  2025-07-24  5:50 ` [PATCH v8 2/8] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Jiqian Chen @ 2025-07-24  5:49 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

Add a new function to emulate extended capability list for dom0,
and call it in init_header(). So that it will be easy to hide an
extended capability whose initialization fails.

As for the extended capability list of domU, just move the logic
into above function and keep hiding it for domU.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v7->v8 changes:
* Remove the unnecessary function vpci_hw_write32, since it can cause the value of the
  extended capability header cached in vPCI of dom0 is inconsistent with the hardware.

v6->v7 changes:
* Change word "guest" to "DomU" in vpci_init_ext_capability_list().
* Change parameter of vpci_init_ext_capability_list() to be const.
* Delete check "if ( !header )" in the while loop of vpci_init_ext_capability_list().
* Change the loop from while to do while in vpci_init_ext_capability_list().

v5->v6 changes:
* Delete unnecessary parameter "ttl" in vpci_init_ext_capability_list()
  since vpci_add_register() can already detect the overlaps.

v4->v5 changes:
* Add check: if capability list of hardware has a overlap, print warning and return 0.

v3->v4 changes:
* Add check "if ( !header )   return 0;" to avoid adding handler for
  device that has no extended capabilities.

v2->v3 changes:
* In vpci_init_ext_capability_list(), when domain is domU, directly return after
  adding a handler(hiding all extended capability for domU).
* In vpci_init_ext_capability_list(), change condition to be "while ( pos >= 0x100U && ttl-- )"
  instead of "while ( pos && ttl-- )".
* Add new function vpci_hw_write32, and pass it to extended capability handler for dom0.

v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c | 44 ++++++++++++++++++++++++++++++++-------
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index bb76e707992c..f537f3f25d2a 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -825,6 +825,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
                                   PCI_STATUS_RSVDZ_MASK);
 }
 
+static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
+{
+    unsigned int pos = PCI_CFG_SPACE_SIZE;
+
+    if ( !is_hardware_domain(pdev->domain) )
+        /* Extended capabilities read as zero, write ignore for DomU */
+        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                 pos, 4, (void *)0);
+
+    do
+    {
+        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
+        int rc;
+
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                               pos, 4, (void *)(uintptr_t)header);
+        if ( rc == -EEXIST )
+        {
+            printk(XENLOG_WARNING
+                   "%pd %pp: overlap in extended cap list, offset %#x\n",
+                   pdev->domain, &pdev->sbdf, pos);
+            return 0;
+        }
+
+        if ( rc )
+            return rc;
+
+        pos = PCI_EXT_CAP_NEXT(header);
+    } while ( pos >= PCI_CFG_SPACE_SIZE );
+
+    return 0;
+}
+
 static int cf_check init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -877,14 +910,9 @@ static int cf_check init_header(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
-    if ( !is_hwdom )
-    {
-        /* Extended capabilities read as zero, write ignore */
-        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL, 0x100, 4,
-                               (void *)0);
-        if ( rc )
-            return rc;
-    }
+    rc = vpci_init_ext_capability_list(pdev);
+    if ( rc )
+        return rc;
 
     if ( pdev->ignore_bars )
         return 0;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 2/8] vpci: Refactor REGISTER_VPCI_INIT
  2025-07-24  5:49 [PATCH v8 0/8] Support hiding capability when its initialization fails Jiqian Chen
  2025-07-24  5:49 ` [PATCH v8 1/8] vpci/header: Emulate extended capability list for dom0 Jiqian Chen
@ 2025-07-24  5:50 ` Jiqian Chen
  2025-07-24 14:28   ` Roger Pau Monné
  2025-07-24  5:50 ` [PATCH v8 3/8] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Jiqian Chen @ 2025-07-24  5:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall,
	Stefano Stabellini

Refactor REGISTER_VPCI_INIT to contain more capability specific
information, this will benefit further follow-on changes to hide
capability when initialization fails.

What's more, change the definition of init_header() since it is
not a capability and it is needed for all devices' PCI config space.

After refactor, the "priority" of initializing capabilities isn't
needed anymore, so delete its related codes.

Note:
Call vpci_make_msix_hole() in the end of init_msix() since the change
of sequence of init_header() and init_msix().

The cleanup hook is also added in this change, even if it's still
unused. Further changes will make use of it.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
v7->v8 changes:
* Recover vpci_make_msix_hole() call in modify_decoding(), which was  deleted by wrong.
* Add some comment to describe why need to add vpci_make_msix_hole() in init_msix().

v6->v7 changes:
* Change the pointer parameter of cleanup hook of vpci_capability_t to be const.
  If change parameter of init hook to be const will affect init_msix, and it assigns pdev
  to struct vpci_msix, so keep no const to expanding the impact.
* Delete the vpci_make_msix_hole() call in modify_decoding().
* Change __start_vpci_array from vpci_capability_t* array to vpci_capability_t array.
* Change the name "finit##_t" to be "name##_entry" and add a "name" parameter to macro
  REGISTER_VPCI_CAPABILITY.

v5->v6 changes:
* Rename REGISTER_PCI_CAPABILITY to REGISTER_VPCI_CAPABILITY.
* Move vpci_capability_t entry from ".data.vpci" to ".data.rel.ro.vpci" and
  move the instances of VPCI_ARRAY in the linker scripts before *(.data.rel.ro).
* Change _start/end_vpci_array[] to be const pointer array.

v4->v5 changes:
* Rename REGISTER_VPCI_CAP to REGISTER_PCI_CAPABILITY, rename REGISTER_VPCI_LEGACY_CAP to
  REGISTER_VPCI_CAP, rename REGISTER_VPCI_EXTENDED_CAP to REGISTER_VPCI_EXTCAP.
* Change cleanup hook of vpci_capability_t from void to int.

v3->v4 changes
* Delete the useless trailing dot of section ".data.vpci".
* Add description about priority since this patch removes the initializing priority of
  capabilities and priority is not needed anymore.
* Change the hook name from fini to cleanup.
* Change the name x and y to be finit and fclean.
* Remove the unnecessary check "!capability->init"

v2->v3 changes:
* This is separated from patch "vpci: Hide capability when it fails to initialize" of v2.
* Delete __maybe_unused attribute of "out" in function vpci_assign_devic().
* Rename REGISTER_VPCI_EXTEND_CAP to REGISTER_VPCI_EXTENDED_CAP.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove
  failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/arch/arm/xen.lds.S    |  3 +--
 xen/arch/ppc/xen.lds.S    |  3 +--
 xen/arch/riscv/xen.lds.S  |  3 +--
 xen/arch/x86/xen.lds.S    |  2 +-
 xen/drivers/vpci/header.c |  3 +--
 xen/drivers/vpci/msi.c    |  2 +-
 xen/drivers/vpci/msix.c   | 13 ++++++++++--
 xen/drivers/vpci/rebar.c  |  2 +-
 xen/drivers/vpci/vpci.c   | 44 ++++++++++++++++++++++++++++++---------
 xen/include/xen/vpci.h    | 29 +++++++++++++++++++-------
 xen/include/xen/xen.lds.h |  2 +-
 11 files changed, 74 insertions(+), 32 deletions(-)

diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 5bfbe1e92c1e..9f30c3a13ed1 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -57,6 +57,7 @@ SECTIONS
 
        *(.rodata)
        *(.rodata.*)
+       VPCI_ARRAY
        *(.data.rel.ro)
        *(.data.rel.ro.*)
 
@@ -64,8 +65,6 @@ SECTIONS
        __proc_info_start = .;
        *(.proc.info)
        __proc_info_end = .;
-
-       VPCI_ARRAY
   } :text
 
 #if defined(BUILD_ID)
diff --git a/xen/arch/ppc/xen.lds.S b/xen/arch/ppc/xen.lds.S
index 1366e2819eed..1de0b77fc6b9 100644
--- a/xen/arch/ppc/xen.lds.S
+++ b/xen/arch/ppc/xen.lds.S
@@ -51,11 +51,10 @@ SECTIONS
 
         *(.rodata)
         *(.rodata.*)
+        VPCI_ARRAY
         *(.data.rel.ro)
         *(.data.rel.ro.*)
 
-        VPCI_ARRAY
-
         . = ALIGN(POINTER_ALIGN);
     } :text
 
diff --git a/xen/arch/riscv/xen.lds.S b/xen/arch/riscv/xen.lds.S
index 8c3c06de01f6..edcadff90bfe 100644
--- a/xen/arch/riscv/xen.lds.S
+++ b/xen/arch/riscv/xen.lds.S
@@ -46,11 +46,10 @@ SECTIONS
 
         *(.rodata)
         *(.rodata.*)
+        VPCI_ARRAY
         *(.data.rel.ro)
         *(.data.rel.ro.*)
 
-        VPCI_ARRAY
-
         . = ALIGN(POINTER_ALIGN);
     } :text
 
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 636c7768aa3c..8e9cac75b09e 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -135,6 +135,7 @@ SECTIONS
 
        *(.rodata)
        *(.rodata.*)
+       VPCI_ARRAY
        *(.data.rel.ro)
        *(.data.rel.ro.*)
 
@@ -148,7 +149,6 @@ SECTIONS
        *(.note.gnu.build-id)
        __note_gnu_build_id_end = .;
 #endif
-       VPCI_ARRAY
   } PHDR(text)
 
 #if defined(CONFIG_PVH_GUEST) && !defined(EFI)
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index f537f3f25d2a..469f4977441a 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -858,7 +858,7 @@ static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
     return 0;
 }
 
-static int cf_check init_header(struct pci_dev *pdev)
+int vpci_init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
     uint64_t addr, size;
@@ -1054,7 +1054,6 @@ static int cf_check init_header(struct pci_dev *pdev)
     pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
     return rc;
 }
-REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 66e5a8a116be..c3eba4e14870 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_CAP(MSI, init_msi, NULL);
 
 void vpci_dump_msi(void)
 {
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 74211301ba10..eb3e7dcd1068 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -703,9 +703,18 @@ static int cf_check init_msix(struct pci_dev *pdev)
     pdev->vpci->msix = msix;
     list_add(&msix->next, &d->arch.hvm.msix_tables);
 
-    return 0;
+    /*
+     * vPCI header initialization will have mapped the whole BAR into the
+     * p2m, as MSI-X capability was not yet initialized.  Crave a hole for
+     * the MSI-X table here, so that Xen can trap accesses.
+     */
+    spin_lock(&pdev->vpci->lock);
+    rc = vpci_make_msix_hole(pdev);
+    spin_unlock(&pdev->vpci->lock);
+
+    return rc;
 }
-REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
+REGISTER_VPCI_CAP(MSIX, init_msix, NULL);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
index 793937449af7..3c18792d9bcd 100644
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_EXTCAP(REBAR, init_rebar, NULL);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 09988f04c27c..7778acee0df6 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -36,8 +36,8 @@ struct vpci_register {
 };
 
 #ifdef __XEN__
-extern vpci_register_init_t *const __start_vpci_array[];
-extern vpci_register_init_t *const __end_vpci_array[];
+extern const vpci_capability_t __start_vpci_array[];
+extern const vpci_capability_t __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
@@ -83,6 +83,32 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
 
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
+static int vpci_init_capabilities(struct pci_dev *pdev)
+{
+    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        const vpci_capability_t *capability = &__start_vpci_array[i];
+        const unsigned int cap = capability->id;
+        const bool is_ext = capability->is_ext;
+        unsigned int pos = 0;
+        int rc;
+
+        if ( !is_ext )
+            pos = pci_find_cap_offset(pdev->sbdf, cap);
+        else if ( is_hardware_domain(pdev->domain) )
+            pos = pci_find_ext_capability(pdev->sbdf, cap);
+
+        if ( !pos )
+            continue;
+
+        rc = capability->init(pdev);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 void vpci_deassign_device(struct pci_dev *pdev)
 {
     unsigned int i;
@@ -128,7 +154,6 @@ void vpci_deassign_device(struct pci_dev *pdev)
 
 int vpci_assign_device(struct pci_dev *pdev)
 {
-    unsigned int i;
     const unsigned long *ro_map;
     int rc = 0;
 
@@ -159,14 +184,13 @@ int vpci_assign_device(struct pci_dev *pdev)
         goto out;
 #endif
 
-    for ( i = 0; i < NUM_VPCI_INIT; i++ )
-    {
-        rc = __start_vpci_array[i](pdev);
-        if ( rc )
-            break;
-    }
+    rc = vpci_init_header(pdev);
+    if ( rc )
+        goto out;
+
+    rc = vpci_init_capabilities(pdev);
 
- out: __maybe_unused;
+ out:
     if ( rc )
         vpci_deassign_device(pdev);
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 6a481a4e89d3..17cfecb0aabf 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -13,11 +13,12 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
 typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
                           uint32_t val, void *data);
 
-typedef int vpci_register_init_t(struct pci_dev *dev);
-
-#define VPCI_PRIORITY_HIGH      "1"
-#define VPCI_PRIORITY_MIDDLE    "5"
-#define VPCI_PRIORITY_LOW       "9"
+typedef struct {
+    unsigned int id;
+    bool is_ext;
+    int (* init)(struct pci_dev *pdev);
+    int (* cleanup)(const struct pci_dev *pdev);
+} vpci_capability_t;
 
 #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
 
@@ -29,9 +30,21 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
  */
 #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
 
-#define REGISTER_VPCI_INIT(x, p)                \
-  static vpci_register_init_t *const x##_entry  \
-               __used_section(".data.vpci." p) = (x)
+#define REGISTER_VPCI_CAPABILITY(cap, name, finit, fclean, ext) \
+    static const vpci_capability_t name##_entry \
+        __used_section(".data.rel.ro.vpci") = { \
+        .id = (cap), \
+        .init = (finit), \
+        .cleanup = (fclean), \
+        .is_ext = (ext), \
+    }
+
+#define REGISTER_VPCI_CAP(name, finit, fclean) \
+    REGISTER_VPCI_CAPABILITY(PCI_CAP_ID_##name, name, finit, fclean, false)
+#define REGISTER_VPCI_EXTCAP(name, finit, fclean) \
+    REGISTER_VPCI_CAPABILITY(PCI_EXT_CAP_ID_##name, name, finit, fclean, true)
+
+int __must_check vpci_init_header(struct pci_dev *pdev);
 
 /* Assign vPCI to device by adding handlers. */
 int __must_check vpci_assign_device(struct pci_dev *pdev);
diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
index 793d0e11450c..b126dfe88792 100644
--- a/xen/include/xen/xen.lds.h
+++ b/xen/include/xen/xen.lds.h
@@ -188,7 +188,7 @@
 #define VPCI_ARRAY               \
        . = ALIGN(POINTER_ALIGN); \
        __start_vpci_array = .;   \
-       *(SORT(.data.vpci.*))     \
+       *(.data.rel.ro.vpci)      \
        __end_vpci_array = .;
 #else
 #define VPCI_ARRAY
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 3/8] vpci: Hide legacy capability when it fails to initialize
  2025-07-24  5:49 [PATCH v8 0/8] Support hiding capability when its initialization fails Jiqian Chen
  2025-07-24  5:49 ` [PATCH v8 1/8] vpci/header: Emulate extended capability list for dom0 Jiqian Chen
  2025-07-24  5:50 ` [PATCH v8 2/8] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
@ 2025-07-24  5:50 ` Jiqian Chen
  2025-07-24 14:47   ` Roger Pau Monné
  2025-07-24  5:50 ` [PATCH v8 4/8] vpci: Hide extended " Jiqian Chen
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Jiqian Chen @ 2025-07-24  5:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When vpci fails to initialize a legacy capability of device, it just
returns an error and vPCI gets disabled for the whole device.  That
most likely renders the device unusable, plus possibly causing issues
to Xen itself if guest attempts to program the native MSI or MSI-X
capabilities if present.

So, add new function to hide legacy capability when initialization
fails. And remove the failed legacy capability from the vpci emulated
legacy capability list.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v7->v8 changes:
* Change the type of next from uint31_t to unsigned int in vpci_get_previous_cap_register().
* Change to not return error when cleanup fail for dom0.

v6->v7 changes:
* Change the pointer parameter of vpci_get_register(),
  vpci_get_previous_cap_register() and vpci_capability_hide() to be const.

v5->v6 changes:
* Rename parameter rm to r in vpci_get_register().
* Use for loop to compact the code of vpci_get_previous_cap_register().
* Rename prev_next_r to prev_r in vpci_capability_hide(().
* Add printing when cap init, cleanup and hide fail.

v4->v5 changes:
* Modify vpci_get_register() to delete some unnecessary check, so that
  I don't need to move function vpci_register_cmp().
* Rename vpci_capability_mask() to vpci_capability_hide().

v3->v4 changes:
* Modify the commit message.
* In function vpci_get_previous_cap_register(), add an ASSERT_UNREACHABLE() if offset below 0x40.
* Modify vpci_capability_mask() to return error instead of using ASSERT.
* Use vpci_remove_register to remove PCI_CAP_LIST_ID register instead of open code.
* Add check "if ( !offset )" in vpci_capability_mask().

v2->v3 changes:
* Separated from the last version patch "vpci: Hide capability when it fails to initialize"
* Whole implementation changed because last version is wrong.
  This version adds a new helper function vpci_get_register() and uses it to get
  target handler and previous handler from vpci->handlers, then remove the target.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
  remove failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/vpci.c | 111 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 110 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 7778acee0df6..9960b11cf2c9 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -83,6 +83,88 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
 
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
+static struct vpci_register *vpci_get_register(const struct vpci *vpci,
+                                               unsigned int offset,
+                                               unsigned int size)
+{
+    struct vpci_register *r;
+
+    ASSERT(spin_is_locked(&vpci->lock));
+
+    list_for_each_entry ( r, &vpci->handlers, node )
+    {
+        if ( r->offset == offset && r->size == size )
+            return r;
+
+        if ( offset <= r->offset )
+            break;
+    }
+
+    return NULL;
+}
+
+static struct vpci_register *vpci_get_previous_cap_register(
+    const struct vpci *vpci, unsigned int offset)
+{
+    unsigned int next;
+    struct vpci_register *r;
+
+    if ( offset < 0x40 )
+    {
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    for ( r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1); r;
+          r = next >= 0x40 ? vpci_get_register(vpci,
+                                               next + PCI_CAP_LIST_NEXT, 1)
+                           : NULL )
+    {
+        next = (unsigned int)(uintptr_t)r->private;
+        ASSERT(next == (uintptr_t)r->private);
+        if ( next == offset )
+            break;
+    }
+
+    return r;
+}
+
+static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
+{
+    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
+    struct vpci_register *prev_r, *next_r;
+    struct vpci *vpci = pdev->vpci;
+
+    if ( !offset )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&vpci->lock);
+    prev_r = vpci_get_previous_cap_register(vpci, offset);
+    next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
+    if ( !prev_r || !next_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    prev_r->private = next_r->private;
+    /*
+     * Not calling vpci_remove_register() here is to avoid redoing
+     * the register search
+     */
+    list_del(&next_r->node);
+    spin_unlock(&vpci->lock);
+    xfree(next_r);
+
+    if ( !is_hardware_domain(pdev->domain) )
+        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
+
+    return 0;
+}
+
 static int vpci_init_capabilities(struct pci_dev *pdev)
 {
     for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
@@ -103,7 +185,34 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
 
         rc = capability->init(pdev);
         if ( rc )
-            return rc;
+        {
+            const char *type = is_ext ? "extended" : "legacy";
+
+            printk(XENLOG_WARNING
+                   "%pd %pp: init %s cap %u fail rc=%d, mask it\n",
+                   pdev->domain, &pdev->sbdf, type, cap, rc);
+
+            if ( capability->cleanup )
+            {
+                rc = capability->cleanup(pdev);
+                if ( rc )
+                {
+                    printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
+                           pdev->domain, &pdev->sbdf, type, cap, rc);
+                    if ( !is_hardware_domain(pdev->domain) )
+                        return rc;
+                }
+            }
+
+            if ( !is_ext )
+                rc = vpci_capability_hide(pdev, cap);
+            if ( rc )
+            {
+                printk(XENLOG_ERR "%pd %pp: hide %s cap %u fail rc=%d\n",
+                       pdev->domain, &pdev->sbdf, type, cap, rc);
+                return rc;
+            }
+        }
     }
 
     return 0;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 4/8] vpci: Hide extended capability when it fails to initialize
  2025-07-24  5:49 [PATCH v8 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (2 preceding siblings ...)
  2025-07-24  5:50 ` [PATCH v8 3/8] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
@ 2025-07-24  5:50 ` Jiqian Chen
  2025-07-24  5:50 ` [PATCH v8 5/8] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Jiqian Chen @ 2025-07-24  5:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall,
	Stefano Stabellini

When vpci fails to initialize an extended capability of device, it
just returns an error and vPCI gets disabled for the whole device.

So, add function to hide extended capability when initialization
fails. And remove the failed extended capability handler from vpci
extended capability list.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
v7->v8 changes:
* s/and force/hopefully forcing/ in vpci_ext_capability_hide().
* Add Roger's Reviewed-by.

v6->v7 changes:
* Change the pointer parameter of vpci_get_previous_ext_cap_register()
  and vpci_ext_capability_hide() to be const.

v5->v6 changes:
* Change to use for loop to compact code of vpci_get_previous_ext_cap_register().
* Rename parameter rm to r in vpci_ext_capability_hide().
* Change comment to describ the case that hide capability of position
  0x100U.

v4->v5 changes:
* Modify the hex digits of PCI_EXT_CAP_NEXT_MASK and PCI_EXT_CAP_NEXT to be low case.
* Rename vpci_ext_capability_mask to vpci_ext_capability_hide.

v3->v4 changes:
* Change definition of PCI_EXT_CAP_NEXT to be "#define PCI_EXT_CAP_NEXT(header)
  (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xFFCU)" to avoid redundancy.
* Modify the commit message.
* Change vpci_ext_capability_mask() to return error instead of using ASSERT.
* Set the capability ID part to be zero when we need to hide the capability of position 0x100U.
* Add check "if ( !offset )" in vpci_ext_capability_mask().

v2->v3 changes:
* Separated from the last version patch "vpci: Hide capability when it fails to initialize".
* Whole implementation changed because last version is wrong.
  This version gets target handler and previous handler from vpci->handlers, then remove the target.
* Note: a case in function vpci_ext_capability_mask() needs to be discussed,
  because it may change the offset of next capability when the offset of target
  capability is 0x100U(the first extended capability), my implementation is just to
  ignore and let hardware to handle the target capability.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
  remove failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/vpci.c    | 88 ++++++++++++++++++++++++++++++++++++++
 xen/include/xen/pci_regs.h |  5 ++-
 2 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 9960b11cf2c9..26cda5b3262a 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -165,6 +165,92 @@ static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
     return 0;
 }
 
+static struct vpci_register *vpci_get_previous_ext_cap_register(
+    const struct vpci *vpci, unsigned int offset)
+{
+    unsigned int pos = PCI_CFG_SPACE_SIZE;
+    struct vpci_register *r;
+
+    if ( offset <= PCI_CFG_SPACE_SIZE )
+    {
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    for ( r = vpci_get_register(vpci, pos, 4); r;
+          r = pos > PCI_CFG_SPACE_SIZE ? vpci_get_register(vpci, pos, 4)
+                                       : NULL )
+    {
+        uint32_t header = (uint32_t)(uintptr_t)r->private;
+
+        ASSERT(header == (uintptr_t)r->private);
+
+        pos = PCI_EXT_CAP_NEXT(header);
+        if ( pos == offset )
+            break;
+    }
+
+    return r;
+}
+
+static int vpci_ext_capability_hide(
+    const struct pci_dev *pdev, unsigned int cap)
+{
+    const unsigned int offset = pci_find_ext_capability(pdev->sbdf, cap);
+    struct vpci_register *r, *prev_r;
+    struct vpci *vpci = pdev->vpci;
+    uint32_t header, pre_header;
+
+    if ( offset < PCI_CFG_SPACE_SIZE )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&vpci->lock);
+    r = vpci_get_register(vpci, offset, 4);
+    if ( !r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    header = (uint32_t)(uintptr_t)r->private;
+    if ( offset == PCI_CFG_SPACE_SIZE )
+    {
+        if ( PCI_EXT_CAP_NEXT(header) <= PCI_CFG_SPACE_SIZE )
+            r->private = (void *)(uintptr_t)0;
+        else
+            /*
+             * The first extended capability (0x100) can not be removed from
+             * the linked list, so instead mask its capability ID to return 0
+             * hopefully forcing OSes to skip it.
+             */
+            r->private = (void *)(uintptr_t)(header & ~PCI_EXT_CAP_ID(header));
+
+        spin_unlock(&vpci->lock);
+        return 0;
+    }
+
+    prev_r = vpci_get_previous_ext_cap_register(vpci, offset);
+    if ( !prev_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    pre_header = (uint32_t)(uintptr_t)prev_r->private;
+    pre_header &= ~PCI_EXT_CAP_NEXT_MASK;
+    pre_header |= header & PCI_EXT_CAP_NEXT_MASK;
+    prev_r->private = (void *)(uintptr_t)pre_header;
+
+    list_del(&r->node);
+    spin_unlock(&vpci->lock);
+    xfree(r);
+
+    return 0;
+}
+
 static int vpci_init_capabilities(struct pci_dev *pdev)
 {
     for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
@@ -206,6 +292,8 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
 
             if ( !is_ext )
                 rc = vpci_capability_hide(pdev, cap);
+            else
+                rc = vpci_ext_capability_hide(pdev, cap);
             if ( rc )
             {
                 printk(XENLOG_ERR "%pd %pp: hide %s cap %u fail rc=%d\n",
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index 27b4f44eedf3..3b6963133dbd 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -448,7 +448,10 @@
 /* Extended Capabilities (PCI-X 2.0 and Express) */
 #define PCI_EXT_CAP_ID(header)		((header) & 0x0000ffff)
 #define PCI_EXT_CAP_VER(header)		(((header) >> 16) & 0xf)
-#define PCI_EXT_CAP_NEXT(header)	(((header) >> 20) & 0xffc)
+#define PCI_EXT_CAP_NEXT_MASK		0xfff00000U
+/* Bottom two bits of next capability position are reserved. */
+#define PCI_EXT_CAP_NEXT(header) \
+    (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xffcU)
 
 #define PCI_EXT_CAP_ID_ERR	1
 #define PCI_EXT_CAP_ID_VC	2
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 5/8] vpci: Refactor vpci_remove_register to remove matched registers
  2025-07-24  5:49 [PATCH v8 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (3 preceding siblings ...)
  2025-07-24  5:50 ` [PATCH v8 4/8] vpci: Hide extended " Jiqian Chen
@ 2025-07-24  5:50 ` Jiqian Chen
  2025-07-24  5:50 ` [PATCH v8 6/8] vpci/rebar: Free Rebar resources when init_rebar() fails Jiqian Chen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Jiqian Chen @ 2025-07-24  5:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Anthony PERARD

vpci_remove_register() only supports removing a register in a time,
but the follow-on changes need to remove all registers within a range.
So, refactor it to support removing all registers in a given region.

And it is no issue to remove a non exist register, so remove the
__must_check prefix.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
---
v7->v8 changes:
v6->v7 changes:
No.

v5->v6 changes:
* Modify commit message.
* Add Roger's Reviewed-by.

v4->v5 changes:
No.

v3->v4 changes:
* Use list_for_each_entry_safe instead of list_for_each_entry.
* Return ERANGE if overlap.

v2->v3 changes:
* Add new check to return error if registers overlap but not inside range.

v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 tools/tests/vpci/main.c |  4 ++--
 xen/drivers/vpci/vpci.c | 38 ++++++++++++++++++++------------------
 xen/include/xen/vpci.h  |  4 ++--
 3 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
index 33223db3eb77..ca72877d60cd 100644
--- a/tools/tests/vpci/main.c
+++ b/tools/tests/vpci/main.c
@@ -132,10 +132,10 @@ static void vpci_write32_mask(const struct pci_dev *pdev, unsigned int reg,
                                   rsvdz_mask))
 
 #define VPCI_REMOVE_REG(off, size)                                          \
-    assert(!vpci_remove_register(test_pdev.vpci, off, size))
+    assert(!vpci_remove_registers(test_pdev.vpci, off, size))
 
 #define VPCI_REMOVE_INVALID_REG(off, size)                                  \
-    assert(vpci_remove_register(test_pdev.vpci, off, size))
+    assert(vpci_remove_registers(test_pdev.vpci, off, size))
 
 /* Read a 32b register using all possible sizes. */
 void multiread4_check(unsigned int reg, uint32_t val)
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 26cda5b3262a..dd9ef166b412 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -152,7 +152,7 @@ static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
 
     prev_r->private = next_r->private;
     /*
-     * Not calling vpci_remove_register() here is to avoid redoing
+     * Not calling vpci_remove_registers() here is to avoid redoing
      * the register search
      */
     list_del(&next_r->node);
@@ -160,7 +160,7 @@ static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
     xfree(next_r);
 
     if ( !is_hardware_domain(pdev->domain) )
-        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
+        return vpci_remove_registers(vpci, offset + PCI_CAP_LIST_ID, 1);
 
     return 0;
 }
@@ -549,34 +549,36 @@ int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
     return 0;
 }
 
-int vpci_remove_register(struct vpci *vpci, unsigned int offset,
-                         unsigned int size)
+int vpci_remove_registers(struct vpci *vpci, unsigned int start,
+                          unsigned int size)
 {
-    const struct vpci_register r = { .offset = offset, .size = size };
-    struct vpci_register *rm;
+    struct vpci_register *rm, *tmp;
+    unsigned int end = start + size;
 
     spin_lock(&vpci->lock);
-    list_for_each_entry ( rm, &vpci->handlers, node )
+    list_for_each_entry_safe ( rm, tmp, &vpci->handlers, node )
     {
-        int cmp = vpci_register_cmp(&r, rm);
-
-        /*
-         * NB: do not use a switch so that we can use break to
-         * get out of the list loop earlier if required.
-         */
-        if ( !cmp && rm->offset == offset && rm->size == size )
+        /* Remove rm if rm is inside the range. */
+        if ( rm->offset >= start && rm->offset + rm->size <= end )
         {
             list_del(&rm->node);
-            spin_unlock(&vpci->lock);
             xfree(rm);
-            return 0;
+            continue;
         }
-        if ( cmp <= 0 )
+
+        /* Return error if registers overlap but not inside. */
+        if ( rm->offset + rm->size > start && rm->offset < end )
+        {
+            spin_unlock(&vpci->lock);
+            return -ERANGE;
+        }
+
+        if ( start < rm->offset )
             break;
     }
     spin_unlock(&vpci->lock);
 
-    return -ENOENT;
+    return 0;
 }
 
 /* Wrappers for performing reads/writes to the underlying hardware. */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 17cfecb0aabf..514a0ce39133 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -70,8 +70,8 @@ static inline int __must_check vpci_add_register(struct vpci *vpci,
                                   size, data, 0, 0, 0, 0);
 }
 
-int __must_check vpci_remove_register(struct vpci *vpci, unsigned int offset,
-                                      unsigned int size);
+int vpci_remove_registers(struct vpci *vpci, unsigned int start,
+                          unsigned int size);
 
 /* Generic read/write handlers for the PCI config space. */
 uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 6/8] vpci/rebar: Free Rebar resources when init_rebar() fails
  2025-07-24  5:49 [PATCH v8 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (4 preceding siblings ...)
  2025-07-24  5:50 ` [PATCH v8 5/8] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
@ 2025-07-24  5:50 ` Jiqian Chen
  2025-07-24  5:50 ` [PATCH v8 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
  2025-07-24  5:50 ` [PATCH v8 8/8] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
  7 siblings, 0 replies; 20+ messages in thread
From: Jiqian Chen @ 2025-07-24  5:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_rebar() fails, current logic return fail and free Rebar-related
resources in vpci_deassign_device(). But the previous new changes will
hide Rebar capability and return success, it can't reach
vpci_deassign_device() to remove resources if hiding success, so those
resources must be removed in cleanup function of Rebar.

To do that, implement cleanup function for Rebar.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v7->v8 changes:
* Add Roger's Reviewed-by.

v6->v7 changes:
* Change the pointer parameter of cleanup_rebar() to be const.
* Print error when vpci_remove_registers() fail in cleanup_rebar().

v5->v6 changes:
No.

v4->v5 changes:
* Change definition "static void cleanup_rebar" to "static int cf_check cleanup_rebar" since cleanup hook is changed to be int.

v3->v4 changes:
* Change function name from fini_rebar() to cleanup_rebar().
* Change the error number to be E2BIG and ENXIO in init_rebar().

v2->v3 changes:
* Use fini_rebar() to remove all register instead of in the failure path of init_rebar();

v1->v2 changes:
* Called vpci_remove_registers() to remove all possible registered registers instead of using a array to record all registered register.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/rebar.c | 41 +++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
index 3c18792d9bcd..39ab9c2eb7d9 100644
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -49,6 +49,32 @@ static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
     bar->guest_addr = bar->addr;
 }
 
+static int cf_check cleanup_rebar(const struct pci_dev *pdev)
+{
+    int rc;
+    uint32_t ctrl;
+    unsigned int nbars;
+    unsigned int rebar_offset = pci_find_ext_capability(pdev->sbdf,
+                                                        PCI_EXT_CAP_ID_REBAR);
+
+    if ( !rebar_offset || !is_hardware_domain(pdev->domain) )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    ctrl = pci_conf_read32(pdev->sbdf, rebar_offset + PCI_REBAR_CTRL(0));
+    nbars = MASK_EXTR(ctrl, PCI_REBAR_CTRL_NBAR_MASK);
+
+    rc = vpci_remove_registers(pdev->vpci, rebar_offset + PCI_REBAR_CAP(0),
+                               PCI_REBAR_CTRL(nbars - 1));
+    if ( rc )
+        printk(XENLOG_ERR "%pd %pp: fail to remove Rebar handlers rc=%d\n",
+               pdev->domain, &pdev->sbdf, rc);
+
+    return rc;
+}
+
 static int cf_check init_rebar(struct pci_dev *pdev)
 {
     uint32_t ctrl;
@@ -80,7 +106,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: too big BAR number %u in REBAR_CTRL\n",
                    pdev->domain, &pdev->sbdf, index);
-            continue;
+            return -E2BIG;
         }
 
         bar = &pdev->vpci->header.bars[index];
@@ -88,7 +114,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: BAR%u is not in memory space\n",
                    pdev->domain, &pdev->sbdf, index);
-            continue;
+            return -ENXIO;
         }
 
         rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rebar_ctrl_write,
@@ -97,14 +123,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: BAR%u fail to add reg of REBAR_CTRL rc=%d\n",
                    pdev->domain, &pdev->sbdf, index, rc);
-            /*
-             * Ideally we would hide the ReBar capability on error, but code
-             * for doing so still needs to be written. Use continue instead
-             * to keep any already setup register hooks, as returning an
-             * error will cause the hardware domain to get unmediated access
-             * to all device registers.
-             */
-            continue;
+            return rc;
         }
 
         bar->resizable_sizes =
@@ -118,7 +137,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_EXTCAP(REBAR, init_rebar, NULL);
+REGISTER_VPCI_EXTCAP(REBAR, init_rebar, cleanup_rebar);
 
 /*
  * Local variables:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-07-24  5:49 [PATCH v8 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (5 preceding siblings ...)
  2025-07-24  5:50 ` [PATCH v8 6/8] vpci/rebar: Free Rebar resources when init_rebar() fails Jiqian Chen
@ 2025-07-24  5:50 ` Jiqian Chen
  2025-07-24 14:52   ` Roger Pau Monné
  2025-07-24  5:50 ` [PATCH v8 8/8] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
  7 siblings, 1 reply; 20+ messages in thread
From: Jiqian Chen @ 2025-07-24  5:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_msi() fails, current logic return fail and free MSI-related
resources in vpci_deassign_device(). But the previous new changes will
hide MSI capability and return success, it can't reach
vpci_deassign_device() to remove resources if hiding success, so those
resources must be removed in cleanup function of MSI.

To do that, implement cleanup function for MSI.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v7->v8 changes:
* Add a comment to describe why "-2" in cleanup_msi().
* Given the code in vpci_remove_registers() an error in the removal of
  registers would likely imply memory corruption, at which point it's
  best to fully disable the device. So, Rollback the last two modifications of v7.

v6->v7 changes:
* Change the pointer parameter of cleanup_msi() to be const.
* When vpci_remove_registers() in cleanup_msi() fails, not to return
  directly, instead try to free msi and re-add ctrl handler.
* Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msi in
  init_msi() since we need that every handler realize that msi is NULL
  when msi is free but handlers are still in there.

v5->v6 changes:
No.

v4->v5 changes:
* Change definition "static void cleanup_msi" to "static int cf_check cleanup_msi"
  since cleanup hook is changed to be int.
* Add a read-only register for MSI Control Register in the end of cleanup_msi.

v3->v4 changes:
* Change function name from fini_msi() to cleanup_msi().
* Remove unnecessary comment.
* Change to use XFREE to free vpci->msi.

v2->v3 changes:
* Remove all fail path, and use fini_msi() hook instead.
* Change the method to calculating the size of msi registers.

v1->v2 changes:
* Added a new function fini_msi to free all MSI resources instead of using an array
  to record registered registers.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/msi.c | 46 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index c3eba4e14870..ad5138c4cb5e 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -193,6 +193,50 @@ static void cf_check mask_write(
     msi->mask = val;
 }
 
+static int cf_check cleanup_msi(const struct pci_dev *pdev)
+{
+    int rc;
+    unsigned int end;
+    struct vpci *vpci = pdev->vpci;
+    const unsigned int msi_pos = pdev->msi_pos;
+    const unsigned int ctrl = msi_control_reg(msi_pos);
+
+    if ( !msi_pos || !vpci->msi )
+        return 0;
+
+    if ( vpci->msi->masking )
+        end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
+    else
+        /*
+         * "-2" here is to cut the reserved 2 bytes of Message Data when
+         * there is no masking support.
+         */
+        end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
+
+    rc = vpci_remove_registers(vpci, ctrl, end - ctrl);
+    if ( rc )
+    {
+        printk(XENLOG_ERR "%pd %pp: fail to remove MSI handlers rc=%d\n",
+               pdev->domain, &pdev->sbdf, rc);
+        ASSERT_UNREACHABLE();
+        return rc;
+    }
+
+    XFREE(vpci->msi);
+
+    /*
+     * The driver may not traverse the capability list and think device
+     * supports MSI by default. So here let the control register of MSI
+     * be Read-Only is to ensure MSI disabled.
+     */
+    rc = vpci_add_register(vpci, vpci_hw_read16, NULL, ctrl, 2, NULL);
+    if ( rc )
+        printk(XENLOG_ERR "%pd %pp: fail to add MSI ctrl handler rc=%d\n",
+               pdev->domain, &pdev->sbdf, rc);
+
+    return rc;
+}
+
 static int cf_check init_msi(struct pci_dev *pdev)
 {
     unsigned int pos = pdev->msi_pos;
@@ -270,7 +314,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_CAP(MSI, init_msi, NULL);
+REGISTER_VPCI_CAP(MSI, init_msi, cleanup_msi);
 
 void vpci_dump_msi(void)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 8/8] vpci/msix: Free MSIX resources when init_msix() fails
  2025-07-24  5:49 [PATCH v8 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (6 preceding siblings ...)
  2025-07-24  5:50 ` [PATCH v8 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
@ 2025-07-24  5:50 ` Jiqian Chen
  2025-07-24 15:59   ` Roger Pau Monné
  7 siblings, 1 reply; 20+ messages in thread
From: Jiqian Chen @ 2025-07-24  5:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_msix() fails, current logic return fail and free MSIX-related
resources in vpci_deassign_device(). But the previous new changes will
hide MSIX capability and return success, it can't reach
vpci_deassign_device() to remove resources if hiding success, so those
resources must be removed in cleanup function of MSIX.

To do that, implement cleanup function for MSIX.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v7->v8 changes:
* Given the code in vpci_remove_registers() an error in the removal of
  registers would likely imply memory corruption, at which point it's
  best to fully disable the device. So, Rollback the last two modifications of v7.

v6->v7 changes:
* Change the pointer parameter of cleanup_msix() to be const.
* When vpci_remove_registers() in cleanup_msix() fails, not to return
  directly, instead try to free msix and re-add ctrl handler.
* Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msix in
  init_msix() since we need that every handler realize that msix is NULL
  when msix is freed but handlers are still in there.

v5->v6 changes:
* Change the logic to add dummy handler when !vpci->msix in cleanup_msix().

v4->v5 changes:
* Change definition "static void cleanup_msix" to "static int cf_check cleanup_msix"
  since cleanup hook is changed to be int.
* Add a read-only register for MSIX Control Register in the end of cleanup_msix().

v3->v4 changes:
* Change function name from fini_msix() to cleanup_msix().
* Change to use XFREE to free vpci->msix.
* In cleanup function, change the sequence of check and remove action according to
  init_msix().

v2->v3 changes:
* Remove unnecessary clean operations in fini_msix().

v1->v2 changes:
new patch.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/msix.c | 44 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index eb3e7dcd1068..8ab159969da6 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -655,6 +655,48 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
     return 0;
 }
 
+static int cf_check cleanup_msix(const struct pci_dev *pdev)
+{
+    int rc;
+    struct vpci *vpci = pdev->vpci;
+    const unsigned int msix_pos = pdev->msix_pos;
+
+    if ( !msix_pos )
+        return 0;
+
+    rc = vpci_remove_registers(vpci, msix_control_reg(msix_pos), 2);
+    if ( rc )
+    {
+        printk(XENLOG_ERR "%pd %pp: fail to remove MSIX handlers rc=%d\n",
+               pdev->domain, &pdev->sbdf, rc);
+        ASSERT_UNREACHABLE();
+        return rc;
+    }
+
+    if ( vpci->msix )
+    {
+        for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
+            if ( vpci->msix->table[i] )
+                iounmap(vpci->msix->table[i]);
+
+        list_del(&vpci->msix->next);
+        XFREE(vpci->msix);
+    }
+
+    /*
+     * The driver may not traverse the capability list and think device
+     * supports MSIX by default. So here let the control register of MSIX
+     * be Read-Only is to ensure MSIX disabled.
+     */
+    rc = vpci_add_register(vpci, vpci_hw_read16, NULL,
+                           msix_control_reg(msix_pos), 2, NULL);
+    if ( rc )
+        printk(XENLOG_ERR "%pd %pp: fail to add MSIX ctrl handler rc=%d\n",
+               pdev->domain, &pdev->sbdf, rc);
+
+    return rc;
+}
+
 static int cf_check init_msix(struct pci_dev *pdev)
 {
     struct domain *d = pdev->domain;
@@ -714,7 +756,7 @@ static int cf_check init_msix(struct pci_dev *pdev)
 
     return rc;
 }
-REGISTER_VPCI_CAP(MSIX, init_msix, NULL);
+REGISTER_VPCI_CAP(MSIX, init_msix, cleanup_msix);
 
 /*
  * Local variables:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 1/8] vpci/header: Emulate extended capability list for dom0
  2025-07-24  5:49 ` [PATCH v8 1/8] vpci/header: Emulate extended capability list for dom0 Jiqian Chen
@ 2025-07-24  8:09   ` Jan Beulich
  2025-07-24 14:16     ` Roger Pau Monné
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2025-07-24  8:09 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: Huang Rui, Roger Pau Monné, xen-devel

On 24.07.2025 07:49, Jiqian Chen wrote:
> Add a new function to emulate extended capability list for dom0,
> and call it in init_header(). So that it will be easy to hide an
> extended capability whose initialization fails.
> 
> As for the extended capability list of domU, just move the logic
> into above function and keep hiding it for domU.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 1/8] vpci/header: Emulate extended capability list for dom0
  2025-07-24  8:09   ` Jan Beulich
@ 2025-07-24 14:16     ` Roger Pau Monné
  0 siblings, 0 replies; 20+ messages in thread
From: Roger Pau Monné @ 2025-07-24 14:16 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jiqian Chen, Huang Rui, xen-devel

On Thu, Jul 24, 2025 at 10:09:43AM +0200, Jan Beulich wrote:
> On 24.07.2025 07:49, Jiqian Chen wrote:
> > Add a new function to emulate extended capability list for dom0,
> > and call it in init_header(). So that it will be easy to hide an
> > extended capability whose initialization fails.
> > 
> > As for the extended capability list of domU, just move the logic
> > into above function and keep hiding it for domU.
> > 
> > Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citirx.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 2/8] vpci: Refactor REGISTER_VPCI_INIT
  2025-07-24  5:50 ` [PATCH v8 2/8] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
@ 2025-07-24 14:28   ` Roger Pau Monné
  2025-07-25  3:15     ` Chen, Jiqian
  0 siblings, 1 reply; 20+ messages in thread
From: Roger Pau Monné @ 2025-07-24 14:28 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Huang Rui, Andrew Cooper, Anthony PERARD, Michal Orzel,
	Jan Beulich, Julien Grall, Stefano Stabellini

On Thu, Jul 24, 2025 at 01:50:00PM +0800, Jiqian Chen wrote:
> Refactor REGISTER_VPCI_INIT to contain more capability specific
> information, this will benefit further follow-on changes to hide
> capability when initialization fails.
> 
> What's more, change the definition of init_header() since it is
> not a capability and it is needed for all devices' PCI config space.
> 
> After refactor, the "priority" of initializing capabilities isn't
> needed anymore, so delete its related codes.
> 
> Note:
> Call vpci_make_msix_hole() in the end of init_msix() since the change
> of sequence of init_header() and init_msix().
> 
> The cleanup hook is also added in this change, even if it's still
> unused. Further changes will make use of it.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> cc: Andrew Cooper <andrew.cooper3@citrix.com>
> cc: Anthony PERARD <anthony.perard@vates.tech>
> cc: Michal Orzel <michal.orzel@amd.com>
> cc: Jan Beulich <jbeulich@suse.com>
> cc: Julien Grall <julien@xen.org>
> cc: Stefano Stabellini <sstabellini@kernel.org>
> ---
> v7->v8 changes:
> * Recover vpci_make_msix_hole() call in modify_decoding(), which was  deleted by wrong.
> * Add some comment to describe why need to add vpci_make_msix_hole() in init_msix().
> 
> v6->v7 changes:
> * Change the pointer parameter of cleanup hook of vpci_capability_t to be const.
>   If change parameter of init hook to be const will affect init_msix, and it assigns pdev
>   to struct vpci_msix, so keep no const to expanding the impact.
> * Delete the vpci_make_msix_hole() call in modify_decoding().
> * Change __start_vpci_array from vpci_capability_t* array to vpci_capability_t array.
> * Change the name "finit##_t" to be "name##_entry" and add a "name" parameter to macro
>   REGISTER_VPCI_CAPABILITY.
> 
> v5->v6 changes:
> * Rename REGISTER_PCI_CAPABILITY to REGISTER_VPCI_CAPABILITY.
> * Move vpci_capability_t entry from ".data.vpci" to ".data.rel.ro.vpci" and
>   move the instances of VPCI_ARRAY in the linker scripts before *(.data.rel.ro).
> * Change _start/end_vpci_array[] to be const pointer array.
> 
> v4->v5 changes:
> * Rename REGISTER_VPCI_CAP to REGISTER_PCI_CAPABILITY, rename REGISTER_VPCI_LEGACY_CAP to
>   REGISTER_VPCI_CAP, rename REGISTER_VPCI_EXTENDED_CAP to REGISTER_VPCI_EXTCAP.
> * Change cleanup hook of vpci_capability_t from void to int.
> 
> v3->v4 changes
> * Delete the useless trailing dot of section ".data.vpci".
> * Add description about priority since this patch removes the initializing priority of
>   capabilities and priority is not needed anymore.
> * Change the hook name from fini to cleanup.
> * Change the name x and y to be finit and fclean.
> * Remove the unnecessary check "!capability->init"
> 
> v2->v3 changes:
> * This is separated from patch "vpci: Hide capability when it fails to initialize" of v2.
> * Delete __maybe_unused attribute of "out" in function vpci_assign_devic().
> * Rename REGISTER_VPCI_EXTEND_CAP to REGISTER_VPCI_EXTENDED_CAP.
> 
> v1->v2 changes:
> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove
>   failed capability from list.
> * Called vpci_make_msix_hole() in the end of init_msix().
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/arch/arm/xen.lds.S    |  3 +--
>  xen/arch/ppc/xen.lds.S    |  3 +--
>  xen/arch/riscv/xen.lds.S  |  3 +--
>  xen/arch/x86/xen.lds.S    |  2 +-
>  xen/drivers/vpci/header.c |  3 +--
>  xen/drivers/vpci/msi.c    |  2 +-
>  xen/drivers/vpci/msix.c   | 13 ++++++++++--
>  xen/drivers/vpci/rebar.c  |  2 +-
>  xen/drivers/vpci/vpci.c   | 44 ++++++++++++++++++++++++++++++---------
>  xen/include/xen/vpci.h    | 29 +++++++++++++++++++-------
>  xen/include/xen/xen.lds.h |  2 +-
>  11 files changed, 74 insertions(+), 32 deletions(-)
> 
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index 5bfbe1e92c1e..9f30c3a13ed1 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -57,6 +57,7 @@ SECTIONS
>  
>         *(.rodata)
>         *(.rodata.*)
> +       VPCI_ARRAY
>         *(.data.rel.ro)
>         *(.data.rel.ro.*)
>  
> @@ -64,8 +65,6 @@ SECTIONS
>         __proc_info_start = .;
>         *(.proc.info)
>         __proc_info_end = .;
> -
> -       VPCI_ARRAY
>    } :text
>  
>  #if defined(BUILD_ID)
> diff --git a/xen/arch/ppc/xen.lds.S b/xen/arch/ppc/xen.lds.S
> index 1366e2819eed..1de0b77fc6b9 100644
> --- a/xen/arch/ppc/xen.lds.S
> +++ b/xen/arch/ppc/xen.lds.S
> @@ -51,11 +51,10 @@ SECTIONS
>  
>          *(.rodata)
>          *(.rodata.*)
> +        VPCI_ARRAY
>          *(.data.rel.ro)
>          *(.data.rel.ro.*)
>  
> -        VPCI_ARRAY
> -
>          . = ALIGN(POINTER_ALIGN);
>      } :text
>  
> diff --git a/xen/arch/riscv/xen.lds.S b/xen/arch/riscv/xen.lds.S
> index 8c3c06de01f6..edcadff90bfe 100644
> --- a/xen/arch/riscv/xen.lds.S
> +++ b/xen/arch/riscv/xen.lds.S
> @@ -46,11 +46,10 @@ SECTIONS
>  
>          *(.rodata)
>          *(.rodata.*)
> +        VPCI_ARRAY
>          *(.data.rel.ro)
>          *(.data.rel.ro.*)
>  
> -        VPCI_ARRAY
> -
>          . = ALIGN(POINTER_ALIGN);
>      } :text
>  
> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> index 636c7768aa3c..8e9cac75b09e 100644
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -135,6 +135,7 @@ SECTIONS
>  
>         *(.rodata)
>         *(.rodata.*)
> +       VPCI_ARRAY
>         *(.data.rel.ro)
>         *(.data.rel.ro.*)
>  
> @@ -148,7 +149,6 @@ SECTIONS
>         *(.note.gnu.build-id)
>         __note_gnu_build_id_end = .;
>  #endif
> -       VPCI_ARRAY
>    } PHDR(text)
>  
>  #if defined(CONFIG_PVH_GUEST) && !defined(EFI)
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index f537f3f25d2a..469f4977441a 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -858,7 +858,7 @@ static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
>      return 0;
>  }
>  
> -static int cf_check init_header(struct pci_dev *pdev)
> +int vpci_init_header(struct pci_dev *pdev)
>  {
>      uint16_t cmd;
>      uint64_t addr, size;
> @@ -1054,7 +1054,6 @@ static int cf_check init_header(struct pci_dev *pdev)
>      pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>      return rc;
>  }
> -REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
>  
>  /*
>   * Local variables:
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> index 66e5a8a116be..c3eba4e14870 100644
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>  
>      return 0;
>  }
> -REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
> +REGISTER_VPCI_CAP(MSI, init_msi, NULL);
>  
>  void vpci_dump_msi(void)
>  {
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index 74211301ba10..eb3e7dcd1068 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -703,9 +703,18 @@ static int cf_check init_msix(struct pci_dev *pdev)
>      pdev->vpci->msix = msix;
>      list_add(&msix->next, &d->arch.hvm.msix_tables);
>  
> -    return 0;
> +    /*
> +     * vPCI header initialization will have mapped the whole BAR into the
> +     * p2m, as MSI-X capability was not yet initialized.  Crave a hole for
> +     * the MSI-X table here, so that Xen can trap accesses.
> +     */
> +    spin_lock(&pdev->vpci->lock);
> +    rc = vpci_make_msix_hole(pdev);
> +    spin_unlock(&pdev->vpci->lock);

I should have asked in the last version, but why do you take the vPCI
lock here?

The path that ASSERTs the lock is held should never be taken when
called from init_msix().  Is there some path I'm missing in
vpci_make_msix_hole() that requires the vCPI lock to be held?

The rest LGTM.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 3/8] vpci: Hide legacy capability when it fails to initialize
  2025-07-24  5:50 ` [PATCH v8 3/8] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
@ 2025-07-24 14:47   ` Roger Pau Monné
  0 siblings, 0 replies; 20+ messages in thread
From: Roger Pau Monné @ 2025-07-24 14:47 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Thu, Jul 24, 2025 at 01:50:01PM +0800, Jiqian Chen wrote:
> When vpci fails to initialize a legacy capability of device, it just
> returns an error and vPCI gets disabled for the whole device.  That
> most likely renders the device unusable, plus possibly causing issues
> to Xen itself if guest attempts to program the native MSI or MSI-X
> capabilities if present.
> 
> So, add new function to hide legacy capability when initialization
> fails. And remove the failed legacy capability from the vpci emulated
> legacy capability list.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v7->v8 changes:
> * Change the type of next from uint31_t to unsigned int in vpci_get_previous_cap_register().
> * Change to not return error when cleanup fail for dom0.
> 
> v6->v7 changes:
> * Change the pointer parameter of vpci_get_register(),
>   vpci_get_previous_cap_register() and vpci_capability_hide() to be const.
> 
> v5->v6 changes:
> * Rename parameter rm to r in vpci_get_register().
> * Use for loop to compact the code of vpci_get_previous_cap_register().
> * Rename prev_next_r to prev_r in vpci_capability_hide(().
> * Add printing when cap init, cleanup and hide fail.
> 
> v4->v5 changes:
> * Modify vpci_get_register() to delete some unnecessary check, so that
>   I don't need to move function vpci_register_cmp().
> * Rename vpci_capability_mask() to vpci_capability_hide().
> 
> v3->v4 changes:
> * Modify the commit message.
> * In function vpci_get_previous_cap_register(), add an ASSERT_UNREACHABLE() if offset below 0x40.
> * Modify vpci_capability_mask() to return error instead of using ASSERT.
> * Use vpci_remove_register to remove PCI_CAP_LIST_ID register instead of open code.
> * Add check "if ( !offset )" in vpci_capability_mask().
> 
> v2->v3 changes:
> * Separated from the last version patch "vpci: Hide capability when it fails to initialize"
> * Whole implementation changed because last version is wrong.
>   This version adds a new helper function vpci_get_register() and uses it to get
>   target handler and previous handler from vpci->handlers, then remove the target.
> 
> v1->v2 changes:
> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
>   remove failed capability from list.
> * Called vpci_make_msix_hole() in the end of init_msix().
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/vpci.c | 111 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 110 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 7778acee0df6..9960b11cf2c9 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -83,6 +83,88 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>  
> +static struct vpci_register *vpci_get_register(const struct vpci *vpci,
> +                                               unsigned int offset,
> +                                               unsigned int size)
> +{
> +    struct vpci_register *r;
> +
> +    ASSERT(spin_is_locked(&vpci->lock));
> +
> +    list_for_each_entry ( r, &vpci->handlers, node )
> +    {
> +        if ( r->offset == offset && r->size == size )
> +            return r;
> +
> +        if ( offset <= r->offset )
> +            break;
> +    }
> +
> +    return NULL;
> +}
> +
> +static struct vpci_register *vpci_get_previous_cap_register(
> +    const struct vpci *vpci, unsigned int offset)
> +{
> +    unsigned int next;
> +    struct vpci_register *r;
> +
> +    if ( offset < 0x40 )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return NULL;
> +    }
> +
> +    for ( r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1); r;
> +          r = next >= 0x40 ? vpci_get_register(vpci,
> +                                               next + PCI_CAP_LIST_NEXT, 1)
> +                           : NULL )
> +    {
> +        next = (unsigned int)(uintptr_t)r->private;
> +        ASSERT(next == (uintptr_t)r->private);
> +        if ( next == offset )
> +            break;
> +    }
> +
> +    return r;
> +}
> +
> +static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
> +{
> +    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
> +    struct vpci_register *prev_r, *next_r;
> +    struct vpci *vpci = pdev->vpci;
> +
> +    if ( !offset )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return 0;
> +    }
> +
> +    spin_lock(&vpci->lock);
> +    prev_r = vpci_get_previous_cap_register(vpci, offset);
> +    next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
> +    if ( !prev_r || !next_r )
> +    {
> +        spin_unlock(&vpci->lock);
> +        return -ENODEV;
> +    }
> +
> +    prev_r->private = next_r->private;
> +    /*
> +     * Not calling vpci_remove_register() here is to avoid redoing
> +     * the register search

I would prefer a full stop after the sentence here, but since it's
still a single sentence multi line comment, I don't think it's
strictly necessary.

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-07-24  5:50 ` [PATCH v8 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
@ 2025-07-24 14:52   ` Roger Pau Monné
  0 siblings, 0 replies; 20+ messages in thread
From: Roger Pau Monné @ 2025-07-24 14:52 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Thu, Jul 24, 2025 at 01:50:05PM +0800, Jiqian Chen wrote:
> When init_msi() fails, current logic return fail and free MSI-related
> resources in vpci_deassign_device(). But the previous new changes will
> hide MSI capability and return success, it can't reach
> vpci_deassign_device() to remove resources if hiding success, so those
> resources must be removed in cleanup function of MSI.
> 
> To do that, implement cleanup function for MSI.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 8/8] vpci/msix: Free MSIX resources when init_msix() fails
  2025-07-24  5:50 ` [PATCH v8 8/8] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
@ 2025-07-24 15:59   ` Roger Pau Monné
  2025-07-25  2:50     ` Chen, Jiqian
  0 siblings, 1 reply; 20+ messages in thread
From: Roger Pau Monné @ 2025-07-24 15:59 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Thu, Jul 24, 2025 at 01:50:06PM +0800, Jiqian Chen wrote:
> When init_msix() fails, current logic return fail and free MSIX-related
> resources in vpci_deassign_device(). But the previous new changes will
> hide MSIX capability and return success, it can't reach
> vpci_deassign_device() to remove resources if hiding success, so those
> resources must be removed in cleanup function of MSIX.

The text here is a bit convoluted IMO.  It would be clearer as:

When MSI-X initialization fails vPCI will hide the capability, but
remove of handlers and data won't be performed until the device is
deassigned.  Introduce a MSI-X cleanup hook that will be called when
initialization fails to cleanup MSI-X related hooks and free it's
associated data.

As all supported capabilities have been switched to use the cleanup
hooks call those from vpci_deassign_device() instead of open-code the
capability specific cleanup in there.

(see below for the reasoning behind the last paragrpah).

> To do that, implement cleanup function for MSIX.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v7->v8 changes:
> * Given the code in vpci_remove_registers() an error in the removal of
>   registers would likely imply memory corruption, at which point it's
>   best to fully disable the device. So, Rollback the last two modifications of v7.
> 
> v6->v7 changes:
> * Change the pointer parameter of cleanup_msix() to be const.
> * When vpci_remove_registers() in cleanup_msix() fails, not to return
>   directly, instead try to free msix and re-add ctrl handler.
> * Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msix in
>   init_msix() since we need that every handler realize that msix is NULL
>   when msix is freed but handlers are still in there.
> 
> v5->v6 changes:
> * Change the logic to add dummy handler when !vpci->msix in cleanup_msix().
> 
> v4->v5 changes:
> * Change definition "static void cleanup_msix" to "static int cf_check cleanup_msix"
>   since cleanup hook is changed to be int.
> * Add a read-only register for MSIX Control Register in the end of cleanup_msix().
> 
> v3->v4 changes:
> * Change function name from fini_msix() to cleanup_msix().
> * Change to use XFREE to free vpci->msix.
> * In cleanup function, change the sequence of check and remove action according to
>   init_msix().
> 
> v2->v3 changes:
> * Remove unnecessary clean operations in fini_msix().
> 
> v1->v2 changes:
> new patch.
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/msix.c | 44 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index eb3e7dcd1068..8ab159969da6 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -655,6 +655,48 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
>      return 0;
>  }
>  
> +static int cf_check cleanup_msix(const struct pci_dev *pdev)
> +{
> +    int rc;
> +    struct vpci *vpci = pdev->vpci;
> +    const unsigned int msix_pos = pdev->msix_pos;
> +
> +    if ( !msix_pos )
> +        return 0;
> +
> +    rc = vpci_remove_registers(vpci, msix_control_reg(msix_pos), 2);
> +    if ( rc )
> +    {
> +        printk(XENLOG_ERR "%pd %pp: fail to remove MSIX handlers rc=%d\n",
> +               pdev->domain, &pdev->sbdf, rc);
> +        ASSERT_UNREACHABLE();
> +        return rc;
> +    }
> +
> +    if ( vpci->msix )
> +    {
> +        for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
> +            if ( vpci->msix->table[i] )
> +                iounmap(vpci->msix->table[i]);
> +
> +        list_del(&vpci->msix->next);
> +        XFREE(vpci->msix);
> +    }
> +
> +    /*
> +     * The driver may not traverse the capability list and think device
> +     * supports MSIX by default. So here let the control register of MSIX
> +     * be Read-Only is to ensure MSIX disabled.
> +     */
> +    rc = vpci_add_register(vpci, vpci_hw_read16, NULL,
> +                           msix_control_reg(msix_pos), 2, NULL);
> +    if ( rc )
> +        printk(XENLOG_ERR "%pd %pp: fail to add MSIX ctrl handler rc=%d\n",
> +               pdev->domain, &pdev->sbdf, rc);
> +
> +    return rc;
> +}
> +
>  static int cf_check init_msix(struct pci_dev *pdev)
>  {
>      struct domain *d = pdev->domain;
> @@ -714,7 +756,7 @@ static int cf_check init_msix(struct pci_dev *pdev)
>  
>      return rc;
>  }
> -REGISTER_VPCI_CAP(MSIX, init_msix, NULL);
> +REGISTER_VPCI_CAP(MSIX, init_msix, cleanup_msix);

Don't you need to also call the cleanup hooks in
vpci_deassign_device() and remove the open-coded cleaning of MSI-X
done there?

Otherwise the code here is duplicated with the code in
vpci_deassign_device() for MSI-X cleanup (apart from it being a bit of
a layering violation to open-code the MSI-X cleanup in there).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 8/8] vpci/msix: Free MSIX resources when init_msix() fails
  2025-07-24 15:59   ` Roger Pau Monné
@ 2025-07-25  2:50     ` Chen, Jiqian
  2025-07-25  8:06       ` Roger Pau Monné
  0 siblings, 1 reply; 20+ messages in thread
From: Chen, Jiqian @ 2025-07-25  2:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/7/24 23:59, Roger Pau Monné wrote:
> On Thu, Jul 24, 2025 at 01:50:06PM +0800, Jiqian Chen wrote:
>> When init_msix() fails, current logic return fail and free MSIX-related
>> resources in vpci_deassign_device(). But the previous new changes will
>> hide MSIX capability and return success, it can't reach
>> vpci_deassign_device() to remove resources if hiding success, so those
>> resources must be removed in cleanup function of MSIX.
> 
> The text here is a bit convoluted IMO.  It would be clearer as:
> 
> When MSI-X initialization fails vPCI will hide the capability, but
> remove of handlers and data won't be performed until the device is
> deassigned.  Introduce a MSI-X cleanup hook that will be called when
> initialization fails to cleanup MSI-X related hooks and free it's
> associated data.
> 
> As all supported capabilities have been switched to use the cleanup
> hooks call those from vpci_deassign_device() instead of open-code the
> capability specific cleanup in there.
Thanks, will change.

> 
> (see below for the reasoning behind the last paragrpah).
> 
>> To do that, implement cleanup function for MSIX.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> ---
>> v7->v8 changes:
>> * Given the code in vpci_remove_registers() an error in the removal of
>>   registers would likely imply memory corruption, at which point it's
>>   best to fully disable the device. So, Rollback the last two modifications of v7.
>>
>> v6->v7 changes:
>> * Change the pointer parameter of cleanup_msix() to be const.
>> * When vpci_remove_registers() in cleanup_msix() fails, not to return
>>   directly, instead try to free msix and re-add ctrl handler.
>> * Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msix in
>>   init_msix() since we need that every handler realize that msix is NULL
>>   when msix is freed but handlers are still in there.
>>
>> v5->v6 changes:
>> * Change the logic to add dummy handler when !vpci->msix in cleanup_msix().
>>
>> v4->v5 changes:
>> * Change definition "static void cleanup_msix" to "static int cf_check cleanup_msix"
>>   since cleanup hook is changed to be int.
>> * Add a read-only register for MSIX Control Register in the end of cleanup_msix().
>>
>> v3->v4 changes:
>> * Change function name from fini_msix() to cleanup_msix().
>> * Change to use XFREE to free vpci->msix.
>> * In cleanup function, change the sequence of check and remove action according to
>>   init_msix().
>>
>> v2->v3 changes:
>> * Remove unnecessary clean operations in fini_msix().
>>
>> v1->v2 changes:
>> new patch.
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/msix.c | 44 ++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 43 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>> index eb3e7dcd1068..8ab159969da6 100644
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -655,6 +655,48 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
>>      return 0;
>>  }
>>  
>> +static int cf_check cleanup_msix(const struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +    struct vpci *vpci = pdev->vpci;
>> +    const unsigned int msix_pos = pdev->msix_pos;
>> +
>> +    if ( !msix_pos )
>> +        return 0;
>> +
>> +    rc = vpci_remove_registers(vpci, msix_control_reg(msix_pos), 2);
>> +    if ( rc )
>> +    {
>> +        printk(XENLOG_ERR "%pd %pp: fail to remove MSIX handlers rc=%d\n",
>> +               pdev->domain, &pdev->sbdf, rc);
>> +        ASSERT_UNREACHABLE();
>> +        return rc;
>> +    }
>> +
>> +    if ( vpci->msix )
>> +    {
>> +        for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
>> +            if ( vpci->msix->table[i] )
>> +                iounmap(vpci->msix->table[i]);
>> +
>> +        list_del(&vpci->msix->next);
Should I need to move this line above " for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )" ?
Because I noticed that is what it be in vpci_deassign_device.

>> +        XFREE(vpci->msix);
>> +    }
>> +
>> +    /*
>> +     * The driver may not traverse the capability list and think device
>> +     * supports MSIX by default. So here let the control register of MSIX
>> +     * be Read-Only is to ensure MSIX disabled.
>> +     */
>> +    rc = vpci_add_register(vpci, vpci_hw_read16, NULL,
>> +                           msix_control_reg(msix_pos), 2, NULL);
>> +    if ( rc )
>> +        printk(XENLOG_ERR "%pd %pp: fail to add MSIX ctrl handler rc=%d\n",
>> +               pdev->domain, &pdev->sbdf, rc);
>> +
>> +    return rc;
>> +}
>> +
>>  static int cf_check init_msix(struct pci_dev *pdev)
>>  {
>>      struct domain *d = pdev->domain;
>> @@ -714,7 +756,7 @@ static int cf_check init_msix(struct pci_dev *pdev)
>>  
>>      return rc;
>>  }
>> -REGISTER_VPCI_CAP(MSIX, init_msix, NULL);
>> +REGISTER_VPCI_CAP(MSIX, init_msix, cleanup_msix);
> 
> Don't you need to also call the cleanup hooks in
> vpci_deassign_device() and remove the open-coded cleaning of MSI-X
> done there?
Oh, will do.
How do I process the return value of cleanup_msix in vpci_deassign_device?
Just print an error when it fails and continue to do other deassign actions?

> 
> Otherwise the code here is duplicated with the code in
> vpci_deassign_device() for MSI-X cleanup (apart from it being a bit of
> a layering violation to open-code the MSI-X cleanup in there).
> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 2/8] vpci: Refactor REGISTER_VPCI_INIT
  2025-07-24 14:28   ` Roger Pau Monné
@ 2025-07-25  3:15     ` Chen, Jiqian
  2025-07-25  8:08       ` Roger Pau Monné
  0 siblings, 1 reply; 20+ messages in thread
From: Chen, Jiqian @ 2025-07-25  3:15 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Jan Beulich, Julien Grall,
	Stefano Stabellini, Chen, Jiqian

On 2025/7/24 22:28, Roger Pau Monné wrote:
> On Thu, Jul 24, 2025 at 01:50:00PM +0800, Jiqian Chen wrote:
>> Refactor REGISTER_VPCI_INIT to contain more capability specific
>> information, this will benefit further follow-on changes to hide
>> capability when initialization fails.
>>
>> What's more, change the definition of init_header() since it is
>> not a capability and it is needed for all devices' PCI config space.
>>
>> After refactor, the "priority" of initializing capabilities isn't
>> needed anymore, so delete its related codes.
>>
>> Note:
>> Call vpci_make_msix_hole() in the end of init_msix() since the change
>> of sequence of init_header() and init_msix().
>>
>> The cleanup hook is also added in this change, even if it's still
>> unused. Further changes will make use of it.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> cc: Anthony PERARD <anthony.perard@vates.tech>
>> cc: Michal Orzel <michal.orzel@amd.com>
>> cc: Jan Beulich <jbeulich@suse.com>
>> cc: Julien Grall <julien@xen.org>
>> cc: Stefano Stabellini <sstabellini@kernel.org>
>> ---
>> v7->v8 changes:
>> * Recover vpci_make_msix_hole() call in modify_decoding(), which was  deleted by wrong.
>> * Add some comment to describe why need to add vpci_make_msix_hole() in init_msix().
>>
>> v6->v7 changes:
>> * Change the pointer parameter of cleanup hook of vpci_capability_t to be const.
>>   If change parameter of init hook to be const will affect init_msix, and it assigns pdev
>>   to struct vpci_msix, so keep no const to expanding the impact.
>> * Delete the vpci_make_msix_hole() call in modify_decoding().
>> * Change __start_vpci_array from vpci_capability_t* array to vpci_capability_t array.
>> * Change the name "finit##_t" to be "name##_entry" and add a "name" parameter to macro
>>   REGISTER_VPCI_CAPABILITY.
>>
>> v5->v6 changes:
>> * Rename REGISTER_PCI_CAPABILITY to REGISTER_VPCI_CAPABILITY.
>> * Move vpci_capability_t entry from ".data.vpci" to ".data.rel.ro.vpci" and
>>   move the instances of VPCI_ARRAY in the linker scripts before *(.data.rel.ro).
>> * Change _start/end_vpci_array[] to be const pointer array.
>>
>> v4->v5 changes:
>> * Rename REGISTER_VPCI_CAP to REGISTER_PCI_CAPABILITY, rename REGISTER_VPCI_LEGACY_CAP to
>>   REGISTER_VPCI_CAP, rename REGISTER_VPCI_EXTENDED_CAP to REGISTER_VPCI_EXTCAP.
>> * Change cleanup hook of vpci_capability_t from void to int.
>>
>> v3->v4 changes
>> * Delete the useless trailing dot of section ".data.vpci".
>> * Add description about priority since this patch removes the initializing priority of
>>   capabilities and priority is not needed anymore.
>> * Change the hook name from fini to cleanup.
>> * Change the name x and y to be finit and fclean.
>> * Remove the unnecessary check "!capability->init"
>>
>> v2->v3 changes:
>> * This is separated from patch "vpci: Hide capability when it fails to initialize" of v2.
>> * Delete __maybe_unused attribute of "out" in function vpci_assign_devic().
>> * Rename REGISTER_VPCI_EXTEND_CAP to REGISTER_VPCI_EXTENDED_CAP.
>>
>> v1->v2 changes:
>> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
>> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove
>>   failed capability from list.
>> * Called vpci_make_msix_hole() in the end of init_msix().
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/arch/arm/xen.lds.S    |  3 +--
>>  xen/arch/ppc/xen.lds.S    |  3 +--
>>  xen/arch/riscv/xen.lds.S  |  3 +--
>>  xen/arch/x86/xen.lds.S    |  2 +-
>>  xen/drivers/vpci/header.c |  3 +--
>>  xen/drivers/vpci/msi.c    |  2 +-
>>  xen/drivers/vpci/msix.c   | 13 ++++++++++--
>>  xen/drivers/vpci/rebar.c  |  2 +-
>>  xen/drivers/vpci/vpci.c   | 44 ++++++++++++++++++++++++++++++---------
>>  xen/include/xen/vpci.h    | 29 +++++++++++++++++++-------
>>  xen/include/xen/xen.lds.h |  2 +-
>>  11 files changed, 74 insertions(+), 32 deletions(-)
>>
>> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
>> index 5bfbe1e92c1e..9f30c3a13ed1 100644
>> --- a/xen/arch/arm/xen.lds.S
>> +++ b/xen/arch/arm/xen.lds.S
>> @@ -57,6 +57,7 @@ SECTIONS
>>  
>>         *(.rodata)
>>         *(.rodata.*)
>> +       VPCI_ARRAY
>>         *(.data.rel.ro)
>>         *(.data.rel.ro.*)
>>  
>> @@ -64,8 +65,6 @@ SECTIONS
>>         __proc_info_start = .;
>>         *(.proc.info)
>>         __proc_info_end = .;
>> -
>> -       VPCI_ARRAY
>>    } :text
>>  
>>  #if defined(BUILD_ID)
>> diff --git a/xen/arch/ppc/xen.lds.S b/xen/arch/ppc/xen.lds.S
>> index 1366e2819eed..1de0b77fc6b9 100644
>> --- a/xen/arch/ppc/xen.lds.S
>> +++ b/xen/arch/ppc/xen.lds.S
>> @@ -51,11 +51,10 @@ SECTIONS
>>  
>>          *(.rodata)
>>          *(.rodata.*)
>> +        VPCI_ARRAY
>>          *(.data.rel.ro)
>>          *(.data.rel.ro.*)
>>  
>> -        VPCI_ARRAY
>> -
>>          . = ALIGN(POINTER_ALIGN);
>>      } :text
>>  
>> diff --git a/xen/arch/riscv/xen.lds.S b/xen/arch/riscv/xen.lds.S
>> index 8c3c06de01f6..edcadff90bfe 100644
>> --- a/xen/arch/riscv/xen.lds.S
>> +++ b/xen/arch/riscv/xen.lds.S
>> @@ -46,11 +46,10 @@ SECTIONS
>>  
>>          *(.rodata)
>>          *(.rodata.*)
>> +        VPCI_ARRAY
>>          *(.data.rel.ro)
>>          *(.data.rel.ro.*)
>>  
>> -        VPCI_ARRAY
>> -
>>          . = ALIGN(POINTER_ALIGN);
>>      } :text
>>  
>> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
>> index 636c7768aa3c..8e9cac75b09e 100644
>> --- a/xen/arch/x86/xen.lds.S
>> +++ b/xen/arch/x86/xen.lds.S
>> @@ -135,6 +135,7 @@ SECTIONS
>>  
>>         *(.rodata)
>>         *(.rodata.*)
>> +       VPCI_ARRAY
>>         *(.data.rel.ro)
>>         *(.data.rel.ro.*)
>>  
>> @@ -148,7 +149,6 @@ SECTIONS
>>         *(.note.gnu.build-id)
>>         __note_gnu_build_id_end = .;
>>  #endif
>> -       VPCI_ARRAY
>>    } PHDR(text)
>>  
>>  #if defined(CONFIG_PVH_GUEST) && !defined(EFI)
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index f537f3f25d2a..469f4977441a 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -858,7 +858,7 @@ static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
>>      return 0;
>>  }
>>  
>> -static int cf_check init_header(struct pci_dev *pdev)
>> +int vpci_init_header(struct pci_dev *pdev)
>>  {
>>      uint16_t cmd;
>>      uint64_t addr, size;
>> @@ -1054,7 +1054,6 @@ static int cf_check init_header(struct pci_dev *pdev)
>>      pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>>      return rc;
>>  }
>> -REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
>>  
>>  /*
>>   * Local variables:
>> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
>> index 66e5a8a116be..c3eba4e14870 100644
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>>  
>>      return 0;
>>  }
>> -REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
>> +REGISTER_VPCI_CAP(MSI, init_msi, NULL);
>>  
>>  void vpci_dump_msi(void)
>>  {
>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>> index 74211301ba10..eb3e7dcd1068 100644
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -703,9 +703,18 @@ static int cf_check init_msix(struct pci_dev *pdev)
>>      pdev->vpci->msix = msix;
>>      list_add(&msix->next, &d->arch.hvm.msix_tables);
>>  
>> -    return 0;
>> +    /*
>> +     * vPCI header initialization will have mapped the whole BAR into the
>> +     * p2m, as MSI-X capability was not yet initialized.  Crave a hole for
>> +     * the MSI-X table here, so that Xen can trap accesses.
>> +     */
>> +    spin_lock(&pdev->vpci->lock);
>> +    rc = vpci_make_msix_hole(pdev);
>> +    spin_unlock(&pdev->vpci->lock);
> 
> I should have asked in the last version, but why do you take the vPCI
> lock here?
> 
> The path that ASSERTs the lock is held should never be taken when
> called from init_msix().  Is there some path I'm missing in
> vpci_make_msix_hole() that requires the vCPI lock to be held?
> 
> The rest LGTM.
Sorry to forget to delete this.
Feel free to change when submit.
Or I will change by sending a new version.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 8/8] vpci/msix: Free MSIX resources when init_msix() fails
  2025-07-25  2:50     ` Chen, Jiqian
@ 2025-07-25  8:06       ` Roger Pau Monné
  0 siblings, 0 replies; 20+ messages in thread
From: Roger Pau Monné @ 2025-07-25  8:06 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: xen-devel@lists.xenproject.org, Huang, Ray

On Fri, Jul 25, 2025 at 02:50:36AM +0000, Chen, Jiqian wrote:
> On 2025/7/24 23:59, Roger Pau Monné wrote:
> > On Thu, Jul 24, 2025 at 01:50:06PM +0800, Jiqian Chen wrote:
> >> When init_msix() fails, current logic return fail and free MSIX-related
> >> resources in vpci_deassign_device(). But the previous new changes will
> >> hide MSIX capability and return success, it can't reach
> >> vpci_deassign_device() to remove resources if hiding success, so those
> >> resources must be removed in cleanup function of MSIX.
> > 
> > The text here is a bit convoluted IMO.  It would be clearer as:
> > 
> > When MSI-X initialization fails vPCI will hide the capability, but
> > remove of handlers and data won't be performed until the device is
> > deassigned.  Introduce a MSI-X cleanup hook that will be called when
> > initialization fails to cleanup MSI-X related hooks and free it's
> > associated data.
> > 
> > As all supported capabilities have been switched to use the cleanup
> > hooks call those from vpci_deassign_device() instead of open-code the
> > capability specific cleanup in there.
> Thanks, will change.
> 
> > 
> > (see below for the reasoning behind the last paragrpah).
> > 
> >> To do that, implement cleanup function for MSIX.
> >>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >> ---
> >> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> >> ---
> >> v7->v8 changes:
> >> * Given the code in vpci_remove_registers() an error in the removal of
> >>   registers would likely imply memory corruption, at which point it's
> >>   best to fully disable the device. So, Rollback the last two modifications of v7.
> >>
> >> v6->v7 changes:
> >> * Change the pointer parameter of cleanup_msix() to be const.
> >> * When vpci_remove_registers() in cleanup_msix() fails, not to return
> >>   directly, instead try to free msix and re-add ctrl handler.
> >> * Pass pdev->vpci into vpci_add_register() instead of pdev->vpci->msix in
> >>   init_msix() since we need that every handler realize that msix is NULL
> >>   when msix is freed but handlers are still in there.
> >>
> >> v5->v6 changes:
> >> * Change the logic to add dummy handler when !vpci->msix in cleanup_msix().
> >>
> >> v4->v5 changes:
> >> * Change definition "static void cleanup_msix" to "static int cf_check cleanup_msix"
> >>   since cleanup hook is changed to be int.
> >> * Add a read-only register for MSIX Control Register in the end of cleanup_msix().
> >>
> >> v3->v4 changes:
> >> * Change function name from fini_msix() to cleanup_msix().
> >> * Change to use XFREE to free vpci->msix.
> >> * In cleanup function, change the sequence of check and remove action according to
> >>   init_msix().
> >>
> >> v2->v3 changes:
> >> * Remove unnecessary clean operations in fini_msix().
> >>
> >> v1->v2 changes:
> >> new patch.
> >>
> >> Best regards,
> >> Jiqian Chen.
> >> ---
> >>  xen/drivers/vpci/msix.c | 44 ++++++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 43 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> >> index eb3e7dcd1068..8ab159969da6 100644
> >> --- a/xen/drivers/vpci/msix.c
> >> +++ b/xen/drivers/vpci/msix.c
> >> @@ -655,6 +655,48 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
> >>      return 0;
> >>  }
> >>  
> >> +static int cf_check cleanup_msix(const struct pci_dev *pdev)
> >> +{
> >> +    int rc;
> >> +    struct vpci *vpci = pdev->vpci;
> >> +    const unsigned int msix_pos = pdev->msix_pos;
> >> +
> >> +    if ( !msix_pos )
> >> +        return 0;
> >> +
> >> +    rc = vpci_remove_registers(vpci, msix_control_reg(msix_pos), 2);
> >> +    if ( rc )
> >> +    {
> >> +        printk(XENLOG_ERR "%pd %pp: fail to remove MSIX handlers rc=%d\n",
> >> +               pdev->domain, &pdev->sbdf, rc);
> >> +        ASSERT_UNREACHABLE();
> >> +        return rc;
> >> +    }
> >> +
> >> +    if ( vpci->msix )
> >> +    {
> >> +        for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
> >> +            if ( vpci->msix->table[i] )
> >> +                iounmap(vpci->msix->table[i]);
> >> +
> >> +        list_del(&vpci->msix->next);
> Should I need to move this line above " for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )" ?
> Because I noticed that is what it be in vpci_deassign_device.

Yes, indeed, that would be preferable.

> >> +        XFREE(vpci->msix);
> >> +    }
> >> +
> >> +    /*
> >> +     * The driver may not traverse the capability list and think device
> >> +     * supports MSIX by default. So here let the control register of MSIX
> >> +     * be Read-Only is to ensure MSIX disabled.
> >> +     */
> >> +    rc = vpci_add_register(vpci, vpci_hw_read16, NULL,
> >> +                           msix_control_reg(msix_pos), 2, NULL);
> >> +    if ( rc )
> >> +        printk(XENLOG_ERR "%pd %pp: fail to add MSIX ctrl handler rc=%d\n",
> >> +               pdev->domain, &pdev->sbdf, rc);
> >> +
> >> +    return rc;
> >> +}
> >> +
> >>  static int cf_check init_msix(struct pci_dev *pdev)
> >>  {
> >>      struct domain *d = pdev->domain;
> >> @@ -714,7 +756,7 @@ static int cf_check init_msix(struct pci_dev *pdev)
> >>  
> >>      return rc;
> >>  }
> >> -REGISTER_VPCI_CAP(MSIX, init_msix, NULL);
> >> +REGISTER_VPCI_CAP(MSIX, init_msix, cleanup_msix);
> > 
> > Don't you need to also call the cleanup hooks in
> > vpci_deassign_device() and remove the open-coded cleaning of MSI-X
> > done there?
> Oh, will do.
> How do I process the return value of cleanup_msix in vpci_deassign_device?
> Just print an error when it fails and continue to do other deassign actions?

Yeah, I don't think there's much else that can be done.  Printing an
error and continuing should be fine.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 2/8] vpci: Refactor REGISTER_VPCI_INIT
  2025-07-25  3:15     ` Chen, Jiqian
@ 2025-07-25  8:08       ` Roger Pau Monné
  2025-07-25  8:22         ` Chen, Jiqian
  0 siblings, 1 reply; 20+ messages in thread
From: Roger Pau Monné @ 2025-07-25  8:08 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Jan Beulich, Julien Grall,
	Stefano Stabellini

On Fri, Jul 25, 2025 at 03:15:13AM +0000, Chen, Jiqian wrote:
> On 2025/7/24 22:28, Roger Pau Monné wrote:
> > On Thu, Jul 24, 2025 at 01:50:00PM +0800, Jiqian Chen wrote:
> >> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> >> index 74211301ba10..eb3e7dcd1068 100644
> >> --- a/xen/drivers/vpci/msix.c
> >> +++ b/xen/drivers/vpci/msix.c
> >> @@ -703,9 +703,18 @@ static int cf_check init_msix(struct pci_dev *pdev)
> >>      pdev->vpci->msix = msix;
> >>      list_add(&msix->next, &d->arch.hvm.msix_tables);
> >>  
> >> -    return 0;
> >> +    /*
> >> +     * vPCI header initialization will have mapped the whole BAR into the
> >> +     * p2m, as MSI-X capability was not yet initialized.  Crave a hole for
> >> +     * the MSI-X table here, so that Xen can trap accesses.
> >> +     */
> >> +    spin_lock(&pdev->vpci->lock);
> >> +    rc = vpci_make_msix_hole(pdev);
> >> +    spin_unlock(&pdev->vpci->lock);
> > 
> > I should have asked in the last version, but why do you take the vPCI
> > lock here?
> > 
> > The path that ASSERTs the lock is held should never be taken when
> > called from init_msix().  Is there some path I'm missing in
> > vpci_make_msix_hole() that requires the vCPI lock to be held?
> > 
> > The rest LGTM.
> Sorry to forget to delete this.
> Feel free to change when submit.
> Or I will change by sending a new version.

Can you ensure it also works without the locking?  I think so, but I
haven't tested myself.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 2/8] vpci: Refactor REGISTER_VPCI_INIT
  2025-07-25  8:08       ` Roger Pau Monné
@ 2025-07-25  8:22         ` Chen, Jiqian
  0 siblings, 0 replies; 20+ messages in thread
From: Chen, Jiqian @ 2025-07-25  8:22 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Jan Beulich, Julien Grall,
	Stefano Stabellini, Chen, Jiqian

On 2025/7/25 16:08, Roger Pau Monné wrote:
> On Fri, Jul 25, 2025 at 03:15:13AM +0000, Chen, Jiqian wrote:
>> On 2025/7/24 22:28, Roger Pau Monné wrote:
>>> On Thu, Jul 24, 2025 at 01:50:00PM +0800, Jiqian Chen wrote:
>>>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>>>> index 74211301ba10..eb3e7dcd1068 100644
>>>> --- a/xen/drivers/vpci/msix.c
>>>> +++ b/xen/drivers/vpci/msix.c
>>>> @@ -703,9 +703,18 @@ static int cf_check init_msix(struct pci_dev *pdev)
>>>>      pdev->vpci->msix = msix;
>>>>      list_add(&msix->next, &d->arch.hvm.msix_tables);
>>>>  
>>>> -    return 0;
>>>> +    /*
>>>> +     * vPCI header initialization will have mapped the whole BAR into the
>>>> +     * p2m, as MSI-X capability was not yet initialized.  Crave a hole for
>>>> +     * the MSI-X table here, so that Xen can trap accesses.
>>>> +     */
>>>> +    spin_lock(&pdev->vpci->lock);
>>>> +    rc = vpci_make_msix_hole(pdev);
>>>> +    spin_unlock(&pdev->vpci->lock);
>>>
>>> I should have asked in the last version, but why do you take the vPCI
>>> lock here?
>>>
>>> The path that ASSERTs the lock is held should never be taken when
>>> called from init_msix().  Is there some path I'm missing in
>>> vpci_make_msix_hole() that requires the vCPI lock to be held?
>>>
>>> The rest LGTM.
>> Sorry to forget to delete this.
>> Feel free to change when submit.
>> Or I will change by sending a new version.
> 
> Can you ensure it also works without the locking?  I think so, but I
> haven't tested myself.
Yes, before I replied to you last email.
I have tested locally. MSI-X and other things work fine.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-07-25  8:22 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-24  5:49 [PATCH v8 0/8] Support hiding capability when its initialization fails Jiqian Chen
2025-07-24  5:49 ` [PATCH v8 1/8] vpci/header: Emulate extended capability list for dom0 Jiqian Chen
2025-07-24  8:09   ` Jan Beulich
2025-07-24 14:16     ` Roger Pau Monné
2025-07-24  5:50 ` [PATCH v8 2/8] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
2025-07-24 14:28   ` Roger Pau Monné
2025-07-25  3:15     ` Chen, Jiqian
2025-07-25  8:08       ` Roger Pau Monné
2025-07-25  8:22         ` Chen, Jiqian
2025-07-24  5:50 ` [PATCH v8 3/8] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
2025-07-24 14:47   ` Roger Pau Monné
2025-07-24  5:50 ` [PATCH v8 4/8] vpci: Hide extended " Jiqian Chen
2025-07-24  5:50 ` [PATCH v8 5/8] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
2025-07-24  5:50 ` [PATCH v8 6/8] vpci/rebar: Free Rebar resources when init_rebar() fails Jiqian Chen
2025-07-24  5:50 ` [PATCH v8 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
2025-07-24 14:52   ` Roger Pau Monné
2025-07-24  5:50 ` [PATCH v8 8/8] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
2025-07-24 15:59   ` Roger Pau Monné
2025-07-25  2:50     ` Chen, Jiqian
2025-07-25  8:06       ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.