[PATCH v5 0/5] vPCI: extended capability handling

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v5 0/5] vPCI: extended capability handling
@ 2026-02-25 11:41 Jan Beulich
  2026-02-25 11:43 ` [PATCH v5 1/5] vPCI: introduce private header Jan Beulich
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Jan Beulich @ 2026-02-25 11:41 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org; +Cc: Roger Pau Monné, Stewart Hildebrand

This is a follow-on to 'vPCI: avoid bogus "overlap in extended cap list"
warnings', addressing further issues noted there.

v5: One new patch and some other re-work. See individual patches.

1: introduce private header
2: move vpci_init_capabilities() to a separate file
3: move capability-list init
4: ReBAR: improve cleanup
5: re-init extended-capabilities when MMCFG availability changed

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 1/5] vPCI: introduce private header
  2026-02-25 11:41 [PATCH v5 0/5] vPCI: extended capability handling Jan Beulich
@ 2026-02-25 11:43 ` Jan Beulich
  2026-02-26  3:33   ` Stewart Hildebrand
  2026-03-03 16:44   ` Roger Pau Monné
  2026-02-25 11:43 ` [PATCH v5 2/5] vPCI: move vpci_init_capabilities() to a separate file Jan Beulich
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Jan Beulich @ 2026-02-25 11:43 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org
  Cc: Roger Pau Monné, Stewart Hildebrand, Andrew Cooper,
	Julien Grall, Stefano Stabellini, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Anthony PERARD

Before adding more private stuff to xen/vpci.h, split it up. In order to
be able to include the private header first in a CU, the per-arch struct
decls also need to move (to new asm/vpci.h files).

While adjusting the test harness'es Makefile, also switch the pre-existing
header symlink-ing rule to a pattern one.

Apart from in the test harness code, things only move; no functional
change intended.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Subsequently, at least on x86 more stuff may want moving into asm/vpci.h.
---
v5: Add new generated header to test harness clean rule and to .gitignore.
    Also move vpci_init_header().
v4: New.

--- a/.gitignore
+++ b/.gitignore
@@ -154,6 +154,7 @@ tools/tests/x86_emulator/test_x86_emulat
 tools/tests/x86_emulator/x86_emulate
 tools/tests/x86_emulator/xop*.[ch]
 tools/tests/vpci/list.h
+tools/tests/vpci/private.h
 tools/tests/vpci/vpci.[hc]
 tools/tests/vpci/test_vpci
 tools/xcutils/lsevtchn
--- a/tools/tests/vpci/Makefile
+++ b/tools/tests/vpci/Makefile
@@ -14,12 +14,12 @@ else
 	$(warning HOSTCC != CC, will not run test)
 endif
 
-$(TARGET): vpci.c vpci.h list.h main.c emul.h
-	$(CC) $(CFLAGS_xeninclude) -g -o $@ vpci.c main.c
+$(TARGET): vpci.c vpci.h list.h private.h main.c emul.h
+	$(CC) $(CFLAGS_xeninclude) -include emul.h -g -o $@ vpci.c main.c
 
 .PHONY: clean
 clean:
-	rm -rf $(TARGET) *.o *~ vpci.h vpci.c list.h
+	rm -rf $(TARGET) *.o *~ vpci.h vpci.c list.h private.h
 
 .PHONY: distclean
 distclean: clean
@@ -34,10 +34,10 @@ uninstall:
 	$(RM) -- $(DESTDIR)$(LIBEXEC)/tests/$(TARGET)
 
 vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
-	# Remove includes and add the test harness header
-	sed -e '/#include/d' -e '1s/^/#include "emul.h"/' <$< >$@
+	sed -e '/#include/d' <$< >$@
+
+private.h: %.h: $(XEN_ROOT)/xen/drivers/vpci/%.h
+	sed -e '/#include/d' <$< >$@
 
-list.h: $(XEN_ROOT)/xen/include/xen/list.h
-vpci.h: $(XEN_ROOT)/xen/include/xen/vpci.h
-list.h vpci.h:
+list.h vpci.h: %.h: $(XEN_ROOT)/xen/include/xen/%.h
 	sed -e '/#include/d' <$< >$@
--- a/tools/tests/vpci/emul.h
+++ b/tools/tests/vpci/emul.h
@@ -86,6 +86,7 @@ typedef union {
 
 #define CONFIG_HAS_VPCI
 #include "vpci.h"
+#include "private.h"
 
 #define __hwdom_init
 
--- a/xen/arch/arm/include/asm/pci.h
+++ b/xen/arch/arm/include/asm/pci.h
@@ -31,14 +31,6 @@ struct arch_pci_dev {
     struct device dev;
 };
 
-/* Arch-specific MSI data for vPCI. */
-struct vpci_arch_msi {
-};
-
-/* Arch-specific MSI-X entry data for vPCI. */
-struct vpci_arch_msix_entry {
-};
-
 /*
  * Because of the header cross-dependencies, e.g. we need both
  * struct pci_dev and struct arch_pci_dev at the same time, this cannot be
--- /dev/null
+++ b/xen/arch/arm/include/asm/vpci.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef ARM_VPCI_H
+#define ARM_VPCI_H
+
+/* Arch-specific MSI data for vPCI. */
+struct vpci_arch_msi {
+};
+
+/* Arch-specific MSI-X entry data for vPCI. */
+struct vpci_arch_msix_entry {
+};
+
+#endif /* ARM_VPCI_H */
--- a/xen/arch/x86/include/asm/hvm/io.h
+++ b/xen/arch/x86/include/asm/hvm/io.h
@@ -97,17 +97,6 @@ void msixtbl_init(struct domain *d);
 static inline void msixtbl_init(struct domain *d) {}
 #endif
 
-/* Arch-specific MSI data for vPCI. */
-struct vpci_arch_msi {
-    int pirq;
-    bool bound;
-};
-
-/* Arch-specific MSI-X entry data for vPCI. */
-struct vpci_arch_msix_entry {
-    int pirq;
-};
-
 void stdvga_init(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
--- /dev/null
+++ b/xen/arch/x86/include/asm/vpci.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef X86_VPCI_H
+#define X86_VPCI_H
+
+#include <xen/stdbool.h>
+
+/* Arch-specific MSI data for vPCI. */
+struct vpci_arch_msi {
+    int pirq;
+    bool bound;
+};
+
+/* Arch-specific MSI-X entry data for vPCI. */
+struct vpci_arch_msix_entry {
+    int pirq;
+};
+
+#endif /* X86_VPCI_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -17,11 +17,12 @@
  * License along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include "private.h"
+
 #include <xen/iocap.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
 #include <xen/softirq.h>
-#include <xen/vpci.h>
 
 #include <xsm/xsm.h>
 
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -16,9 +16,10 @@
  * License along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include "private.h"
+
 #include <xen/sched.h>
 #include <xen/softirq.h>
-#include <xen/vpci.h>
 
 #include <asm/msi.h>
 
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -17,10 +17,11 @@
  * License along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include "private.h"
+
 #include <xen/io.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
-#include <xen/vpci.h>
 
 #include <asm/msi.h>
 #include <asm/p2m.h>
--- /dev/null
+++ b/xen/drivers/vpci/private.h
@@ -0,0 +1,126 @@
+#ifndef VPCI_PRIVATE_H
+#define VPCI_PRIVATE_H
+
+#include <xen/vpci.h>
+
+typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
+                             void *data);
+
+typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
+                          uint32_t val, void *data);
+
+typedef struct {
+    unsigned int id;
+    bool is_ext;
+    int (* init)(struct pci_dev *pdev);
+    int (* cleanup)(const struct pci_dev *pdev, bool hide);
+} vpci_capability_t;
+
+#define REGISTER_VPCI_CAPABILITY(cap, name, finit, fclean, ext) \
+    static const vpci_capability_t name##_entry \
+        __used_section(".data.rel.ro.vpci") = { \
+        .id = (cap), \
+        .init = (finit), \
+        .cleanup = (fclean), \
+        .is_ext = (ext), \
+    }
+
+#define REGISTER_VPCI_CAP(name, finit, fclean) \
+    REGISTER_VPCI_CAPABILITY(PCI_CAP_ID_##name, name, finit, fclean, false)
+#define REGISTER_VPCI_EXTCAP(name, finit, fclean) \
+    REGISTER_VPCI_CAPABILITY(PCI_EXT_CAP_ID_##name, name, finit, fclean, true)
+
+int __must_check vpci_init_header(struct pci_dev *pdev);
+
+/* Add/remove a register handler. */
+int __must_check vpci_add_register_mask(struct vpci *vpci,
+                                        vpci_read_t *read_handler,
+                                        vpci_write_t *write_handler,
+                                        unsigned int offset, unsigned int size,
+                                        void *data, uint32_t ro_mask,
+                                        uint32_t rw1c_mask, uint32_t rsvdp_mask,
+                                        uint32_t rsvdz_mask);
+int __must_check vpci_add_register(struct vpci *vpci,
+                                   vpci_read_t *read_handler,
+                                   vpci_write_t *write_handler,
+                                   unsigned int offset, unsigned int size,
+                                   void *data);
+
+int vpci_remove_registers(struct vpci *vpci, unsigned int start,
+                          unsigned int size);
+
+/* Helper to return the value passed in data. */
+uint32_t cf_check vpci_read_val(
+    const struct pci_dev *pdev, unsigned int reg, void *data);
+
+/* Passthrough handlers. */
+uint32_t cf_check vpci_hw_read8(
+    const struct pci_dev *pdev, unsigned int reg, void *data);
+uint32_t cf_check vpci_hw_read16(
+    const struct pci_dev *pdev, unsigned int reg, void *data);
+uint32_t cf_check vpci_hw_read32(
+    const struct pci_dev *pdev, unsigned int reg, void *data);
+void cf_check vpci_hw_write8(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
+void cf_check vpci_hw_write16(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
+
+#ifdef __XEN__
+/* Make sure there's a hole in the p2m for the MSIX mmio areas. */
+int vpci_make_msix_hole(const struct pci_dev *pdev);
+
+/*
+ * Helper functions to fetch MSIX related data. They are used by both the
+ * emulated MSIX code and the BAR handlers.
+ */
+static inline paddr_t vmsix_table_host_base(const struct vpci *vpci,
+                                            unsigned int nr)
+{
+    return vpci->header.bars[vpci->msix->tables[nr] & PCI_MSIX_BIRMASK].addr;
+}
+
+static inline paddr_t vmsix_table_host_addr(const struct vpci *vpci,
+                                            unsigned int nr)
+{
+    return vmsix_table_host_base(vpci, nr) +
+           (vpci->msix->tables[nr] & ~PCI_MSIX_BIRMASK);
+}
+
+static inline paddr_t vmsix_table_base(const struct vpci *vpci, unsigned int nr)
+{
+    return vpci->header.bars[vpci->msix->tables[nr] &
+                             PCI_MSIX_BIRMASK].guest_addr;
+}
+
+static inline paddr_t vmsix_table_addr(const struct vpci *vpci, unsigned int nr)
+{
+    return vmsix_table_base(vpci, nr) +
+           (vpci->msix->tables[nr] & ~PCI_MSIX_BIRMASK);
+}
+
+/*
+ * Note regarding the size calculation of the PBA: the spec mentions "The last
+ * QWORD will not necessarily be fully populated", so it implies that the PBA
+ * size is 64-bit aligned.
+ */
+static inline size_t vmsix_table_size(const struct vpci *vpci, unsigned int nr)
+{
+    return
+        (nr == VPCI_MSIX_TABLE) ? vpci->msix->max_entries * PCI_MSIX_ENTRY_SIZE
+                                : ROUNDUP(DIV_ROUND_UP(vpci->msix->max_entries,
+                                                       8), 8);
+}
+
+#endif /* __XEN__ */
+
+#endif /* VPCI_PRIVATE_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -5,8 +5,9 @@
  * Author: Jiqian Chen <Jiqian.Chen@amd.com>
  */
 
+#include "private.h"
+
 #include <xen/sched.h>
-#include <xen/vpci.h>
 
 static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
                                       unsigned int reg,
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -17,8 +17,9 @@
  * License along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include "private.h"
+
 #include <xen/sched.h>
-#include <xen/vpci.h>
 #include <xen/vmap.h>
 
 /* Internal struct to store the emulated PCI registers. */
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -7,18 +7,7 @@
 #include <xen/types.h>
 #include <xen/list.h>
 
-typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
-                             void *data);
-
-typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
-                          uint32_t val, void *data);
-
-typedef struct {
-    unsigned int id;
-    bool is_ext;
-    int (* init)(struct pci_dev *pdev);
-    int (* cleanup)(const struct pci_dev *pdev, bool hide);
-} vpci_capability_t;
+#include <asm/vpci.h>
 
 #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
 
@@ -30,66 +19,17 @@ typedef struct {
  */
 #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
 
-#define REGISTER_VPCI_CAPABILITY(cap, name, finit, fclean, ext) \
-    static const vpci_capability_t name##_entry \
-        __used_section(".data.rel.ro.vpci") = { \
-        .id = (cap), \
-        .init = (finit), \
-        .cleanup = (fclean), \
-        .is_ext = (ext), \
-    }
-
-#define REGISTER_VPCI_CAP(name, finit, fclean) \
-    REGISTER_VPCI_CAPABILITY(PCI_CAP_ID_##name, name, finit, fclean, false)
-#define REGISTER_VPCI_EXTCAP(name, finit, fclean) \
-    REGISTER_VPCI_CAPABILITY(PCI_EXT_CAP_ID_##name, name, finit, fclean, true)
-
-int __must_check vpci_init_header(struct pci_dev *pdev);
-
 /* Assign vPCI to device by adding handlers. */
 int __must_check vpci_assign_device(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
 
-/* Add/remove a register handler. */
-int __must_check vpci_add_register_mask(struct vpci *vpci,
-                                        vpci_read_t *read_handler,
-                                        vpci_write_t *write_handler,
-                                        unsigned int offset, unsigned int size,
-                                        void *data, uint32_t ro_mask,
-                                        uint32_t rw1c_mask, uint32_t rsvdp_mask,
-                                        uint32_t rsvdz_mask);
-int __must_check vpci_add_register(struct vpci *vpci,
-                                   vpci_read_t *read_handler,
-                                   vpci_write_t *write_handler,
-                                   unsigned int offset, unsigned int size,
-                                   void *data);
-
-int vpci_remove_registers(struct vpci *vpci, unsigned int start,
-                          unsigned int size);
-
 /* Generic read/write handlers for the PCI config space. */
 uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size);
 void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
                 uint32_t data);
 
-/* Helper to return the value passed in data. */
-uint32_t cf_check vpci_read_val(
-    const struct pci_dev *pdev, unsigned int reg, void *data);
-
-/* Passthrough handlers. */
-uint32_t cf_check vpci_hw_read8(
-    const struct pci_dev *pdev, unsigned int reg, void *data);
-uint32_t cf_check vpci_hw_read16(
-    const struct pci_dev *pdev, unsigned int reg, void *data);
-uint32_t cf_check vpci_hw_read32(
-    const struct pci_dev *pdev, unsigned int reg, void *data);
-void cf_check vpci_hw_write8(
-    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
-void cf_check vpci_hw_write16(
-    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
-
 /*
  * Check for pending vPCI operations on this vcpu. Returns true if the vcpu
  * should not run.
@@ -213,9 +153,6 @@ struct vpci_vcpu {
 #ifdef __XEN__
 void vpci_dump_msi(void);
 
-/* Make sure there's a hole in the p2m for the MSIX mmio areas. */
-int vpci_make_msix_hole(const struct pci_dev *pdev);
-
 /* Arch-specific vPCI MSI helpers. */
 void vpci_msi_arch_mask(struct vpci_msi *msi, const struct pci_dev *pdev,
                         unsigned int entry, bool mask);
@@ -238,48 +175,6 @@ int __must_check vpci_msix_arch_disable_
 void vpci_msix_arch_init_entry(struct vpci_msix_entry *entry);
 int vpci_msix_arch_print(const struct vpci_msix *msix);
 
-/*
- * Helper functions to fetch MSIX related data. They are used by both the
- * emulated MSIX code and the BAR handlers.
- */
-static inline paddr_t vmsix_table_host_base(const struct vpci *vpci,
-                                            unsigned int nr)
-{
-    return vpci->header.bars[vpci->msix->tables[nr] & PCI_MSIX_BIRMASK].addr;
-}
-
-static inline paddr_t vmsix_table_host_addr(const struct vpci *vpci,
-                                            unsigned int nr)
-{
-    return vmsix_table_host_base(vpci, nr) +
-           (vpci->msix->tables[nr] & ~PCI_MSIX_BIRMASK);
-}
-
-static inline paddr_t vmsix_table_base(const struct vpci *vpci, unsigned int nr)
-{
-    return vpci->header.bars[vpci->msix->tables[nr] &
-                             PCI_MSIX_BIRMASK].guest_addr;
-}
-
-static inline paddr_t vmsix_table_addr(const struct vpci *vpci, unsigned int nr)
-{
-    return vmsix_table_base(vpci, nr) +
-           (vpci->msix->tables[nr] & ~PCI_MSIX_BIRMASK);
-}
-
-/*
- * Note regarding the size calculation of the PBA: the spec mentions "The last
- * QWORD will not necessarily be fully populated", so it implies that the PBA
- * size is 64-bit aligned.
- */
-static inline size_t vmsix_table_size(const struct vpci *vpci, unsigned int nr)
-{
-    return
-        (nr == VPCI_MSIX_TABLE) ? vpci->msix->max_entries * PCI_MSIX_ENTRY_SIZE
-                                : ROUNDUP(DIV_ROUND_UP(vpci->msix->max_entries,
-                                                       8), 8);
-}
-
 static inline unsigned int vmsix_entry_nr(const struct vpci_msix *msix,
                                           const struct vpci_msix_entry *entry)
 {



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 2/5] vPCI: move vpci_init_capabilities() to a separate file
  2026-02-25 11:41 [PATCH v5 0/5] vPCI: extended capability handling Jan Beulich
  2026-02-25 11:43 ` [PATCH v5 1/5] vPCI: introduce private header Jan Beulich
@ 2026-02-25 11:43 ` Jan Beulich
  2026-02-25 11:43 ` [PATCH v5 3/5] vPCI: move capability-list init Jan Beulich
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2026-02-25 11:43 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org; +Cc: Roger Pau Monné, Stewart Hildebrand

Let's keep capability handling together. Start with moving
vpci_init_capabilities() and its helpers, plus splitting out of its
cleanup counterpart.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
---
vpci_get_register(), while now only used by cap.c, didn't look like it
would want moving there.
---
v4: New.

--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1,2 +1,3 @@
+obj-y += cap.o
 obj-y += vpci.o header.o rebar.o
 obj-$(CONFIG_HAS_PCI_MSI) += msi.o msix.o
--- /dev/null
+++ b/xen/drivers/vpci/cap.c
@@ -0,0 +1,252 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Capability handling for guest PCI configuration space.
+ */
+
+#include "private.h"
+
+#include <xen/sched.h>
+
+extern const vpci_capability_t __start_vpci_array[];
+extern const vpci_capability_t __end_vpci_array[];
+#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
+
+static struct vpci_register *vpci_get_previous_cap_register(
+    const struct vpci *vpci, unsigned int offset)
+{
+    unsigned int next;
+    struct vpci_register *r;
+
+    if ( offset < 0x40 )
+    {
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    for ( r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1); r;
+          r = next >= 0x40 ? vpci_get_register(vpci,
+                                               next + PCI_CAP_LIST_NEXT, 1)
+                           : NULL )
+    {
+        next = (unsigned int)(uintptr_t)r->private;
+        ASSERT(next == (uintptr_t)r->private);
+        if ( next == offset )
+            break;
+    }
+
+    return r;
+}
+
+static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
+{
+    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
+    struct vpci_register *prev_r, *next_r;
+    struct vpci *vpci = pdev->vpci;
+
+    if ( !offset )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&vpci->lock);
+    prev_r = vpci_get_previous_cap_register(vpci, offset);
+    next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
+    if ( !prev_r || !next_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    prev_r->private = next_r->private;
+    /*
+     * Not calling vpci_remove_registers() here is to avoid redoing
+     * the register search.
+     */
+    list_del(&next_r->node);
+    spin_unlock(&vpci->lock);
+    xfree(next_r);
+
+    if ( !is_hardware_domain(pdev->domain) )
+        return vpci_remove_registers(vpci, offset + PCI_CAP_LIST_ID, 1);
+
+    return 0;
+}
+
+static struct vpci_register *vpci_get_previous_ext_cap_register(
+    const struct vpci *vpci, unsigned int offset)
+{
+    unsigned int pos = PCI_CFG_SPACE_SIZE;
+    struct vpci_register *r;
+
+    if ( offset <= PCI_CFG_SPACE_SIZE )
+    {
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    for ( r = vpci_get_register(vpci, pos, 4); r;
+          r = pos > PCI_CFG_SPACE_SIZE ? vpci_get_register(vpci, pos, 4)
+                                       : NULL )
+    {
+        uint32_t header = (uint32_t)(uintptr_t)r->private;
+
+        ASSERT(header == (uintptr_t)r->private);
+
+        pos = PCI_EXT_CAP_NEXT(header);
+        if ( pos == offset )
+            break;
+    }
+
+    return r;
+}
+
+static int vpci_ext_capability_hide(
+    const struct pci_dev *pdev, unsigned int cap)
+{
+    const unsigned int offset = pci_find_ext_capability(pdev, cap);
+    struct vpci_register *r, *prev_r;
+    struct vpci *vpci = pdev->vpci;
+    uint32_t header, pre_header;
+
+    if ( offset < PCI_CFG_SPACE_SIZE )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&vpci->lock);
+    r = vpci_get_register(vpci, offset, 4);
+    if ( !r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    header = (uint32_t)(uintptr_t)r->private;
+    if ( offset == PCI_CFG_SPACE_SIZE )
+    {
+        if ( PCI_EXT_CAP_NEXT(header) <= PCI_CFG_SPACE_SIZE )
+            r->private = (void *)0;
+        else
+            /*
+             * The first extended capability (0x100) can not be removed from
+             * the linked list, so instead mask its capability ID to return 0
+             * hopefully forcing OSes to skip it.
+             */
+            r->private = (void *)(uintptr_t)(header & ~PCI_EXT_CAP_ID(header));
+
+        spin_unlock(&vpci->lock);
+        return 0;
+    }
+
+    prev_r = vpci_get_previous_ext_cap_register(vpci, offset);
+    if ( !prev_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    pre_header = (uint32_t)(uintptr_t)prev_r->private;
+    pre_header &= ~PCI_EXT_CAP_NEXT_MASK;
+    pre_header |= header & PCI_EXT_CAP_NEXT_MASK;
+    prev_r->private = (void *)(uintptr_t)pre_header;
+
+    list_del(&r->node);
+    spin_unlock(&vpci->lock);
+    xfree(r);
+
+    return 0;
+}
+
+int vpci_init_capabilities(struct pci_dev *pdev)
+{
+    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        const vpci_capability_t *capability = &__start_vpci_array[i];
+        const unsigned int cap = capability->id;
+        const bool is_ext = capability->is_ext;
+        unsigned int pos = 0;
+        int rc;
+
+        if ( !is_ext )
+            pos = pci_find_cap_offset(pdev->sbdf, cap);
+        else if ( is_hardware_domain(pdev->domain) )
+            pos = pci_find_ext_capability(pdev, cap);
+
+        if ( !pos )
+            continue;
+
+        rc = capability->init(pdev);
+        if ( rc )
+        {
+            const char *type = is_ext ? "extended" : "legacy";
+
+            printk(XENLOG_WARNING
+                   "%pd %pp: init %s cap %u fail rc=%d, mask it\n",
+                   pdev->domain, &pdev->sbdf, type, cap, rc);
+
+            if ( capability->cleanup )
+            {
+                rc = capability->cleanup(pdev, true);
+                if ( rc )
+                {
+                    printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
+                           pdev->domain, &pdev->sbdf, type, cap, rc);
+                    if ( !is_hardware_domain(pdev->domain) )
+                        return rc;
+                }
+            }
+
+            if ( !is_ext )
+                rc = vpci_capability_hide(pdev, cap);
+            else
+                rc = vpci_ext_capability_hide(pdev, cap);
+            if ( rc )
+            {
+                printk(XENLOG_ERR "%pd %pp: hide %s cap %u fail rc=%d\n",
+                       pdev->domain, &pdev->sbdf, type, cap, rc);
+                return rc;
+            }
+        }
+    }
+
+    return 0;
+}
+
+void vpci_cleanup_capabilities(struct pci_dev *pdev)
+{
+    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        const vpci_capability_t *capability = &__start_vpci_array[i];
+        const unsigned int cap = capability->id;
+        unsigned int pos = 0;
+
+        if ( !capability->cleanup )
+            continue;
+
+        if ( !capability->is_ext )
+            pos = pci_find_cap_offset(pdev->sbdf, cap);
+        else if ( is_hardware_domain(pdev->domain) )
+            pos = pci_find_ext_capability(pdev, cap);
+        if ( pos )
+        {
+            int rc = capability->cleanup(pdev, false);
+
+            if ( rc )
+                printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
+                       pdev->domain, &pdev->sbdf,
+                       capability->is_ext ? "extended" : "legacy", cap, rc);
+        }
+    }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- a/xen/drivers/vpci/private.h
+++ b/xen/drivers/vpci/private.h
@@ -9,6 +9,20 @@ typedef uint32_t vpci_read_t(const struc
 typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
                           uint32_t val, void *data);
 
+/* Internal struct to store the emulated PCI registers. */
+struct vpci_register {
+    vpci_read_t *read;
+    vpci_write_t *write;
+    unsigned int size;
+    unsigned int offset;
+    void *private;
+    struct list_head node;
+    uint32_t ro_mask;
+    uint32_t rw1c_mask;
+    uint32_t rsvdp_mask;
+    uint32_t rsvdz_mask;
+};
+
 typedef struct {
     unsigned int id;
     bool is_ext;
@@ -32,6 +46,9 @@ typedef struct {
 
 int __must_check vpci_init_header(struct pci_dev *pdev);
 
+int vpci_init_capabilities(struct pci_dev *pdev);
+void vpci_cleanup_capabilities(struct pci_dev *pdev);
+
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
                                         vpci_read_t *read_handler,
@@ -49,6 +66,10 @@ int __must_check vpci_add_register(struc
 int vpci_remove_registers(struct vpci *vpci, unsigned int start,
                           unsigned int size);
 
+struct vpci_register *vpci_get_register(const struct vpci *vpci,
+                                        unsigned int offset,
+                                        unsigned int size);
+
 /* Helper to return the value passed in data. */
 uint32_t cf_check vpci_read_val(
     const struct pci_dev *pdev, unsigned int reg, void *data);
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -22,24 +22,7 @@
 #include <xen/sched.h>
 #include <xen/vmap.h>
 
-/* Internal struct to store the emulated PCI registers. */
-struct vpci_register {
-    vpci_read_t *read;
-    vpci_write_t *write;
-    unsigned int size;
-    unsigned int offset;
-    void *private;
-    struct list_head node;
-    uint32_t ro_mask;
-    uint32_t rw1c_mask;
-    uint32_t rsvdp_mask;
-    uint32_t rsvdz_mask;
-};
-
 #ifdef __XEN__
-extern const vpci_capability_t __start_vpci_array[];
-extern const vpci_capability_t __end_vpci_array[];
-#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
 static int assign_virtual_sbdf(struct pci_dev *pdev)
@@ -84,9 +67,9 @@ static int assign_virtual_sbdf(struct pc
 
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
-static struct vpci_register *vpci_get_register(const struct vpci *vpci,
-                                               unsigned int offset,
-                                               unsigned int size)
+struct vpci_register *vpci_get_register(const struct vpci *vpci,
+                                        unsigned int offset,
+                                        unsigned int size)
 {
     struct vpci_register *r;
 
@@ -104,209 +87,6 @@ static struct vpci_register *vpci_get_re
     return NULL;
 }
 
-static struct vpci_register *vpci_get_previous_cap_register(
-    const struct vpci *vpci, unsigned int offset)
-{
-    unsigned int next;
-    struct vpci_register *r;
-
-    if ( offset < 0x40 )
-    {
-        ASSERT_UNREACHABLE();
-        return NULL;
-    }
-
-    for ( r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1); r;
-          r = next >= 0x40 ? vpci_get_register(vpci,
-                                               next + PCI_CAP_LIST_NEXT, 1)
-                           : NULL )
-    {
-        next = (unsigned int)(uintptr_t)r->private;
-        ASSERT(next == (uintptr_t)r->private);
-        if ( next == offset )
-            break;
-    }
-
-    return r;
-}
-
-static int vpci_capability_hide(const struct pci_dev *pdev, unsigned int cap)
-{
-    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
-    struct vpci_register *prev_r, *next_r;
-    struct vpci *vpci = pdev->vpci;
-
-    if ( !offset )
-    {
-        ASSERT_UNREACHABLE();
-        return 0;
-    }
-
-    spin_lock(&vpci->lock);
-    prev_r = vpci_get_previous_cap_register(vpci, offset);
-    next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
-    if ( !prev_r || !next_r )
-    {
-        spin_unlock(&vpci->lock);
-        return -ENODEV;
-    }
-
-    prev_r->private = next_r->private;
-    /*
-     * Not calling vpci_remove_registers() here is to avoid redoing
-     * the register search.
-     */
-    list_del(&next_r->node);
-    spin_unlock(&vpci->lock);
-    xfree(next_r);
-
-    if ( !is_hardware_domain(pdev->domain) )
-        return vpci_remove_registers(vpci, offset + PCI_CAP_LIST_ID, 1);
-
-    return 0;
-}
-
-static struct vpci_register *vpci_get_previous_ext_cap_register(
-    const struct vpci *vpci, unsigned int offset)
-{
-    unsigned int pos = PCI_CFG_SPACE_SIZE;
-    struct vpci_register *r;
-
-    if ( offset <= PCI_CFG_SPACE_SIZE )
-    {
-        ASSERT_UNREACHABLE();
-        return NULL;
-    }
-
-    for ( r = vpci_get_register(vpci, pos, 4); r;
-          r = pos > PCI_CFG_SPACE_SIZE ? vpci_get_register(vpci, pos, 4)
-                                       : NULL )
-    {
-        uint32_t header = (uint32_t)(uintptr_t)r->private;
-
-        ASSERT(header == (uintptr_t)r->private);
-
-        pos = PCI_EXT_CAP_NEXT(header);
-        if ( pos == offset )
-            break;
-    }
-
-    return r;
-}
-
-static int vpci_ext_capability_hide(
-    const struct pci_dev *pdev, unsigned int cap)
-{
-    const unsigned int offset = pci_find_ext_capability(pdev, cap);
-    struct vpci_register *r, *prev_r;
-    struct vpci *vpci = pdev->vpci;
-    uint32_t header, pre_header;
-
-    if ( offset < PCI_CFG_SPACE_SIZE )
-    {
-        ASSERT_UNREACHABLE();
-        return 0;
-    }
-
-    spin_lock(&vpci->lock);
-    r = vpci_get_register(vpci, offset, 4);
-    if ( !r )
-    {
-        spin_unlock(&vpci->lock);
-        return -ENODEV;
-    }
-
-    header = (uint32_t)(uintptr_t)r->private;
-    if ( offset == PCI_CFG_SPACE_SIZE )
-    {
-        if ( PCI_EXT_CAP_NEXT(header) <= PCI_CFG_SPACE_SIZE )
-            r->private = (void *)0;
-        else
-            /*
-             * The first extended capability (0x100) can not be removed from
-             * the linked list, so instead mask its capability ID to return 0
-             * hopefully forcing OSes to skip it.
-             */
-            r->private = (void *)(uintptr_t)(header & ~PCI_EXT_CAP_ID(header));
-
-        spin_unlock(&vpci->lock);
-        return 0;
-    }
-
-    prev_r = vpci_get_previous_ext_cap_register(vpci, offset);
-    if ( !prev_r )
-    {
-        spin_unlock(&vpci->lock);
-        return -ENODEV;
-    }
-
-    pre_header = (uint32_t)(uintptr_t)prev_r->private;
-    pre_header &= ~PCI_EXT_CAP_NEXT_MASK;
-    pre_header |= header & PCI_EXT_CAP_NEXT_MASK;
-    prev_r->private = (void *)(uintptr_t)pre_header;
-
-    list_del(&r->node);
-    spin_unlock(&vpci->lock);
-    xfree(r);
-
-    return 0;
-}
-
-static int vpci_init_capabilities(struct pci_dev *pdev)
-{
-    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
-    {
-        const vpci_capability_t *capability = &__start_vpci_array[i];
-        const unsigned int cap = capability->id;
-        const bool is_ext = capability->is_ext;
-        unsigned int pos = 0;
-        int rc;
-
-        if ( !is_ext )
-            pos = pci_find_cap_offset(pdev->sbdf, cap);
-        else if ( is_hardware_domain(pdev->domain) )
-            pos = pci_find_ext_capability(pdev, cap);
-
-        if ( !pos )
-            continue;
-
-        rc = capability->init(pdev);
-        if ( rc )
-        {
-            const char *type = is_ext ? "extended" : "legacy";
-
-            printk(XENLOG_WARNING
-                   "%pd %pp: init %s cap %u fail rc=%d, mask it\n",
-                   pdev->domain, &pdev->sbdf, type, cap, rc);
-
-            if ( capability->cleanup )
-            {
-                rc = capability->cleanup(pdev, true);
-                if ( rc )
-                {
-                    printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
-                           pdev->domain, &pdev->sbdf, type, cap, rc);
-                    if ( !is_hardware_domain(pdev->domain) )
-                        return rc;
-                }
-            }
-
-            if ( !is_ext )
-                rc = vpci_capability_hide(pdev, cap);
-            else
-                rc = vpci_ext_capability_hide(pdev, cap);
-            if ( rc )
-            {
-                printk(XENLOG_ERR "%pd %pp: hide %s cap %u fail rc=%d\n",
-                       pdev->domain, &pdev->sbdf, type, cap, rc);
-                return rc;
-            }
-        }
-    }
-
-    return 0;
-}
-
 void vpci_deassign_device(struct pci_dev *pdev)
 {
     unsigned int i;
@@ -322,29 +102,7 @@ void vpci_deassign_device(struct pci_dev
                     &pdev->domain->vpci_dev_assigned_map);
 #endif
 
-    for ( i = 0; i < NUM_VPCI_INIT; i++ )
-    {
-        const vpci_capability_t *capability = &__start_vpci_array[i];
-        const unsigned int cap = capability->id;
-        unsigned int pos = 0;
-
-        if ( !capability->cleanup )
-            continue;
-
-        if ( !capability->is_ext )
-            pos = pci_find_cap_offset(pdev->sbdf, cap);
-        else if ( is_hardware_domain(pdev->domain) )
-            pos = pci_find_ext_capability(pdev, cap);
-        if ( pos )
-        {
-            int rc = capability->cleanup(pdev, false);
-
-            if ( rc )
-                printk(XENLOG_ERR "%pd %pp: clean %s cap %u fail rc=%d\n",
-                       pdev->domain, &pdev->sbdf,
-                       capability->is_ext ? "extended" : "legacy", cap, rc);
-        }
-    }
+    vpci_cleanup_capabilities(pdev);
 
     spin_lock(&pdev->vpci->lock);
     while ( !list_empty(&pdev->vpci->handlers) )



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 3/5] vPCI: move capability-list init
  2026-02-25 11:41 [PATCH v5 0/5] vPCI: extended capability handling Jan Beulich
  2026-02-25 11:43 ` [PATCH v5 1/5] vPCI: introduce private header Jan Beulich
  2026-02-25 11:43 ` [PATCH v5 2/5] vPCI: move vpci_init_capabilities() to a separate file Jan Beulich
@ 2026-02-25 11:43 ` Jan Beulich
  2026-02-25 11:44 ` [PATCH v5 4/5] vPCI/ReBAR: improve cleanup Jan Beulich
  2026-02-25 11:44 ` [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed Jan Beulich
  4 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2026-02-25 11:43 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org; +Cc: Roger Pau Monné, Stewart Hildebrand

... both for when the functions are invoked and where they live in source.
Don't invoke them directly in vpci_init_header(), but instead first thing
in vpci_init_capabilities().

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
---
v4: New.

--- a/xen/drivers/vpci/cap.c
+++ b/xen/drivers/vpci/cap.c
@@ -159,15 +159,150 @@ static int vpci_ext_capability_hide(
     return 0;
 }
 
+static int vpci_init_capability_list(struct pci_dev *pdev)
+{
+    int rc;
+    bool mask_cap_list = false;
+    bool is_hwdom = is_hardware_domain(pdev->domain);
+
+    if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
+    {
+        /* Only expose capabilities to the guest that vPCI can handle. */
+        unsigned int next, ttl = 48;
+        static const unsigned int supported_caps[] = {
+            PCI_CAP_ID_MSI,
+            PCI_CAP_ID_MSIX,
+        };
+        /*
+         * For dom0, we should expose all capabilities instead of a fixed
+         * capabilities array, so setting n to 0 here is to get the next
+         * capability position directly in pci_find_next_cap_ttl.
+         */
+        const unsigned int n = is_hwdom ? 0 : ARRAY_SIZE(supported_caps);
+
+        next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
+                                     supported_caps, n, &ttl);
+
+        rc = vpci_add_register(pdev->vpci, vpci_read_val,
+                               is_hwdom ? vpci_hw_write8 : NULL,
+                               PCI_CAPABILITY_LIST, 1,
+                               (void *)(uintptr_t)next);
+        if ( rc )
+            return rc;
+
+        next &= ~3;
+
+        if ( !next && !is_hwdom )
+            /*
+             * If we don't have any supported capabilities to expose to the
+             * guest, mask the PCI_STATUS_CAP_LIST bit in the status
+             * register.
+             */
+            mask_cap_list = true;
+
+        while ( next && ttl )
+        {
+            unsigned int pos = next;
+
+            next = pci_find_next_cap_ttl(pdev->sbdf,
+                                         pos + PCI_CAP_LIST_NEXT,
+                                         supported_caps, n, &ttl);
+
+            if ( !is_hwdom )
+            {
+                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
+                                       pos + PCI_CAP_LIST_ID, 1, NULL);
+                if ( rc )
+                    return rc;
+            }
+
+            rc = vpci_add_register(pdev->vpci, vpci_read_val,
+                                   is_hwdom ? vpci_hw_write8 : NULL,
+                                   pos + PCI_CAP_LIST_NEXT, 1,
+                                   (void *)(uintptr_t)next);
+            if ( rc )
+                return rc;
+
+            next &= ~3;
+        }
+    }
+
+    /* Return early for the hw domain, no masking of PCI_STATUS. */
+    if ( is_hwdom )
+        return 0;
+
+    /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
+    return vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
+                                  PCI_STATUS, 2, NULL,
+                                  PCI_STATUS_RO_MASK &
+                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
+                                  PCI_STATUS_RW1C_MASK,
+                                  mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
+                                  PCI_STATUS_RSVDZ_MASK);
+}
+
+static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
+{
+    unsigned int pos = PCI_CFG_SPACE_SIZE;
+
+    if ( !pdev->ext_cfg )
+        return 0;
+
+    if ( !is_hardware_domain(pdev->domain) )
+        /* Extended capabilities read as zero, write ignore for DomU */
+        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                 pos, 4, (void *)0);
+
+    do
+    {
+        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
+        int rc;
+
+        if ( header == 0xffffffffU )
+        {
+            printk(XENLOG_WARNING
+                   "%pd %pp: broken extended cap list, offset %#x\n",
+                   pdev->domain, &pdev->sbdf, pos);
+            return 0;
+        }
+
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                               pos, 4, (void *)(uintptr_t)header);
+        if ( rc == -EEXIST )
+        {
+            printk(XENLOG_WARNING
+                   "%pd %pp: overlap in extended cap list, offset %#x\n",
+                   pdev->domain, &pdev->sbdf, pos);
+            return 0;
+        }
+
+        if ( rc )
+            return rc;
+
+        pos = PCI_EXT_CAP_NEXT(header);
+    } while ( pos >= PCI_CFG_SPACE_SIZE );
+
+    return 0;
+}
+
 int vpci_init_capabilities(struct pci_dev *pdev)
 {
+    int rc;
+
+    rc = vpci_init_capability_list(pdev);
+    if ( rc )
+        return rc;
+
+    rc = vpci_init_ext_capability_list(pdev);
+    if ( rc )
+        return rc;
+
     for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
     {
         const vpci_capability_t *capability = &__start_vpci_array[i];
         const unsigned int cap = capability->id;
         const bool is_ext = capability->is_ext;
         unsigned int pos = 0;
-        int rc;
 
         if ( !is_ext )
             pos = pci_find_cap_offset(pdev->sbdf, cap);
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -744,132 +744,6 @@ static int bar_add_rangeset(const struct
     return !bar->mem ? -ENOMEM : 0;
 }
 
-static int vpci_init_capability_list(struct pci_dev *pdev)
-{
-    int rc;
-    bool mask_cap_list = false;
-    bool is_hwdom = is_hardware_domain(pdev->domain);
-
-    if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
-    {
-        /* Only expose capabilities to the guest that vPCI can handle. */
-        unsigned int next, ttl = 48;
-        static const unsigned int supported_caps[] = {
-            PCI_CAP_ID_MSI,
-            PCI_CAP_ID_MSIX,
-        };
-        /*
-         * For dom0, we should expose all capabilities instead of a fixed
-         * capabilities array, so setting n to 0 here is to get the next
-         * capability position directly in pci_find_next_cap_ttl.
-         */
-        const unsigned int n = is_hwdom ? 0 : ARRAY_SIZE(supported_caps);
-
-        next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
-                                     supported_caps, n, &ttl);
-
-        rc = vpci_add_register(pdev->vpci, vpci_read_val,
-                               is_hwdom ? vpci_hw_write8 : NULL,
-                               PCI_CAPABILITY_LIST, 1,
-                               (void *)(uintptr_t)next);
-        if ( rc )
-            return rc;
-
-        next &= ~3;
-
-        if ( !next && !is_hwdom )
-            /*
-             * If we don't have any supported capabilities to expose to the
-             * guest, mask the PCI_STATUS_CAP_LIST bit in the status
-             * register.
-             */
-            mask_cap_list = true;
-
-        while ( next && ttl )
-        {
-            unsigned int pos = next;
-
-            next = pci_find_next_cap_ttl(pdev->sbdf,
-                                         pos + PCI_CAP_LIST_NEXT,
-                                         supported_caps, n, &ttl);
-
-            if ( !is_hwdom )
-            {
-                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
-                                       pos + PCI_CAP_LIST_ID, 1, NULL);
-                if ( rc )
-                    return rc;
-            }
-
-            rc = vpci_add_register(pdev->vpci, vpci_read_val,
-                                   is_hwdom ? vpci_hw_write8 : NULL,
-                                   pos + PCI_CAP_LIST_NEXT, 1,
-                                   (void *)(uintptr_t)next);
-            if ( rc )
-                return rc;
-
-            next &= ~3;
-        }
-    }
-
-    /* Return early for the hw domain, no masking of PCI_STATUS. */
-    if ( is_hwdom )
-        return 0;
-
-    /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
-    return vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
-                                  PCI_STATUS, 2, NULL,
-                                  PCI_STATUS_RO_MASK &
-                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
-                                  PCI_STATUS_RW1C_MASK,
-                                  mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
-                                  PCI_STATUS_RSVDZ_MASK);
-}
-
-static int vpci_init_ext_capability_list(const struct pci_dev *pdev)
-{
-    unsigned int pos = PCI_CFG_SPACE_SIZE;
-
-    if ( !pdev->ext_cfg )
-        return 0;
-
-    if ( !is_hardware_domain(pdev->domain) )
-        /* Extended capabilities read as zero, write ignore for DomU */
-        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
-                                 pos, 4, (void *)0);
-
-    do
-    {
-        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
-        int rc;
-
-        if ( header == 0xffffffffU )
-        {
-            printk(XENLOG_WARNING
-                   "%pd %pp: broken extended cap list, offset %#x\n",
-                   pdev->domain, &pdev->sbdf, pos);
-            return 0;
-        }
-
-        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
-                               pos, 4, (void *)(uintptr_t)header);
-        if ( rc == -EEXIST )
-        {
-            printk(XENLOG_WARNING
-                   "%pd %pp: overlap in extended cap list, offset %#x\n",
-                   pdev->domain, &pdev->sbdf, pos);
-            return 0;
-        }
-
-        if ( rc )
-            return rc;
-
-        pos = PCI_EXT_CAP_NEXT(header);
-    } while ( pos >= PCI_CFG_SPACE_SIZE );
-
-    return 0;
-}
-
 int vpci_init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -918,14 +792,6 @@ int vpci_init_header(struct pci_dev *pde
     if ( rc )
         return rc;
 
-    rc = vpci_init_capability_list(pdev);
-    if ( rc )
-        return rc;
-
-    rc = vpci_init_ext_capability_list(pdev);
-    if ( rc )
-        return rc;
-
     if ( pdev->ignore_bars )
         return 0;
 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 4/5] vPCI/ReBAR: improve cleanup
  2026-02-25 11:41 [PATCH v5 0/5] vPCI: extended capability handling Jan Beulich
                   ` (2 preceding siblings ...)
  2026-02-25 11:43 ` [PATCH v5 3/5] vPCI: move capability-list init Jan Beulich
@ 2026-02-25 11:44 ` Jan Beulich
  2026-03-04 13:46   ` Roger Pau Monné
  2026-02-25 11:44 ` [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed Jan Beulich
  4 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2026-02-25 11:44 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org; +Cc: Roger Pau Monné, Stewart Hildebrand

We cannot assume extended config space to (still) be accessible when
cleaning up. Necessary value need caching instead. In fact, as the caller
also cannot look up extended capabilities, the cleanup function needs to
cope with being called when there's no ReBAR capability at all.

As kind of a side effect nbars being 0 (which init_rebar() doesn't
explicitly reject) no longer results in a bogus call to
vpci_remove_registers().

Fixes: ee459aeac096 ("vpci/rebar: Implement cleanup function for Rebar")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v5: New.

--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -53,17 +53,12 @@ static void cf_check rebar_ctrl_write(co
 static int cf_check cleanup_rebar(const struct pci_dev *pdev, bool hide)
 {
     int rc;
-    uint32_t ctrl;
-    unsigned int nbars;
-    unsigned int rebar_offset = pci_find_ext_capability(pdev,
-                                                        PCI_EXT_CAP_ID_REBAR);
+    unsigned int nbars = pdev->vpci->rebar.nbars;
+    unsigned int rebar_offset = pdev->vpci->rebar.offset;
 
-    if ( !hide )
+    if ( !rebar_offset || !nbars || !hide )
         return 0;
 
-    ctrl = pci_conf_read32(pdev->sbdf, rebar_offset + PCI_REBAR_CTRL(0));
-    nbars = MASK_EXTR(ctrl, PCI_REBAR_CTRL_NBAR_MASK);
-
     rc = vpci_remove_registers(pdev->vpci, rebar_offset + PCI_REBAR_CAP(0),
                                PCI_REBAR_CTRL(nbars - 1));
     if ( rc )
@@ -121,6 +116,10 @@ static int cf_check init_rebar(struct pc
 
     ctrl = pci_conf_read32(pdev->sbdf, rebar_offset + PCI_REBAR_CTRL(0));
     nbars = MASK_EXTR(ctrl, PCI_REBAR_CTRL_NBAR_MASK);
+
+    pdev->vpci->rebar.offset = rebar_offset;
+    pdev->vpci->rebar.nbars  = nbars;
+
     for ( unsigned int i = 0; i < nbars; i++ )
     {
         int rc;
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -135,6 +135,13 @@ struct vpci {
             struct vpci_arch_msix_entry arch;
         } entries[];
     } *msix;
+
+    /* Resizable BARs data */
+    struct vpci_rebar {
+        unsigned int offset:12;
+        unsigned int nbars:3;
+    } rebar;
+
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
     /* Guest SBDF of the device. */
 #define INVALID_GUEST_SBDF ((pci_sbdf_t){ .sbdf = ~0U })



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed
  2026-02-25 11:41 [PATCH v5 0/5] vPCI: extended capability handling Jan Beulich
                   ` (3 preceding siblings ...)
  2026-02-25 11:44 ` [PATCH v5 4/5] vPCI/ReBAR: improve cleanup Jan Beulich
@ 2026-02-25 11:44 ` Jan Beulich
  2026-03-04 15:06   ` Roger Pau Monné
  4 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2026-02-25 11:44 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org; +Cc: Roger Pau Monné, Stewart Hildebrand

When Dom0 informs us about MMCFG usability, this may change whether
extended capabilities are available (accessible) for devices. Zap what
might be on record, and re-initialize things.

No synchronization is added for the case where devices may already be in
use. That'll need sorting when (a) DomU support was added and (b) DomU-s
may run already while Dom0 / hwdom still boots (dom0less, Hyperlaunch).

vpci_cleanup_capabilities() also shouldn't have used
pci_find_ext_capability(), as already when the function was introduced
extended config space may not have been (properly) accessible anymore,
no matter whether it was during init. Extended capability cleanup hooks
need to cope with being called when the respective capability doesn't
exist (and hence the corresponding ->init() hook was never called).

Fixes: 70e6dace747e ("vpci: Use cleanup to free capability resource during deassign")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
vpci_reinit_ext_capabilities()'es return value is checked only to log an
error; it doesn't feel quite right to fail the hypercall because of this.
Roger brought up the idea of de-assigning the device in such a case, but
if a driver doesn't use extended capabilities the device would likely
continue to work fine, for Dom0 this probably wouldn't be quite right
anyway, and it's also unclear whether calling deassign_device() could be
done from this context. Something like what pci_check_disable_device()
does may be an option, if we really think we need to "break" the device.

The use of is_hardware_domain() in vpci_cleanup_capabilities() was
uncommented and hence is left so. Shouldn't there be a DomU-related TODO
or FIXME?
---
v5: Don't use pci_find_ext_capability() in vpci_cleanup_capabilities().
    Add assertion in vpci_reinit_ext_capabilities().
v4: Make sure ->cleanup() and ->init() are invoked.
v3: New.

--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -8,6 +8,8 @@
 #include <xen/guest_access.h>
 #include <xen/iocap.h>
 #include <xen/serial.h>
+#include <xen/vpci.h>
+
 #include <asm/current.h>
 #include <asm/io_apic.h>
 #include <asm/msi.h>
@@ -169,8 +171,17 @@ int cf_check physdev_check_pci_extcfg(st
 
     ASSERT(pdev->seg == info->segment);
     if ( pdev->bus >= info->start_bus && pdev->bus <= info->end_bus )
+    {
+        int rc;
+
         pci_check_extcfg(pdev);
 
+        rc = vpci_reinit_ext_capabilities(pdev);
+        if ( rc )
+            gprintk(XENLOG_ERR, "%pp(%pd): vPCI extcap reinit failed: %d\n",
+                    &pdev->sbdf, pdev->domain, rc);
+    }
+
     return 0;
 }
 #endif /* COMPAT */
--- a/xen/drivers/vpci/cap.c
+++ b/xen/drivers/vpci/cap.c
@@ -285,13 +285,16 @@ static int vpci_init_ext_capability_list
     return 0;
 }
 
-int vpci_init_capabilities(struct pci_dev *pdev)
+int vpci_init_capabilities(struct pci_dev *pdev, bool ext_only)
 {
     int rc;
 
-    rc = vpci_init_capability_list(pdev);
-    if ( rc )
-        return rc;
+    if ( !ext_only )
+    {
+        rc = vpci_init_capability_list(pdev);
+        if ( rc )
+            return rc;
+    }
 
     rc = vpci_init_ext_capability_list(pdev);
     if ( rc )
@@ -305,7 +308,7 @@ int vpci_init_capabilities(struct pci_de
         unsigned int pos = 0;
 
         if ( !is_ext )
-            pos = pci_find_cap_offset(pdev->sbdf, cap);
+            pos = !ext_only ? pci_find_cap_offset(pdev->sbdf, cap) : 0;
         else if ( is_hardware_domain(pdev->domain) )
             pos = pci_find_ext_capability(pdev, cap);
 
@@ -349,22 +352,23 @@ int vpci_init_capabilities(struct pci_de
     return 0;
 }
 
-void vpci_cleanup_capabilities(struct pci_dev *pdev)
+void vpci_cleanup_capabilities(struct pci_dev *pdev, bool ext_only)
 {
     for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
     {
         const vpci_capability_t *capability = &__start_vpci_array[i];
         const unsigned int cap = capability->id;
-        unsigned int pos = 0;
 
         if ( !capability->cleanup )
             continue;
 
-        if ( !capability->is_ext )
-            pos = pci_find_cap_offset(pdev->sbdf, cap);
-        else if ( is_hardware_domain(pdev->domain) )
-            pos = pci_find_ext_capability(pdev, cap);
-        if ( pos )
+        /*
+         * Cannot call pci_find_ext_capability() here, as extended config
+         * space may (no longer) be accessible.
+         */
+        if ( capability->is_ext
+             ? is_hardware_domain(pdev->domain)
+             : !ext_only && pci_find_cap_offset(pdev->sbdf, cap) )
         {
             int rc = capability->cleanup(pdev, false);
 
@@ -376,6 +380,28 @@ void vpci_cleanup_capabilities(struct pc
     }
 }
 
+int vpci_reinit_ext_capabilities(struct pci_dev *pdev)
+{
+    if ( !pdev->vpci )
+        return 0;
+
+    /*
+     * FIXME: DomU support is missing.  For already running domains we may
+     * need to pause them around the entire re-evaluation of extended config
+     * space accessibility.
+     */
+    if ( pdev->domain )
+        ASSERT(pdev->domain == hardware_domain || pdev->domain == dom_io);
+
+    vpci_cleanup_capabilities(pdev, true);
+
+    if ( vpci_remove_registers(pdev->vpci, PCI_CFG_SPACE_SIZE,
+                               PCI_CFG_SPACE_EXP_SIZE - PCI_CFG_SPACE_SIZE) )
+        ASSERT_UNREACHABLE();
+
+    return vpci_init_capabilities(pdev, true);
+}
+
 /*
  * Local variables:
  * mode: C
--- a/xen/drivers/vpci/private.h
+++ b/xen/drivers/vpci/private.h
@@ -46,8 +46,8 @@ typedef struct {
 
 int __must_check vpci_init_header(struct pci_dev *pdev);
 
-int vpci_init_capabilities(struct pci_dev *pdev);
-void vpci_cleanup_capabilities(struct pci_dev *pdev);
+int vpci_init_capabilities(struct pci_dev *pdev, bool ext_only);
+void vpci_cleanup_capabilities(struct pci_dev *pdev, bool ext_only);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -102,7 +102,7 @@ void vpci_deassign_device(struct pci_dev
                     &pdev->domain->vpci_dev_assigned_map);
 #endif
 
-    vpci_cleanup_capabilities(pdev);
+    vpci_cleanup_capabilities(pdev, false);
 
     spin_lock(&pdev->vpci->lock);
     while ( !list_empty(&pdev->vpci->handlers) )
@@ -159,7 +159,7 @@ int vpci_assign_device(struct pci_dev *p
     if ( rc )
         goto out;
 
-    rc = vpci_init_capabilities(pdev);
+    rc = vpci_init_capabilities(pdev, false);
 
  out:
     if ( rc )
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -25,6 +25,8 @@ int __must_check vpci_assign_device(stru
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
 
+int vpci_reinit_ext_capabilities(struct pci_dev *pdev);
+
 /* Generic read/write handlers for the PCI config space. */
 uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size);
 void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
@@ -202,6 +204,11 @@ bool vpci_ecam_read(pci_sbdf_t sbdf, uns
 #else /* !CONFIG_HAS_VPCI */
 struct vpci_vcpu {};
 
+static inline int vpci_reinit_ext_capabilities(struct pci_dev *pdev)
+{
+    return 0;
+}
+
 static inline int vpci_assign_device(struct pci_dev *pdev)
 {
     return 0;



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/5] vPCI: introduce private header
  2026-02-25 11:43 ` [PATCH v5 1/5] vPCI: introduce private header Jan Beulich
@ 2026-02-26  3:33   ` Stewart Hildebrand
  2026-03-03 16:44   ` Roger Pau Monné
  1 sibling, 0 replies; 19+ messages in thread
From: Stewart Hildebrand @ 2026-02-26  3:33 UTC (permalink / raw)
  To: Jan Beulich, xen-devel@lists.xenproject.org
  Cc: Roger Pau Monné, Andrew Cooper, Julien Grall,
	Stefano Stabellini, Volodymyr Babchuk, Bertrand Marquis,
	Michal Orzel, Anthony PERARD

On 2/25/26 06:43, Jan Beulich wrote:
> Before adding more private stuff to xen/vpci.h, split it up. In order to
> be able to include the private header first in a CU, the per-arch struct
> decls also need to move (to new asm/vpci.h files).
> 
> While adjusting the test harness'es Makefile, also switch the pre-existing
> header symlink-ing rule to a pattern one.
> 
> Apart from in the test harness code, things only move; no functional
> change intended.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/5] vPCI: introduce private header
  2026-02-25 11:43 ` [PATCH v5 1/5] vPCI: introduce private header Jan Beulich
  2026-02-26  3:33   ` Stewart Hildebrand
@ 2026-03-03 16:44   ` Roger Pau Monné
  2026-03-03 16:48     ` Jan Beulich
  1 sibling, 1 reply; 19+ messages in thread
From: Roger Pau Monné @ 2026-03-03 16:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand, Andrew Cooper,
	Julien Grall, Stefano Stabellini, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Anthony PERARD

On Wed, Feb 25, 2026 at 12:43:01PM +0100, Jan Beulich wrote:
> Before adding more private stuff to xen/vpci.h, split it up. In order to
> be able to include the private header first in a CU, the per-arch struct
> decls also need to move (to new asm/vpci.h files).
> 
> While adjusting the test harness'es Makefile, also switch the pre-existing
> header symlink-ing rule to a pattern one.
> 
> Apart from in the test harness code, things only move; no functional
> change intended.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

One comment below.  The Ack stands regardless of whether you want to
change it or not.

> ---
> Subsequently, at least on x86 more stuff may want moving into asm/vpci.h.
> ---
> v5: Add new generated header to test harness clean rule and to .gitignore.
>     Also move vpci_init_header().
> v4: New.
> 
> --- a/.gitignore
> +++ b/.gitignore
> @@ -154,6 +154,7 @@ tools/tests/x86_emulator/test_x86_emulat
>  tools/tests/x86_emulator/x86_emulate
>  tools/tests/x86_emulator/xop*.[ch]
>  tools/tests/vpci/list.h
> +tools/tests/vpci/private.h
>  tools/tests/vpci/vpci.[hc]
>  tools/tests/vpci/test_vpci
>  tools/xcutils/lsevtchn
> --- a/tools/tests/vpci/Makefile
> +++ b/tools/tests/vpci/Makefile
> @@ -14,12 +14,12 @@ else
>  	$(warning HOSTCC != CC, will not run test)
>  endif
>  
> -$(TARGET): vpci.c vpci.h list.h main.c emul.h
> -	$(CC) $(CFLAGS_xeninclude) -g -o $@ vpci.c main.c
> +$(TARGET): vpci.c vpci.h list.h private.h main.c emul.h
> +	$(CC) $(CFLAGS_xeninclude) -include emul.h -g -o $@ vpci.c main.c
>  
>  .PHONY: clean
>  clean:
> -	rm -rf $(TARGET) *.o *~ vpci.h vpci.c list.h
> +	rm -rf $(TARGET) *.o *~ vpci.h vpci.c list.h private.h
>  
>  .PHONY: distclean
>  distclean: clean
> @@ -34,10 +34,10 @@ uninstall:
>  	$(RM) -- $(DESTDIR)$(LIBEXEC)/tests/$(TARGET)
>  
>  vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
> -	# Remove includes and add the test harness header
> -	sed -e '/#include/d' -e '1s/^/#include "emul.h"/' <$< >$@
> +	sed -e '/#include/d' <$< >$@
> +
> +private.h: %.h: $(XEN_ROOT)/xen/drivers/vpci/%.h
> +	sed -e '/#include/d' <$< >$@

Nit: if you are changing/adding those we might as well do
/^#[[:space:]]*include/d, as sometimes we add spaces if the header
inclusion is conditional.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/5] vPCI: introduce private header
  2026-03-03 16:44   ` Roger Pau Monné
@ 2026-03-03 16:48     ` Jan Beulich
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2026-03-03 16:48 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand, Andrew Cooper,
	Julien Grall, Stefano Stabellini, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Anthony PERARD

On 03.03.2026 17:44, Roger Pau Monné wrote:
> On Wed, Feb 25, 2026 at 12:43:01PM +0100, Jan Beulich wrote:
>> Before adding more private stuff to xen/vpci.h, split it up. In order to
>> be able to include the private header first in a CU, the per-arch struct
>> decls also need to move (to new asm/vpci.h files).
>>
>> While adjusting the test harness'es Makefile, also switch the pre-existing
>> header symlink-ing rule to a pattern one.
>>
>> Apart from in the test harness code, things only move; no functional
>> change intended.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> --- a/tools/tests/vpci/Makefile
>> +++ b/tools/tests/vpci/Makefile
>> @@ -14,12 +14,12 @@ else
>>  	$(warning HOSTCC != CC, will not run test)
>>  endif
>>  
>> -$(TARGET): vpci.c vpci.h list.h main.c emul.h
>> -	$(CC) $(CFLAGS_xeninclude) -g -o $@ vpci.c main.c
>> +$(TARGET): vpci.c vpci.h list.h private.h main.c emul.h
>> +	$(CC) $(CFLAGS_xeninclude) -include emul.h -g -o $@ vpci.c main.c
>>  
>>  .PHONY: clean
>>  clean:
>> -	rm -rf $(TARGET) *.o *~ vpci.h vpci.c list.h
>> +	rm -rf $(TARGET) *.o *~ vpci.h vpci.c list.h private.h
>>  
>>  .PHONY: distclean
>>  distclean: clean
>> @@ -34,10 +34,10 @@ uninstall:
>>  	$(RM) -- $(DESTDIR)$(LIBEXEC)/tests/$(TARGET)
>>  
>>  vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
>> -	# Remove includes and add the test harness header
>> -	sed -e '/#include/d' -e '1s/^/#include "emul.h"/' <$< >$@
>> +	sed -e '/#include/d' <$< >$@
>> +
>> +private.h: %.h: $(XEN_ROOT)/xen/drivers/vpci/%.h
>> +	sed -e '/#include/d' <$< >$@
> 
> Nit: if you are changing/adding those we might as well do
> /^#[[:space:]]*include/d, as sometimes we add spaces if the header
> inclusion is conditional.

Editing these two like you say would leave an inconsistency with what's
further down (and what isn't being touched). I think I'd hence prefer
that to be a separate change. Nevertheless, I agree with the intention.

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/5] vPCI/ReBAR: improve cleanup
  2026-02-25 11:44 ` [PATCH v5 4/5] vPCI/ReBAR: improve cleanup Jan Beulich
@ 2026-03-04 13:46   ` Roger Pau Monné
  0 siblings, 0 replies; 19+ messages in thread
From: Roger Pau Monné @ 2026-03-04 13:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand

On Wed, Feb 25, 2026 at 12:44:22PM +0100, Jan Beulich wrote:
> We cannot assume extended config space to (still) be accessible when
> cleaning up. Necessary value need caching instead. In fact, as the caller
> also cannot look up extended capabilities, the cleanup function needs to
> cope with being called when there's no ReBAR capability at all.

I see that you adjust the code in vpci_cleanup_capabilities() in the
next patch, so that it also doesn't check for extended capability
presence as part of cleanup.

> As kind of a side effect nbars being 0 (which init_rebar() doesn't
> explicitly reject) no longer results in a bogus call to
> vpci_remove_registers().
> 
> Fixes: ee459aeac096 ("vpci/rebar: Implement cleanup function for Rebar")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed
  2026-02-25 11:44 ` [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed Jan Beulich
@ 2026-03-04 15:06   ` Roger Pau Monné
  2026-03-04 15:39     ` Jan Beulich
  2026-03-05  9:00     ` Jan Beulich
  0 siblings, 2 replies; 19+ messages in thread
From: Roger Pau Monné @ 2026-03-04 15:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand

On Wed, Feb 25, 2026 at 12:44:44PM +0100, Jan Beulich wrote:
> When Dom0 informs us about MMCFG usability, this may change whether
> extended capabilities are available (accessible) for devices. Zap what
> might be on record, and re-initialize things.
> 
> No synchronization is added for the case where devices may already be in
> use. That'll need sorting when (a) DomU support was added and (b) DomU-s
> may run already while Dom0 / hwdom still boots (dom0less, Hyperlaunch).
> 
> vpci_cleanup_capabilities() also shouldn't have used
> pci_find_ext_capability(), as already when the function was introduced
> extended config space may not have been (properly) accessible anymore,
> no matter whether it was during init. Extended capability cleanup hooks
> need to cope with being called when the respective capability doesn't
> exist (and hence the corresponding ->init() hook was never called).
> 
> Fixes: 70e6dace747e ("vpci: Use cleanup to free capability resource during deassign")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> vpci_reinit_ext_capabilities()'es return value is checked only to log an
> error; it doesn't feel quite right to fail the hypercall because of this.
> Roger brought up the idea of de-assigning the device in such a case, but
> if a driver doesn't use extended capabilities the device would likely
> continue to work fine, for Dom0 this probably wouldn't be quite right
> anyway, and it's also unclear whether calling deassign_device() could be
> done from this context. Something like what pci_check_disable_device()
> does may be an option, if we really think we need to "break" the device.

We may want to add a note there, stating that we have considered all
possible options, and hiding the capability and hoping the owner
domain would continue to work as expected seems the less bad of all of
them?

> The use of is_hardware_domain() in vpci_cleanup_capabilities() was
> uncommented and hence is left so. Shouldn't there be a DomU-related TODO
> or FIXME?

Hm, yes, possibly.  I think limiting extended space availability to
the hardware domain only has been done "just" because we have no
extended capabilities to expose to domUs so far, and I don't think we
even setup the extended capability list in the domU case.

> ---
> v5: Don't use pci_find_ext_capability() in vpci_cleanup_capabilities().
>     Add assertion in vpci_reinit_ext_capabilities().
> v4: Make sure ->cleanup() and ->init() are invoked.
> v3: New.
> 
> --- a/xen/arch/x86/physdev.c
> +++ b/xen/arch/x86/physdev.c
> @@ -8,6 +8,8 @@
>  #include <xen/guest_access.h>
>  #include <xen/iocap.h>
>  #include <xen/serial.h>
> +#include <xen/vpci.h>
> +
>  #include <asm/current.h>
>  #include <asm/io_apic.h>
>  #include <asm/msi.h>
> @@ -169,8 +171,17 @@ int cf_check physdev_check_pci_extcfg(st
>  
>      ASSERT(pdev->seg == info->segment);
>      if ( pdev->bus >= info->start_bus && pdev->bus <= info->end_bus )
> +    {
> +        int rc;
> +
>          pci_check_extcfg(pdev);
>  
> +        rc = vpci_reinit_ext_capabilities(pdev);
> +        if ( rc )
> +            gprintk(XENLOG_ERR, "%pp(%pd): vPCI extcap reinit failed: %d\n",
> +                    &pdev->sbdf, pdev->domain, rc);
> +    }
> +
>      return 0;
>  }
>  #endif /* COMPAT */
> --- a/xen/drivers/vpci/cap.c
> +++ b/xen/drivers/vpci/cap.c
> @@ -285,13 +285,16 @@ static int vpci_init_ext_capability_list
>      return 0;
>  }
>  
> -int vpci_init_capabilities(struct pci_dev *pdev)
> +int vpci_init_capabilities(struct pci_dev *pdev, bool ext_only)
>  {
>      int rc;
>  
> -    rc = vpci_init_capability_list(pdev);
> -    if ( rc )
> -        return rc;
> +    if ( !ext_only )
> +    {
> +        rc = vpci_init_capability_list(pdev);
> +        if ( rc )
> +            return rc;
> +    }
>  
>      rc = vpci_init_ext_capability_list(pdev);
>      if ( rc )
> @@ -305,7 +308,7 @@ int vpci_init_capabilities(struct pci_de
>          unsigned int pos = 0;
>  
>          if ( !is_ext )
> -            pos = pci_find_cap_offset(pdev->sbdf, cap);
> +            pos = !ext_only ? pci_find_cap_offset(pdev->sbdf, cap) : 0;
>          else if ( is_hardware_domain(pdev->domain) )
>              pos = pci_find_ext_capability(pdev, cap);
>  
> @@ -349,22 +352,23 @@ int vpci_init_capabilities(struct pci_de
>      return 0;
>  }
>  
> -void vpci_cleanup_capabilities(struct pci_dev *pdev)
> +void vpci_cleanup_capabilities(struct pci_dev *pdev, bool ext_only)
>  {

You could short-circuit the function here, ie:

if ( ext_only && !is_hardware_domain(pdev->domain) )
    return;

But I'm not sure that would simplify the code of the function much?
Likewise for vpci_init_capabilities().

>      for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
>      {
>          const vpci_capability_t *capability = &__start_vpci_array[i];
>          const unsigned int cap = capability->id;
> -        unsigned int pos = 0;
>  
>          if ( !capability->cleanup )
>              continue;
>  
> -        if ( !capability->is_ext )
> -            pos = pci_find_cap_offset(pdev->sbdf, cap);
> -        else if ( is_hardware_domain(pdev->domain) )
> -            pos = pci_find_ext_capability(pdev, cap);
> -        if ( pos )
> +        /*
> +         * Cannot call pci_find_ext_capability() here, as extended config
> +         * space may (no longer) be accessible.
> +         */
> +        if ( capability->is_ext
> +             ? is_hardware_domain(pdev->domain)
> +             : !ext_only && pci_find_cap_offset(pdev->sbdf, cap) )

Given the changes you have done to the reBAR cleanup function, we
could even call capability->cleanup() on domUs, as the handler has to
deal with uninitialized capabilities either way?

>          {
>              int rc = capability->cleanup(pdev, false);
>  
> @@ -376,6 +380,28 @@ void vpci_cleanup_capabilities(struct pc
>      }
>  }
>  
> +int vpci_reinit_ext_capabilities(struct pci_dev *pdev)
> +{
> +    if ( !pdev->vpci )
> +        return 0;
> +
> +    /*
> +     * FIXME: DomU support is missing.  For already running domains we may
> +     * need to pause them around the entire re-evaluation of extended config
> +     * space accessibility.
> +     */
> +    if ( pdev->domain )
> +        ASSERT(pdev->domain == hardware_domain || pdev->domain == dom_io);

Is this to cope around races?  I don't think it's a valid state to
have pdev->vpci != NULL and pdev->domain == NULL?

Neither you can have pdev->domain == dom_io and pdev->vpci != NULL?

> +
> +    vpci_cleanup_capabilities(pdev, true);
> +
> +    if ( vpci_remove_registers(pdev->vpci, PCI_CFG_SPACE_SIZE,
> +                               PCI_CFG_SPACE_EXP_SIZE - PCI_CFG_SPACE_SIZE) )
> +        ASSERT_UNREACHABLE();

Ideally this would better be done the other way around.  We first
remove the handlers, and the cleanup the capabilities.  Just to ensure
no stray handler could end up having cached references to data that's
been freed by vpci_cleanup_capabilities().

And we should take the write_lock(&pdev->domain->pci_lock).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed
  2026-03-04 15:06   ` Roger Pau Monné
@ 2026-03-04 15:39     ` Jan Beulich
  2026-03-04 16:53       ` Roger Pau Monné
  2026-03-05  9:00     ` Jan Beulich
  1 sibling, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2026-03-04 15:39 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand

On 04.03.2026 16:06, Roger Pau Monné wrote:
> On Wed, Feb 25, 2026 at 12:44:44PM +0100, Jan Beulich wrote:
>> When Dom0 informs us about MMCFG usability, this may change whether
>> extended capabilities are available (accessible) for devices. Zap what
>> might be on record, and re-initialize things.
>>
>> No synchronization is added for the case where devices may already be in
>> use. That'll need sorting when (a) DomU support was added and (b) DomU-s
>> may run already while Dom0 / hwdom still boots (dom0less, Hyperlaunch).
>>
>> vpci_cleanup_capabilities() also shouldn't have used
>> pci_find_ext_capability(), as already when the function was introduced
>> extended config space may not have been (properly) accessible anymore,
>> no matter whether it was during init. Extended capability cleanup hooks
>> need to cope with being called when the respective capability doesn't
>> exist (and hence the corresponding ->init() hook was never called).
>>
>> Fixes: 70e6dace747e ("vpci: Use cleanup to free capability resource during deassign")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> vpci_reinit_ext_capabilities()'es return value is checked only to log an
>> error; it doesn't feel quite right to fail the hypercall because of this.
>> Roger brought up the idea of de-assigning the device in such a case, but
>> if a driver doesn't use extended capabilities the device would likely
>> continue to work fine, for Dom0 this probably wouldn't be quite right
>> anyway, and it's also unclear whether calling deassign_device() could be
>> done from this context. Something like what pci_check_disable_device()
>> does may be an option, if we really think we need to "break" the device.
> 
> We may want to add a note there, stating that we have considered all
> possible options, and hiding the capability and hoping the owner
> domain would continue to work as expected seems the less bad of all of
> them?

I'll see what I can do.

>> The use of is_hardware_domain() in vpci_cleanup_capabilities() was
>> uncommented and hence is left so. Shouldn't there be a DomU-related TODO
>> or FIXME?
> 
> Hm, yes, possibly.  I think limiting extended space availability to
> the hardware domain only has been done "just" because we have no
> extended capabilities to expose to domUs so far, and I don't think we
> even setup the extended capability list in the domU case.

Considering how many things there are to be done for DomU support, I
think it would help if every place where e.g. is_hardware_domain() is
used only as surrogate would be properly annotated. Or perhaps properly
named predicates could be introduced right away, so one can actually go
hunt for all of them. Then again is_hardware_domain() is also something
one can go hunt for.

>> @@ -349,22 +352,23 @@ int vpci_init_capabilities(struct pci_de
>>      return 0;
>>  }
>>  
>> -void vpci_cleanup_capabilities(struct pci_dev *pdev)
>> +void vpci_cleanup_capabilities(struct pci_dev *pdev, bool ext_only)
>>  {
> 
> You could short-circuit the function here, ie:
> 
> if ( ext_only && !is_hardware_domain(pdev->domain) )
>     return;
> 
> But I'm not sure that would simplify the code of the function much?
> Likewise for vpci_init_capabilities().

Such a short-circuit would need replacing / dropping once DomU support is
added. I was hoping the chosen arrangement would make for a little less
churn at that time. I'll listen to your advice, though, just that the
question gives the impression you're not quite sure either.

>>      for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
>>      {
>>          const vpci_capability_t *capability = &__start_vpci_array[i];
>>          const unsigned int cap = capability->id;
>> -        unsigned int pos = 0;
>>  
>>          if ( !capability->cleanup )
>>              continue;
>>  
>> -        if ( !capability->is_ext )
>> -            pos = pci_find_cap_offset(pdev->sbdf, cap);
>> -        else if ( is_hardware_domain(pdev->domain) )
>> -            pos = pci_find_ext_capability(pdev, cap);
>> -        if ( pos )
>> +        /*
>> +         * Cannot call pci_find_ext_capability() here, as extended config
>> +         * space may (no longer) be accessible.
>> +         */
>> +        if ( capability->is_ext
>> +             ? is_hardware_domain(pdev->domain)
>> +             : !ext_only && pci_find_cap_offset(pdev->sbdf, cap) )
> 
> Given the changes you have done to the reBAR cleanup function, we
> could even call capability->cleanup() on domUs, as the handler has to
> deal with uninitialized capabilities either way?

Hmm, yes, looks like we could.

>> @@ -376,6 +380,28 @@ void vpci_cleanup_capabilities(struct pc
>>      }
>>  }
>>  
>> +int vpci_reinit_ext_capabilities(struct pci_dev *pdev)
>> +{
>> +    if ( !pdev->vpci )
>> +        return 0;
>> +
>> +    /*
>> +     * FIXME: DomU support is missing.  For already running domains we may
>> +     * need to pause them around the entire re-evaluation of extended config
>> +     * space accessibility.
>> +     */
>> +    if ( pdev->domain )
>> +        ASSERT(pdev->domain == hardware_domain || pdev->domain == dom_io);
> 
> Is this to cope around races?  I don't think it's a valid state to
> have pdev->vpci != NULL and pdev->domain == NULL?

Knowing that I had seen pdev-s with NULL domains, I'm perhaps overly cautious.
Yes, ->vpci being non-NULL ought to demand a proper owner.

> Neither you can have pdev->domain == dom_io and pdev->vpci != NULL?

Same here, looks like I went too far.

>> +
>> +    vpci_cleanup_capabilities(pdev, true);
>> +
>> +    if ( vpci_remove_registers(pdev->vpci, PCI_CFG_SPACE_SIZE,
>> +                               PCI_CFG_SPACE_EXP_SIZE - PCI_CFG_SPACE_SIZE) )
>> +        ASSERT_UNREACHABLE();
> 
> Ideally this would better be done the other way around.  We first
> remove the handlers, and the cleanup the capabilities.  Just to ensure
> no stray handler could end up having cached references to data that's
> been freed by vpci_cleanup_capabilities().

And maybe not just that: For the hwdom case cleanup_rebar() adds new handlers,
which we'd wrongly purge again right away. (Because we pass "false" for "hide",
this isn't an active issue right now.)

> And we should take the write_lock(&pdev->domain->pci_lock).

Now this is a request that I'm struggling with some. I can see that callers
of vpci_{init,cleanup}_capabilities() assert that the lock is being held, yet
it's not quite clear to me why that's needed. Shouldn't vPCI internals all
synchronize on the vPCI lock of the domain?

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed
  2026-03-04 15:39     ` Jan Beulich
@ 2026-03-04 16:53       ` Roger Pau Monné
  2026-03-05  8:40         ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Roger Pau Monné @ 2026-03-04 16:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand

On Wed, Mar 04, 2026 at 04:39:00PM +0100, Jan Beulich wrote:
> On 04.03.2026 16:06, Roger Pau Monné wrote:
> > On Wed, Feb 25, 2026 at 12:44:44PM +0100, Jan Beulich wrote:
> >> When Dom0 informs us about MMCFG usability, this may change whether
> >> extended capabilities are available (accessible) for devices. Zap what
> >> might be on record, and re-initialize things.
> >>
> >> No synchronization is added for the case where devices may already be in
> >> use. That'll need sorting when (a) DomU support was added and (b) DomU-s
> >> may run already while Dom0 / hwdom still boots (dom0less, Hyperlaunch).
> >>
> >> vpci_cleanup_capabilities() also shouldn't have used
> >> pci_find_ext_capability(), as already when the function was introduced
> >> extended config space may not have been (properly) accessible anymore,
> >> no matter whether it was during init. Extended capability cleanup hooks
> >> need to cope with being called when the respective capability doesn't
> >> exist (and hence the corresponding ->init() hook was never called).
> >>
> >> Fixes: 70e6dace747e ("vpci: Use cleanup to free capability resource during deassign")
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> vpci_reinit_ext_capabilities()'es return value is checked only to log an
> >> error; it doesn't feel quite right to fail the hypercall because of this.
> >> Roger brought up the idea of de-assigning the device in such a case, but
> >> if a driver doesn't use extended capabilities the device would likely
> >> continue to work fine, for Dom0 this probably wouldn't be quite right
> >> anyway, and it's also unclear whether calling deassign_device() could be
> >> done from this context. Something like what pci_check_disable_device()
> >> does may be an option, if we really think we need to "break" the device.
> > 
> > We may want to add a note there, stating that we have considered all
> > possible options, and hiding the capability and hoping the owner
> > domain would continue to work as expected seems the less bad of all of
> > them?
> 
> I'll see what I can do.
> 
> >> The use of is_hardware_domain() in vpci_cleanup_capabilities() was
> >> uncommented and hence is left so. Shouldn't there be a DomU-related TODO
> >> or FIXME?
> > 
> > Hm, yes, possibly.  I think limiting extended space availability to
> > the hardware domain only has been done "just" because we have no
> > extended capabilities to expose to domUs so far, and I don't think we
> > even setup the extended capability list in the domU case.
> 
> Considering how many things there are to be done for DomU support, I
> think it would help if every place where e.g. is_hardware_domain() is
> used only as surrogate would be properly annotated. Or perhaps properly
> named predicates could be introduced right away, so one can actually go
> hunt for all of them. Then again is_hardware_domain() is also something
> one can go hunt for.

I would mind having more concrete predicates, even if right now they
are some kind of dummy checks.  However it might be difficult to
formulate those know without having the full picture of what domU
support requires.

> >> @@ -349,22 +352,23 @@ int vpci_init_capabilities(struct pci_de
> >>      return 0;
> >>  }
> >>  
> >> -void vpci_cleanup_capabilities(struct pci_dev *pdev)
> >> +void vpci_cleanup_capabilities(struct pci_dev *pdev, bool ext_only)
> >>  {
> > 
> > You could short-circuit the function here, ie:
> > 
> > if ( ext_only && !is_hardware_domain(pdev->domain) )
> >     return;
> > 
> > But I'm not sure that would simplify the code of the function much?
> > Likewise for vpci_init_capabilities().
> 
> Such a short-circuit would need replacing / dropping once DomU support is
> added. I was hoping the chosen arrangement would make for a little less
> churn at that time. I'll listen to your advice, though, just that the
> question gives the impression you're not quite sure either.

Yeah, I wasn't fully sure.  IT would be nice if we could add those
short circuits now, and then once domU support is in place we just
remove teh shortcuts and it works for domU also.  But I fear more
changes will be needed anyway, at which point the short-circuit is
not that attractive to use.

> >> +
> >> +    vpci_cleanup_capabilities(pdev, true);
> >> +
> >> +    if ( vpci_remove_registers(pdev->vpci, PCI_CFG_SPACE_SIZE,
> >> +                               PCI_CFG_SPACE_EXP_SIZE - PCI_CFG_SPACE_SIZE) )
> >> +        ASSERT_UNREACHABLE();
> > 
> > Ideally this would better be done the other way around.  We first
> > remove the handlers, and the cleanup the capabilities.  Just to ensure
> > no stray handler could end up having cached references to data that's
> > been freed by vpci_cleanup_capabilities().
> 
> And maybe not just that: For the hwdom case cleanup_rebar() adds new handlers,
> which we'd wrongly purge again right away. (Because we pass "false" for "hide",
> this isn't an active issue right now.)
> 
> > And we should take the write_lock(&pdev->domain->pci_lock).
> 
> Now this is a request that I'm struggling with some. I can see that callers
> of vpci_{init,cleanup}_capabilities() assert that the lock is being held, yet
> it's not quite clear to me why that's needed. Shouldn't vPCI internals all
> synchronize on the vPCI lock of the domain?

Right, the callers of the handlers already hold the locks, and the
removal of the handlers should also hold the locks.  The point of
taking the d->pci_lock is to avoid the device from being removed
while there are vPCI accesses against it being done.  The vPCI lock is
fine for vPCI internals, but functions that deal with addition or
removal of devices need the d->pci_lock to avoid races with possibly
freeing pdev->vpci while in use.

I think you are right, and for the usage here (that doesn't add or
remove pdev->vpci itself), the internal vPCI lock should be enough.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed
  2026-03-04 16:53       ` Roger Pau Monné
@ 2026-03-05  8:40         ` Jan Beulich
  2026-03-05  9:19           ` Roger Pau Monné
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2026-03-05  8:40 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand

On 04.03.2026 17:53, Roger Pau Monné wrote:
> On Wed, Mar 04, 2026 at 04:39:00PM +0100, Jan Beulich wrote:
>> On 04.03.2026 16:06, Roger Pau Monné wrote:
>>> On Wed, Feb 25, 2026 at 12:44:44PM +0100, Jan Beulich wrote:
>>>> @@ -349,22 +352,23 @@ int vpci_init_capabilities(struct pci_de
>>>>      return 0;
>>>>  }
>>>>  
>>>> -void vpci_cleanup_capabilities(struct pci_dev *pdev)
>>>> +void vpci_cleanup_capabilities(struct pci_dev *pdev, bool ext_only)
>>>>  {
>>>
>>> You could short-circuit the function here, ie:
>>>
>>> if ( ext_only && !is_hardware_domain(pdev->domain) )
>>>     return;
>>>
>>> But I'm not sure that would simplify the code of the function much?
>>> Likewise for vpci_init_capabilities().
>>
>> Such a short-circuit would need replacing / dropping once DomU support is
>> added. I was hoping the chosen arrangement would make for a little less
>> churn at that time. I'll listen to your advice, though, just that the
>> question gives the impression you're not quite sure either.
> 
> Yeah, I wasn't fully sure.  IT would be nice if we could add those
> short circuits now, and then once domU support is in place we just
> remove teh shortcuts and it works for domU also.  But I fear more
> changes will be needed anyway, at which point the short-circuit is
> not that attractive to use.

As per your other request (calling ->cleanup() even for DomU-s) the use of
is_hardware_domain() would go away anyway, and the function would be ready
for use for DomU-s as well.

>>>> +
>>>> +    vpci_cleanup_capabilities(pdev, true);
>>>> +
>>>> +    if ( vpci_remove_registers(pdev->vpci, PCI_CFG_SPACE_SIZE,
>>>> +                               PCI_CFG_SPACE_EXP_SIZE - PCI_CFG_SPACE_SIZE) )
>>>> +        ASSERT_UNREACHABLE();
>>>
>>> Ideally this would better be done the other way around.  We first
>>> remove the handlers, and the cleanup the capabilities.  Just to ensure
>>> no stray handler could end up having cached references to data that's
>>> been freed by vpci_cleanup_capabilities().
>>
>> And maybe not just that: For the hwdom case cleanup_rebar() adds new handlers,
>> which we'd wrongly purge again right away. (Because we pass "false" for "hide",
>> this isn't an active issue right now.)
>>
>>> And we should take the write_lock(&pdev->domain->pci_lock).
>>
>> Now this is a request that I'm struggling with some. I can see that callers
>> of vpci_{init,cleanup}_capabilities() assert that the lock is being held, yet
>> it's not quite clear to me why that's needed. Shouldn't vPCI internals all
>> synchronize on the vPCI lock of the domain?
> 
> Right, the callers of the handlers already hold the locks, and the
> removal of the handlers should also hold the locks.  The point of
> taking the d->pci_lock is to avoid the device from being removed
> while there are vPCI accesses against it being done.  The vPCI lock is
> fine for vPCI internals, but functions that deal with addition or
> removal of devices need the d->pci_lock to avoid races with possibly
> freeing pdev->vpci while in use.
> 
> I think you are right, and for the usage here (that doesn't add or
> remove pdev->vpci itself), the internal vPCI lock should be enough.

Well, we could take two positions: Either we say that as we're being called
from a context where the PCI device is being operated on anyway, we can
assume it can't go away. Then no further locking would be needed here. Or
we want to explicitly guard against that, in which case (seeing that
nothing is added / removed), d->pci_lock may want read-locking?

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed
  2026-03-04 15:06   ` Roger Pau Monné
  2026-03-04 15:39     ` Jan Beulich
@ 2026-03-05  9:00     ` Jan Beulich
  2026-03-05  9:22       ` Roger Pau Monné
  1 sibling, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2026-03-05  9:00 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand

On 04.03.2026 16:06, Roger Pau Monné wrote:
> On Wed, Feb 25, 2026 at 12:44:44PM +0100, Jan Beulich wrote:
>> When Dom0 informs us about MMCFG usability, this may change whether
>> extended capabilities are available (accessible) for devices. Zap what
>> might be on record, and re-initialize things.
>>
>> No synchronization is added for the case where devices may already be in
>> use. That'll need sorting when (a) DomU support was added and (b) DomU-s
>> may run already while Dom0 / hwdom still boots (dom0less, Hyperlaunch).
>>
>> vpci_cleanup_capabilities() also shouldn't have used
>> pci_find_ext_capability(), as already when the function was introduced
>> extended config space may not have been (properly) accessible anymore,
>> no matter whether it was during init. Extended capability cleanup hooks
>> need to cope with being called when the respective capability doesn't
>> exist (and hence the corresponding ->init() hook was never called).
>>
>> Fixes: 70e6dace747e ("vpci: Use cleanup to free capability resource during deassign")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> vpci_reinit_ext_capabilities()'es return value is checked only to log an
>> error; it doesn't feel quite right to fail the hypercall because of this.
>> Roger brought up the idea of de-assigning the device in such a case, but
>> if a driver doesn't use extended capabilities the device would likely
>> continue to work fine, for Dom0 this probably wouldn't be quite right
>> anyway, and it's also unclear whether calling deassign_device() could be
>> done from this context. Something like what pci_check_disable_device()
>> does may be an option, if we really think we need to "break" the device.
> 
> We may want to add a note there, stating that we have considered all
> possible options, and hiding the capability and hoping the owner
> domain would continue to work as expected seems the less bad of all of
> them?

While adding that note it occurred to me that in order to keep the device
as functioning as possible, in the re-init case vpci_init_capabilities()
might better not bail upon encountering a failure, but accumulate the
error while continuing its loop in a best-effort manner. Thoughts? (One
of the two return-s is already guarded by !is_hardware_domain(), so that
could be left alone for the immediate purpose.)

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed
  2026-03-05  8:40         ` Jan Beulich
@ 2026-03-05  9:19           ` Roger Pau Monné
  0 siblings, 0 replies; 19+ messages in thread
From: Roger Pau Monné @ 2026-03-05  9:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand

On Thu, Mar 05, 2026 at 09:40:32AM +0100, Jan Beulich wrote:
> On 04.03.2026 17:53, Roger Pau Monné wrote:
> > On Wed, Mar 04, 2026 at 04:39:00PM +0100, Jan Beulich wrote:
> >> On 04.03.2026 16:06, Roger Pau Monné wrote:
> >>> On Wed, Feb 25, 2026 at 12:44:44PM +0100, Jan Beulich wrote:
> >>>> @@ -349,22 +352,23 @@ int vpci_init_capabilities(struct pci_de
> >>>>      return 0;
> >>>>  }
> >>>>  
> >>>> -void vpci_cleanup_capabilities(struct pci_dev *pdev)
> >>>> +void vpci_cleanup_capabilities(struct pci_dev *pdev, bool ext_only)
> >>>>  {
> >>>
> >>> You could short-circuit the function here, ie:
> >>>
> >>> if ( ext_only && !is_hardware_domain(pdev->domain) )
> >>>     return;
> >>>
> >>> But I'm not sure that would simplify the code of the function much?
> >>> Likewise for vpci_init_capabilities().
> >>
> >> Such a short-circuit would need replacing / dropping once DomU support is
> >> added. I was hoping the chosen arrangement would make for a little less
> >> churn at that time. I'll listen to your advice, though, just that the
> >> question gives the impression you're not quite sure either.
> > 
> > Yeah, I wasn't fully sure.  IT would be nice if we could add those
> > short circuits now, and then once domU support is in place we just
> > remove teh shortcuts and it works for domU also.  But I fear more
> > changes will be needed anyway, at which point the short-circuit is
> > not that attractive to use.
> 
> As per your other request (calling ->cleanup() even for DomU-s) the use of
> is_hardware_domain() would go away anyway, and the function would be ready
> for use for DomU-s as well.

OK, so that one is (possibly) sorted then.

> >>>> +
> >>>> +    vpci_cleanup_capabilities(pdev, true);
> >>>> +
> >>>> +    if ( vpci_remove_registers(pdev->vpci, PCI_CFG_SPACE_SIZE,
> >>>> +                               PCI_CFG_SPACE_EXP_SIZE - PCI_CFG_SPACE_SIZE) )
> >>>> +        ASSERT_UNREACHABLE();
> >>>
> >>> Ideally this would better be done the other way around.  We first
> >>> remove the handlers, and the cleanup the capabilities.  Just to ensure
> >>> no stray handler could end up having cached references to data that's
> >>> been freed by vpci_cleanup_capabilities().
> >>
> >> And maybe not just that: For the hwdom case cleanup_rebar() adds new handlers,
> >> which we'd wrongly purge again right away. (Because we pass "false" for "hide",
> >> this isn't an active issue right now.)
> >>
> >>> And we should take the write_lock(&pdev->domain->pci_lock).
> >>
> >> Now this is a request that I'm struggling with some. I can see that callers
> >> of vpci_{init,cleanup}_capabilities() assert that the lock is being held, yet
> >> it's not quite clear to me why that's needed. Shouldn't vPCI internals all
> >> synchronize on the vPCI lock of the domain?
> > 
> > Right, the callers of the handlers already hold the locks, and the
> > removal of the handlers should also hold the locks.  The point of
> > taking the d->pci_lock is to avoid the device from being removed
> > while there are vPCI accesses against it being done.  The vPCI lock is
> > fine for vPCI internals, but functions that deal with addition or
> > removal of devices need the d->pci_lock to avoid races with possibly
> > freeing pdev->vpci while in use.
> > 
> > I think you are right, and for the usage here (that doesn't add or
> > remove pdev->vpci itself), the internal vPCI lock should be enough.
> 
> Well, we could take two positions: Either we say that as we're being called
> from a context where the PCI device is being operated on anyway, we can
> assume it can't go away. Then no further locking would be needed here. Or
> we want to explicitly guard against that, in which case (seeing that
> nothing is added / removed), d->pci_lock may want read-locking?

In this context the caller is already holding the pcidevs lock, so
there's no risk of the device being de-assigned.

My reasoning for taking the d->pci_lock in write mode was to prevent
any concurrent access to vPCI while the changes to the emulated config
space is taken place, but you are right that the vPCI lock if used
properly should prevent races.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed
  2026-03-05  9:00     ` Jan Beulich
@ 2026-03-05  9:22       ` Roger Pau Monné
  2026-03-05  9:45         ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Roger Pau Monné @ 2026-03-05  9:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand

On Thu, Mar 05, 2026 at 10:00:13AM +0100, Jan Beulich wrote:
> On 04.03.2026 16:06, Roger Pau Monné wrote:
> > On Wed, Feb 25, 2026 at 12:44:44PM +0100, Jan Beulich wrote:
> >> When Dom0 informs us about MMCFG usability, this may change whether
> >> extended capabilities are available (accessible) for devices. Zap what
> >> might be on record, and re-initialize things.
> >>
> >> No synchronization is added for the case where devices may already be in
> >> use. That'll need sorting when (a) DomU support was added and (b) DomU-s
> >> may run already while Dom0 / hwdom still boots (dom0less, Hyperlaunch).
> >>
> >> vpci_cleanup_capabilities() also shouldn't have used
> >> pci_find_ext_capability(), as already when the function was introduced
> >> extended config space may not have been (properly) accessible anymore,
> >> no matter whether it was during init. Extended capability cleanup hooks
> >> need to cope with being called when the respective capability doesn't
> >> exist (and hence the corresponding ->init() hook was never called).
> >>
> >> Fixes: 70e6dace747e ("vpci: Use cleanup to free capability resource during deassign")
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> vpci_reinit_ext_capabilities()'es return value is checked only to log an
> >> error; it doesn't feel quite right to fail the hypercall because of this.
> >> Roger brought up the idea of de-assigning the device in such a case, but
> >> if a driver doesn't use extended capabilities the device would likely
> >> continue to work fine, for Dom0 this probably wouldn't be quite right
> >> anyway, and it's also unclear whether calling deassign_device() could be
> >> done from this context. Something like what pci_check_disable_device()
> >> does may be an option, if we really think we need to "break" the device.
> > 
> > We may want to add a note there, stating that we have considered all
> > possible options, and hiding the capability and hoping the owner
> > domain would continue to work as expected seems the less bad of all of
> > them?
> 
> While adding that note it occurred to me that in order to keep the device
> as functioning as possible, in the re-init case vpci_init_capabilities()
> might better not bail upon encountering a failure, but accumulate the
> error while continuing its loop in a best-effort manner. Thoughts? (One
> of the two return-s is already guarded by !is_hardware_domain(), so that
> could be left alone for the immediate purpose.)

Right, yes, that would be preferable.  We already print a message for
the failed to init capabilities, so there's no need to print another
one in the caller.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed
  2026-03-05  9:22       ` Roger Pau Monné
@ 2026-03-05  9:45         ` Jan Beulich
  2026-03-05 10:20           ` Roger Pau Monné
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2026-03-05  9:45 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand

On 05.03.2026 10:22, Roger Pau Monné wrote:
> On Thu, Mar 05, 2026 at 10:00:13AM +0100, Jan Beulich wrote:
>> On 04.03.2026 16:06, Roger Pau Monné wrote:
>>> On Wed, Feb 25, 2026 at 12:44:44PM +0100, Jan Beulich wrote:
>>>> When Dom0 informs us about MMCFG usability, this may change whether
>>>> extended capabilities are available (accessible) for devices. Zap what
>>>> might be on record, and re-initialize things.
>>>>
>>>> No synchronization is added for the case where devices may already be in
>>>> use. That'll need sorting when (a) DomU support was added and (b) DomU-s
>>>> may run already while Dom0 / hwdom still boots (dom0less, Hyperlaunch).
>>>>
>>>> vpci_cleanup_capabilities() also shouldn't have used
>>>> pci_find_ext_capability(), as already when the function was introduced
>>>> extended config space may not have been (properly) accessible anymore,
>>>> no matter whether it was during init. Extended capability cleanup hooks
>>>> need to cope with being called when the respective capability doesn't
>>>> exist (and hence the corresponding ->init() hook was never called).
>>>>
>>>> Fixes: 70e6dace747e ("vpci: Use cleanup to free capability resource during deassign")
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> ---
>>>> vpci_reinit_ext_capabilities()'es return value is checked only to log an
>>>> error; it doesn't feel quite right to fail the hypercall because of this.
>>>> Roger brought up the idea of de-assigning the device in such a case, but
>>>> if a driver doesn't use extended capabilities the device would likely
>>>> continue to work fine, for Dom0 this probably wouldn't be quite right
>>>> anyway, and it's also unclear whether calling deassign_device() could be
>>>> done from this context. Something like what pci_check_disable_device()
>>>> does may be an option, if we really think we need to "break" the device.
>>>
>>> We may want to add a note there, stating that we have considered all
>>> possible options, and hiding the capability and hoping the owner
>>> domain would continue to work as expected seems the less bad of all of
>>> them?
>>
>> While adding that note it occurred to me that in order to keep the device
>> as functioning as possible, in the re-init case vpci_init_capabilities()
>> might better not bail upon encountering a failure, but accumulate the
>> error while continuing its loop in a best-effort manner. Thoughts? (One
>> of the two return-s is already guarded by !is_hardware_domain(), so that
>> could be left alone for the immediate purpose.)
> 
> Right, yes, that would be preferable.  We already print a message for
> the failed to init capabilities, so there's no need to print another
> one in the caller.

Hmm, that's another aspect I didn't consider. Yes, the log message in the
caller is redundant with the present code structure. If we expect that to
remain like that, I can drop logging anything from
physdev_check_pci_extcfg(). Which then re-raises the question whether
vpci_reinit_ext_capabilities() might better return void. At which point
the comment I put in physdev_check_pci_extcfg() (upon your request) would
want to move there.

But my earlier question went in a different direction, and you didn't
comment on that at all.

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed
  2026-03-05  9:45         ` Jan Beulich
@ 2026-03-05 10:20           ` Roger Pau Monné
  0 siblings, 0 replies; 19+ messages in thread
From: Roger Pau Monné @ 2026-03-05 10:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand

On Thu, Mar 05, 2026 at 10:45:29AM +0100, Jan Beulich wrote:
> On 05.03.2026 10:22, Roger Pau Monné wrote:
> > On Thu, Mar 05, 2026 at 10:00:13AM +0100, Jan Beulich wrote:
> >> On 04.03.2026 16:06, Roger Pau Monné wrote:
> >>> On Wed, Feb 25, 2026 at 12:44:44PM +0100, Jan Beulich wrote:
> >>>> When Dom0 informs us about MMCFG usability, this may change whether
> >>>> extended capabilities are available (accessible) for devices. Zap what
> >>>> might be on record, and re-initialize things.
> >>>>
> >>>> No synchronization is added for the case where devices may already be in
> >>>> use. That'll need sorting when (a) DomU support was added and (b) DomU-s
> >>>> may run already while Dom0 / hwdom still boots (dom0less, Hyperlaunch).
> >>>>
> >>>> vpci_cleanup_capabilities() also shouldn't have used
> >>>> pci_find_ext_capability(), as already when the function was introduced
> >>>> extended config space may not have been (properly) accessible anymore,
> >>>> no matter whether it was during init. Extended capability cleanup hooks
> >>>> need to cope with being called when the respective capability doesn't
> >>>> exist (and hence the corresponding ->init() hook was never called).
> >>>>
> >>>> Fixes: 70e6dace747e ("vpci: Use cleanup to free capability resource during deassign")
> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >>>> ---
> >>>> vpci_reinit_ext_capabilities()'es return value is checked only to log an
> >>>> error; it doesn't feel quite right to fail the hypercall because of this.
> >>>> Roger brought up the idea of de-assigning the device in such a case, but
> >>>> if a driver doesn't use extended capabilities the device would likely
> >>>> continue to work fine, for Dom0 this probably wouldn't be quite right
> >>>> anyway, and it's also unclear whether calling deassign_device() could be
> >>>> done from this context. Something like what pci_check_disable_device()
> >>>> does may be an option, if we really think we need to "break" the device.
> >>>
> >>> We may want to add a note there, stating that we have considered all
> >>> possible options, and hiding the capability and hoping the owner
> >>> domain would continue to work as expected seems the less bad of all of
> >>> them?
> >>
> >> While adding that note it occurred to me that in order to keep the device
> >> as functioning as possible, in the re-init case vpci_init_capabilities()
> >> might better not bail upon encountering a failure, but accumulate the
> >> error while continuing its loop in a best-effort manner. Thoughts? (One
> >> of the two return-s is already guarded by !is_hardware_domain(), so that
> >> could be left alone for the immediate purpose.)
> > 
> > Right, yes, that would be preferable.  We already print a message for
> > the failed to init capabilities, so there's no need to print another
> > one in the caller.
> 
> Hmm, that's another aspect I didn't consider. Yes, the log message in the
> caller is redundant with the present code structure. If we expect that to
> remain like that, I can drop logging anything from
> physdev_check_pci_extcfg(). Which then re-raises the question whether
> vpci_reinit_ext_capabilities() might better return void. At which point
> the comment I put in physdev_check_pci_extcfg() (upon your request) would
> want to move there.
> 
> But my earlier question went in a different direction, and you didn't
> comment on that at all.

Yes, I think we should accumulate errors.  One device failing doesn't
mean the rest will also fail.  We should continue the loop.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-03-05 10:21 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-25 11:41 [PATCH v5 0/5] vPCI: extended capability handling Jan Beulich
2026-02-25 11:43 ` [PATCH v5 1/5] vPCI: introduce private header Jan Beulich
2026-02-26  3:33   ` Stewart Hildebrand
2026-03-03 16:44   ` Roger Pau Monné
2026-03-03 16:48     ` Jan Beulich
2026-02-25 11:43 ` [PATCH v5 2/5] vPCI: move vpci_init_capabilities() to a separate file Jan Beulich
2026-02-25 11:43 ` [PATCH v5 3/5] vPCI: move capability-list init Jan Beulich
2026-02-25 11:44 ` [PATCH v5 4/5] vPCI/ReBAR: improve cleanup Jan Beulich
2026-03-04 13:46   ` Roger Pau Monné
2026-02-25 11:44 ` [PATCH v5 5/5] vPCI: re-init extended-capabilities when MMCFG availability changed Jan Beulich
2026-03-04 15:06   ` Roger Pau Monné
2026-03-04 15:39     ` Jan Beulich
2026-03-04 16:53       ` Roger Pau Monné
2026-03-05  8:40         ` Jan Beulich
2026-03-05  9:19           ` Roger Pau Monné
2026-03-05  9:00     ` Jan Beulich
2026-03-05  9:22       ` Roger Pau Monné
2026-03-05  9:45         ` Jan Beulich
2026-03-05 10:20           ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.