All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/10] Support hiding capability when its initialization fails
@ 2025-05-26  9:45 Jiqian Chen
  2025-05-26  9:45 ` [PATCH v5 01/10] vpci/header: Move emulating cap list logic into new function Jiqian Chen
                   ` (9 more replies)
  0 siblings, 10 replies; 44+ messages in thread
From: Jiqian Chen @ 2025-05-26  9:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
	Stefano Stabellini

Hi,

This series is to
emulate legacy and extended capability list for dom0, including patch #1, #2, #3.
hide legacy and extended capability when its initialization fails, including patch #4, #5, #6.
remove all related registers and other resources when initializing capability fails, including patch #7, #8, #9, #10.

Best regards,
Jiqian Chen.
---
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
Jiqian Chen (10):
  vpci/header: Move emulating cap list logic into new function
  vpci/header: Emulate legacy capability list for dom0
  vpci/header: Emulate extended capability list for dom0
  vpci: Refactor REGISTER_VPCI_INIT
  vpci: Hide legacy capability when it fails to initialize
  vpci: Hide extended capability when it fails to initialize
  vpci: Refactor vpci_remove_register to remove matched registers
  vpci/rebar: Free Rebar resources when init_rebar() fails
  vpci/msi: Free MSI resources when init_msi() fails
  vpci/msix: Free MSIX resources when init_msix() fails

 tools/tests/vpci/main.c    |   4 +-
 xen/drivers/vpci/header.c  | 195 +++++++++++++++---------
 xen/drivers/vpci/msi.c     |  29 +++-
 xen/drivers/vpci/msix.c    |  35 ++++-
 xen/drivers/vpci/rebar.c   |  35 +++--
 xen/drivers/vpci/vpci.c    | 293 +++++++++++++++++++++++++++++++++----
 xen/include/xen/pci_regs.h |   5 +-
 xen/include/xen/vpci.h     |  38 +++--
 xen/include/xen/xen.lds.h  |   2 +-
 9 files changed, 510 insertions(+), 126 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v5 01/10] vpci/header: Move emulating cap list logic into new function
  2025-05-26  9:45 [PATCH v5 00/10] Support hiding capability when its initialization fails Jiqian Chen
@ 2025-05-26  9:45 ` Jiqian Chen
  2025-05-26  9:45 ` [PATCH v5 02/10] vpci/header: Emulate legacy capability list for dom0 Jiqian Chen
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Jiqian Chen @ 2025-05-26  9:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

No functional changes.
Follow-on changes will benifit from this.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v4->v5 changes:
No.

v3->v4 changes:
* Add Acked-by of Roger.

v2->v3 changes:
new patch.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c | 138 ++++++++++++++++++++------------------
 1 file changed, 73 insertions(+), 65 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 1f48f2aac64e..0fb3cfa6a376 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -754,6 +754,75 @@ static int bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
     return !bar->mem ? -ENOMEM : 0;
 }
 
+static int vpci_init_capability_list(struct pci_dev *pdev)
+{
+    int rc;
+    bool mask_cap_list = false;
+
+    if ( !is_hardware_domain(pdev->domain) &&
+         pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
+    {
+        /* Only expose capabilities to the guest that vPCI can handle. */
+        unsigned int next, ttl = 48;
+        static const unsigned int supported_caps[] = {
+            PCI_CAP_ID_MSI,
+            PCI_CAP_ID_MSIX,
+        };
+
+        next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
+                                     supported_caps,
+                                     ARRAY_SIZE(supported_caps), &ttl);
+
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                               PCI_CAPABILITY_LIST, 1,
+                               (void *)(uintptr_t)next);
+        if ( rc )
+            return rc;
+
+        next &= ~3;
+
+        if ( !next )
+            /*
+             * If we don't have any supported capabilities to expose to the
+             * guest, mask the PCI_STATUS_CAP_LIST bit in the status
+             * register.
+             */
+            mask_cap_list = true;
+
+        while ( next && ttl )
+        {
+            unsigned int pos = next;
+
+            next = pci_find_next_cap_ttl(pdev->sbdf,
+                                         pos + PCI_CAP_LIST_NEXT,
+                                         supported_caps,
+                                         ARRAY_SIZE(supported_caps), &ttl);
+
+            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
+                                   pos + PCI_CAP_LIST_ID, 1, NULL);
+            if ( rc )
+                return rc;
+
+            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                   pos + PCI_CAP_LIST_NEXT, 1,
+                                   (void *)(uintptr_t)next);
+            if ( rc )
+                return rc;
+
+            next &= ~3;
+        }
+    }
+
+    /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
+    return vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
+                                  PCI_STATUS, 2, NULL,
+                                  PCI_STATUS_RO_MASK &
+                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
+                                  PCI_STATUS_RW1C_MASK,
+                                  mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
+                                  PCI_STATUS_RSVDZ_MASK);
+}
+
 static int cf_check init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -762,7 +831,6 @@ static int cf_check init_header(struct pci_dev *pdev)
     struct vpci_header *header = &pdev->vpci->header;
     struct vpci_bar *bars = header->bars;
     int rc;
-    bool mask_cap_list = false;
     bool is_hwdom = is_hardware_domain(pdev->domain);
 
     ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
@@ -803,61 +871,12 @@ static int cf_check init_header(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
+    rc = vpci_init_capability_list(pdev);
+    if ( rc )
+        return rc;
+
     if ( !is_hwdom )
     {
-        if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
-        {
-            /* Only expose capabilities to the guest that vPCI can handle. */
-            unsigned int next, ttl = 48;
-            static const unsigned int supported_caps[] = {
-                PCI_CAP_ID_MSI,
-                PCI_CAP_ID_MSIX,
-            };
-
-            next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
-                                         supported_caps,
-                                         ARRAY_SIZE(supported_caps), &ttl);
-
-            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
-                                   PCI_CAPABILITY_LIST, 1,
-                                   (void *)(uintptr_t)next);
-            if ( rc )
-                return rc;
-
-            next &= ~3;
-
-            if ( !next )
-                /*
-                 * If we don't have any supported capabilities to expose to the
-                 * guest, mask the PCI_STATUS_CAP_LIST bit in the status
-                 * register.
-                 */
-                mask_cap_list = true;
-
-            while ( next && ttl )
-            {
-                unsigned int pos = next;
-
-                next = pci_find_next_cap_ttl(pdev->sbdf,
-                                             pos + PCI_CAP_LIST_NEXT,
-                                             supported_caps,
-                                             ARRAY_SIZE(supported_caps), &ttl);
-
-                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
-                                       pos + PCI_CAP_LIST_ID, 1, NULL);
-                if ( rc )
-                    return rc;
-
-                rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
-                                       pos + PCI_CAP_LIST_NEXT, 1,
-                                       (void *)(uintptr_t)next);
-                if ( rc )
-                    return rc;
-
-                next &= ~3;
-            }
-        }
-
         /* Extended capabilities read as zero, write ignore */
         rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL, 0x100, 4,
                                (void *)0);
@@ -865,17 +884,6 @@ static int cf_check init_header(struct pci_dev *pdev)
             return rc;
     }
 
-    /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
-    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
-                                PCI_STATUS, 2, NULL,
-                                PCI_STATUS_RO_MASK &
-                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
-                                PCI_STATUS_RW1C_MASK,
-                                mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
-                                PCI_STATUS_RSVDZ_MASK);
-    if ( rc )
-        return rc;
-
     if ( pdev->ignore_bars )
         return 0;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 02/10] vpci/header: Emulate legacy capability list for dom0
  2025-05-26  9:45 [PATCH v5 00/10] Support hiding capability when its initialization fails Jiqian Chen
  2025-05-26  9:45 ` [PATCH v5 01/10] vpci/header: Move emulating cap list logic into new function Jiqian Chen
@ 2025-05-26  9:45 ` Jiqian Chen
  2025-06-05 10:54   ` Roger Pau Monné
  2025-05-26  9:45 ` [PATCH v5 03/10] vpci/header: Emulate extended " Jiqian Chen
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Jiqian Chen @ 2025-05-26  9:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

Current logic of emulating legacy capability list is only for domU.
So, expand it to emulate for dom0 too. Then it will be easy to hide
a capability whose initialization fails in a function.

And restrict adding PCI_STATUS register only for domU since dom0
has no limitation to access that register.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v4->v5 changes:
* Return early if dom0, so that I didn't need to change the exiting return chunk.

v3->v4 changes:
* Also pass supported_caps to pci_find_next_cap_ttl() for dom0 since the n is zero when dom0,
  and add a comment to explain it.
* Restrict adding PCI_STATUS register only for domU since dom0 has no limitation to access that register.
* For dom0 register handler, set vpci_hw_write8 to it instead of NULL.

v2->v3 changes:
* Not to add handler of PCI_CAP_LIST_ID when domain is dom0.

v1->v2 changes:
new patch.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c | 39 ++++++++++++++++++++++++++-------------
 xen/drivers/vpci/vpci.c   |  6 ++++++
 xen/include/xen/vpci.h    |  2 ++
 3 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 0fb3cfa6a376..d26cbba08ee1 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -758,9 +758,9 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
 {
     int rc;
     bool mask_cap_list = false;
+    bool is_hwdom = is_hardware_domain(pdev->domain);
 
-    if ( !is_hardware_domain(pdev->domain) &&
-         pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
+    if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
     {
         /* Only expose capabilities to the guest that vPCI can handle. */
         unsigned int next, ttl = 48;
@@ -768,12 +768,18 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
             PCI_CAP_ID_MSI,
             PCI_CAP_ID_MSIX,
         };
+        /*
+         * For dom0, we should expose all capabilities instead of a fixed
+         * capabilities array, so setting n to 0 here is to get the next
+         * capability position directly in pci_find_next_cap_ttl.
+         */
+        const unsigned int n = is_hwdom ? 0 : ARRAY_SIZE(supported_caps);
 
         next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
-                                     supported_caps,
-                                     ARRAY_SIZE(supported_caps), &ttl);
+                                     supported_caps, n, &ttl);
 
-        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+        rc = vpci_add_register(pdev->vpci, vpci_read_val,
+                               is_hwdom ? vpci_hw_write8 : NULL,
                                PCI_CAPABILITY_LIST, 1,
                                (void *)(uintptr_t)next);
         if ( rc )
@@ -781,7 +787,7 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
 
         next &= ~3;
 
-        if ( !next )
+        if ( !next && !is_hwdom )
             /*
              * If we don't have any supported capabilities to expose to the
              * guest, mask the PCI_STATUS_CAP_LIST bit in the status
@@ -795,15 +801,18 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
 
             next = pci_find_next_cap_ttl(pdev->sbdf,
                                          pos + PCI_CAP_LIST_NEXT,
-                                         supported_caps,
-                                         ARRAY_SIZE(supported_caps), &ttl);
+                                         supported_caps, n, &ttl);
 
-            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
-                                   pos + PCI_CAP_LIST_ID, 1, NULL);
-            if ( rc )
-                return rc;
+            if ( !is_hwdom )
+            {
+                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
+                                       pos + PCI_CAP_LIST_ID, 1, NULL);
+                if ( rc )
+                    return rc;
+            }
 
-            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+            rc = vpci_add_register(pdev->vpci, vpci_read_val,
+                                   is_hwdom ? vpci_hw_write8 : NULL,
                                    pos + PCI_CAP_LIST_NEXT, 1,
                                    (void *)(uintptr_t)next);
             if ( rc )
@@ -813,6 +822,10 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
         }
     }
 
+    /* Return early for the hw domain, no masking of PCI_STATUS. */
+    if ( is_hwdom )
+        return 0;
+
     /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
     return vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
                                   PCI_STATUS, 2, NULL,
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index d2f0f97e0a04..09988f04c27c 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -255,6 +255,12 @@ uint32_t cf_check vpci_hw_read32(
     return pci_conf_read32(pdev->sbdf, reg);
 }
 
+void cf_check vpci_hw_write8(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
+{
+    pci_conf_write8(pdev->sbdf, reg, val);
+}
+
 void cf_check vpci_hw_write16(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 475981cb8155..fc8d5b470b0b 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -76,6 +76,8 @@ uint32_t cf_check vpci_hw_read16(
     const struct pci_dev *pdev, unsigned int reg, void *data);
 uint32_t cf_check vpci_hw_read32(
     const struct pci_dev *pdev, unsigned int reg, void *data);
+void cf_check vpci_hw_write8(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 void cf_check vpci_hw_write16(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-26  9:45 [PATCH v5 00/10] Support hiding capability when its initialization fails Jiqian Chen
  2025-05-26  9:45 ` [PATCH v5 01/10] vpci/header: Move emulating cap list logic into new function Jiqian Chen
  2025-05-26  9:45 ` [PATCH v5 02/10] vpci/header: Emulate legacy capability list for dom0 Jiqian Chen
@ 2025-05-26  9:45 ` Jiqian Chen
  2025-06-05 11:24   ` Roger Pau Monné
  2025-05-26  9:45 ` [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Jiqian Chen @ 2025-05-26  9:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

Add a new function to emulate extended capability list for dom0,
and call it in init_header(). So that it will be easy to hide a
extended capability whose initialization fails.

As for the extended capability list of domU, just move the logic
into above function and keep hiding it for domU.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v4->v5 changes:
* Add check: if capability list of hardware has a overlap, print warning and return 0.

v3->v4 changes:
* Add check "if ( !header )   return 0;" to avoid adding handler for
  device that has no extended capabilities.

v2->v3 changes:
* In vpci_init_ext_capability_list(), when domain is domU, directly return after adding a handler(hiding all extended capability for domU).
* In vpci_init_ext_capability_list(), change condition to be "while ( pos >= 0x100U && ttl-- )" instead of "while ( pos && ttl-- )".
* Add new function vpci_hw_write32, and pass it to extended capability handler for dom0.

v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c | 47 ++++++++++++++++++++++++++++++++-------
 xen/drivers/vpci/vpci.c   |  6 +++++
 xen/include/xen/vpci.h    |  2 ++
 3 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index d26cbba08ee1..4b2f761c9c24 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -836,6 +836,42 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
                                   PCI_STATUS_RSVDZ_MASK);
 }
 
+static int vpci_init_ext_capability_list(struct pci_dev *pdev)
+{
+    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
+
+    if ( !is_hardware_domain(pdev->domain) )
+        /* Extended capabilities read as zero, write ignore for guest */
+        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                 pos, 4, (void *)0);
+
+    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
+    {
+        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
+        int rc;
+
+        if ( !header )
+            return 0;
+
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
+                               pos, 4, (void *)(uintptr_t)header);
+        if ( rc == -EEXIST )
+        {
+            printk(XENLOG_WARNING
+                   "%pd %pp: overlap in extended cap list, offset %#x\n",
+                   pdev->domain, &pdev->sbdf, pos);
+            return 0;
+        }
+
+        if ( rc )
+            return rc;
+
+        pos = PCI_EXT_CAP_NEXT(header);
+    }
+
+    return 0;
+}
+
 static int cf_check init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -888,14 +924,9 @@ static int cf_check init_header(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
-    if ( !is_hwdom )
-    {
-        /* Extended capabilities read as zero, write ignore */
-        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL, 0x100, 4,
-                               (void *)0);
-        if ( rc )
-            return rc;
-    }
+    rc = vpci_init_ext_capability_list(pdev);
+    if ( rc )
+        return rc;
 
     if ( pdev->ignore_bars )
         return 0;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 09988f04c27c..8474c0e3b995 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -267,6 +267,12 @@ void cf_check vpci_hw_write16(
     pci_conf_write16(pdev->sbdf, reg, val);
 }
 
+void cf_check vpci_hw_write32(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
+{
+    pci_conf_write32(pdev->sbdf, reg, val);
+}
+
 int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
                            vpci_write_t *write_handler, unsigned int offset,
                            unsigned int size, void *data, uint32_t ro_mask,
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index fc8d5b470b0b..61d16cc8b897 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -80,6 +80,8 @@ void cf_check vpci_hw_write8(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 void cf_check vpci_hw_write16(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
+void cf_check vpci_hw_write32(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 
 /*
  * Check for pending vPCI operations on this vcpu. Returns true if the vcpu
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-05-26  9:45 [PATCH v5 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (2 preceding siblings ...)
  2025-05-26  9:45 ` [PATCH v5 03/10] vpci/header: Emulate extended " Jiqian Chen
@ 2025-05-26  9:45 ` Jiqian Chen
  2025-06-05 12:50   ` Roger Pau Monné
  2025-05-26  9:45 ` [PATCH v5 05/10] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Jiqian Chen @ 2025-05-26  9:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall,
	Stefano Stabellini

Refactor REGISTER_VPCI_INIT to contain more capability specific
information, this is benefit for follow-on changes to hide capability
when initialization fails.

What's more, change the definition of init_header() since it is
not a capability and it is needed for all devices' PCI config space.

After refactor, the "priority" of initializing capabilities isn't
needed anymore, so delete its related codes.

Note:
Call vpci_make_msix_hole() in the end of init_msix() since the change
of sequence of init_header() and init_msix().

The cleanup hook is also added in this change, even if it's still
unused. Further changes will make use of it.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
v4->v5 changes:
* Rename REGISTER_VPCI_CAP to REGISTER_PCI_CAPABILITY, rename REGISTER_VPCI_LEGACY_CAP to REGISTER_VPCI_CAP, rename REGISTER_VPCI_EXTENDED_CAP to REGISTER_VPCI_EXTCAP.
* Change cleanup hook of vpci_capability_t from void to int.

v3->v4 changes
* Delete the useless trailing dot of section ".data.vpci".
* Add description about priority since this patch removes the initializing priority of capabilities and priority is not needed anymore.
* Change the hook name from fini to cleanup.
* Change the name x and y to be finit and fclean.
* Remove the unnecessary check "!capability->init"

v2->v3 changes:
* This is separated from patch "vpci: Hide capability when it fails to initialize" of v2.
* Delete __maybe_unused attribute of "out" in function vpci_assign_devic().
* Rename REGISTER_VPCI_EXTEND_CAP to REGISTER_VPCI_EXTENDED_CAP.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c |  3 +--
 xen/drivers/vpci/msi.c    |  2 +-
 xen/drivers/vpci/msix.c   |  8 +++++--
 xen/drivers/vpci/rebar.c  |  2 +-
 xen/drivers/vpci/vpci.c   | 46 ++++++++++++++++++++++++++++++---------
 xen/include/xen/vpci.h    | 30 ++++++++++++++++++-------
 xen/include/xen/xen.lds.h |  2 +-
 7 files changed, 68 insertions(+), 25 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 4b2f761c9c24..9fa1cda23151 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -872,7 +872,7 @@ static int vpci_init_ext_capability_list(struct pci_dev *pdev)
     return 0;
 }
 
-static int cf_check init_header(struct pci_dev *pdev)
+int vpci_init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
     uint64_t addr, size;
@@ -1068,7 +1068,6 @@ static int cf_check init_header(struct pci_dev *pdev)
     pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
     return rc;
 }
-REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 66e5a8a116be..2d45c7867de7 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_CAP(PCI_CAP_ID_MSI, init_msi, NULL);
 
 void vpci_dump_msi(void)
 {
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 74211301ba10..674815ead025 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -703,9 +703,13 @@ static int cf_check init_msix(struct pci_dev *pdev)
     pdev->vpci->msix = msix;
     list_add(&msix->next, &d->arch.hvm.msix_tables);
 
-    return 0;
+    spin_lock(&pdev->vpci->lock);
+    rc = vpci_make_msix_hole(pdev);
+    spin_unlock(&pdev->vpci->lock);
+
+    return rc;
 }
-REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
+REGISTER_VPCI_CAP(PCI_CAP_ID_MSIX, init_msix, NULL);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
index 793937449af7..9cafd80ca2c9 100644
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_EXTCAP(PCI_EXT_CAP_ID_REBAR, init_rebar, NULL);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 8474c0e3b995..2861557e06d2 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -36,8 +36,8 @@ struct vpci_register {
 };
 
 #ifdef __XEN__
-extern vpci_register_init_t *const __start_vpci_array[];
-extern vpci_register_init_t *const __end_vpci_array[];
+extern vpci_capability_t *const __start_vpci_array[];
+extern vpci_capability_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
@@ -83,6 +83,34 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
 
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
+static int vpci_init_capabilities(struct pci_dev *pdev)
+{
+    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        const vpci_capability_t *capability = __start_vpci_array[i];
+        const unsigned int cap = capability->id;
+        const bool is_ext = capability->is_ext;
+        unsigned int pos;
+        int rc;
+
+        if ( !is_ext )
+            pos = pci_find_cap_offset(pdev->sbdf, cap);
+        else if ( !is_hardware_domain(pdev->domain) )
+            continue;
+        else
+            pos = pci_find_ext_capability(pdev->sbdf, cap);
+
+        if ( !pos )
+            continue;
+
+        rc = capability->init(pdev);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 void vpci_deassign_device(struct pci_dev *pdev)
 {
     unsigned int i;
@@ -128,7 +156,6 @@ void vpci_deassign_device(struct pci_dev *pdev)
 
 int vpci_assign_device(struct pci_dev *pdev)
 {
-    unsigned int i;
     const unsigned long *ro_map;
     int rc = 0;
 
@@ -159,14 +186,13 @@ int vpci_assign_device(struct pci_dev *pdev)
         goto out;
 #endif
 
-    for ( i = 0; i < NUM_VPCI_INIT; i++ )
-    {
-        rc = __start_vpci_array[i](pdev);
-        if ( rc )
-            break;
-    }
+    rc = vpci_init_header(pdev);
+    if ( rc )
+        goto out;
+
+    rc = vpci_init_capabilities(pdev);
 
- out: __maybe_unused;
+ out:
     if ( rc )
         vpci_deassign_device(pdev);
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 61d16cc8b897..e5e3f23ca39c 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -13,11 +13,12 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
 typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
                           uint32_t val, void *data);
 
-typedef int vpci_register_init_t(struct pci_dev *dev);
-
-#define VPCI_PRIORITY_HIGH      "1"
-#define VPCI_PRIORITY_MIDDLE    "5"
-#define VPCI_PRIORITY_LOW       "9"
+typedef struct {
+    unsigned int id;
+    bool is_ext;
+    int (*init)(struct pci_dev *pdev);
+    int (*cleanup)(struct pci_dev *pdev);
+} vpci_capability_t;
 
 #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
 
@@ -29,9 +30,22 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
  */
 #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
 
-#define REGISTER_VPCI_INIT(x, p)                \
-  static vpci_register_init_t *const x##_entry  \
-               __used_section(".data.vpci." p) = (x)
+#define REGISTER_PCI_CAPABILITY(cap, finit, fclean, ext) \
+  static vpci_capability_t finit##_t = { \
+        .id = (cap), \
+        .init = (finit), \
+        .cleanup = (fclean), \
+        .is_ext = (ext), \
+  }; \
+  static vpci_capability_t *const finit##_entry  \
+               __used_section(".data.vpci") = &finit##_t
+
+#define REGISTER_VPCI_CAP(cap, finit, fclean) \
+                REGISTER_PCI_CAPABILITY(cap, finit, fclean, false)
+#define REGISTER_VPCI_EXTCAP(cap, finit, fclean) \
+                REGISTER_PCI_CAPABILITY(cap, finit, fclean, true)
+
+int __must_check vpci_init_header(struct pci_dev *pdev);
 
 /* Assign vPCI to device by adding handlers. */
 int __must_check vpci_assign_device(struct pci_dev *pdev);
diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
index 793d0e11450c..84ec506b00da 100644
--- a/xen/include/xen/xen.lds.h
+++ b/xen/include/xen/xen.lds.h
@@ -188,7 +188,7 @@
 #define VPCI_ARRAY               \
        . = ALIGN(POINTER_ALIGN); \
        __start_vpci_array = .;   \
-       *(SORT(.data.vpci.*))     \
+       *(.data.vpci)             \
        __end_vpci_array = .;
 #else
 #define VPCI_ARRAY
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 05/10] vpci: Hide legacy capability when it fails to initialize
  2025-05-26  9:45 [PATCH v5 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (3 preceding siblings ...)
  2025-05-26  9:45 ` [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
@ 2025-05-26  9:45 ` Jiqian Chen
  2025-06-05 13:35   ` Roger Pau Monné
  2025-05-26  9:45 ` [PATCH v5 06/10] vpci: Hide extended " Jiqian Chen
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Jiqian Chen @ 2025-05-26  9:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When vpci fails to initialize a legacy capability of device, it just
returns an error and vPCI gets disabled for the whole device.  That
most likely renders the device unusable, plus possibly causing issues
to Xen itself if guest attempts to program the native MSI or MSI-X
capabilities if present.

So, add new function to hide legacy capability when initialization
fails. And remove the failed legacy capability from the vpci emulated
legacy capability list.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v4->v5 changes:
* Modify vpci_get_register() to delete some unnecessary check, so that I don't need to move function vpci_register_cmp().
* Rename vpci_capability_mask() to vpci_capability_hide().

v3->v4 changes:
* Modify the commit message.
* In function vpci_get_previous_cap_register(), add an ASSERT_UNREACHABLE() if offset below 0x40.
* Modify vpci_capability_mask() to return error instead of using ASSERT.
* Use vpci_remove_register to remove PCI_CAP_LIST_ID register instead of open code.
* Add check "if ( !offset )" in vpci_capability_mask().

v2->v3 changes:
* Separated from the last version patch "vpci: Hide capability when it fails to initialize"
* Whole implementation changed because last version is wrong.
  This version adds a new helper function vpci_get_register() and uses it to get
  target handler and previous handler from vpci->handlers, then remove the target.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
  remove failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/vpci.c | 117 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 113 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 2861557e06d2..60e7654ec377 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -83,6 +83,99 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
 
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
+static struct vpci_register *vpci_get_register(struct vpci *vpci,
+                                               unsigned int offset,
+                                               unsigned int size)
+{
+    struct vpci_register *rm;
+
+    ASSERT(spin_is_locked(&vpci->lock));
+
+    list_for_each_entry ( rm, &vpci->handlers, node )
+    {
+        if ( rm->offset == offset && rm->size == size )
+            return rm;
+
+        if ( offset <= rm->offset )
+            break;
+    }
+
+    return NULL;
+}
+
+static struct vpci_register *vpci_get_previous_cap_register(
+    struct vpci *vpci, unsigned int offset)
+{
+    uint32_t next;
+    struct vpci_register *r;
+
+    if ( offset < 0x40 )
+    {
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1);
+    if ( !r )
+        return NULL;
+
+    next = (uint32_t)(uintptr_t)r->private;
+    while ( next >= 0x40 && next != offset )
+    {
+        r = vpci_get_register(vpci, next + PCI_CAP_LIST_NEXT, 1);
+        if ( !r )
+            return NULL;
+        next = (uint32_t)(uintptr_t)r->private;
+    }
+
+    if ( next < 0x40 )
+        return NULL;
+
+    return r;
+}
+
+static int vpci_capability_hide(struct pci_dev *pdev, unsigned int cap)
+{
+    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
+    struct vpci_register *prev_next_r, *next_r;
+    struct vpci *vpci = pdev->vpci;
+
+    if ( !offset )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&vpci->lock);
+    next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
+    if ( !next_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    prev_next_r = vpci_get_previous_cap_register(vpci, offset);
+    if ( !prev_next_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    prev_next_r->private = next_r->private;
+    /*
+     * Not calling vpci_remove_register() here is to avoid redoing
+     * the register search
+     */
+    list_del(&next_r->node);
+    spin_unlock(&vpci->lock);
+    xfree(next_r);
+
+    if ( !is_hardware_domain(pdev->domain) )
+        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
+
+    return 0;
+}
+
 static int vpci_init_capabilities(struct pci_dev *pdev)
 {
     for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
@@ -91,7 +184,6 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
         const unsigned int cap = capability->id;
         const bool is_ext = capability->is_ext;
         unsigned int pos;
-        int rc;
 
         if ( !is_ext )
             pos = pci_find_cap_offset(pdev->sbdf, cap);
@@ -103,9 +195,26 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
         if ( !pos )
             continue;
 
-        rc = capability->init(pdev);
-        if ( rc )
-            return rc;
+        if ( capability->init(pdev) )
+        {
+            int rc;
+
+            if ( capability->cleanup ) {
+                rc = capability->cleanup(pdev);
+                if ( rc )
+                    return rc;
+            }
+
+            printk(XENLOG_WARNING "%pd %pp: %s cap %u init fail, mask it\n",
+                   pdev->domain, &pdev->sbdf,
+                   is_ext ? "extended" : "legacy", cap);
+            if ( !is_ext )
+            {
+                rc = vpci_capability_hide(pdev, cap);
+                if ( rc )
+                    return rc;
+            }
+        }
     }
 
     return 0;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 06/10] vpci: Hide extended capability when it fails to initialize
  2025-05-26  9:45 [PATCH v5 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (4 preceding siblings ...)
  2025-05-26  9:45 ` [PATCH v5 05/10] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
@ 2025-05-26  9:45 ` Jiqian Chen
  2025-06-05 14:47   ` Roger Pau Monné
  2025-05-26  9:45 ` [PATCH v5 07/10] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Jiqian Chen @ 2025-05-26  9:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall,
	Stefano Stabellini

When vpci fails to initialize a extended capability of device, it
just returns an error and vPCI gets disabled for the whole device.

So, add function to hide extended capability when initialization
fails. And remove the failed extended capability handler from vpci
extended capability list.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
v4->v5 changes:
* Modify the hex digits of PCI_EXT_CAP_NEXT_MASK and PCI_EXT_CAP_NEXT to be low case.
* Rename vpci_ext_capability_mask to vpci_ext_capability_hide.

v3->v4 changes:
* Change definition of PCI_EXT_CAP_NEXT to be "#define PCI_EXT_CAP_NEXT(header) (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xFFCU)" to avoid redundancy.
* Modify the commit message.
* Change vpci_ext_capability_mask() to return error instead of using ASSERT.
* Set the capability ID part to be zero when we need to hide the capability of position 0x100U.
* Add check "if ( !offset )" in vpci_ext_capability_mask().

v2->v3 changes:
* Separated from the last version patch "vpci: Hide capability when it fails to initialize".
* Whole implementation changed because last version is wrong.
  This version gets target handler and previous handler from vpci->handlers, then remove the target.
* Note: a case in function vpci_ext_capability_mask() needs to be discussed,
  because it may change the offset of next capability when the offset of target
  capability is 0x100U(the first extended capability), my implementation is just to
  ignore and let hardware to handle the target capability.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
  remove failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/vpci.c    | 100 +++++++++++++++++++++++++++++++++++--
 xen/include/xen/pci_regs.h |   5 +-
 2 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 60e7654ec377..2d4794ff3dea 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -176,6 +176,98 @@ static int vpci_capability_hide(struct pci_dev *pdev, unsigned int cap)
     return 0;
 }
 
+static struct vpci_register *vpci_get_previous_ext_cap_register(
+    struct vpci *vpci, unsigned int offset)
+{
+    uint32_t header;
+    unsigned int pos = PCI_CFG_SPACE_SIZE;
+    struct vpci_register *r;
+
+    if ( offset <= PCI_CFG_SPACE_SIZE )
+    {
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    r = vpci_get_register(vpci, pos, 4);
+    if ( !r )
+        return NULL;
+
+    header = (uint32_t)(uintptr_t)r->private;
+    pos = PCI_EXT_CAP_NEXT(header);
+    while ( pos > PCI_CFG_SPACE_SIZE && pos != offset )
+    {
+        r = vpci_get_register(vpci, pos, 4);
+        if ( !r )
+            return NULL;
+        header = (uint32_t)(uintptr_t)r->private;
+        pos = PCI_EXT_CAP_NEXT(header);
+    }
+
+    if ( pos <= PCI_CFG_SPACE_SIZE )
+        return NULL;
+
+    return r;
+}
+
+static int vpci_ext_capability_hide(struct pci_dev *pdev, unsigned int cap)
+{
+    const unsigned int offset = pci_find_ext_capability(pdev->sbdf, cap);
+    struct vpci_register *rm, *prev_r;
+    struct vpci *vpci = pdev->vpci;
+    uint32_t header, pre_header;
+
+    if ( !offset )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&vpci->lock);
+    rm = vpci_get_register(vpci, offset, 4);
+    if ( !rm )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    header = (uint32_t)(uintptr_t)rm->private;
+    if ( offset == PCI_CFG_SPACE_SIZE )
+    {
+        if ( PCI_EXT_CAP_NEXT(header) <= PCI_CFG_SPACE_SIZE )
+            rm->private = (void *)(uintptr_t)0;
+        else
+            /*
+             * If this case removes target capability of position 0x100U, then
+             * it needs to move the next capability to be in position 0x100U,
+             * that would cause the offset of next capability in vpci different
+             * from the hardware, then cause error accesses, so here chooses to
+             * set the capability ID part to be zero.
+             */
+            rm->private = (void *)(uintptr_t)(header &
+                                              ~PCI_EXT_CAP_ID(header));
+
+        spin_unlock(&vpci->lock);
+        return 0;
+    }
+
+    prev_r = vpci_get_previous_ext_cap_register(vpci, offset);
+    if ( !prev_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    pre_header = (uint32_t)(uintptr_t)prev_r->private;
+    prev_r->private = (void *)(uintptr_t)((pre_header &
+                                           ~PCI_EXT_CAP_NEXT_MASK) |
+                                          (header & PCI_EXT_CAP_NEXT_MASK));
+    list_del(&rm->node);
+    spin_unlock(&vpci->lock);
+    xfree(rm);
+    return 0;
+}
+
 static int vpci_init_capabilities(struct pci_dev *pdev)
 {
     for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
@@ -209,11 +301,11 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
                    pdev->domain, &pdev->sbdf,
                    is_ext ? "extended" : "legacy", cap);
             if ( !is_ext )
-            {
                 rc = vpci_capability_hide(pdev, cap);
-                if ( rc )
-                    return rc;
-            }
+            else
+                rc = vpci_ext_capability_hide(pdev, cap);
+            if ( rc )
+                return rc;
         }
     }
 
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index 27b4f44eedf3..3b6963133dbd 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -448,7 +448,10 @@
 /* Extended Capabilities (PCI-X 2.0 and Express) */
 #define PCI_EXT_CAP_ID(header)		((header) & 0x0000ffff)
 #define PCI_EXT_CAP_VER(header)		(((header) >> 16) & 0xf)
-#define PCI_EXT_CAP_NEXT(header)	(((header) >> 20) & 0xffc)
+#define PCI_EXT_CAP_NEXT_MASK		0xfff00000U
+/* Bottom two bits of next capability position are reserved. */
+#define PCI_EXT_CAP_NEXT(header) \
+    (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xffcU)
 
 #define PCI_EXT_CAP_ID_ERR	1
 #define PCI_EXT_CAP_ID_VC	2
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 07/10] vpci: Refactor vpci_remove_register to remove matched registers
  2025-05-26  9:45 [PATCH v5 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (5 preceding siblings ...)
  2025-05-26  9:45 ` [PATCH v5 06/10] vpci: Hide extended " Jiqian Chen
@ 2025-05-26  9:45 ` Jiqian Chen
  2025-06-05 15:00   ` Roger Pau Monné
  2025-05-26  9:45 ` [PATCH v5 08/10] vpci/rebar: Free Rebar resources when init_rebar() fails Jiqian Chen
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Jiqian Chen @ 2025-05-26  9:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Anthony PERARD

vpci_remove_register() only supports removing a register in a time,
but the follow-on changes need to remove all registers within a range.
So, refactor it to support removing all matched registers in a calling
time.

And it is no matter to remove a non exist register, so remove the
__must_check prefix.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
---
v4->v5 changes:
No.

v3->v4 changes:
* Use list_for_each_entry_safe instead of list_for_each_entry.
* Return ERANGE if overlap.

v2->v3 changes:
* Add new check to return error if registers overlap but not inside range.

v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 tools/tests/vpci/main.c |  4 ++--
 xen/drivers/vpci/vpci.c | 38 ++++++++++++++++++++------------------
 xen/include/xen/vpci.h  |  4 ++--
 3 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
index 33223db3eb77..ca72877d60cd 100644
--- a/tools/tests/vpci/main.c
+++ b/tools/tests/vpci/main.c
@@ -132,10 +132,10 @@ static void vpci_write32_mask(const struct pci_dev *pdev, unsigned int reg,
                                   rsvdz_mask))
 
 #define VPCI_REMOVE_REG(off, size)                                          \
-    assert(!vpci_remove_register(test_pdev.vpci, off, size))
+    assert(!vpci_remove_registers(test_pdev.vpci, off, size))
 
 #define VPCI_REMOVE_INVALID_REG(off, size)                                  \
-    assert(vpci_remove_register(test_pdev.vpci, off, size))
+    assert(vpci_remove_registers(test_pdev.vpci, off, size))
 
 /* Read a 32b register using all possible sizes. */
 void multiread4_check(unsigned int reg, uint32_t val)
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 2d4794ff3dea..d3e9a76d0cf6 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -163,7 +163,7 @@ static int vpci_capability_hide(struct pci_dev *pdev, unsigned int cap)
 
     prev_next_r->private = next_r->private;
     /*
-     * Not calling vpci_remove_register() here is to avoid redoing
+     * Not calling vpci_remove_registers() here is to avoid redoing
      * the register search
      */
     list_del(&next_r->node);
@@ -171,7 +171,7 @@ static int vpci_capability_hide(struct pci_dev *pdev, unsigned int cap)
     xfree(next_r);
 
     if ( !is_hardware_domain(pdev->domain) )
-        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
+        return vpci_remove_registers(vpci, offset + PCI_CAP_LIST_ID, 1);
 
     return 0;
 }
@@ -561,34 +561,36 @@ int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
     return 0;
 }
 
-int vpci_remove_register(struct vpci *vpci, unsigned int offset,
-                         unsigned int size)
+int vpci_remove_registers(struct vpci *vpci, unsigned int start,
+                          unsigned int size)
 {
-    const struct vpci_register r = { .offset = offset, .size = size };
-    struct vpci_register *rm;
+    struct vpci_register *rm, *tmp;
+    unsigned int end = start + size;
 
     spin_lock(&vpci->lock);
-    list_for_each_entry ( rm, &vpci->handlers, node )
+    list_for_each_entry_safe ( rm, tmp, &vpci->handlers, node )
     {
-        int cmp = vpci_register_cmp(&r, rm);
-
-        /*
-         * NB: do not use a switch so that we can use break to
-         * get out of the list loop earlier if required.
-         */
-        if ( !cmp && rm->offset == offset && rm->size == size )
+        /* Remove rm if rm is inside the range. */
+        if ( rm->offset >= start && rm->offset + rm->size <= end )
         {
             list_del(&rm->node);
-            spin_unlock(&vpci->lock);
             xfree(rm);
-            return 0;
+            continue;
         }
-        if ( cmp <= 0 )
+
+        /* Return error if registers overlap but not inside. */
+        if ( rm->offset + rm->size > start && rm->offset < end )
+        {
+            spin_unlock(&vpci->lock);
+            return -ERANGE;
+        }
+
+        if ( start < rm->offset )
             break;
     }
     spin_unlock(&vpci->lock);
 
-    return -ENOENT;
+    return 0;
 }
 
 /* Wrappers for performing reads/writes to the underlying hardware. */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index e5e3f23ca39c..67a831d8e9a0 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -71,8 +71,8 @@ static inline int __must_check vpci_add_register(struct vpci *vpci,
                                   size, data, 0, 0, 0, 0);
 }
 
-int __must_check vpci_remove_register(struct vpci *vpci, unsigned int offset,
-                                      unsigned int size);
+int vpci_remove_registers(struct vpci *vpci, unsigned int start,
+                          unsigned int size);
 
 /* Generic read/write handlers for the PCI config space. */
 uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 08/10] vpci/rebar: Free Rebar resources when init_rebar() fails
  2025-05-26  9:45 [PATCH v5 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (6 preceding siblings ...)
  2025-05-26  9:45 ` [PATCH v5 07/10] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
@ 2025-05-26  9:45 ` Jiqian Chen
  2025-06-05 15:14   ` Roger Pau Monné
  2025-05-26  9:45 ` [PATCH v5 09/10] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
  2025-05-26  9:45 ` [PATCH v5 10/10] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
  9 siblings, 1 reply; 44+ messages in thread
From: Jiqian Chen @ 2025-05-26  9:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_rebar() fails, current logic return fail and free Rebar-related
resources in vpci_deassign_device(). But the previous new changes will
hide Rebar capability and return success, it can't reach
vpci_deassign_device() to remove resources if hiding success, so those
resources must be removed in cleanup function of Rebar.

To do that, implement cleanup function for Rebar.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v4->v5 changes:
* Change definition "static void cleanup_rebar" to "static int cf_check cleanup_rebar" since cleanup hook is changed to be int.

v3->v4 changes:
* Change function name from fini_rebar() to cleanup_rebar().
* Change the error number to be E2BIG and ENXIO in init_rebar().

v2->v3 changes:
* Use fini_rebar() to remove all register instead of in the failure path of init_rebar();

v1->v2 changes:
* Called vpci_remove_registers() to remove all possible registered registers instead of using a array to record all registered register.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/rebar.c | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
index 9cafd80ca2c9..4b1892fab3d6 100644
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -49,6 +49,26 @@ static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
     bar->guest_addr = bar->addr;
 }
 
+static int cf_check cleanup_rebar(struct pci_dev *pdev)
+{
+    uint32_t ctrl;
+    unsigned int nbars;
+    unsigned int rebar_offset = pci_find_ext_capability(pdev->sbdf,
+                                                        PCI_EXT_CAP_ID_REBAR);
+
+    if ( !rebar_offset || !is_hardware_domain(pdev->domain) )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    ctrl = pci_conf_read32(pdev->sbdf, rebar_offset + PCI_REBAR_CTRL(0));
+    nbars = MASK_EXTR(ctrl, PCI_REBAR_CTRL_NBAR_MASK);
+
+    return vpci_remove_registers(pdev->vpci, rebar_offset + PCI_REBAR_CAP(0),
+                                 PCI_REBAR_CTRL(nbars - 1));
+}
+
 static int cf_check init_rebar(struct pci_dev *pdev)
 {
     uint32_t ctrl;
@@ -80,7 +100,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: too big BAR number %u in REBAR_CTRL\n",
                    pdev->domain, &pdev->sbdf, index);
-            continue;
+            return -E2BIG;
         }
 
         bar = &pdev->vpci->header.bars[index];
@@ -88,7 +108,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: BAR%u is not in memory space\n",
                    pdev->domain, &pdev->sbdf, index);
-            continue;
+            return -ENXIO;
         }
 
         rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rebar_ctrl_write,
@@ -97,14 +117,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: BAR%u fail to add reg of REBAR_CTRL rc=%d\n",
                    pdev->domain, &pdev->sbdf, index, rc);
-            /*
-             * Ideally we would hide the ReBar capability on error, but code
-             * for doing so still needs to be written. Use continue instead
-             * to keep any already setup register hooks, as returning an
-             * error will cause the hardware domain to get unmediated access
-             * to all device registers.
-             */
-            continue;
+            return rc;
         }
 
         bar->resizable_sizes =
@@ -118,7 +131,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_EXTCAP(PCI_EXT_CAP_ID_REBAR, init_rebar, NULL);
+REGISTER_VPCI_EXTCAP(PCI_EXT_CAP_ID_REBAR, init_rebar, cleanup_rebar);
 
 /*
  * Local variables:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-26  9:45 [PATCH v5 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (7 preceding siblings ...)
  2025-05-26  9:45 ` [PATCH v5 08/10] vpci/rebar: Free Rebar resources when init_rebar() fails Jiqian Chen
@ 2025-05-26  9:45 ` Jiqian Chen
  2025-06-05 15:19   ` Roger Pau Monné
  2025-05-26  9:45 ` [PATCH v5 10/10] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
  9 siblings, 1 reply; 44+ messages in thread
From: Jiqian Chen @ 2025-05-26  9:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_msi() fails, current logic return fail and free MSI-related
resources in vpci_deassign_device(). But the previous new changes will
hide MSI capability and return success, it can't reach
vpci_deassign_device() to remove resources if hiding success, so those
resources must be removed in cleanup function of MSI.

To do that, implement cleanup function for MSI.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v4->v5 changes:
* Change definition "static void cleanup_msi" to "static int cf_check cleanup_msi" since cleanup hook is changed to be int.
* Add a read-only register for MSI Control Register in the end of cleanup_msi.

v3->v4 changes:
* Change function name from fini_msi() to cleanup_msi().
* Remove unnecessary comment.
* Change to use XFREE to free vpci->msi.

v2->v3 changes:
* Remove all fail path, and use fini_msi() hook instead.
* Change the method to calculating the size of msi registers.

v1->v2 changes:
* Added a new function fini_msi to free all MSI resources instead of using an array to record registered registers.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/msi.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 2d45c7867de7..4e106c39efae 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -193,6 +193,33 @@ static void cf_check mask_write(
     msi->mask = val;
 }
 
+static int cf_check cleanup_msi(struct pci_dev *pdev)
+{
+    int rc;
+    unsigned int end, size;
+    struct vpci *vpci = pdev->vpci;
+    const unsigned int msi_pos = pdev->msi_pos;
+    const unsigned int ctrl = msi_control_reg(msi_pos);
+
+    if ( !msi_pos || !vpci->msi )
+        return 0;
+
+    if ( vpci->msi->masking )
+        end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
+    else
+        end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
+
+    size = end - ctrl;
+
+    rc = vpci_remove_registers(vpci, ctrl, size);
+    if ( rc )
+        return rc;
+
+    XFREE(vpci->msi);
+
+    return vpci_add_register(pdev->vpci, vpci_hw_read16, NULL, ctrl, 2, NULL);
+}
+
 static int cf_check init_msi(struct pci_dev *pdev)
 {
     unsigned int pos = pdev->msi_pos;
@@ -270,7 +297,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_CAP(PCI_CAP_ID_MSI, init_msi, NULL);
+REGISTER_VPCI_CAP(PCI_CAP_ID_MSI, init_msi, cleanup_msi);
 
 void vpci_dump_msi(void)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 10/10] vpci/msix: Free MSIX resources when init_msix() fails
  2025-05-26  9:45 [PATCH v5 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (8 preceding siblings ...)
  2025-05-26  9:45 ` [PATCH v5 09/10] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
@ 2025-05-26  9:45 ` Jiqian Chen
  2025-06-05 15:34   ` Roger Pau Monné
  9 siblings, 1 reply; 44+ messages in thread
From: Jiqian Chen @ 2025-05-26  9:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_msix() fails, current logic return fail and free MSIX-related
resources in vpci_deassign_device(). But the previous new changes will
hide MSIX capability and return success, it can't reach
vpci_deassign_device() to remove resources if hiding success, so those
resources must be removed in cleanup function of MSIX.

To do that, implement cleanup function for MSIX.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v4->v5 changes:
* Change definition "static void cleanup_msix" to "static int cf_check cleanup_msix" since cleanup hook is changed to be int.
* Add a read-only register for MSIX Control Register in the end of cleanup_msix().

v3->v4 changes:
* Change function name from fini_msix() to cleanup_msix().
* Change to use XFREE to free vpci->msix.
* In cleanup function, change the sequence of check and remove action according to init_msix().

v2->v3 changes:
* Remove unnecessary clean operations in fini_msix().

v1->v2 changes:
new patch.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/msix.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 674815ead025..cf79320d3b6f 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -655,6 +655,33 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
     return 0;
 }
 
+static int cf_check cleanup_msix(struct pci_dev *pdev)
+{
+    int rc;
+    struct vpci *vpci = pdev->vpci;
+    const unsigned int msix_pos = pdev->msix_pos;
+
+    if ( !msix_pos )
+        return 0;
+
+    rc = vpci_remove_registers(vpci, msix_control_reg(msix_pos), 2);
+    if ( rc )
+        return rc;
+
+    if ( !vpci->msix )
+        return 0;
+
+    for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
+        if ( vpci->msix->table[i] )
+            iounmap(vpci->msix->table[i]);
+
+    list_del(&vpci->msix->next);
+    XFREE(vpci->msix);
+
+    return vpci_add_register(pdev->vpci, vpci_hw_read16, NULL,
+                             msix_control_reg(msix_pos), 2, NULL);
+}
+
 static int cf_check init_msix(struct pci_dev *pdev)
 {
     struct domain *d = pdev->domain;
@@ -709,7 +736,7 @@ static int cf_check init_msix(struct pci_dev *pdev)
 
     return rc;
 }
-REGISTER_VPCI_CAP(PCI_CAP_ID_MSIX, init_msix, NULL);
+REGISTER_VPCI_CAP(PCI_CAP_ID_MSIX, init_msix, cleanup_msix);
 
 /*
  * Local variables:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/10] vpci/header: Emulate legacy capability list for dom0
  2025-05-26  9:45 ` [PATCH v5 02/10] vpci/header: Emulate legacy capability list for dom0 Jiqian Chen
@ 2025-06-05 10:54   ` Roger Pau Monné
  0 siblings, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-05 10:54 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Mon, May 26, 2025 at 05:45:51PM +0800, Jiqian Chen wrote:
> Current logic of emulating legacy capability list is only for domU.
> So, expand it to emulate for dom0 too. Then it will be easy to hide
> a capability whose initialization fails in a function.
> 
> And restrict adding PCI_STATUS register only for domU since dom0
> has no limitation to access that register.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-26  9:45 ` [PATCH v5 03/10] vpci/header: Emulate extended " Jiqian Chen
@ 2025-06-05 11:24   ` Roger Pau Monné
  2025-06-06  4:05     ` Chen, Jiqian
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-05 11:24 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Mon, May 26, 2025 at 05:45:52PM +0800, Jiqian Chen wrote:
> Add a new function to emulate extended capability list for dom0,
> and call it in init_header(). So that it will be easy to hide a
> extended capability whose initialization fails.
> 
> As for the extended capability list of domU, just move the logic
> into above function and keep hiding it for domU.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v4->v5 changes:
> * Add check: if capability list of hardware has a overlap, print warning and return 0.
> 
> v3->v4 changes:
> * Add check "if ( !header )   return 0;" to avoid adding handler for
>   device that has no extended capabilities.
> 
> v2->v3 changes:
> * In vpci_init_ext_capability_list(), when domain is domU, directly return after adding a handler(hiding all extended capability for domU).
> * In vpci_init_ext_capability_list(), change condition to be "while ( pos >= 0x100U && ttl-- )" instead of "while ( pos && ttl-- )".
> * Add new function vpci_hw_write32, and pass it to extended capability handler for dom0.
> 
> v1->v2 changes:
> new patch
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/header.c | 47 ++++++++++++++++++++++++++++++++-------
>  xen/drivers/vpci/vpci.c   |  6 +++++
>  xen/include/xen/vpci.h    |  2 ++
>  3 files changed, 47 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index d26cbba08ee1..4b2f761c9c24 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -836,6 +836,42 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>                                    PCI_STATUS_RSVDZ_MASK);
>  }
>  
> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
> +{
> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;

I think you can drop the ttl variable, as said by Jan in a previous
review, the purpose of that counter is to detect overlaps in the PCIe
config space, but for the context here the call to vpci_add_register()
already serves that purpose by returning -EEXIST.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-05-26  9:45 ` [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
@ 2025-06-05 12:50   ` Roger Pau Monné
  2025-06-05 12:59     ` Jan Beulich
  2025-06-06  6:29     ` Chen, Jiqian
  0 siblings, 2 replies; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-05 12:50 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Huang Rui, Andrew Cooper, Anthony PERARD, Michal Orzel,
	Jan Beulich, Julien Grall, Stefano Stabellini

On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote:
> Refactor REGISTER_VPCI_INIT to contain more capability specific
> information, this is benefit for follow-on changes to hide capability

"this will benefit further follow-on..."

I think it's clearer.

> when initialization fails.
> 
> What's more, change the definition of init_header() since it is
> not a capability and it is needed for all devices' PCI config space.
> 
> After refactor, the "priority" of initializing capabilities isn't
> needed anymore, so delete its related codes.
> 
> Note:
> Call vpci_make_msix_hole() in the end of init_msix() since the change
> of sequence of init_header() and init_msix().
> 
> The cleanup hook is also added in this change, even if it's still
> unused. Further changes will make use of it.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> cc: Andrew Cooper <andrew.cooper3@citrix.com>
> cc: Anthony PERARD <anthony.perard@vates.tech>
> cc: Michal Orzel <michal.orzel@amd.com>
> cc: Jan Beulich <jbeulich@suse.com>
> cc: Julien Grall <julien@xen.org>
> cc: Stefano Stabellini <sstabellini@kernel.org>
> ---
> v4->v5 changes:
> * Rename REGISTER_VPCI_CAP to REGISTER_PCI_CAPABILITY, rename REGISTER_VPCI_LEGACY_CAP to REGISTER_VPCI_CAP, rename REGISTER_VPCI_EXTENDED_CAP to REGISTER_VPCI_EXTCAP.
> * Change cleanup hook of vpci_capability_t from void to int.
> 
> v3->v4 changes
> * Delete the useless trailing dot of section ".data.vpci".
> * Add description about priority since this patch removes the initializing priority of capabilities and priority is not needed anymore.
> * Change the hook name from fini to cleanup.
> * Change the name x and y to be finit and fclean.
> * Remove the unnecessary check "!capability->init"
> 
> v2->v3 changes:
> * This is separated from patch "vpci: Hide capability when it fails to initialize" of v2.
> * Delete __maybe_unused attribute of "out" in function vpci_assign_devic().
> * Rename REGISTER_VPCI_EXTEND_CAP to REGISTER_VPCI_EXTENDED_CAP.
> 
> v1->v2 changes:
> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove failed capability from list.
> * Called vpci_make_msix_hole() in the end of init_msix().
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/header.c |  3 +--
>  xen/drivers/vpci/msi.c    |  2 +-
>  xen/drivers/vpci/msix.c   |  8 +++++--
>  xen/drivers/vpci/rebar.c  |  2 +-
>  xen/drivers/vpci/vpci.c   | 46 ++++++++++++++++++++++++++++++---------
>  xen/include/xen/vpci.h    | 30 ++++++++++++++++++-------
>  xen/include/xen/xen.lds.h |  2 +-
>  7 files changed, 68 insertions(+), 25 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 4b2f761c9c24..9fa1cda23151 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -872,7 +872,7 @@ static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>      return 0;
>  }
>  
> -static int cf_check init_header(struct pci_dev *pdev)
> +int vpci_init_header(struct pci_dev *pdev)
>  {
>      uint16_t cmd;
>      uint64_t addr, size;
> @@ -1068,7 +1068,6 @@ static int cf_check init_header(struct pci_dev *pdev)
>      pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>      return rc;
>  }
> -REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
>  
>  /*
>   * Local variables:
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> index 66e5a8a116be..2d45c7867de7 100644
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>  
>      return 0;
>  }
> -REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
> +REGISTER_VPCI_CAP(PCI_CAP_ID_MSI, init_msi, NULL);
>  
>  void vpci_dump_msi(void)
>  {
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index 74211301ba10..674815ead025 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -703,9 +703,13 @@ static int cf_check init_msix(struct pci_dev *pdev)
>      pdev->vpci->msix = msix;
>      list_add(&msix->next, &d->arch.hvm.msix_tables);
>  
> -    return 0;
> +    spin_lock(&pdev->vpci->lock);
> +    rc = vpci_make_msix_hole(pdev);
> +    spin_unlock(&pdev->vpci->lock);
> +
> +    return rc;
>  }
> -REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
> +REGISTER_VPCI_CAP(PCI_CAP_ID_MSIX, init_msix, NULL);
>  
>  /*
>   * Local variables:
> diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
> index 793937449af7..9cafd80ca2c9 100644
> --- a/xen/drivers/vpci/rebar.c
> +++ b/xen/drivers/vpci/rebar.c
> @@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
>  
>      return 0;
>  }
> -REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
> +REGISTER_VPCI_EXTCAP(PCI_EXT_CAP_ID_REBAR, init_rebar, NULL);
>  
>  /*
>   * Local variables:
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 8474c0e3b995..2861557e06d2 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -36,8 +36,8 @@ struct vpci_register {
>  };
>  
>  #ifdef __XEN__
> -extern vpci_register_init_t *const __start_vpci_array[];
> -extern vpci_register_init_t *const __end_vpci_array[];
> +extern vpci_capability_t *const __start_vpci_array[];
> +extern vpci_capability_t *const __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
>  #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> @@ -83,6 +83,34 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>  
> +static int vpci_init_capabilities(struct pci_dev *pdev)
> +{
> +    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
> +    {
> +        const vpci_capability_t *capability = __start_vpci_array[i];
> +        const unsigned int cap = capability->id;
> +        const bool is_ext = capability->is_ext;
> +        unsigned int pos;
> +        int rc;
> +
> +        if ( !is_ext )
> +            pos = pci_find_cap_offset(pdev->sbdf, cap);
> +        else if ( !is_hardware_domain(pdev->domain) )
> +            continue;
> +        else
> +            pos = pci_find_ext_capability(pdev->sbdf, cap);

Nit: it's more compact if you do something like:

        unsigned int pos = 0;

        if ( !is_ext )
            pos = pci_find_cap_offset(pdev->sbdf, cap);
        else if ( is_hardware_domain(pdev->domain) )
            pos = pci_find_ext_capability(pdev->sbdf, cap);

        if ( !pos )
            continue;

> +
> +        if ( !pos )
> +            continue;
> +
> +        rc = capability->init(pdev);
> +        if ( rc )
> +            return rc;
> +    }
> +
> +    return 0;
> +}
> +
>  void vpci_deassign_device(struct pci_dev *pdev)
>  {
>      unsigned int i;
> @@ -128,7 +156,6 @@ void vpci_deassign_device(struct pci_dev *pdev)
>  
>  int vpci_assign_device(struct pci_dev *pdev)
>  {
> -    unsigned int i;
>      const unsigned long *ro_map;
>      int rc = 0;
>  
> @@ -159,14 +186,13 @@ int vpci_assign_device(struct pci_dev *pdev)
>          goto out;
>  #endif
>  
> -    for ( i = 0; i < NUM_VPCI_INIT; i++ )
> -    {
> -        rc = __start_vpci_array[i](pdev);
> -        if ( rc )
> -            break;
> -    }
> +    rc = vpci_init_header(pdev);
> +    if ( rc )
> +        goto out;
> +
> +    rc = vpci_init_capabilities(pdev);
>  
> - out: __maybe_unused;
> + out:
>      if ( rc )
>          vpci_deassign_device(pdev);
>  
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 61d16cc8b897..e5e3f23ca39c 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -13,11 +13,12 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
>  typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
>                            uint32_t val, void *data);
>  
> -typedef int vpci_register_init_t(struct pci_dev *dev);
> -
> -#define VPCI_PRIORITY_HIGH      "1"
> -#define VPCI_PRIORITY_MIDDLE    "5"
> -#define VPCI_PRIORITY_LOW       "9"
> +typedef struct {
> +    unsigned int id;
> +    bool is_ext;
> +    int (*init)(struct pci_dev *pdev);
> +    int (*cleanup)(struct pci_dev *pdev);
> +} vpci_capability_t;
>  
>  #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
>  
> @@ -29,9 +30,22 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
>   */
>  #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
>  
> -#define REGISTER_VPCI_INIT(x, p)                \
> -  static vpci_register_init_t *const x##_entry  \
> -               __used_section(".data.vpci." p) = (x)
> +#define REGISTER_PCI_CAPABILITY(cap, finit, fclean, ext) \

REGISTER_VPCI_CAPABILITY() if possible (note the added V).

> +  static vpci_capability_t finit##_t = { \

static const?

> +        .id = (cap), \
> +        .init = (finit), \
> +        .cleanup = (fclean), \
> +        .is_ext = (ext), \

Indent should be 4 spaces here I think.

> +  }; \
> +  static vpci_capability_t *const finit##_entry  \
> +               __used_section(".data.vpci") = &finit##_t

IMO this should better use .rodata instead of .data.  Not that it
matters much in practice, as we place it in .rodata anyway.  Note
however you will have to move the placement of the VPCI_ARRAY in the
linker script ahead of *(.rodata.*), otherwise that section match will
consume the vPCI data.

> +
> +#define REGISTER_VPCI_CAP(cap, finit, fclean) \
> +                REGISTER_PCI_CAPABILITY(cap, finit, fclean, false)
> +#define REGISTER_VPCI_EXTCAP(cap, finit, fclean) \
> +                REGISTER_PCI_CAPABILITY(cap, finit, fclean, true)

Since you are modifying those, can use 4 spaces as indentation?
There's no need to have such padding.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-05 12:50   ` Roger Pau Monné
@ 2025-06-05 12:59     ` Jan Beulich
  2025-06-06  6:29     ` Chen, Jiqian
  1 sibling, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2025-06-05 12:59 UTC (permalink / raw)
  To: Roger Pau Monné, Jiqian Chen
  Cc: xen-devel, Huang Rui, Andrew Cooper, Anthony PERARD, Michal Orzel,
	Julien Grall, Stefano Stabellini

On 05.06.2025 14:50, Roger Pau Monné wrote:
> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote:
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -36,8 +36,8 @@ struct vpci_register {
>>  };
>>  
>>  #ifdef __XEN__
>> -extern vpci_register_init_t *const __start_vpci_array[];
>> -extern vpci_register_init_t *const __end_vpci_array[];
>> +extern vpci_capability_t *const __start_vpci_array[];
>> +extern vpci_capability_t *const __end_vpci_array[];

These also want to gain another const, to match ...

>> @@ -29,9 +30,22 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
>>   */
>>  #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
>>  
>> -#define REGISTER_VPCI_INIT(x, p)                \
>> -  static vpci_register_init_t *const x##_entry  \
>> -               __used_section(".data.vpci." p) = (x)
>> +#define REGISTER_PCI_CAPABILITY(cap, finit, fclean, ext) \
> 
> REGISTER_VPCI_CAPABILITY() if possible (note the added V).
> 
>> +  static vpci_capability_t finit##_t = { \
> 
> static const?

... this.

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 05/10] vpci: Hide legacy capability when it fails to initialize
  2025-05-26  9:45 ` [PATCH v5 05/10] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
@ 2025-06-05 13:35   ` Roger Pau Monné
  2025-06-06  7:03     ` Chen, Jiqian
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-05 13:35 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Mon, May 26, 2025 at 05:45:54PM +0800, Jiqian Chen wrote:
> When vpci fails to initialize a legacy capability of device, it just
> returns an error and vPCI gets disabled for the whole device.  That
> most likely renders the device unusable, plus possibly causing issues
> to Xen itself if guest attempts to program the native MSI or MSI-X
> capabilities if present.
> 
> So, add new function to hide legacy capability when initialization
> fails. And remove the failed legacy capability from the vpci emulated
> legacy capability list.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v4->v5 changes:
> * Modify vpci_get_register() to delete some unnecessary check, so that I don't need to move function vpci_register_cmp().
> * Rename vpci_capability_mask() to vpci_capability_hide().
> 
> v3->v4 changes:
> * Modify the commit message.
> * In function vpci_get_previous_cap_register(), add an ASSERT_UNREACHABLE() if offset below 0x40.
> * Modify vpci_capability_mask() to return error instead of using ASSERT.
> * Use vpci_remove_register to remove PCI_CAP_LIST_ID register instead of open code.
> * Add check "if ( !offset )" in vpci_capability_mask().
> 
> v2->v3 changes:
> * Separated from the last version patch "vpci: Hide capability when it fails to initialize"
> * Whole implementation changed because last version is wrong.
>   This version adds a new helper function vpci_get_register() and uses it to get
>   target handler and previous handler from vpci->handlers, then remove the target.
> 
> v1->v2 changes:
> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
>   remove failed capability from list.
> * Called vpci_make_msix_hole() in the end of init_msix().
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/vpci.c | 117 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 113 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 2861557e06d2..60e7654ec377 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -83,6 +83,99 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>  
> +static struct vpci_register *vpci_get_register(struct vpci *vpci,
> +                                               unsigned int offset,
> +                                               unsigned int size)
> +{
> +    struct vpci_register *rm;

Nit: I think you re-used part of the code from vpci_remove_register()
that names the local variable rm (because it's for removal).  Here it
would better to just name it 'r' (for register).

> +
> +    ASSERT(spin_is_locked(&vpci->lock));
> +
> +    list_for_each_entry ( rm, &vpci->handlers, node )
> +    {
> +        if ( rm->offset == offset && rm->size == size )
> +            return rm;
> +
> +        if ( offset <= rm->offset )
> +            break;
> +    }
> +
> +    return NULL;
> +}
> +
> +static struct vpci_register *vpci_get_previous_cap_register(
> +    struct vpci *vpci, unsigned int offset)
> +{
> +    uint32_t next;
> +    struct vpci_register *r;
> +
> +    if ( offset < 0x40 )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return NULL;
> +    }
> +
> +    r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1);
> +    if ( !r )
> +        return NULL;
> +
> +    next = (uint32_t)(uintptr_t)r->private;
> +    while ( next >= 0x40 && next != offset )
> +    {
> +        r = vpci_get_register(vpci, next + PCI_CAP_LIST_NEXT, 1);
> +        if ( !r )
> +            return NULL;
> +        next = (uint32_t)(uintptr_t)r->private;
> +    }
> +
> +    if ( next < 0x40 )
> +        return NULL;

The code below I think it's a bit simpler (and compact) by having a
single return instead of multiple ones:

for ( r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1); r;
      r = next >= 0x40 ? vpci_get_register(vpci,
                                           next + PCI_CAP_LIST_NEXT, 1)
                       : NULL )
{
    next = (uint32_t)(uintptr_t)r->private;
    ASSERT(next == (uintptr_t)r->private);
    if ( next == offset )
        break;
}

return r;

I haven't tested it however.

> +
> +    return r;
> +}
> +
> +static int vpci_capability_hide(struct pci_dev *pdev, unsigned int cap)
> +{
> +    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
> +    struct vpci_register *prev_next_r, *next_r;

I think it might be clear to just name this prev_r, next_r,
prev_next_r is IMO a bit confusing.  I understand it refers to the next
capability register in the previous capability, and I think prev_r
might be enough.

> +    struct vpci *vpci = pdev->vpci;
> +
> +    if ( !offset )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return 0;
> +    }
> +
> +    spin_lock(&vpci->lock);
> +    next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
> +    if ( !next_r )
> +    {
> +        spin_unlock(&vpci->lock);
> +        return -ENODEV;
> +    }
> +
> +    prev_next_r = vpci_get_previous_cap_register(vpci, offset);
> +    if ( !prev_next_r )
> +    {
> +        spin_unlock(&vpci->lock);
> +        return -ENODEV;
> +    }

You can join both the above into a single check IMO:

next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
prev_r = vpci_get_previous_cap_register(vpci, offset);
if ( !next_r || !prev_r )
{
    spin_unlock(&vpci->lock);
    return -ENODEV;
}

There's no benefit in using two equal error condition checks (just
makes the code longer)

> +
> +    prev_next_r->private = next_r->private;
> +    /*
> +     * Not calling vpci_remove_register() here is to avoid redoing
> +     * the register search
> +     */
> +    list_del(&next_r->node);
> +    spin_unlock(&vpci->lock);
> +    xfree(next_r);
> +
> +    if ( !is_hardware_domain(pdev->domain) )
> +        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
> +
> +    return 0;
> +}
> +
>  static int vpci_init_capabilities(struct pci_dev *pdev)
>  {
>      for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
> @@ -91,7 +184,6 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
>          const unsigned int cap = capability->id;
>          const bool is_ext = capability->is_ext;
>          unsigned int pos;
> -        int rc;
>  
>          if ( !is_ext )
>              pos = pci_find_cap_offset(pdev->sbdf, cap);
> @@ -103,9 +195,26 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
>          if ( !pos )
>              continue;
>  
> -        rc = capability->init(pdev);
> -        if ( rc )
> -            return rc;
> +        if ( capability->init(pdev) )

I think you want to store rc here to print it in the warning message
below.

> +        {
> +            int rc;
> +
> +            if ( capability->cleanup ) {
> +                rc = capability->cleanup(pdev);
> +                if ( rc )
> +                    return rc;

Here where both init and cleanup failed, you simply don't print any
message.

> +            }
> +
> +            printk(XENLOG_WARNING "%pd %pp: %s cap %u init fail, mask it\n",
> +                   pdev->domain, &pdev->sbdf,
> +                   is_ext ? "extended" : "legacy", cap);

I think this needs to be done ahead of the cleanup(), and print the
returned error code.  Overall we need messages printed when any of
those fails, as that makes it easier to debug when things go wrong.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 06/10] vpci: Hide extended capability when it fails to initialize
  2025-05-26  9:45 ` [PATCH v5 06/10] vpci: Hide extended " Jiqian Chen
@ 2025-06-05 14:47   ` Roger Pau Monné
  2025-06-06  8:30     ` Chen, Jiqian
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-05 14:47 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Huang Rui, Andrew Cooper, Anthony PERARD, Michal Orzel,
	Jan Beulich, Julien Grall, Stefano Stabellini

On Mon, May 26, 2025 at 05:45:55PM +0800, Jiqian Chen wrote:
> When vpci fails to initialize a extended capability of device, it
> just returns an error and vPCI gets disabled for the whole device.
> 
> So, add function to hide extended capability when initialization
> fails. And remove the failed extended capability handler from vpci
> extended capability list.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> cc: Andrew Cooper <andrew.cooper3@citrix.com>
> cc: Anthony PERARD <anthony.perard@vates.tech>
> cc: Michal Orzel <michal.orzel@amd.com>
> cc: Jan Beulich <jbeulich@suse.com>
> cc: Julien Grall <julien@xen.org>
> cc: Stefano Stabellini <sstabellini@kernel.org>
> ---
> v4->v5 changes:
> * Modify the hex digits of PCI_EXT_CAP_NEXT_MASK and PCI_EXT_CAP_NEXT to be low case.
> * Rename vpci_ext_capability_mask to vpci_ext_capability_hide.
> 
> v3->v4 changes:
> * Change definition of PCI_EXT_CAP_NEXT to be "#define PCI_EXT_CAP_NEXT(header) (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xFFCU)" to avoid redundancy.
> * Modify the commit message.
> * Change vpci_ext_capability_mask() to return error instead of using ASSERT.
> * Set the capability ID part to be zero when we need to hide the capability of position 0x100U.
> * Add check "if ( !offset )" in vpci_ext_capability_mask().
> 
> v2->v3 changes:
> * Separated from the last version patch "vpci: Hide capability when it fails to initialize".
> * Whole implementation changed because last version is wrong.
>   This version gets target handler and previous handler from vpci->handlers, then remove the target.
> * Note: a case in function vpci_ext_capability_mask() needs to be discussed,
>   because it may change the offset of next capability when the offset of target
>   capability is 0x100U(the first extended capability), my implementation is just to
>   ignore and let hardware to handle the target capability.
> 
> v1->v2 changes:
> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
>   remove failed capability from list.
> * Called vpci_make_msix_hole() in the end of init_msix().
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/vpci.c    | 100 +++++++++++++++++++++++++++++++++++--
>  xen/include/xen/pci_regs.h |   5 +-
>  2 files changed, 100 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 60e7654ec377..2d4794ff3dea 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -176,6 +176,98 @@ static int vpci_capability_hide(struct pci_dev *pdev, unsigned int cap)
>      return 0;
>  }
>  
> +static struct vpci_register *vpci_get_previous_ext_cap_register(
> +    struct vpci *vpci, unsigned int offset)
> +{
> +    uint32_t header;
> +    unsigned int pos = PCI_CFG_SPACE_SIZE;
> +    struct vpci_register *r;
> +
> +    if ( offset <= PCI_CFG_SPACE_SIZE )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return NULL;
> +    }
> +
> +    r = vpci_get_register(vpci, pos, 4);
> +    if ( !r )
> +        return NULL;
> +
> +    header = (uint32_t)(uintptr_t)r->private;
> +    pos = PCI_EXT_CAP_NEXT(header);
> +    while ( pos > PCI_CFG_SPACE_SIZE && pos != offset )
> +    {
> +        r = vpci_get_register(vpci, pos, 4);
> +        if ( !r )
> +            return NULL;
> +        header = (uint32_t)(uintptr_t)r->private;
> +        pos = PCI_EXT_CAP_NEXT(header);
> +    }
> +
> +    if ( pos <= PCI_CFG_SPACE_SIZE )
> +        return NULL;

Same comment as in the previous patch, I think the proposed for loop
there can also be used here to reduce a bit the code size (and unify
the return paths).

> +
> +    return r;
> +}
> +
> +static int vpci_ext_capability_hide(struct pci_dev *pdev, unsigned int cap)
> +{
> +    const unsigned int offset = pci_find_ext_capability(pdev->sbdf, cap);
> +    struct vpci_register *rm, *prev_r;

s/rm/r/

> +    struct vpci *vpci = pdev->vpci;
> +    uint32_t header, pre_header;
> +
> +    if ( !offset )

I think you want offset < PCI_CFG_SPACE_SIZE here?

> +    {
> +        ASSERT_UNREACHABLE();
> +        return 0;
> +    }
> +
> +    spin_lock(&vpci->lock);
> +    rm = vpci_get_register(vpci, offset, 4);
> +    if ( !rm )
> +    {
> +        spin_unlock(&vpci->lock);
> +        return -ENODEV;
> +    }
> +
> +    header = (uint32_t)(uintptr_t)rm->private;
> +    if ( offset == PCI_CFG_SPACE_SIZE )
> +    {
> +        if ( PCI_EXT_CAP_NEXT(header) <= PCI_CFG_SPACE_SIZE )
> +            rm->private = (void *)(uintptr_t)0;
> +        else
> +            /*
> +             * If this case removes target capability of position 0x100U, then
> +             * it needs to move the next capability to be in position 0x100U,
> +             * that would cause the offset of next capability in vpci different
> +             * from the hardware, then cause error accesses, so here chooses to
> +             * set the capability ID part to be zero.

/*
 * The first extended capability (0x100) cannot be removed from the linked
 * list, so instead mask its capability ID to return 0 and force OSes
 * to skip it.
 */

Is simpler IMO and conveys the same message.

> +             */
> +            rm->private = (void *)(uintptr_t)(header &
> +                                              ~PCI_EXT_CAP_ID(header));
> +
> +        spin_unlock(&vpci->lock);
> +        return 0;
> +    }
> +
> +    prev_r = vpci_get_previous_ext_cap_register(vpci, offset);
> +    if ( !prev_r )
> +    {
> +        spin_unlock(&vpci->lock);
> +        return -ENODEV;
> +    }
> +
> +    pre_header = (uint32_t)(uintptr_t)prev_r->private;
> +    prev_r->private = (void *)(uintptr_t)((pre_header &
> +                                           ~PCI_EXT_CAP_NEXT_MASK) |
> +                                          (header & PCI_EXT_CAP_NEXT_MASK));

No strong opinion (and your code is correct), but it might be easier
to read as:

pre_header &= ~PCI_EXT_CAP_NEXT_MASK;
pre_header |= header & PCI_EXT_CAP_NEXT_MASK;
prev_r->private = (void *)(uintptr_t)pre_header;

It's still tree lines of code at the end.  I would also add a newline
to separate from the removal of rm.

> +    list_del(&rm->node);
> +    spin_unlock(&vpci->lock);
> +    xfree(rm);

Newline before the return preferably.

> +    return 0;
> +}
> +
>  static int vpci_init_capabilities(struct pci_dev *pdev)
>  {
>      for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
> @@ -209,11 +301,11 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
>                     pdev->domain, &pdev->sbdf,
>                     is_ext ? "extended" : "legacy", cap);
>              if ( !is_ext )
> -            {
>                  rc = vpci_capability_hide(pdev, cap);
> -                if ( rc )
> -                    return rc;
> -            }
> +            else
> +                rc = vpci_ext_capability_hide(pdev, cap);
> +            if ( rc )
> +                return rc;

Could the code in the previous patch be:

if ( !is_ext )
    rc = vpci_capability_hide(pdev, cap);

if ( rc )
    return rc;

So that your introduction here is simpler?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 07/10] vpci: Refactor vpci_remove_register to remove matched registers
  2025-05-26  9:45 ` [PATCH v5 07/10] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
@ 2025-06-05 15:00   ` Roger Pau Monné
  0 siblings, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-05 15:00 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui, Anthony PERARD

On Mon, May 26, 2025 at 05:45:56PM +0800, Jiqian Chen wrote:
> vpci_remove_register() only supports removing a register in a time,
> but the follow-on changes need to remove all registers within a range.
> So, refactor it to support removing all matched registers in a calling

"So, refactor it to support removing all registers in a given region."

> time.
> 
> And it is no matter to remove a non exist register, so remove the

s/no matter/no issue/

> __must_check prefix.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 08/10] vpci/rebar: Free Rebar resources when init_rebar() fails
  2025-05-26  9:45 ` [PATCH v5 08/10] vpci/rebar: Free Rebar resources when init_rebar() fails Jiqian Chen
@ 2025-06-05 15:14   ` Roger Pau Monné
  2025-06-06  8:32     ` Chen, Jiqian
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-05 15:14 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Mon, May 26, 2025 at 05:45:57PM +0800, Jiqian Chen wrote:
> When init_rebar() fails, current logic return fail and free Rebar-related
> resources in vpci_deassign_device(). But the previous new changes will
> hide Rebar capability and return success, it can't reach
> vpci_deassign_device() to remove resources if hiding success, so those
> resources must be removed in cleanup function of Rebar.
> 
> To do that, implement cleanup function for Rebar.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>

LGTM, just one nit about a bounds check.

> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v4->v5 changes:
> * Change definition "static void cleanup_rebar" to "static int cf_check cleanup_rebar" since cleanup hook is changed to be int.
> 
> v3->v4 changes:
> * Change function name from fini_rebar() to cleanup_rebar().
> * Change the error number to be E2BIG and ENXIO in init_rebar().
> 
> v2->v3 changes:
> * Use fini_rebar() to remove all register instead of in the failure path of init_rebar();
> 
> v1->v2 changes:
> * Called vpci_remove_registers() to remove all possible registered registers instead of using a array to record all registered register.
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/rebar.c | 35 ++++++++++++++++++++++++-----------
>  1 file changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
> index 9cafd80ca2c9..4b1892fab3d6 100644
> --- a/xen/drivers/vpci/rebar.c
> +++ b/xen/drivers/vpci/rebar.c
> @@ -49,6 +49,26 @@ static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
>      bar->guest_addr = bar->addr;
>  }
>  
> +static int cf_check cleanup_rebar(struct pci_dev *pdev)
> +{
> +    uint32_t ctrl;
> +    unsigned int nbars;
> +    unsigned int rebar_offset = pci_find_ext_capability(pdev->sbdf,
> +                                                        PCI_EXT_CAP_ID_REBAR);
> +
> +    if ( !rebar_offset || !is_hardware_domain(pdev->domain) )

I think you could check rebar_offset < PCI_CFG_SPACE_SIZE to be more
accurate?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-26  9:45 ` [PATCH v5 09/10] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
@ 2025-06-05 15:19   ` Roger Pau Monné
  2025-06-06  8:38     ` Chen, Jiqian
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-05 15:19 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Mon, May 26, 2025 at 05:45:58PM +0800, Jiqian Chen wrote:
> When init_msi() fails, current logic return fail and free MSI-related
> resources in vpci_deassign_device(). But the previous new changes will
> hide MSI capability and return success, it can't reach
> vpci_deassign_device() to remove resources if hiding success, so those
> resources must be removed in cleanup function of MSI.
> 
> To do that, implement cleanup function for MSI.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v4->v5 changes:
> * Change definition "static void cleanup_msi" to "static int cf_check cleanup_msi" since cleanup hook is changed to be int.
> * Add a read-only register for MSI Control Register in the end of cleanup_msi.
> 
> v3->v4 changes:
> * Change function name from fini_msi() to cleanup_msi().
> * Remove unnecessary comment.
> * Change to use XFREE to free vpci->msi.
> 
> v2->v3 changes:
> * Remove all fail path, and use fini_msi() hook instead.
> * Change the method to calculating the size of msi registers.
> 
> v1->v2 changes:
> * Added a new function fini_msi to free all MSI resources instead of using an array to record registered registers.
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/msi.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> index 2d45c7867de7..4e106c39efae 100644
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -193,6 +193,33 @@ static void cf_check mask_write(
>      msi->mask = val;
>  }
>  
> +static int cf_check cleanup_msi(struct pci_dev *pdev)
> +{
> +    int rc;
> +    unsigned int end, size;
> +    struct vpci *vpci = pdev->vpci;
> +    const unsigned int msi_pos = pdev->msi_pos;
> +    const unsigned int ctrl = msi_control_reg(msi_pos);
> +
> +    if ( !msi_pos || !vpci->msi )

Possibly same request as the previous patch, msi_pos should be after
the PCI standard header.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] vpci/msix: Free MSIX resources when init_msix() fails
  2025-05-26  9:45 ` [PATCH v5 10/10] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
@ 2025-06-05 15:34   ` Roger Pau Monné
  2025-06-06  8:44     ` Chen, Jiqian
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-05 15:34 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Mon, May 26, 2025 at 05:45:59PM +0800, Jiqian Chen wrote:
> When init_msix() fails, current logic return fail and free MSIX-related
> resources in vpci_deassign_device(). But the previous new changes will
> hide MSIX capability and return success, it can't reach
> vpci_deassign_device() to remove resources if hiding success, so those
> resources must be removed in cleanup function of MSIX.
> 
> To do that, implement cleanup function for MSIX.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>



> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v4->v5 changes:
> * Change definition "static void cleanup_msix" to "static int cf_check cleanup_msix" since cleanup hook is changed to be int.
> * Add a read-only register for MSIX Control Register in the end of cleanup_msix().
> 
> v3->v4 changes:
> * Change function name from fini_msix() to cleanup_msix().
> * Change to use XFREE to free vpci->msix.
> * In cleanup function, change the sequence of check and remove action according to init_msix().
> 
> v2->v3 changes:
> * Remove unnecessary clean operations in fini_msix().
> 
> v1->v2 changes:
> new patch.
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/msix.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index 674815ead025..cf79320d3b6f 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -655,6 +655,33 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
>      return 0;
>  }
>  
> +static int cf_check cleanup_msix(struct pci_dev *pdev)
> +{
> +    int rc;
> +    struct vpci *vpci = pdev->vpci;
> +    const unsigned int msix_pos = pdev->msix_pos;
> +
> +    if ( !msix_pos )
> +        return 0;
> +
> +    rc = vpci_remove_registers(vpci, msix_control_reg(msix_pos), 2);
> +    if ( rc )
> +        return rc;
> +
> +    if ( !vpci->msix )
> +        return 0;

You cannot short-circuit here, as it then prevents adding the dummy
handlers done in the last return.

> +
> +    for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
> +        if ( vpci->msix->table[i] )
> +            iounmap(vpci->msix->table[i]);
> +
> +    list_del(&vpci->msix->next);
> +    XFREE(vpci->msix);
> +
> +    return vpci_add_register(pdev->vpci, vpci_hw_read16, NULL,
> +                             msix_control_reg(msix_pos), 2, NULL);

The above needs to be added even if !vpci->msix.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] vpci/header: Emulate extended capability list for dom0
  2025-06-05 11:24   ` Roger Pau Monné
@ 2025-06-06  4:05     ` Chen, Jiqian
  0 siblings, 0 replies; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-06  4:05 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/6/5 19:24, Roger Pau Monné wrote:
> On Mon, May 26, 2025 at 05:45:52PM +0800, Jiqian Chen wrote:
>> Add a new function to emulate extended capability list for dom0,
>> and call it in init_header(). So that it will be easy to hide a
>> extended capability whose initialization fails.
>>
>> As for the extended capability list of domU, just move the logic
>> into above function and keep hiding it for domU.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> ---
>> v4->v5 changes:
>> * Add check: if capability list of hardware has a overlap, print warning and return 0.
>>
>> v3->v4 changes:
>> * Add check "if ( !header )   return 0;" to avoid adding handler for
>>   device that has no extended capabilities.
>>
>> v2->v3 changes:
>> * In vpci_init_ext_capability_list(), when domain is domU, directly return after adding a handler(hiding all extended capability for domU).
>> * In vpci_init_ext_capability_list(), change condition to be "while ( pos >= 0x100U && ttl-- )" instead of "while ( pos && ttl-- )".
>> * Add new function vpci_hw_write32, and pass it to extended capability handler for dom0.
>>
>> v1->v2 changes:
>> new patch
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/header.c | 47 ++++++++++++++++++++++++++++++++-------
>>  xen/drivers/vpci/vpci.c   |  6 +++++
>>  xen/include/xen/vpci.h    |  2 ++
>>  3 files changed, 47 insertions(+), 8 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index d26cbba08ee1..4b2f761c9c24 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -836,6 +836,42 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>                                    PCI_STATUS_RSVDZ_MASK);
>>  }
>>  
>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>> +{
>> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
> 
> I think you can drop the ttl variable, as said by Jan in a previous
> review, the purpose of that counter is to detect overlaps in the PCIe
> config space, but for the context here the call to vpci_add_register()
> already serves that purpose by returning -EEXIST.
Will do in next version.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-05 12:50   ` Roger Pau Monné
  2025-06-05 12:59     ` Jan Beulich
@ 2025-06-06  6:29     ` Chen, Jiqian
  2025-06-06  7:05       ` Jan Beulich
  1 sibling, 1 reply; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-06  6:29 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Jan Beulich, Julien Grall,
	Stefano Stabellini, Chen, Jiqian

On 2025/6/5 20:50, Roger Pau Monné wrote:
> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
>> +  }; \
>> +  static vpci_capability_t *const finit##_entry  \
>> +               __used_section(".data.vpci") = &finit##_t
> 
> IMO this should better use .rodata instead of .data. 
Is below change correct?

+    static const vpci_capability_t *const finit##_entry  \
+        __used_section(".rodata") = &finit##_t

> Not that it matters much in practice, as we place it in .rodata anyway.  Note
> however you will have to move the placement of the VPCI_ARRAY in the
> linker script ahead of *(.rodata.*), otherwise that section match will
> consume the vPCI data.
I am sorry, how to move it ahead of *(.rodata.*) ?
Is below change correct?

diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
index 793d0e11450c..3817642135aa 100644
--- a/xen/include/xen/xen.lds.h
+++ b/xen/include/xen/xen.lds.h
@@ -188,7 +188,7 @@
 #define VPCI_ARRAY               \
        . = ALIGN(POINTER_ALIGN); \
        __start_vpci_array = .;   \
-       *(SORT(.data.vpci.*))     \
+       *(.rodata)             \
        __end_vpci_array = .;

> 
>> +
>> +#define REGISTER_VPCI_CAP(cap, finit, fclean) \
>> +                REGISTER_PCI_CAPABILITY(cap, finit, fclean, false)
>> +#define REGISTER_VPCI_EXTCAP(cap, finit, fclean) \
>> +                REGISTER_PCI_CAPABILITY(cap, finit, fclean, true)
> 
> Since you are modifying those, can use 4 spaces as indentation?
> There's no need to have such padding.
> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 05/10] vpci: Hide legacy capability when it fails to initialize
  2025-06-05 13:35   ` Roger Pau Monné
@ 2025-06-06  7:03     ` Chen, Jiqian
  2025-06-06  9:16       ` Roger Pau Monné
  0 siblings, 1 reply; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-06  7:03 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/6/5 21:35, Roger Pau Monné wrote:
> On Mon, May 26, 2025 at 05:45:54PM +0800, Jiqian Chen wrote:
>> When vpci fails to initialize a legacy capability of device, it just
>> returns an error and vPCI gets disabled for the whole device.  That
>> most likely renders the device unusable, plus possibly causing issues
>> to Xen itself if guest attempts to program the native MSI or MSI-X
>> capabilities if present.
>>
>> So, add new function to hide legacy capability when initialization
>> fails. And remove the failed legacy capability from the vpci emulated
>> legacy capability list.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> ---
>> v4->v5 changes:
>> * Modify vpci_get_register() to delete some unnecessary check, so that I don't need to move function vpci_register_cmp().
>> * Rename vpci_capability_mask() to vpci_capability_hide().
>>
>> v3->v4 changes:
>> * Modify the commit message.
>> * In function vpci_get_previous_cap_register(), add an ASSERT_UNREACHABLE() if offset below 0x40.
>> * Modify vpci_capability_mask() to return error instead of using ASSERT.
>> * Use vpci_remove_register to remove PCI_CAP_LIST_ID register instead of open code.
>> * Add check "if ( !offset )" in vpci_capability_mask().
>>
>> v2->v3 changes:
>> * Separated from the last version patch "vpci: Hide capability when it fails to initialize"
>> * Whole implementation changed because last version is wrong.
>>   This version adds a new helper function vpci_get_register() and uses it to get
>>   target handler and previous handler from vpci->handlers, then remove the target.
>>
>> v1->v2 changes:
>> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
>> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
>>   remove failed capability from list.
>> * Called vpci_make_msix_hole() in the end of init_msix().
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/vpci.c | 117 ++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 113 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 2861557e06d2..60e7654ec377 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -83,6 +83,99 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>>  
>>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>  
>> +static struct vpci_register *vpci_get_register(struct vpci *vpci,
>> +                                               unsigned int offset,
>> +                                               unsigned int size)
>> +{
>> +    struct vpci_register *rm;
> 
> Nit: I think you re-used part of the code from vpci_remove_register()
> that names the local variable rm (because it's for removal).  Here it
> would better to just name it 'r' (for register).
Will change.
> 
>> +
>> +    ASSERT(spin_is_locked(&vpci->lock));
>> +
>> +    list_for_each_entry ( rm, &vpci->handlers, node )
>> +    {
>> +        if ( rm->offset == offset && rm->size == size )
>> +            return rm;
>> +
>> +        if ( offset <= rm->offset )
>> +            break;
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +static struct vpci_register *vpci_get_previous_cap_register(
>> +    struct vpci *vpci, unsigned int offset)
>> +{
>> +    uint32_t next;
>> +    struct vpci_register *r;
>> +
>> +    if ( offset < 0x40 )
>> +    {
>> +        ASSERT_UNREACHABLE();
>> +        return NULL;
>> +    }
>> +
>> +    r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1);
>> +    if ( !r )
>> +        return NULL;
>> +
>> +    next = (uint32_t)(uintptr_t)r->private;
>> +    while ( next >= 0x40 && next != offset )
>> +    {
>> +        r = vpci_get_register(vpci, next + PCI_CAP_LIST_NEXT, 1);
>> +        if ( !r )
>> +            return NULL;
>> +        next = (uint32_t)(uintptr_t)r->private;
>> +    }
>> +
>> +    if ( next < 0x40 )
>> +        return NULL;
> 
> The code below I think it's a bit simpler (and compact) by having a
> single return instead of multiple ones:
> 
> for ( r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1); r;
>       r = next >= 0x40 ? vpci_get_register(vpci,
>                                            next + PCI_CAP_LIST_NEXT, 1)
>                        : NULL )
> {
>     next = (uint32_t)(uintptr_t)r->private;
>     ASSERT(next == (uintptr_t)r->private);
Why need this ASSERT here?

>     if ( next == offset )
>         break;
> }
> 
> return r;
> 
> I haven't tested it however.
Will change and test.

> 
>> +
>> +    return r;
>> +}
>> +
>> +static int vpci_capability_hide(struct pci_dev *pdev, unsigned int cap)
>> +{
>> +    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
>> +    struct vpci_register *prev_next_r, *next_r;
> 
> I think it might be clear to just name this prev_r, next_r,
> prev_next_r is IMO a bit confusing.  I understand it refers to the next
> capability register in the previous capability, and I think prev_r
> might be enough.
Will change.
> 
>> +    struct vpci *vpci = pdev->vpci;
>> +
>> +    if ( !offset )
>> +    {
>> +        ASSERT_UNREACHABLE();
>> +        return 0;
>> +    }
>> +
>> +    spin_lock(&vpci->lock);
>> +    next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
>> +    if ( !next_r )
>> +    {
>> +        spin_unlock(&vpci->lock);
>> +        return -ENODEV;
>> +    }
>> +
>> +    prev_next_r = vpci_get_previous_cap_register(vpci, offset);
>> +    if ( !prev_next_r )
>> +    {
>> +        spin_unlock(&vpci->lock);
>> +        return -ENODEV;
>> +    }
> 
> You can join both the above into a single check IMO:
> 
> next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
> prev_r = vpci_get_previous_cap_register(vpci, offset);
> if ( !next_r || !prev_r )
> {
>     spin_unlock(&vpci->lock);
>     return -ENODEV;
> }
> 
> There's no benefit in using two equal error condition checks (just
> makes the code longer)
Will change.
> 
>> +
>> +    prev_next_r->private = next_r->private;
>> +    /*
>> +     * Not calling vpci_remove_register() here is to avoid redoing
>> +     * the register search
>> +     */
>> +    list_del(&next_r->node);
>> +    spin_unlock(&vpci->lock);
>> +    xfree(next_r);
>> +
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
>> +
>> +    return 0;
>> +}
>> +
>>  static int vpci_init_capabilities(struct pci_dev *pdev)
>>  {
>>      for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
>> @@ -91,7 +184,6 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
>>          const unsigned int cap = capability->id;
>>          const bool is_ext = capability->is_ext;
>>          unsigned int pos;
>> -        int rc;
>>  
>>          if ( !is_ext )
>>              pos = pci_find_cap_offset(pdev->sbdf, cap);
>> @@ -103,9 +195,26 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
>>          if ( !pos )
>>              continue;
>>  
>> -        rc = capability->init(pdev);
>> -        if ( rc )
>> -            return rc;
>> +        if ( capability->init(pdev) )
> 
> I think you want to store rc here to print it in the warning message
> below.
> 
>> +        {
>> +            int rc;
>> +
>> +            if ( capability->cleanup ) {
>> +                rc = capability->cleanup(pdev);
>> +                if ( rc )
>> +                    return rc;
> 
> Here where both init and cleanup failed, you simply don't print any
> message.
Got it , will print message when init, cleanup and hide fail.

> 
>> +            }
>> +
>> +            printk(XENLOG_WARNING "%pd %pp: %s cap %u init fail, mask it\n",
>> +                   pdev->domain, &pdev->sbdf,
>> +                   is_ext ? "extended" : "legacy", cap);
> 
> I think this needs to be done ahead of the cleanup(), and print the
> returned error code.  Overall we need messages printed when any of
> those fails, as that makes it easier to debug when things go wrong.
> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-06  6:29     ` Chen, Jiqian
@ 2025-06-06  7:05       ` Jan Beulich
  2025-06-06  9:14         ` Roger Pau Monné
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2025-06-06  7:05 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Julien Grall, Stefano Stabellini,
	Roger Pau Monné

On 06.06.2025 08:29, Chen, Jiqian wrote:
> On 2025/6/5 20:50, Roger Pau Monné wrote:
>> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
>>> +  }; \
>>> +  static vpci_capability_t *const finit##_entry  \
>>> +               __used_section(".data.vpci") = &finit##_t
>>
>> IMO this should better use .rodata instead of .data. 
> Is below change correct?
> 
> +    static const vpci_capability_t *const finit##_entry  \
> +        __used_section(".rodata") = &finit##_t

No, specifically because ...

>> Not that it matters much in practice, as we place it in .rodata anyway.  Note
>> however you will have to move the placement of the VPCI_ARRAY in the
>> linker script ahead of *(.rodata.*), otherwise that section match will
>> consume the vPCI data.
> I am sorry, how to move it ahead of *(.rodata.*) ?
> Is below change correct?
> 
> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
> index 793d0e11450c..3817642135aa 100644
> --- a/xen/include/xen/xen.lds.h
> +++ b/xen/include/xen/xen.lds.h
> @@ -188,7 +188,7 @@
>  #define VPCI_ARRAY               \
>         . = ALIGN(POINTER_ALIGN); \
>         __start_vpci_array = .;   \
> -       *(SORT(.data.vpci.*))     \
> +       *(.rodata)             \

... this isn't - you'd move _all_ of .rodata into here, which definitely
isn't what you want. What I understand Roger meant was a .rodata-like
section, e.g. .rodata.vpci.* (much like it was .data.vpci.* before).

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 06/10] vpci: Hide extended capability when it fails to initialize
  2025-06-05 14:47   ` Roger Pau Monné
@ 2025-06-06  8:30     ` Chen, Jiqian
  2025-06-06  9:18       ` Roger Pau Monné
  0 siblings, 1 reply; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-06  8:30 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Jan Beulich, Julien Grall,
	Stefano Stabellini, Chen, Jiqian

On 2025/6/5 22:47, Roger Pau Monné wrote:
> On Mon, May 26, 2025 at 05:45:55PM +0800, Jiqian Chen wrote:
>> When vpci fails to initialize a extended capability of device, it
>> just returns an error and vPCI gets disabled for the whole device.
>>
>> So, add function to hide extended capability when initialization
>> fails. And remove the failed extended capability handler from vpci
>> extended capability list.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> cc: Anthony PERARD <anthony.perard@vates.tech>
>> cc: Michal Orzel <michal.orzel@amd.com>
>> cc: Jan Beulich <jbeulich@suse.com>
>> cc: Julien Grall <julien@xen.org>
>> cc: Stefano Stabellini <sstabellini@kernel.org>
>> ---
>> v4->v5 changes:
>> * Modify the hex digits of PCI_EXT_CAP_NEXT_MASK and PCI_EXT_CAP_NEXT to be low case.
>> * Rename vpci_ext_capability_mask to vpci_ext_capability_hide.
>>
>> v3->v4 changes:
>> * Change definition of PCI_EXT_CAP_NEXT to be "#define PCI_EXT_CAP_NEXT(header) (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xFFCU)" to avoid redundancy.
>> * Modify the commit message.
>> * Change vpci_ext_capability_mask() to return error instead of using ASSERT.
>> * Set the capability ID part to be zero when we need to hide the capability of position 0x100U.
>> * Add check "if ( !offset )" in vpci_ext_capability_mask().
>>
>> v2->v3 changes:
>> * Separated from the last version patch "vpci: Hide capability when it fails to initialize".
>> * Whole implementation changed because last version is wrong.
>>   This version gets target handler and previous handler from vpci->handlers, then remove the target.
>> * Note: a case in function vpci_ext_capability_mask() needs to be discussed,
>>   because it may change the offset of next capability when the offset of target
>>   capability is 0x100U(the first extended capability), my implementation is just to
>>   ignore and let hardware to handle the target capability.
>>
>> v1->v2 changes:
>> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
>> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
>>   remove failed capability from list.
>> * Called vpci_make_msix_hole() in the end of init_msix().
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/vpci.c    | 100 +++++++++++++++++++++++++++++++++++--
>>  xen/include/xen/pci_regs.h |   5 +-
>>  2 files changed, 100 insertions(+), 5 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 60e7654ec377..2d4794ff3dea 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -176,6 +176,98 @@ static int vpci_capability_hide(struct pci_dev *pdev, unsigned int cap)
>>      return 0;
>>  }
>>  
>> +static struct vpci_register *vpci_get_previous_ext_cap_register(
>> +    struct vpci *vpci, unsigned int offset)
>> +{
>> +    uint32_t header;
>> +    unsigned int pos = PCI_CFG_SPACE_SIZE;
>> +    struct vpci_register *r;
>> +
>> +    if ( offset <= PCI_CFG_SPACE_SIZE )
>> +    {
>> +        ASSERT_UNREACHABLE();
>> +        return NULL;
>> +    }
>> +
>> +    r = vpci_get_register(vpci, pos, 4);
>> +    if ( !r )
>> +        return NULL;
>> +
>> +    header = (uint32_t)(uintptr_t)r->private;
>> +    pos = PCI_EXT_CAP_NEXT(header);
>> +    while ( pos > PCI_CFG_SPACE_SIZE && pos != offset )
>> +    {
>> +        r = vpci_get_register(vpci, pos, 4);
>> +        if ( !r )
>> +            return NULL;
>> +        header = (uint32_t)(uintptr_t)r->private;
>> +        pos = PCI_EXT_CAP_NEXT(header);
>> +    }
>> +
>> +    if ( pos <= PCI_CFG_SPACE_SIZE )
>> +        return NULL;
> 
> Same comment as in the previous patch, I think the proposed for loop
> there can also be used here to reduce a bit the code size (and unify
> the return paths).
> 
>> +
>> +    return r;
>> +}
>> +
>> +static int vpci_ext_capability_hide(struct pci_dev *pdev, unsigned int cap)
>> +{
>> +    const unsigned int offset = pci_find_ext_capability(pdev->sbdf, cap);
>> +    struct vpci_register *rm, *prev_r;
> 
> s/rm/r/
> 
>> +    struct vpci *vpci = pdev->vpci;
>> +    uint32_t header, pre_header;
>> +
>> +    if ( !offset )
> 
> I think you want offset < PCI_CFG_SPACE_SIZE here?
> 
>> +    {
>> +        ASSERT_UNREACHABLE();
>> +        return 0;
>> +    }
>> +
>> +    spin_lock(&vpci->lock);
>> +    rm = vpci_get_register(vpci, offset, 4);
>> +    if ( !rm )
>> +    {
>> +        spin_unlock(&vpci->lock);
>> +        return -ENODEV;
>> +    }
>> +
>> +    header = (uint32_t)(uintptr_t)rm->private;
>> +    if ( offset == PCI_CFG_SPACE_SIZE )
>> +    {
>> +        if ( PCI_EXT_CAP_NEXT(header) <= PCI_CFG_SPACE_SIZE )
>> +            rm->private = (void *)(uintptr_t)0;
>> +        else
>> +            /*
>> +             * If this case removes target capability of position 0x100U, then
>> +             * it needs to move the next capability to be in position 0x100U,
>> +             * that would cause the offset of next capability in vpci different
>> +             * from the hardware, then cause error accesses, so here chooses to
>> +             * set the capability ID part to be zero.
> 
> /*
>  * The first extended capability (0x100) cannot be removed from the linked
>  * list, so instead mask its capability ID to return 0 and force OSes
>  * to skip it.
>  */
> 
> Is simpler IMO and conveys the same message.
> 
>> +             */
>> +            rm->private = (void *)(uintptr_t)(header &
>> +                                              ~PCI_EXT_CAP_ID(header));
>> +
>> +        spin_unlock(&vpci->lock);
>> +        return 0;
>> +    }
>> +
>> +    prev_r = vpci_get_previous_ext_cap_register(vpci, offset);
>> +    if ( !prev_r )
>> +    {
>> +        spin_unlock(&vpci->lock);
>> +        return -ENODEV;
>> +    }
>> +
>> +    pre_header = (uint32_t)(uintptr_t)prev_r->private;
>> +    prev_r->private = (void *)(uintptr_t)((pre_header &
>> +                                           ~PCI_EXT_CAP_NEXT_MASK) |
>> +                                          (header & PCI_EXT_CAP_NEXT_MASK));
> 
> No strong opinion (and your code is correct), but it might be easier
> to read as:
> 
> pre_header &= ~PCI_EXT_CAP_NEXT_MASK;
> pre_header |= header & PCI_EXT_CAP_NEXT_MASK;
> prev_r->private = (void *)(uintptr_t)pre_header;
> 
> It's still tree lines of code at the end.  I would also add a newline
> to separate from the removal of rm.
> 
>> +    list_del(&rm->node);
>> +    spin_unlock(&vpci->lock);
>> +    xfree(rm);
> 
> Newline before the return preferably.
Will change my patch according to all your comments.

> 
>> +    return 0;
>> +}
>> +
>>  static int vpci_init_capabilities(struct pci_dev *pdev)
>>  {
>>      for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
>> @@ -209,11 +301,11 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
>>                     pdev->domain, &pdev->sbdf,
>>                     is_ext ? "extended" : "legacy", cap);
>>              if ( !is_ext )
>> -            {
>>                  rc = vpci_capability_hide(pdev, cap);
>> -                if ( rc )
>> -                    return rc;
>> -            }
>> +            else
>> +                rc = vpci_ext_capability_hide(pdev, cap);
>> +            if ( rc )
>> +                return rc;
> 
> Could the code in the previous patch be:
> 
> if ( !is_ext )
>     rc = vpci_capability_hide(pdev, cap);
> 
> if ( rc )
>     return rc;
> 
> So that your introduction here is simpler?
OK, but the logic of the previous patch will become a little strange.
Anyway, the strange will disappear after applying this patch.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 08/10] vpci/rebar: Free Rebar resources when init_rebar() fails
  2025-06-05 15:14   ` Roger Pau Monné
@ 2025-06-06  8:32     ` Chen, Jiqian
  2025-06-06  9:20       ` Roger Pau Monné
  0 siblings, 1 reply; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-06  8:32 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/6/5 23:14, Roger Pau Monné wrote:
> On Mon, May 26, 2025 at 05:45:57PM +0800, Jiqian Chen wrote:
>> When init_rebar() fails, current logic return fail and free Rebar-related
>> resources in vpci_deassign_device(). But the previous new changes will
>> hide Rebar capability and return success, it can't reach
>> vpci_deassign_device() to remove resources if hiding success, so those
>> resources must be removed in cleanup function of Rebar.
>>
>> To do that, implement cleanup function for Rebar.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> 
> LGTM, just one nit about a bounds check.
> 
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> ---
>> v4->v5 changes:
>> * Change definition "static void cleanup_rebar" to "static int cf_check cleanup_rebar" since cleanup hook is changed to be int.
>>
>> v3->v4 changes:
>> * Change function name from fini_rebar() to cleanup_rebar().
>> * Change the error number to be E2BIG and ENXIO in init_rebar().
>>
>> v2->v3 changes:
>> * Use fini_rebar() to remove all register instead of in the failure path of init_rebar();
>>
>> v1->v2 changes:
>> * Called vpci_remove_registers() to remove all possible registered registers instead of using a array to record all registered register.
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/rebar.c | 35 ++++++++++++++++++++++++-----------
>>  1 file changed, 24 insertions(+), 11 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
>> index 9cafd80ca2c9..4b1892fab3d6 100644
>> --- a/xen/drivers/vpci/rebar.c
>> +++ b/xen/drivers/vpci/rebar.c
>> @@ -49,6 +49,26 @@ static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
>>      bar->guest_addr = bar->addr;
>>  }
>>  
>> +static int cf_check cleanup_rebar(struct pci_dev *pdev)
>> +{
>> +    uint32_t ctrl;
>> +    unsigned int nbars;
>> +    unsigned int rebar_offset = pci_find_ext_capability(pdev->sbdf,
>> +                                                        PCI_EXT_CAP_ID_REBAR);
>> +
>> +    if ( !rebar_offset || !is_hardware_domain(pdev->domain) )
> 
> I think you could check rebar_offset < PCI_CFG_SPACE_SIZE to be more
> accurate?
OK.
Do I need to change in init_rebar()?

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-06-05 15:19   ` Roger Pau Monné
@ 2025-06-06  8:38     ` Chen, Jiqian
  2025-06-06  9:21       ` Roger Pau Monné
  0 siblings, 1 reply; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-06  8:38 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/6/5 23:19, Roger Pau Monné wrote:
> On Mon, May 26, 2025 at 05:45:58PM +0800, Jiqian Chen wrote:
>> When init_msi() fails, current logic return fail and free MSI-related
>> resources in vpci_deassign_device(). But the previous new changes will
>> hide MSI capability and return success, it can't reach
>> vpci_deassign_device() to remove resources if hiding success, so those
>> resources must be removed in cleanup function of MSI.
>>
>> To do that, implement cleanup function for MSI.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> ---
>> v4->v5 changes:
>> * Change definition "static void cleanup_msi" to "static int cf_check cleanup_msi" since cleanup hook is changed to be int.
>> * Add a read-only register for MSI Control Register in the end of cleanup_msi.
>>
>> v3->v4 changes:
>> * Change function name from fini_msi() to cleanup_msi().
>> * Remove unnecessary comment.
>> * Change to use XFREE to free vpci->msi.
>>
>> v2->v3 changes:
>> * Remove all fail path, and use fini_msi() hook instead.
>> * Change the method to calculating the size of msi registers.
>>
>> v1->v2 changes:
>> * Added a new function fini_msi to free all MSI resources instead of using an array to record registered registers.
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/msi.c | 29 ++++++++++++++++++++++++++++-
>>  1 file changed, 28 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
>> index 2d45c7867de7..4e106c39efae 100644
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -193,6 +193,33 @@ static void cf_check mask_write(
>>      msi->mask = val;
>>  }
>>  
>> +static int cf_check cleanup_msi(struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +    unsigned int end, size;
>> +    struct vpci *vpci = pdev->vpci;
>> +    const unsigned int msi_pos = pdev->msi_pos;
>> +    const unsigned int ctrl = msi_control_reg(msi_pos);
>> +
>> +    if ( !msi_pos || !vpci->msi )
> 
> Possibly same request as the previous patch, msi_pos should be after
> the PCI standard header.
msi_pos <= PCI_CAPABILITY_LIST ?
Or msi_pos < 0x40 ?

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] vpci/msix: Free MSIX resources when init_msix() fails
  2025-06-05 15:34   ` Roger Pau Monné
@ 2025-06-06  8:44     ` Chen, Jiqian
  2025-06-06  9:23       ` Roger Pau Monné
  0 siblings, 1 reply; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-06  8:44 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/6/5 23:34, Roger Pau Monné wrote:
> On Mon, May 26, 2025 at 05:45:59PM +0800, Jiqian Chen wrote:
>> When init_msix() fails, current logic return fail and free MSIX-related
>> resources in vpci_deassign_device(). But the previous new changes will
>> hide MSIX capability and return success, it can't reach
>> vpci_deassign_device() to remove resources if hiding success, so those
>> resources must be removed in cleanup function of MSIX.
>>
>> To do that, implement cleanup function for MSIX.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> 
> 
> 
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> ---
>> v4->v5 changes:
>> * Change definition "static void cleanup_msix" to "static int cf_check cleanup_msix" since cleanup hook is changed to be int.
>> * Add a read-only register for MSIX Control Register in the end of cleanup_msix().
>>
>> v3->v4 changes:
>> * Change function name from fini_msix() to cleanup_msix().
>> * Change to use XFREE to free vpci->msix.
>> * In cleanup function, change the sequence of check and remove action according to init_msix().
>>
>> v2->v3 changes:
>> * Remove unnecessary clean operations in fini_msix().
>>
>> v1->v2 changes:
>> new patch.
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/msix.c | 29 ++++++++++++++++++++++++++++-
>>  1 file changed, 28 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>> index 674815ead025..cf79320d3b6f 100644
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -655,6 +655,33 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
>>      return 0;
>>  }
>>  
>> +static int cf_check cleanup_msix(struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +    struct vpci *vpci = pdev->vpci;
>> +    const unsigned int msix_pos = pdev->msix_pos;
>> +
>> +    if ( !msix_pos )
Need to change this check as previous patch, I think.

>> +        return 0;
>> +
>> +    rc = vpci_remove_registers(vpci, msix_control_reg(msix_pos), 2);
>> +    if ( rc )
>> +        return rc;
>> +
>> +    if ( !vpci->msix )
>> +        return 0;
> 
> You cannot short-circuit here, as it then prevents adding the dummy
> handlers done in the last return.
> 
>> +
>> +    for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
>> +        if ( vpci->msix->table[i] )
>> +            iounmap(vpci->msix->table[i]);
>> +
>> +    list_del(&vpci->msix->next);
>> +    XFREE(vpci->msix);
>> +
>> +    return vpci_add_register(pdev->vpci, vpci_hw_read16, NULL,
>> +                             msix_control_reg(msix_pos), 2, NULL);
> 
> The above needs to be added even if !vpci->msix.
Oh, right.
I will change to

    if ( vpci->msix )
    {
        for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
            if ( vpci->msix->table[i] )
                iounmap(vpci->msix->table[i]);

        list_del(&vpci->msix->next);
        XFREE(vpci->msix);
    }

    return vpci_add_register(pdev->vpci, vpci_hw_read16, NULL,
                             msix_control_reg(msix_pos), 2, NULL);


> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-06  7:05       ` Jan Beulich
@ 2025-06-06  9:14         ` Roger Pau Monné
  2025-06-09  7:50           ` Chen, Jiqian
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-06  9:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Chen, Jiqian, xen-devel@lists.xenproject.org, Huang, Ray,
	Andrew Cooper, Anthony PERARD, Orzel, Michal, Julien Grall,
	Stefano Stabellini

On Fri, Jun 06, 2025 at 09:05:48AM +0200, Jan Beulich wrote:
> On 06.06.2025 08:29, Chen, Jiqian wrote:
> > On 2025/6/5 20:50, Roger Pau Monné wrote:
> >> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
> >>> +  }; \
> >>> +  static vpci_capability_t *const finit##_entry  \
> >>> +               __used_section(".data.vpci") = &finit##_t
> >>
> >> IMO this should better use .rodata instead of .data. 
> > Is below change correct?
> > 
> > +    static const vpci_capability_t *const finit##_entry  \
> > +        __used_section(".rodata") = &finit##_t
> 
> No, specifically because ...
> 
> >> Not that it matters much in practice, as we place it in .rodata anyway.  Note
> >> however you will have to move the placement of the VPCI_ARRAY in the
> >> linker script ahead of *(.rodata.*), otherwise that section match will
> >> consume the vPCI data.
> > I am sorry, how to move it ahead of *(.rodata.*) ?
> > Is below change correct?
> > 
> > diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
> > index 793d0e11450c..3817642135aa 100644
> > --- a/xen/include/xen/xen.lds.h
> > +++ b/xen/include/xen/xen.lds.h
> > @@ -188,7 +188,7 @@
> >  #define VPCI_ARRAY               \
> >         . = ALIGN(POINTER_ALIGN); \
> >         __start_vpci_array = .;   \
> > -       *(SORT(.data.vpci.*))     \
> > +       *(.rodata)             \
> 
> ... this isn't - you'd move _all_ of .rodata into here, which definitely
> isn't what you want. What I understand Roger meant was a .rodata-like
> section, e.g. .rodata.vpci.* (much like it was .data.vpci.* before).

Indeed, my suggestion was merely to use .rodata instead of .data, as
that's more accurate IMO.  I think it should be *(.rodata.vpci) (and
same section change for the __used_section() attribute.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 05/10] vpci: Hide legacy capability when it fails to initialize
  2025-06-06  7:03     ` Chen, Jiqian
@ 2025-06-06  9:16       ` Roger Pau Monné
  0 siblings, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-06  9:16 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: xen-devel@lists.xenproject.org, Huang, Ray

On Fri, Jun 06, 2025 at 07:03:02AM +0000, Chen, Jiqian wrote:
> On 2025/6/5 21:35, Roger Pau Monné wrote:
> > On Mon, May 26, 2025 at 05:45:54PM +0800, Jiqian Chen wrote:
> >> When vpci fails to initialize a legacy capability of device, it just
> >> returns an error and vPCI gets disabled for the whole device.  That
> >> most likely renders the device unusable, plus possibly causing issues
> >> to Xen itself if guest attempts to program the native MSI or MSI-X
> >> capabilities if present.
> >>
> >> So, add new function to hide legacy capability when initialization
> >> fails. And remove the failed legacy capability from the vpci emulated
> >> legacy capability list.
> >>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >> ---
> >> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> >> ---
> >> v4->v5 changes:
> >> * Modify vpci_get_register() to delete some unnecessary check, so that I don't need to move function vpci_register_cmp().
> >> * Rename vpci_capability_mask() to vpci_capability_hide().
> >>
> >> v3->v4 changes:
> >> * Modify the commit message.
> >> * In function vpci_get_previous_cap_register(), add an ASSERT_UNREACHABLE() if offset below 0x40.
> >> * Modify vpci_capability_mask() to return error instead of using ASSERT.
> >> * Use vpci_remove_register to remove PCI_CAP_LIST_ID register instead of open code.
> >> * Add check "if ( !offset )" in vpci_capability_mask().
> >>
> >> v2->v3 changes:
> >> * Separated from the last version patch "vpci: Hide capability when it fails to initialize"
> >> * Whole implementation changed because last version is wrong.
> >>   This version adds a new helper function vpci_get_register() and uses it to get
> >>   target handler and previous handler from vpci->handlers, then remove the target.
> >>
> >> v1->v2 changes:
> >> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
> >> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
> >>   remove failed capability from list.
> >> * Called vpci_make_msix_hole() in the end of init_msix().
> >>
> >> Best regards,
> >> Jiqian Chen.
> >> ---
> >>  xen/drivers/vpci/vpci.c | 117 ++++++++++++++++++++++++++++++++++++++--
> >>  1 file changed, 113 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> >> index 2861557e06d2..60e7654ec377 100644
> >> --- a/xen/drivers/vpci/vpci.c
> >> +++ b/xen/drivers/vpci/vpci.c
> >> @@ -83,6 +83,99 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
> >>  
> >>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
> >>  
> >> +static struct vpci_register *vpci_get_register(struct vpci *vpci,
> >> +                                               unsigned int offset,
> >> +                                               unsigned int size)
> >> +{
> >> +    struct vpci_register *rm;
> > 
> > Nit: I think you re-used part of the code from vpci_remove_register()
> > that names the local variable rm (because it's for removal).  Here it
> > would better to just name it 'r' (for register).
> Will change.
> > 
> >> +
> >> +    ASSERT(spin_is_locked(&vpci->lock));
> >> +
> >> +    list_for_each_entry ( rm, &vpci->handlers, node )
> >> +    {
> >> +        if ( rm->offset == offset && rm->size == size )
> >> +            return rm;
> >> +
> >> +        if ( offset <= rm->offset )
> >> +            break;
> >> +    }
> >> +
> >> +    return NULL;
> >> +}
> >> +
> >> +static struct vpci_register *vpci_get_previous_cap_register(
> >> +    struct vpci *vpci, unsigned int offset)
> >> +{
> >> +    uint32_t next;
> >> +    struct vpci_register *r;
> >> +
> >> +    if ( offset < 0x40 )
> >> +    {
> >> +        ASSERT_UNREACHABLE();
> >> +        return NULL;
> >> +    }
> >> +
> >> +    r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1);
> >> +    if ( !r )
> >> +        return NULL;
> >> +
> >> +    next = (uint32_t)(uintptr_t)r->private;
> >> +    while ( next >= 0x40 && next != offset )
> >> +    {
> >> +        r = vpci_get_register(vpci, next + PCI_CAP_LIST_NEXT, 1);
> >> +        if ( !r )
> >> +            return NULL;
> >> +        next = (uint32_t)(uintptr_t)r->private;
> >> +    }
> >> +
> >> +    if ( next < 0x40 )
> >> +        return NULL;
> > 
> > The code below I think it's a bit simpler (and compact) by having a
> > single return instead of multiple ones:
> > 
> > for ( r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1); r;
> >       r = next >= 0x40 ? vpci_get_register(vpci,
> >                                            next + PCI_CAP_LIST_NEXT, 1)
> >                        : NULL )
> > {
> >     next = (uint32_t)(uintptr_t)r->private;
> >     ASSERT(next == (uintptr_t)r->private);
> Why need this ASSERT here?

Extra safety to ensure the value is not truncated (which will indicate
something is off).  I would not argue strongly for it to be added if
you don't think it's needed.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 06/10] vpci: Hide extended capability when it fails to initialize
  2025-06-06  8:30     ` Chen, Jiqian
@ 2025-06-06  9:18       ` Roger Pau Monné
  0 siblings, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-06  9:18 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Jan Beulich, Julien Grall,
	Stefano Stabellini

On Fri, Jun 06, 2025 at 08:30:42AM +0000, Chen, Jiqian wrote:
> On 2025/6/5 22:47, Roger Pau Monné wrote:
> > On Mon, May 26, 2025 at 05:45:55PM +0800, Jiqian Chen wrote:
> >> @@ -209,11 +301,11 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
> >>                     pdev->domain, &pdev->sbdf,
> >>                     is_ext ? "extended" : "legacy", cap);
> >>              if ( !is_ext )
> >> -            {
> >>                  rc = vpci_capability_hide(pdev, cap);
> >> -                if ( rc )
> >> -                    return rc;
> >> -            }
> >> +            else
> >> +                rc = vpci_ext_capability_hide(pdev, cap);
> >> +            if ( rc )
> >> +                return rc;
> > 
> > Could the code in the previous patch be:
> > 
> > if ( !is_ext )
> >     rc = vpci_capability_hide(pdev, cap);
> > 
> > if ( rc )
> >     return rc;
> > 
> > So that your introduction here is simpler?
> OK, but the logic of the previous patch will become a little strange.
> Anyway, the strange will disappear after applying this patch.

No strong opinion really, was mostly a recommendation to avoid extra
changes here.  In a series it's best if you try to arrange the code so
that it's only modified once (if possible, obviously).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 08/10] vpci/rebar: Free Rebar resources when init_rebar() fails
  2025-06-06  8:32     ` Chen, Jiqian
@ 2025-06-06  9:20       ` Roger Pau Monné
  0 siblings, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-06  9:20 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: xen-devel@lists.xenproject.org, Huang, Ray

On Fri, Jun 06, 2025 at 08:32:35AM +0000, Chen, Jiqian wrote:
> On 2025/6/5 23:14, Roger Pau Monné wrote:
> > On Mon, May 26, 2025 at 05:45:57PM +0800, Jiqian Chen wrote:
> >> When init_rebar() fails, current logic return fail and free Rebar-related
> >> resources in vpci_deassign_device(). But the previous new changes will
> >> hide Rebar capability and return success, it can't reach
> >> vpci_deassign_device() to remove resources if hiding success, so those
> >> resources must be removed in cleanup function of Rebar.
> >>
> >> To do that, implement cleanup function for Rebar.
> >>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> > 
> > LGTM, just one nit about a bounds check.
> > 
> >> ---
> >> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> >> ---
> >> v4->v5 changes:
> >> * Change definition "static void cleanup_rebar" to "static int cf_check cleanup_rebar" since cleanup hook is changed to be int.
> >>
> >> v3->v4 changes:
> >> * Change function name from fini_rebar() to cleanup_rebar().
> >> * Change the error number to be E2BIG and ENXIO in init_rebar().
> >>
> >> v2->v3 changes:
> >> * Use fini_rebar() to remove all register instead of in the failure path of init_rebar();
> >>
> >> v1->v2 changes:
> >> * Called vpci_remove_registers() to remove all possible registered registers instead of using a array to record all registered register.
> >>
> >> Best regards,
> >> Jiqian Chen.
> >> ---
> >>  xen/drivers/vpci/rebar.c | 35 ++++++++++++++++++++++++-----------
> >>  1 file changed, 24 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
> >> index 9cafd80ca2c9..4b1892fab3d6 100644
> >> --- a/xen/drivers/vpci/rebar.c
> >> +++ b/xen/drivers/vpci/rebar.c
> >> @@ -49,6 +49,26 @@ static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
> >>      bar->guest_addr = bar->addr;
> >>  }
> >>  
> >> +static int cf_check cleanup_rebar(struct pci_dev *pdev)
> >> +{
> >> +    uint32_t ctrl;
> >> +    unsigned int nbars;
> >> +    unsigned int rebar_offset = pci_find_ext_capability(pdev->sbdf,
> >> +                                                        PCI_EXT_CAP_ID_REBAR);
> >> +
> >> +    if ( !rebar_offset || !is_hardware_domain(pdev->domain) )
> > 
> > I think you could check rebar_offset < PCI_CFG_SPACE_SIZE to be more
> > accurate?
> OK.
> Do I need to change in init_rebar()?

Hm, pci_find_ext_capability() will never return a value <
PCI_CFG_SPACE_SIZE different than 0, so maybe none of this is needed.
Just leave it like you have, and please ignore the other requests to
change the checks, sorry for the fuss.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-06-06  8:38     ` Chen, Jiqian
@ 2025-06-06  9:21       ` Roger Pau Monné
  0 siblings, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-06  9:21 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: xen-devel@lists.xenproject.org, Huang, Ray

On Fri, Jun 06, 2025 at 08:38:49AM +0000, Chen, Jiqian wrote:
> On 2025/6/5 23:19, Roger Pau Monné wrote:
> > On Mon, May 26, 2025 at 05:45:58PM +0800, Jiqian Chen wrote:
> >> When init_msi() fails, current logic return fail and free MSI-related
> >> resources in vpci_deassign_device(). But the previous new changes will
> >> hide MSI capability and return success, it can't reach
> >> vpci_deassign_device() to remove resources if hiding success, so those
> >> resources must be removed in cleanup function of MSI.
> >>
> >> To do that, implement cleanup function for MSI.
> >>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >> ---
> >> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> >> ---
> >> v4->v5 changes:
> >> * Change definition "static void cleanup_msi" to "static int cf_check cleanup_msi" since cleanup hook is changed to be int.
> >> * Add a read-only register for MSI Control Register in the end of cleanup_msi.
> >>
> >> v3->v4 changes:
> >> * Change function name from fini_msi() to cleanup_msi().
> >> * Remove unnecessary comment.
> >> * Change to use XFREE to free vpci->msi.
> >>
> >> v2->v3 changes:
> >> * Remove all fail path, and use fini_msi() hook instead.
> >> * Change the method to calculating the size of msi registers.
> >>
> >> v1->v2 changes:
> >> * Added a new function fini_msi to free all MSI resources instead of using an array to record registered registers.
> >>
> >> Best regards,
> >> Jiqian Chen.
> >> ---
> >>  xen/drivers/vpci/msi.c | 29 ++++++++++++++++++++++++++++-
> >>  1 file changed, 28 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> >> index 2d45c7867de7..4e106c39efae 100644
> >> --- a/xen/drivers/vpci/msi.c
> >> +++ b/xen/drivers/vpci/msi.c
> >> @@ -193,6 +193,33 @@ static void cf_check mask_write(
> >>      msi->mask = val;
> >>  }
> >>  
> >> +static int cf_check cleanup_msi(struct pci_dev *pdev)
> >> +{
> >> +    int rc;
> >> +    unsigned int end, size;
> >> +    struct vpci *vpci = pdev->vpci;
> >> +    const unsigned int msi_pos = pdev->msi_pos;
> >> +    const unsigned int ctrl = msi_control_reg(msi_pos);
> >> +
> >> +    if ( !msi_pos || !vpci->msi )
> > 
> > Possibly same request as the previous patch, msi_pos should be after
> > the PCI standard header.
> msi_pos <= PCI_CAPABILITY_LIST ?
> Or msi_pos < 0x40 ?

Hm, no, ignore this, sorry.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] vpci/msix: Free MSIX resources when init_msix() fails
  2025-06-06  8:44     ` Chen, Jiqian
@ 2025-06-06  9:23       ` Roger Pau Monné
  0 siblings, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-06  9:23 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: xen-devel@lists.xenproject.org, Huang, Ray

On Fri, Jun 06, 2025 at 08:44:02AM +0000, Chen, Jiqian wrote:
> On 2025/6/5 23:34, Roger Pau Monné wrote:
> > On Mon, May 26, 2025 at 05:45:59PM +0800, Jiqian Chen wrote:
> >> When init_msix() fails, current logic return fail and free MSIX-related
> >> resources in vpci_deassign_device(). But the previous new changes will
> >> hide MSIX capability and return success, it can't reach
> >> vpci_deassign_device() to remove resources if hiding success, so those
> >> resources must be removed in cleanup function of MSIX.
> >>
> >> To do that, implement cleanup function for MSIX.
> >>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> > 
> > 
> > 
> >> ---
> >> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> >> ---
> >> v4->v5 changes:
> >> * Change definition "static void cleanup_msix" to "static int cf_check cleanup_msix" since cleanup hook is changed to be int.
> >> * Add a read-only register for MSIX Control Register in the end of cleanup_msix().
> >>
> >> v3->v4 changes:
> >> * Change function name from fini_msix() to cleanup_msix().
> >> * Change to use XFREE to free vpci->msix.
> >> * In cleanup function, change the sequence of check and remove action according to init_msix().
> >>
> >> v2->v3 changes:
> >> * Remove unnecessary clean operations in fini_msix().
> >>
> >> v1->v2 changes:
> >> new patch.
> >>
> >> Best regards,
> >> Jiqian Chen.
> >> ---
> >>  xen/drivers/vpci/msix.c | 29 ++++++++++++++++++++++++++++-
> >>  1 file changed, 28 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> >> index 674815ead025..cf79320d3b6f 100644
> >> --- a/xen/drivers/vpci/msix.c
> >> +++ b/xen/drivers/vpci/msix.c
> >> @@ -655,6 +655,33 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
> >>      return 0;
> >>  }
> >>  
> >> +static int cf_check cleanup_msix(struct pci_dev *pdev)
> >> +{
> >> +    int rc;
> >> +    struct vpci *vpci = pdev->vpci;
> >> +    const unsigned int msix_pos = pdev->msix_pos;
> >> +
> >> +    if ( !msix_pos )
> Need to change this check as previous patch, I think.

No, please ignore those previous requests, sorry.

> >> +        return 0;
> >> +
> >> +    rc = vpci_remove_registers(vpci, msix_control_reg(msix_pos), 2);
> >> +    if ( rc )
> >> +        return rc;
> >> +
> >> +    if ( !vpci->msix )
> >> +        return 0;
> > 
> > You cannot short-circuit here, as it then prevents adding the dummy
> > handlers done in the last return.
> > 
> >> +
> >> +    for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
> >> +        if ( vpci->msix->table[i] )
> >> +            iounmap(vpci->msix->table[i]);
> >> +
> >> +    list_del(&vpci->msix->next);
> >> +    XFREE(vpci->msix);
> >> +
> >> +    return vpci_add_register(pdev->vpci, vpci_hw_read16, NULL,
> >> +                             msix_control_reg(msix_pos), 2, NULL);
> > 
> > The above needs to be added even if !vpci->msix.
> Oh, right.
> I will change to
> 
>     if ( vpci->msix )
>     {
>         for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
>             if ( vpci->msix->table[i] )
>                 iounmap(vpci->msix->table[i]);
> 
>         list_del(&vpci->msix->next);
>         XFREE(vpci->msix);
>     }

LGTM, thanks.

Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-06  9:14         ` Roger Pau Monné
@ 2025-06-09  7:50           ` Chen, Jiqian
  2025-06-09  9:29             ` Roger Pau Monné
  0 siblings, 1 reply; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-09  7:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Julien Grall, Stefano Stabellini,
	Huang, Ray, Chen, Jiqian

On 2025/6/6 17:14, Roger Pau Monné wrote:
> On Fri, Jun 06, 2025 at 09:05:48AM +0200, Jan Beulich wrote:
>> On 06.06.2025 08:29, Chen, Jiqian wrote:
>>> On 2025/6/5 20:50, Roger Pau Monné wrote:
>>>> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
>>>>> +  }; \
>>>>> +  static vpci_capability_t *const finit##_entry  \
>>>>> +               __used_section(".data.vpci") = &finit##_t
>>>>
>>>> IMO this should better use .rodata instead of .data. 
>>> Is below change correct?
>>>
>>> +    static const vpci_capability_t *const finit##_entry  \
>>> +        __used_section(".rodata") = &finit##_t
>>
>> No, specifically because ...
>>
>>>> Not that it matters much in practice, as we place it in .rodata anyway.  Note
>>>> however you will have to move the placement of the VPCI_ARRAY in the
>>>> linker script ahead of *(.rodata.*), otherwise that section match will
>>>> consume the vPCI data.
>>> I am sorry, how to move it ahead of *(.rodata.*) ?
>>> Is below change correct?
>>>
>>> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
>>> index 793d0e11450c..3817642135aa 100644
>>> --- a/xen/include/xen/xen.lds.h
>>> +++ b/xen/include/xen/xen.lds.h
>>> @@ -188,7 +188,7 @@
>>>  #define VPCI_ARRAY               \
>>>         . = ALIGN(POINTER_ALIGN); \
>>>         __start_vpci_array = .;   \
>>> -       *(SORT(.data.vpci.*))     \
>>> +       *(.rodata)             \
>>
>> ... this isn't - you'd move _all_ of .rodata into here, which definitely
>> isn't what you want. What I understand Roger meant was a .rodata-like
>> section, e.g. .rodata.vpci.* (much like it was .data.vpci.* before).
> 
> Indeed, my suggestion was merely to use .rodata instead of .data, as
> that's more accurate IMO.  I think it should be *(.rodata.vpci) (and
> same section change for the __used_section() attribute.

If I understand correctly, the next version will be:

+    static const vpci_capability_t *const finit##_entry  \
+        __used_section(".rodata.vpci") = &finit##_t
+

and

 #define VPCI_ARRAY               \
        . = ALIGN(POINTER_ALIGN); \
        __start_vpci_array = .;   \
-       *(SORT(.data.vpci.*))     \
+       *(.rodata.vpci)           \
        __end_vpci_array = .;

But, that encountered an warning when compiling.
" {standard input}: Assembler messages:
{standard input}:1160: Warning: setting incorrect section attributes for .rodata.vpci
{standard input}: Assembler messages:
{standard input}:3034: Warning: setting incorrect section attributes for .rodata.vpci
{standard input}: Assembler messages:
{standard input}:6686: Warning: setting incorrect section attributes for .rodata.vpci "

And, during booting Xen, all value of __start_vpci_array is incorrect.
Do I miss anything?

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-09  7:50           ` Chen, Jiqian
@ 2025-06-09  9:29             ` Roger Pau Monné
  2025-06-09 10:18               ` Chen, Jiqian
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-09  9:29 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Julien Grall, Stefano Stabellini,
	Huang, Ray

On Mon, Jun 09, 2025 at 07:50:21AM +0000, Chen, Jiqian wrote:
> On 2025/6/6 17:14, Roger Pau Monné wrote:
> > On Fri, Jun 06, 2025 at 09:05:48AM +0200, Jan Beulich wrote:
> >> On 06.06.2025 08:29, Chen, Jiqian wrote:
> >>> On 2025/6/5 20:50, Roger Pau Monné wrote:
> >>>> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
> >>>>> +  }; \
> >>>>> +  static vpci_capability_t *const finit##_entry  \
> >>>>> +               __used_section(".data.vpci") = &finit##_t
> >>>>
> >>>> IMO this should better use .rodata instead of .data. 
> >>> Is below change correct?
> >>>
> >>> +    static const vpci_capability_t *const finit##_entry  \
> >>> +        __used_section(".rodata") = &finit##_t
> >>
> >> No, specifically because ...
> >>
> >>>> Not that it matters much in practice, as we place it in .rodata anyway.  Note
> >>>> however you will have to move the placement of the VPCI_ARRAY in the
> >>>> linker script ahead of *(.rodata.*), otherwise that section match will
> >>>> consume the vPCI data.
> >>> I am sorry, how to move it ahead of *(.rodata.*) ?
> >>> Is below change correct?
> >>>
> >>> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
> >>> index 793d0e11450c..3817642135aa 100644
> >>> --- a/xen/include/xen/xen.lds.h
> >>> +++ b/xen/include/xen/xen.lds.h
> >>> @@ -188,7 +188,7 @@
> >>>  #define VPCI_ARRAY               \
> >>>         . = ALIGN(POINTER_ALIGN); \
> >>>         __start_vpci_array = .;   \
> >>> -       *(SORT(.data.vpci.*))     \
> >>> +       *(.rodata)             \
> >>
> >> ... this isn't - you'd move _all_ of .rodata into here, which definitely
> >> isn't what you want. What I understand Roger meant was a .rodata-like
> >> section, e.g. .rodata.vpci.* (much like it was .data.vpci.* before).
> > 
> > Indeed, my suggestion was merely to use .rodata instead of .data, as
> > that's more accurate IMO.  I think it should be *(.rodata.vpci) (and
> > same section change for the __used_section() attribute.
> 
> If I understand correctly, the next version will be:
> 
> +    static const vpci_capability_t *const finit##_entry  \
> +        __used_section(".rodata.vpci") = &finit##_t
> +
> 
> and
> 
>  #define VPCI_ARRAY               \
>         . = ALIGN(POINTER_ALIGN); \
>         __start_vpci_array = .;   \
> -       *(SORT(.data.vpci.*))     \
> +       *(.rodata.vpci)           \
>         __end_vpci_array = .;

Did you also move the instances of VPCI_ARRAY in the linker scripts so
it's before the catch-all *(.rodata.*)?

> 
> But, that encountered an warning when compiling.
> " {standard input}: Assembler messages:
> {standard input}:1160: Warning: setting incorrect section attributes for .rodata.vpci
> {standard input}: Assembler messages:
> {standard input}:3034: Warning: setting incorrect section attributes for .rodata.vpci
> {standard input}: Assembler messages:
> {standard input}:6686: Warning: setting incorrect section attributes for .rodata.vpci "

What are the attributes for .rodata.vpci in the object files?  You can
get those using objdump or readelf, for example:

$ objdump -h xen/drivers/vpci/msi.o
[...]
 17 .data.vpci.9  00000008  0000000000000000  0000000000000000  00000a50  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA

It should be READONLY, otherwise you will get those messages.

> And, during booting Xen, all value of __start_vpci_array is incorrect.
> Do I miss anything?

I think that's likely because you haven't moved the instance of
VPCI_ARRAY in the linker script?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-09  9:29             ` Roger Pau Monné
@ 2025-06-09 10:18               ` Chen, Jiqian
  2025-06-09 10:40                 ` Roger Pau Monné
  0 siblings, 1 reply; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-09 10:18 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Julien Grall, Stefano Stabellini,
	Huang, Ray, Chen, Jiqian

On 2025/6/9 17:29, Roger Pau Monné wrote:
> On Mon, Jun 09, 2025 at 07:50:21AM +0000, Chen, Jiqian wrote:
>> On 2025/6/6 17:14, Roger Pau Monné wrote:
>>> On Fri, Jun 06, 2025 at 09:05:48AM +0200, Jan Beulich wrote:
>>>> On 06.06.2025 08:29, Chen, Jiqian wrote:
>>>>> On 2025/6/5 20:50, Roger Pau Monné wrote:
>>>>>> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
>>>>>>> +  }; \
>>>>>>> +  static vpci_capability_t *const finit##_entry  \
>>>>>>> +               __used_section(".data.vpci") = &finit##_t
>>>>>>
>>>>>> IMO this should better use .rodata instead of .data. 
>>>>> Is below change correct?
>>>>>
>>>>> +    static const vpci_capability_t *const finit##_entry  \
>>>>> +        __used_section(".rodata") = &finit##_t
>>>>
>>>> No, specifically because ...
>>>>
>>>>>> Not that it matters much in practice, as we place it in .rodata anyway.  Note
>>>>>> however you will have to move the placement of the VPCI_ARRAY in the
>>>>>> linker script ahead of *(.rodata.*), otherwise that section match will
>>>>>> consume the vPCI data.
>>>>> I am sorry, how to move it ahead of *(.rodata.*) ?
>>>>> Is below change correct?
>>>>>
>>>>> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
>>>>> index 793d0e11450c..3817642135aa 100644
>>>>> --- a/xen/include/xen/xen.lds.h
>>>>> +++ b/xen/include/xen/xen.lds.h
>>>>> @@ -188,7 +188,7 @@
>>>>>  #define VPCI_ARRAY               \
>>>>>         . = ALIGN(POINTER_ALIGN); \
>>>>>         __start_vpci_array = .;   \
>>>>> -       *(SORT(.data.vpci.*))     \
>>>>> +       *(.rodata)             \
>>>>
>>>> ... this isn't - you'd move _all_ of .rodata into here, which definitely
>>>> isn't what you want. What I understand Roger meant was a .rodata-like
>>>> section, e.g. .rodata.vpci.* (much like it was .data.vpci.* before).
>>>
>>> Indeed, my suggestion was merely to use .rodata instead of .data, as
>>> that's more accurate IMO.  I think it should be *(.rodata.vpci) (and
>>> same section change for the __used_section() attribute.
>>
>> If I understand correctly, the next version will be:
>>
>> +    static const vpci_capability_t *const finit##_entry  \
>> +        __used_section(".rodata.vpci") = &finit##_t
>> +
>>
>> and
>>
>>  #define VPCI_ARRAY               \
>>         . = ALIGN(POINTER_ALIGN); \
>>         __start_vpci_array = .;   \
>> -       *(SORT(.data.vpci.*))     \
>> +       *(.rodata.vpci)           \
>>         __end_vpci_array = .;
> 
> Did you also move the instances of VPCI_ARRAY in the linker scripts so
> it's before the catch-all *(.rodata.*)?
> 
>>
>> But, that encountered an warning when compiling.
>> " {standard input}: Assembler messages:
>> {standard input}:1160: Warning: setting incorrect section attributes for .rodata.vpci
>> {standard input}: Assembler messages:
>> {standard input}:3034: Warning: setting incorrect section attributes for .rodata.vpci
>> {standard input}: Assembler messages:
>> {standard input}:6686: Warning: setting incorrect section attributes for .rodata.vpci "
> 
> What are the attributes for .rodata.vpci in the object files?  You can
> get those using objdump or readelf, for example:
> 
> $ objdump -h xen/drivers/vpci/msi.o
> [...]
>  17 .data.vpci.9  00000008  0000000000000000  0000000000000000  00000a50  2**3
>                   CONTENTS, ALLOC, LOAD, RELOC, DATA
> 
> It should be READONLY, otherwise you will get those messages.
> 
>> And, during booting Xen, all value of __start_vpci_array is incorrect.
>> Do I miss anything?
> 
> I think that's likely because you haven't moved the instance of
> VPCI_ARRAY in the linker script?
Oh, right. Sorry, I forgot to move it.
After changing this, it works now.

diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index bf956b6c5fc0..c88fd62f4f0d 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -134,6 +134,7 @@ SECTIONS
        BUGFRAMES

        *(.rodata)
+       VPCI_ARRAY
        *(.rodata.*)
        *(.data.rel.ro)
        *(.data.rel.ro.*)
@@ -148,7 +149,6 @@ SECTIONS
        *(.note.gnu.build-id)
        __note_gnu_build_id_end = .;
 #endif
-       VPCI_ARRAY
   } PHDR(text)

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-09 10:18               ` Chen, Jiqian
@ 2025-06-09 10:40                 ` Roger Pau Monné
  2025-06-10  3:52                   ` Chen, Jiqian
  0 siblings, 1 reply; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-09 10:40 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Julien Grall, Stefano Stabellini,
	Huang, Ray

On Mon, Jun 09, 2025 at 10:18:42AM +0000, Chen, Jiqian wrote:
> On 2025/6/9 17:29, Roger Pau Monné wrote:
> > On Mon, Jun 09, 2025 at 07:50:21AM +0000, Chen, Jiqian wrote:
> >> On 2025/6/6 17:14, Roger Pau Monné wrote:
> >>> On Fri, Jun 06, 2025 at 09:05:48AM +0200, Jan Beulich wrote:
> >>>> On 06.06.2025 08:29, Chen, Jiqian wrote:
> >>>>> On 2025/6/5 20:50, Roger Pau Monné wrote:
> >>>>>> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
> >>>>>>> +  }; \
> >>>>>>> +  static vpci_capability_t *const finit##_entry  \
> >>>>>>> +               __used_section(".data.vpci") = &finit##_t
> >>>>>>
> >>>>>> IMO this should better use .rodata instead of .data. 
> >>>>> Is below change correct?
> >>>>>
> >>>>> +    static const vpci_capability_t *const finit##_entry  \
> >>>>> +        __used_section(".rodata") = &finit##_t
> >>>>
> >>>> No, specifically because ...
> >>>>
> >>>>>> Not that it matters much in practice, as we place it in .rodata anyway.  Note
> >>>>>> however you will have to move the placement of the VPCI_ARRAY in the
> >>>>>> linker script ahead of *(.rodata.*), otherwise that section match will
> >>>>>> consume the vPCI data.
> >>>>> I am sorry, how to move it ahead of *(.rodata.*) ?
> >>>>> Is below change correct?
> >>>>>
> >>>>> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
> >>>>> index 793d0e11450c..3817642135aa 100644
> >>>>> --- a/xen/include/xen/xen.lds.h
> >>>>> +++ b/xen/include/xen/xen.lds.h
> >>>>> @@ -188,7 +188,7 @@
> >>>>>  #define VPCI_ARRAY               \
> >>>>>         . = ALIGN(POINTER_ALIGN); \
> >>>>>         __start_vpci_array = .;   \
> >>>>> -       *(SORT(.data.vpci.*))     \
> >>>>> +       *(.rodata)             \
> >>>>
> >>>> ... this isn't - you'd move _all_ of .rodata into here, which definitely
> >>>> isn't what you want. What I understand Roger meant was a .rodata-like
> >>>> section, e.g. .rodata.vpci.* (much like it was .data.vpci.* before).
> >>>
> >>> Indeed, my suggestion was merely to use .rodata instead of .data, as
> >>> that's more accurate IMO.  I think it should be *(.rodata.vpci) (and
> >>> same section change for the __used_section() attribute.
> >>
> >> If I understand correctly, the next version will be:
> >>
> >> +    static const vpci_capability_t *const finit##_entry  \
> >> +        __used_section(".rodata.vpci") = &finit##_t
> >> +
> >>
> >> and
> >>
> >>  #define VPCI_ARRAY               \
> >>         . = ALIGN(POINTER_ALIGN); \
> >>         __start_vpci_array = .;   \
> >> -       *(SORT(.data.vpci.*))     \
> >> +       *(.rodata.vpci)           \
> >>         __end_vpci_array = .;
> > 
> > Did you also move the instances of VPCI_ARRAY in the linker scripts so
> > it's before the catch-all *(.rodata.*)?
> > 
> >>
> >> But, that encountered an warning when compiling.
> >> " {standard input}: Assembler messages:
> >> {standard input}:1160: Warning: setting incorrect section attributes for .rodata.vpci
> >> {standard input}: Assembler messages:
> >> {standard input}:3034: Warning: setting incorrect section attributes for .rodata.vpci
> >> {standard input}: Assembler messages:
> >> {standard input}:6686: Warning: setting incorrect section attributes for .rodata.vpci "
> > 
> > What are the attributes for .rodata.vpci in the object files?  You can
> > get those using objdump or readelf, for example:
> > 
> > $ objdump -h xen/drivers/vpci/msi.o
> > [...]
> >  17 .data.vpci.9  00000008  0000000000000000  0000000000000000  00000a50  2**3
> >                   CONTENTS, ALLOC, LOAD, RELOC, DATA
> > 
> > It should be READONLY, otherwise you will get those messages.
> > 
> >> And, during booting Xen, all value of __start_vpci_array is incorrect.
> >> Do I miss anything?
> > 
> > I think that's likely because you haven't moved the instance of
> > VPCI_ARRAY in the linker script?
> Oh, right. Sorry, I forgot to move it.
> After changing this, it works now.
> 
> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> index bf956b6c5fc0..c88fd62f4f0d 100644
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -134,6 +134,7 @@ SECTIONS
>         BUGFRAMES
> 
>         *(.rodata)
> +       VPCI_ARRAY
>         *(.rodata.*)
>         *(.data.rel.ro)
>         *(.data.rel.ro.*)
> @@ -148,7 +149,6 @@ SECTIONS
>         *(.note.gnu.build-id)
>         __note_gnu_build_id_end = .;
>  #endif
> -       VPCI_ARRAY
>    } PHDR(text)

FWIW, I would put it ahead of *(.rodata).  Remember to also modify the
linker script for all the other arches, not just x86.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-09 10:40                 ` Roger Pau Monné
@ 2025-06-10  3:52                   ` Chen, Jiqian
  2025-06-10  7:08                     ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-10  3:52 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Julien Grall, Stefano Stabellini,
	Huang, Ray, Chen, Jiqian

[-- Attachment #1: Type: text/plain, Size: 5100 bytes --]

On 2025/6/9 18:40, Roger Pau Monné wrote:
> On Mon, Jun 09, 2025 at 10:18:42AM +0000, Chen, Jiqian wrote:
>> On 2025/6/9 17:29, Roger Pau Monné wrote:
>>> On Mon, Jun 09, 2025 at 07:50:21AM +0000, Chen, Jiqian wrote:
>>>> On 2025/6/6 17:14, Roger Pau Monné wrote:
>>>>> On Fri, Jun 06, 2025 at 09:05:48AM +0200, Jan Beulich wrote:
>>>>>> On 06.06.2025 08:29, Chen, Jiqian wrote:
>>>>>>> On 2025/6/5 20:50, Roger Pau Monné wrote:
>>>>>>>> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
>>>>>>>>> +  }; \
>>>>>>>>> +  static vpci_capability_t *const finit##_entry  \
>>>>>>>>> +               __used_section(".data.vpci") = &finit##_t
>>>>>>>>
>>>>>>>> IMO this should better use .rodata instead of .data. 
>>>>>>> Is below change correct?
>>>>>>>
>>>>>>> +    static const vpci_capability_t *const finit##_entry  \
>>>>>>> +        __used_section(".rodata") = &finit##_t
>>>>>>
>>>>>> No, specifically because ...
>>>>>>
>>>>>>>> Not that it matters much in practice, as we place it in .rodata anyway.  Note
>>>>>>>> however you will have to move the placement of the VPCI_ARRAY in the
>>>>>>>> linker script ahead of *(.rodata.*), otherwise that section match will
>>>>>>>> consume the vPCI data.
>>>>>>> I am sorry, how to move it ahead of *(.rodata.*) ?
>>>>>>> Is below change correct?
>>>>>>>
>>>>>>> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
>>>>>>> index 793d0e11450c..3817642135aa 100644
>>>>>>> --- a/xen/include/xen/xen.lds.h
>>>>>>> +++ b/xen/include/xen/xen.lds.h
>>>>>>> @@ -188,7 +188,7 @@
>>>>>>>  #define VPCI_ARRAY               \
>>>>>>>         . = ALIGN(POINTER_ALIGN); \
>>>>>>>         __start_vpci_array = .;   \
>>>>>>> -       *(SORT(.data.vpci.*))     \
>>>>>>> +       *(.rodata)             \
>>>>>>
>>>>>> ... this isn't - you'd move _all_ of .rodata into here, which definitely
>>>>>> isn't what you want. What I understand Roger meant was a .rodata-like
>>>>>> section, e.g. .rodata.vpci.* (much like it was .data.vpci.* before).
>>>>>
>>>>> Indeed, my suggestion was merely to use .rodata instead of .data, as
>>>>> that's more accurate IMO.  I think it should be *(.rodata.vpci) (and
>>>>> same section change for the __used_section() attribute.
>>>>
>>>> If I understand correctly, the next version will be:
>>>>
>>>> +    static const vpci_capability_t *const finit##_entry  \
>>>> +        __used_section(".rodata.vpci") = &finit##_t
>>>> +
>>>>
>>>> and
>>>>
>>>>  #define VPCI_ARRAY               \
>>>>         . = ALIGN(POINTER_ALIGN); \
>>>>         __start_vpci_array = .;   \
>>>> -       *(SORT(.data.vpci.*))     \
>>>> +       *(.rodata.vpci)           \
>>>>         __end_vpci_array = .;
>>>
>>> Did you also move the instances of VPCI_ARRAY in the linker scripts so
>>> it's before the catch-all *(.rodata.*)?
>>>
>>>>
>>>> But, that encountered an warning when compiling.
>>>> " {standard input}: Assembler messages:
>>>> {standard input}:1160: Warning: setting incorrect section attributes for .rodata.vpci
>>>> {standard input}: Assembler messages:
>>>> {standard input}:3034: Warning: setting incorrect section attributes for .rodata.vpci
>>>> {standard input}: Assembler messages:
>>>> {standard input}:6686: Warning: setting incorrect section attributes for .rodata.vpci "
>>>
>>> What are the attributes for .rodata.vpci in the object files?  You can
>>> get those using objdump or readelf, for example:
>>>
>>> $ objdump -h xen/drivers/vpci/msi.o
>>> [...]
>>>  17 .data.vpci.9  00000008  0000000000000000  0000000000000000  00000a50  2**3
>>>                   CONTENTS, ALLOC, LOAD, RELOC, DATA
>>>
>>> It should be READONLY, otherwise you will get those messages.
>>>
>>>> And, during booting Xen, all value of __start_vpci_array is incorrect.
>>>> Do I miss anything?
>>>
>>> I think that's likely because you haven't moved the instance of
>>> VPCI_ARRAY in the linker script?
>> Oh, right. Sorry, I forgot to move it.
>> After changing this, it works now.
>>
>> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
>> index bf956b6c5fc0..c88fd62f4f0d 100644
>> --- a/xen/arch/x86/xen.lds.S
>> +++ b/xen/arch/x86/xen.lds.S
>> @@ -134,6 +134,7 @@ SECTIONS
>>         BUGFRAMES
>>
>>         *(.rodata)
>> +       VPCI_ARRAY
>>         *(.rodata.*)
>>         *(.data.rel.ro)
>>         *(.data.rel.ro.*)
>> @@ -148,7 +149,6 @@ SECTIONS
>>         *(.note.gnu.build-id)
>>         __note_gnu_build_id_end = .;
>>  #endif
>> -       VPCI_ARRAY
>>    } PHDR(text)
> 
> FWIW, I would put it ahead of *(.rodata).  Remember to also modify the
> linker script for all the other arches, not just x86.

Whether before *(.rodata.*) or before *(.rodata), there still is the warning " Warning: setting incorrect section attributes for .rodata.vpci "
And the objdump shows "rodata.vpci" has no "readonly" word.
But starting Xen dom0 is OK.
I attached the new this patch and the result of "objdump -h xen/drivers/vpci/msi.o"

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

[-- Attachment #2: objdump_before_rodata.txt --]
[-- Type: text/plain, Size: 5787 bytes --]


./xen/drivers/vpci/msi.o:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000000  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  00000040  2**0
                  ALLOC
  3 .text._can_read_lock 00000052  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  4 .text.address_read 0000000b  0000000000000000  0000000000000000  00000092  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  5 .text.address_hi_read 00000010  0000000000000000  0000000000000000  0000009d  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  6 .text.data_read 0000000d  0000000000000000  0000000000000000  000000ad  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  7 .text.mask_read 0000000c  0000000000000000  0000000000000000  000000ba  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  8 .text.mask_write 00000078  0000000000000000  0000000000000000  000000c6  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  9 .text.address_hi_write 00000027  0000000000000000  0000000000000000  0000013e  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 10 .text.control_read 0000004e  0000000000000000  0000000000000000  00000165  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 11 .text.control_write 00000144  0000000000000000  0000000000000000  000001b3  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 12 .altinstructions 00000070  0000000000000000  0000000000000000  000002f7  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
 13 .discard      00000010  0000000000000000  0000000000000000  00000367  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 .altinstr_replacement 00000000  0000000000000000  0000000000000000  00000377  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 15 .text._read_trylock 00000090  0000000000000000  0000000000000000  00000377  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 16 .text._read_unlock 00000036  0000000000000000  0000000000000000  00000407  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 17 .text.cleanup_msi 000000dc  0000000000000000  0000000000000000  0000043d  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 18 .rodata.init_msi.str1.1 0000002a  0000000000000000  0000000000000000  00000519  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 19 .text.init_msi 00000270  0000000000000000  0000000000000000  00000543  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 20 .bug_frames.3 00000020  0000000000000000  0000000000000000  000007b4  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
 21 .text.address_write 00000039  0000000000000000  0000000000000000  000007d4  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 22 .text.data_write 00000028  0000000000000000  0000000000000000  0000080d  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 23 .rodata.vpci_dump_msi.str1.1 0000009a  0000000000000000  0000000000000000  00000835  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 24 .rodata.vpci_dump_msi.str1.8 0000007f  0000000000000000  0000000000000000  000008d0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 25 .text.vpci_dump_msi 000002af  0000000000000000  0000000000000000  0000094f  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 26 .rodata.vpci  00000008  0000000000000000  0000000000000000  00000c00  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
 27 .data.rel.ro.local.init_msi_t 00000018  0000000000000000  0000000000000000  00000c10  2**4
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
 28 .debug_info   0000ae83  0000000000000000  0000000000000000  00000c28  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING, OCTETS
 29 .debug_abbrev 00000892  0000000000000000  0000000000000000  0000baab  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 30 .debug_loclists 00000ffc  0000000000000000  0000000000000000  0000c33d  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING, OCTETS
 31 .debug_aranges 00000120  0000000000000000  0000000000000000  0000d339  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING, OCTETS
 32 .debug_rnglists 00000212  0000000000000000  0000000000000000  0000d459  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING, OCTETS
 33 .debug_line   00000e30  0000000000000000  0000000000000000  0000d66b  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING, OCTETS
 34 .debug_str    0000502b  0000000000000000  0000000000000000  0000e49b  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 35 .debug_line_str 00000470  0000000000000000  0000000000000000  000134c6  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 36 .comment      0000002c  0000000000000000  0000000000000000  00013936  2**0
                  CONTENTS, READONLY
 37 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00013962  2**0
                  CONTENTS, READONLY
 38 .note.gnu.property 00000020  0000000000000000  0000000000000000  00013968  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 39 .debug_frame  000002d0  0000000000000000  0000000000000000  00013988  2**3
                  CONTENTS, RELOC, READONLY, DEBUGGING, OCTETS

[-- Attachment #3: 0003-vpci-Refactor-REGISTER_VPCI_INIT.patch --]
[-- Type: application/octet-stream, Size: 11360 bytes --]

From ff2d43c1b61e9fc1bbba0a4028ab278aa1c21a7e Mon Sep 17 00:00:00 2001
From: Jiqian Chen <Jiqian.Chen@amd.com>
Date: Wed, 16 Apr 2025 16:38:07 +0800
Subject: [PATCH 3/9] vpci: Refactor REGISTER_VPCI_INIT
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Refactor REGISTER_VPCI_INIT to contain more capability specific
information, this will benefit further follow-on changes to hide
capability when initialization fails.

What's more, change the definition of init_header() since it is
not a capability and it is needed for all devices' PCI config space.

After refactor, the "priority" of initializing capabilities isn't
needed anymore, so delete its related codes.

Note:
Call vpci_make_msix_hole() in the end of init_msix() since the change
of sequence of init_header() and init_msix().

The cleanup hook is also added in this change, even if it's still
unused. Further changes will make use of it.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
v5->v6 changes:
* Rename REGISTER_PCI_CAPABILITY to REGISTER_VPCI_CAPABILITY.
* Move vpci_capability_t entry from ".data.vpci" to ".rodata.vpci" and
  move the instances of VPCI_ARRAY in the linker scripts before *(.rodata).
* Change _start/end_vpci_array[] to be const pointer array.

v4->v5 changes:
* Rename REGISTER_VPCI_CAP to REGISTER_PCI_CAPABILITY, rename REGISTER_VPCI_LEGACY_CAP to
  REGISTER_VPCI_CAP, rename REGISTER_VPCI_EXTENDED_CAP to REGISTER_VPCI_EXTCAP.
* Change cleanup hook of vpci_capability_t from void to int.

v3->v4 changes
* Delete the useless trailing dot of section ".data.vpci".
* Add description about priority since this patch removes the initializing priority of
  capabilities and priority is not needed anymore.
* Change the hook name from fini to cleanup.
* Change the name x and y to be finit and fclean.
* Remove the unnecessary check "!capability->init"

v2->v3 changes:
* This is separated from patch "vpci: Hide capability when it fails to initialize" of v2.
* Delete __maybe_unused attribute of "out" in function vpci_assign_devic().
* Rename REGISTER_VPCI_EXTEND_CAP to REGISTER_VPCI_EXTENDED_CAP.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove
  failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/arch/arm/xen.lds.S    |  3 +--
 xen/arch/ppc/xen.lds.S    |  3 +--
 xen/arch/riscv/xen.lds.S  |  3 +--
 xen/arch/x86/xen.lds.S    |  2 +-
 xen/drivers/vpci/header.c |  3 +--
 xen/drivers/vpci/msi.c    |  2 +-
 xen/drivers/vpci/msix.c   |  8 +++++--
 xen/drivers/vpci/rebar.c  |  2 +-
 xen/drivers/vpci/vpci.c   | 44 ++++++++++++++++++++++++++++++---------
 xen/include/xen/vpci.h    | 30 +++++++++++++++++++-------
 xen/include/xen/xen.lds.h |  2 +-
 11 files changed, 70 insertions(+), 32 deletions(-)

diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 5bfbe1e92c1e..7d9508ed0ff5 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -55,6 +55,7 @@ SECTIONS
 
         BUGFRAMES
 
+       VPCI_ARRAY
        *(.rodata)
        *(.rodata.*)
        *(.data.rel.ro)
@@ -64,8 +65,6 @@ SECTIONS
        __proc_info_start = .;
        *(.proc.info)
        __proc_info_end = .;
-
-       VPCI_ARRAY
   } :text
 
 #if defined(BUILD_ID)
diff --git a/xen/arch/ppc/xen.lds.S b/xen/arch/ppc/xen.lds.S
index 1366e2819eed..0295c921f735 100644
--- a/xen/arch/ppc/xen.lds.S
+++ b/xen/arch/ppc/xen.lds.S
@@ -49,13 +49,12 @@ SECTIONS
 
         BUGFRAMES
 
+        VPCI_ARRAY
         *(.rodata)
         *(.rodata.*)
         *(.data.rel.ro)
         *(.data.rel.ro.*)
 
-        VPCI_ARRAY
-
         . = ALIGN(POINTER_ALIGN);
     } :text
 
diff --git a/xen/arch/riscv/xen.lds.S b/xen/arch/riscv/xen.lds.S
index 8c3c06de01f6..6df4cb92a6b2 100644
--- a/xen/arch/riscv/xen.lds.S
+++ b/xen/arch/riscv/xen.lds.S
@@ -44,13 +44,12 @@ SECTIONS
 
         BUGFRAMES
 
+        VPCI_ARRAY
         *(.rodata)
         *(.rodata.*)
         *(.data.rel.ro)
         *(.data.rel.ro.*)
 
-        VPCI_ARRAY
-
         . = ALIGN(POINTER_ALIGN);
     } :text
 
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index bf956b6c5fc0..ba035d6a673f 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -133,6 +133,7 @@ SECTIONS
 
        BUGFRAMES
 
+       VPCI_ARRAY
        *(.rodata)
        *(.rodata.*)
        *(.data.rel.ro)
@@ -148,7 +149,6 @@ SECTIONS
        *(.note.gnu.build-id)
        __note_gnu_build_id_end = .;
 #endif
-       VPCI_ARRAY
   } PHDR(text)
 
 #if defined(CONFIG_PVH_GUEST) && !defined(EFI)
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index a39bf2b12585..2cf54ce60297 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -872,7 +872,7 @@ static int vpci_init_ext_capability_list(struct pci_dev *pdev)
     return 0;
 }
 
-static int cf_check init_header(struct pci_dev *pdev)
+int vpci_init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
     uint64_t addr, size;
@@ -1068,7 +1068,6 @@ static int cf_check init_header(struct pci_dev *pdev)
     pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
     return rc;
 }
-REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 66e5a8a116be..2d45c7867de7 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_CAP(PCI_CAP_ID_MSI, init_msi, NULL);
 
 void vpci_dump_msi(void)
 {
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 74211301ba10..674815ead025 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -703,9 +703,13 @@ static int cf_check init_msix(struct pci_dev *pdev)
     pdev->vpci->msix = msix;
     list_add(&msix->next, &d->arch.hvm.msix_tables);
 
-    return 0;
+    spin_lock(&pdev->vpci->lock);
+    rc = vpci_make_msix_hole(pdev);
+    spin_unlock(&pdev->vpci->lock);
+
+    return rc;
 }
-REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
+REGISTER_VPCI_CAP(PCI_CAP_ID_MSIX, init_msix, NULL);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
index 793937449af7..9cafd80ca2c9 100644
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_EXTCAP(PCI_EXT_CAP_ID_REBAR, init_rebar, NULL);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 8474c0e3b995..6e768bb32447 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -36,8 +36,8 @@ struct vpci_register {
 };
 
 #ifdef __XEN__
-extern vpci_register_init_t *const __start_vpci_array[];
-extern vpci_register_init_t *const __end_vpci_array[];
+extern const vpci_capability_t *const __start_vpci_array[];
+extern const vpci_capability_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
@@ -83,6 +83,32 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
 
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
+static int vpci_init_capabilities(struct pci_dev *pdev)
+{
+    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        const vpci_capability_t *capability = __start_vpci_array[i];
+        const unsigned int cap = capability->id;
+        const bool is_ext = capability->is_ext;
+        int rc;
+        unsigned int pos = 0;
+
+        if ( !is_ext )
+            pos = pci_find_cap_offset(pdev->sbdf, cap);
+        else if ( is_hardware_domain(pdev->domain) )
+            pos = pci_find_ext_capability(pdev->sbdf, cap);
+
+        if ( !pos )
+            continue;
+
+        rc = capability->init(pdev);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 void vpci_deassign_device(struct pci_dev *pdev)
 {
     unsigned int i;
@@ -128,7 +154,6 @@ void vpci_deassign_device(struct pci_dev *pdev)
 
 int vpci_assign_device(struct pci_dev *pdev)
 {
-    unsigned int i;
     const unsigned long *ro_map;
     int rc = 0;
 
@@ -159,14 +184,13 @@ int vpci_assign_device(struct pci_dev *pdev)
         goto out;
 #endif
 
-    for ( i = 0; i < NUM_VPCI_INIT; i++ )
-    {
-        rc = __start_vpci_array[i](pdev);
-        if ( rc )
-            break;
-    }
+    rc = vpci_init_header(pdev);
+    if ( rc )
+        goto out;
+
+    rc = vpci_init_capabilities(pdev);
 
- out: __maybe_unused;
+ out:
     if ( rc )
         vpci_deassign_device(pdev);
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 61d16cc8b897..6ea6fcf578f1 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -13,11 +13,12 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
 typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
                           uint32_t val, void *data);
 
-typedef int vpci_register_init_t(struct pci_dev *dev);
-
-#define VPCI_PRIORITY_HIGH      "1"
-#define VPCI_PRIORITY_MIDDLE    "5"
-#define VPCI_PRIORITY_LOW       "9"
+typedef struct {
+    unsigned int id;
+    bool is_ext;
+    int (*init)(struct pci_dev *pdev);
+    int (*cleanup)(struct pci_dev *pdev);
+} vpci_capability_t;
 
 #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
 
@@ -29,9 +30,22 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
  */
 #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
 
-#define REGISTER_VPCI_INIT(x, p)                \
-  static vpci_register_init_t *const x##_entry  \
-               __used_section(".data.vpci." p) = (x)
+#define REGISTER_VPCI_CAPABILITY(cap, finit, fclean, ext) \
+    static const vpci_capability_t finit##_t = { \
+        .id = (cap), \
+        .init = (finit), \
+        .cleanup = (fclean), \
+        .is_ext = (ext), \
+    }; \
+    static const vpci_capability_t *const finit##_entry  \
+        __used_section(".rodata.vpci") = &finit##_t
+
+#define REGISTER_VPCI_CAP(cap, finit, fclean) \
+    REGISTER_VPCI_CAPABILITY(cap, finit, fclean, false)
+#define REGISTER_VPCI_EXTCAP(cap, finit, fclean) \
+    REGISTER_VPCI_CAPABILITY(cap, finit, fclean, true)
+
+int __must_check vpci_init_header(struct pci_dev *pdev);
 
 /* Assign vPCI to device by adding handlers. */
 int __must_check vpci_assign_device(struct pci_dev *pdev);
diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
index 793d0e11450c..ac359c3757a2 100644
--- a/xen/include/xen/xen.lds.h
+++ b/xen/include/xen/xen.lds.h
@@ -188,7 +188,7 @@
 #define VPCI_ARRAY               \
        . = ALIGN(POINTER_ALIGN); \
        __start_vpci_array = .;   \
-       *(SORT(.data.vpci.*))     \
+       *(.rodata.vpci)           \
        __end_vpci_array = .;
 #else
 #define VPCI_ARRAY
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-10  3:52                   ` Chen, Jiqian
@ 2025-06-10  7:08                     ` Jan Beulich
  2025-06-10  9:03                       ` Chen, Jiqian
  2025-06-11 10:19                       ` Chen, Jiqian
  0 siblings, 2 replies; 44+ messages in thread
From: Jan Beulich @ 2025-06-10  7:08 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Anthony PERARD,
	Orzel, Michal, Julien Grall, Stefano Stabellini, Huang, Ray,
	Roger Pau Monné

On 10.06.2025 05:52, Chen, Jiqian wrote:
> On 2025/6/9 18:40, Roger Pau Monné wrote:
>> On Mon, Jun 09, 2025 at 10:18:42AM +0000, Chen, Jiqian wrote:
>>> On 2025/6/9 17:29, Roger Pau Monné wrote:
>>>> On Mon, Jun 09, 2025 at 07:50:21AM +0000, Chen, Jiqian wrote:
>>>>> On 2025/6/6 17:14, Roger Pau Monné wrote:
>>>>>> On Fri, Jun 06, 2025 at 09:05:48AM +0200, Jan Beulich wrote:
>>>>>>> On 06.06.2025 08:29, Chen, Jiqian wrote:
>>>>>>>> On 2025/6/5 20:50, Roger Pau Monné wrote:
>>>>>>>>> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
>>>>>>>>>> +  }; \
>>>>>>>>>> +  static vpci_capability_t *const finit##_entry  \
>>>>>>>>>> +               __used_section(".data.vpci") = &finit##_t
>>>>>>>>>
>>>>>>>>> IMO this should better use .rodata instead of .data. 
>>>>>>>> Is below change correct?
>>>>>>>>
>>>>>>>> +    static const vpci_capability_t *const finit##_entry  \
>>>>>>>> +        __used_section(".rodata") = &finit##_t
>>>>>>>
>>>>>>> No, specifically because ...
>>>>>>>
>>>>>>>>> Not that it matters much in practice, as we place it in .rodata anyway.  Note
>>>>>>>>> however you will have to move the placement of the VPCI_ARRAY in the
>>>>>>>>> linker script ahead of *(.rodata.*), otherwise that section match will
>>>>>>>>> consume the vPCI data.
>>>>>>>> I am sorry, how to move it ahead of *(.rodata.*) ?
>>>>>>>> Is below change correct?
>>>>>>>>
>>>>>>>> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
>>>>>>>> index 793d0e11450c..3817642135aa 100644
>>>>>>>> --- a/xen/include/xen/xen.lds.h
>>>>>>>> +++ b/xen/include/xen/xen.lds.h
>>>>>>>> @@ -188,7 +188,7 @@
>>>>>>>>  #define VPCI_ARRAY               \
>>>>>>>>         . = ALIGN(POINTER_ALIGN); \
>>>>>>>>         __start_vpci_array = .;   \
>>>>>>>> -       *(SORT(.data.vpci.*))     \
>>>>>>>> +       *(.rodata)             \
>>>>>>>
>>>>>>> ... this isn't - you'd move _all_ of .rodata into here, which definitely
>>>>>>> isn't what you want. What I understand Roger meant was a .rodata-like
>>>>>>> section, e.g. .rodata.vpci.* (much like it was .data.vpci.* before).
>>>>>>
>>>>>> Indeed, my suggestion was merely to use .rodata instead of .data, as
>>>>>> that's more accurate IMO.  I think it should be *(.rodata.vpci) (and
>>>>>> same section change for the __used_section() attribute.
>>>>>
>>>>> If I understand correctly, the next version will be:
>>>>>
>>>>> +    static const vpci_capability_t *const finit##_entry  \
>>>>> +        __used_section(".rodata.vpci") = &finit##_t
>>>>> +
>>>>>
>>>>> and
>>>>>
>>>>>  #define VPCI_ARRAY               \
>>>>>         . = ALIGN(POINTER_ALIGN); \
>>>>>         __start_vpci_array = .;   \
>>>>> -       *(SORT(.data.vpci.*))     \
>>>>> +       *(.rodata.vpci)           \
>>>>>         __end_vpci_array = .;
>>>>
>>>> Did you also move the instances of VPCI_ARRAY in the linker scripts so
>>>> it's before the catch-all *(.rodata.*)?
>>>>
>>>>>
>>>>> But, that encountered an warning when compiling.
>>>>> " {standard input}: Assembler messages:
>>>>> {standard input}:1160: Warning: setting incorrect section attributes for .rodata.vpci
>>>>> {standard input}: Assembler messages:
>>>>> {standard input}:3034: Warning: setting incorrect section attributes for .rodata.vpci
>>>>> {standard input}: Assembler messages:
>>>>> {standard input}:6686: Warning: setting incorrect section attributes for .rodata.vpci "
>>>>
>>>> What are the attributes for .rodata.vpci in the object files?  You can
>>>> get those using objdump or readelf, for example:
>>>>
>>>> $ objdump -h xen/drivers/vpci/msi.o
>>>> [...]
>>>>  17 .data.vpci.9  00000008  0000000000000000  0000000000000000  00000a50  2**3
>>>>                   CONTENTS, ALLOC, LOAD, RELOC, DATA
>>>>
>>>> It should be READONLY, otherwise you will get those messages.
>>>>
>>>>> And, during booting Xen, all value of __start_vpci_array is incorrect.
>>>>> Do I miss anything?
>>>>
>>>> I think that's likely because you haven't moved the instance of
>>>> VPCI_ARRAY in the linker script?
>>> Oh, right. Sorry, I forgot to move it.
>>> After changing this, it works now.
>>>
>>> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
>>> index bf956b6c5fc0..c88fd62f4f0d 100644
>>> --- a/xen/arch/x86/xen.lds.S
>>> +++ b/xen/arch/x86/xen.lds.S
>>> @@ -134,6 +134,7 @@ SECTIONS
>>>         BUGFRAMES
>>>
>>>         *(.rodata)
>>> +       VPCI_ARRAY
>>>         *(.rodata.*)
>>>         *(.data.rel.ro)
>>>         *(.data.rel.ro.*)
>>> @@ -148,7 +149,6 @@ SECTIONS
>>>         *(.note.gnu.build-id)
>>>         __note_gnu_build_id_end = .;
>>>  #endif
>>> -       VPCI_ARRAY
>>>    } PHDR(text)
>>
>> FWIW, I would put it ahead of *(.rodata).  Remember to also modify the
>> linker script for all the other arches, not just x86.
> 
> Whether before *(.rodata.*) or before *(.rodata), there still is the warning " Warning: setting incorrect section attributes for .rodata.vpci "
> And the objdump shows "rodata.vpci" has no "readonly" word.

Did you check what gcc emits? It may be requesting "aw" instead of the
wanted "a" in the section attributes. Since there are relocations here,
".rodata." may not be the correct prefix to use; it may instead need to
be ".data.rel.ro.".

Jan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-10  7:08                     ` Jan Beulich
@ 2025-06-10  9:03                       ` Chen, Jiqian
  2025-06-11 10:19                       ` Chen, Jiqian
  1 sibling, 0 replies; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-10  9:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Anthony PERARD,
	Orzel, Michal, Julien Grall, Stefano Stabellini, Huang, Ray,
	Roger Pau Monné, Chen, Jiqian

On 2025/6/10 15:08, Jan Beulich wrote:
> On 10.06.2025 05:52, Chen, Jiqian wrote:
>> On 2025/6/9 18:40, Roger Pau Monné wrote:
>>> On Mon, Jun 09, 2025 at 10:18:42AM +0000, Chen, Jiqian wrote:
>>>> On 2025/6/9 17:29, Roger Pau Monné wrote:
>>>>> On Mon, Jun 09, 2025 at 07:50:21AM +0000, Chen, Jiqian wrote:
>>>>>> On 2025/6/6 17:14, Roger Pau Monné wrote:
>>>>>>> On Fri, Jun 06, 2025 at 09:05:48AM +0200, Jan Beulich wrote:
>>>>>>>> On 06.06.2025 08:29, Chen, Jiqian wrote:
>>>>>>>>> On 2025/6/5 20:50, Roger Pau Monné wrote:
>>>>>>>>>> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
>>>>>>>>>>> +  }; \
>>>>>>>>>>> +  static vpci_capability_t *const finit##_entry  \
>>>>>>>>>>> +               __used_section(".data.vpci") = &finit##_t
>>>>>>>>>>
>>>>>>>>>> IMO this should better use .rodata instead of .data. 
>>>>>>>>> Is below change correct?
>>>>>>>>>
>>>>>>>>> +    static const vpci_capability_t *const finit##_entry  \
>>>>>>>>> +        __used_section(".rodata") = &finit##_t
>>>>>>>>
>>>>>>>> No, specifically because ...
>>>>>>>>
>>>>>>>>>> Not that it matters much in practice, as we place it in .rodata anyway.  Note
>>>>>>>>>> however you will have to move the placement of the VPCI_ARRAY in the
>>>>>>>>>> linker script ahead of *(.rodata.*), otherwise that section match will
>>>>>>>>>> consume the vPCI data.
>>>>>>>>> I am sorry, how to move it ahead of *(.rodata.*) ?
>>>>>>>>> Is below change correct?
>>>>>>>>>
>>>>>>>>> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
>>>>>>>>> index 793d0e11450c..3817642135aa 100644
>>>>>>>>> --- a/xen/include/xen/xen.lds.h
>>>>>>>>> +++ b/xen/include/xen/xen.lds.h
>>>>>>>>> @@ -188,7 +188,7 @@
>>>>>>>>>  #define VPCI_ARRAY               \
>>>>>>>>>         . = ALIGN(POINTER_ALIGN); \
>>>>>>>>>         __start_vpci_array = .;   \
>>>>>>>>> -       *(SORT(.data.vpci.*))     \
>>>>>>>>> +       *(.rodata)             \
>>>>>>>>
>>>>>>>> ... this isn't - you'd move _all_ of .rodata into here, which definitely
>>>>>>>> isn't what you want. What I understand Roger meant was a .rodata-like
>>>>>>>> section, e.g. .rodata.vpci.* (much like it was .data.vpci.* before).
>>>>>>>
>>>>>>> Indeed, my suggestion was merely to use .rodata instead of .data, as
>>>>>>> that's more accurate IMO.  I think it should be *(.rodata.vpci) (and
>>>>>>> same section change for the __used_section() attribute.
>>>>>>
>>>>>> If I understand correctly, the next version will be:
>>>>>>
>>>>>> +    static const vpci_capability_t *const finit##_entry  \
>>>>>> +        __used_section(".rodata.vpci") = &finit##_t
>>>>>> +
>>>>>>
>>>>>> and
>>>>>>
>>>>>>  #define VPCI_ARRAY               \
>>>>>>         . = ALIGN(POINTER_ALIGN); \
>>>>>>         __start_vpci_array = .;   \
>>>>>> -       *(SORT(.data.vpci.*))     \
>>>>>> +       *(.rodata.vpci)           \
>>>>>>         __end_vpci_array = .;
>>>>>
>>>>> Did you also move the instances of VPCI_ARRAY in the linker scripts so
>>>>> it's before the catch-all *(.rodata.*)?
>>>>>
>>>>>>
>>>>>> But, that encountered an warning when compiling.
>>>>>> " {standard input}: Assembler messages:
>>>>>> {standard input}:1160: Warning: setting incorrect section attributes for .rodata.vpci
>>>>>> {standard input}: Assembler messages:
>>>>>> {standard input}:3034: Warning: setting incorrect section attributes for .rodata.vpci
>>>>>> {standard input}: Assembler messages:
>>>>>> {standard input}:6686: Warning: setting incorrect section attributes for .rodata.vpci "
>>>>>
>>>>> What are the attributes for .rodata.vpci in the object files?  You can
>>>>> get those using objdump or readelf, for example:
>>>>>
>>>>> $ objdump -h xen/drivers/vpci/msi.o
>>>>> [...]
>>>>>  17 .data.vpci.9  00000008  0000000000000000  0000000000000000  00000a50  2**3
>>>>>                   CONTENTS, ALLOC, LOAD, RELOC, DATA
>>>>>
>>>>> It should be READONLY, otherwise you will get those messages.
>>>>>
>>>>>> And, during booting Xen, all value of __start_vpci_array is incorrect.
>>>>>> Do I miss anything?
>>>>>
>>>>> I think that's likely because you haven't moved the instance of
>>>>> VPCI_ARRAY in the linker script?
>>>> Oh, right. Sorry, I forgot to move it.
>>>> After changing this, it works now.
>>>>
>>>> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
>>>> index bf956b6c5fc0..c88fd62f4f0d 100644
>>>> --- a/xen/arch/x86/xen.lds.S
>>>> +++ b/xen/arch/x86/xen.lds.S
>>>> @@ -134,6 +134,7 @@ SECTIONS
>>>>         BUGFRAMES
>>>>
>>>>         *(.rodata)
>>>> +       VPCI_ARRAY
>>>>         *(.rodata.*)
>>>>         *(.data.rel.ro)
>>>>         *(.data.rel.ro.*)
>>>> @@ -148,7 +149,6 @@ SECTIONS
>>>>         *(.note.gnu.build-id)
>>>>         __note_gnu_build_id_end = .;
>>>>  #endif
>>>> -       VPCI_ARRAY
>>>>    } PHDR(text)
>>>
>>> FWIW, I would put it ahead of *(.rodata).  Remember to also modify the
>>> linker script for all the other arches, not just x86.
>>
>> Whether before *(.rodata.*) or before *(.rodata), there still is the warning " Warning: setting incorrect section attributes for .rodata.vpci "
>> And the objdump shows "rodata.vpci" has no "readonly" word.
> 
> Did you check what gcc emits? 
I just saw above warning.
> It may be requesting "aw" instead of the
> wanted "a" in the section attributes. Since there are relocations here,
> ".rodata." may not be the correct prefix to use; it may instead need to
> be ".data.rel.ro.".
You may right.
From "objdump -r xen/drivers/vpci/msi.o"
I can get

RELOCATION RECORDS FOR [.rodata.vpci]:
OFFSET           TYPE              VALUE
0000000000000000 R_X86_64_64       .data.rel.ro.local.init_msi_t


RELOCATION RECORDS FOR [.data.rel.ro.local.init_msi_t]:
OFFSET           TYPE              VALUE
0000000000000008 R_X86_64_64       .text.init_msi
0000000000000010 R_X86_64_64       .text.cleanup_msi

And from "objdump -D xen/drivers/vpci/msi.o"
I can get

Disassembly of section .rodata.vpci:

0000000000000000 <init_msi_entry>:
        ...

Disassembly of section .data.rel.ro.local.init_msi_t:

0000000000000000 <init_msi_t>:
   0:   05 00 00 00 00          add    $0x0,%eax
        ...

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-10  7:08                     ` Jan Beulich
  2025-06-10  9:03                       ` Chen, Jiqian
@ 2025-06-11 10:19                       ` Chen, Jiqian
  2025-06-11 10:57                         ` Roger Pau Monné
  1 sibling, 1 reply; 44+ messages in thread
From: Chen, Jiqian @ 2025-06-11 10:19 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Julien Grall, Stefano Stabellini,
	Huang, Ray, Chen, Jiqian

Hi Roger,

On 2025/6/10 15:08, Jan Beulich wrote:
> On 10.06.2025 05:52, Chen, Jiqian wrote:
>> On 2025/6/9 18:40, Roger Pau Monné wrote:
>>> On Mon, Jun 09, 2025 at 10:18:42AM +0000, Chen, Jiqian wrote:
>>>> On 2025/6/9 17:29, Roger Pau Monné wrote:
>>>>> On Mon, Jun 09, 2025 at 07:50:21AM +0000, Chen, Jiqian wrote:
>>>>>> On 2025/6/6 17:14, Roger Pau Monné wrote:
>>>>>>> On Fri, Jun 06, 2025 at 09:05:48AM +0200, Jan Beulich wrote:
>>>>>>>> On 06.06.2025 08:29, Chen, Jiqian wrote:
>>>>>>>>> On 2025/6/5 20:50, Roger Pau Monné wrote:
>>>>>>>>>> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
>>>>>>>>>>> +  }; \
>>>>>>>>>>> +  static vpci_capability_t *const finit##_entry  \
>>>>>>>>>>> +               __used_section(".data.vpci") = &finit##_t
>>>>>>>>>>
>>>>>>>>>> IMO this should better use .rodata instead of .data. 
>>>>>>>>> Is below change correct?
>>>>>>>>>
>>>>>>>>> +    static const vpci_capability_t *const finit##_entry  \
>>>>>>>>> +        __used_section(".rodata") = &finit##_t
>>>>>>>>
>>>>>>>> No, specifically because ...
>>>>>>>>
>>>>>>>>>> Not that it matters much in practice, as we place it in .rodata anyway.  Note
>>>>>>>>>> however you will have to move the placement of the VPCI_ARRAY in the
>>>>>>>>>> linker script ahead of *(.rodata.*), otherwise that section match will
>>>>>>>>>> consume the vPCI data.
>>>>>>>>> I am sorry, how to move it ahead of *(.rodata.*) ?
>>>>>>>>> Is below change correct?
>>>>>>>>>
>>>>>>>>> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
>>>>>>>>> index 793d0e11450c..3817642135aa 100644
>>>>>>>>> --- a/xen/include/xen/xen.lds.h
>>>>>>>>> +++ b/xen/include/xen/xen.lds.h
>>>>>>>>> @@ -188,7 +188,7 @@
>>>>>>>>>  #define VPCI_ARRAY               \
>>>>>>>>>         . = ALIGN(POINTER_ALIGN); \
>>>>>>>>>         __start_vpci_array = .;   \
>>>>>>>>> -       *(SORT(.data.vpci.*))     \
>>>>>>>>> +       *(.rodata)             \
>>>>>>>>
>>>>>>>> ... this isn't - you'd move _all_ of .rodata into here, which definitely
>>>>>>>> isn't what you want. What I understand Roger meant was a .rodata-like
>>>>>>>> section, e.g. .rodata.vpci.* (much like it was .data.vpci.* before).
>>>>>>>
>>>>>>> Indeed, my suggestion was merely to use .rodata instead of .data, as
>>>>>>> that's more accurate IMO.  I think it should be *(.rodata.vpci) (and
>>>>>>> same section change for the __used_section() attribute.
>>>>>>
>>>>>> If I understand correctly, the next version will be:
>>>>>>
>>>>>> +    static const vpci_capability_t *const finit##_entry  \
>>>>>> +        __used_section(".rodata.vpci") = &finit##_t
>>>>>> +
>>>>>>
>>>>>> and
>>>>>>
>>>>>>  #define VPCI_ARRAY               \
>>>>>>         . = ALIGN(POINTER_ALIGN); \
>>>>>>         __start_vpci_array = .;   \
>>>>>> -       *(SORT(.data.vpci.*))     \
>>>>>> +       *(.rodata.vpci)           \
>>>>>>         __end_vpci_array = .;
>>>>>
>>>>> Did you also move the instances of VPCI_ARRAY in the linker scripts so
>>>>> it's before the catch-all *(.rodata.*)?
>>>>>
>>>>>>
>>>>>> But, that encountered an warning when compiling.
>>>>>> " {standard input}: Assembler messages:
>>>>>> {standard input}:1160: Warning: setting incorrect section attributes for .rodata.vpci
>>>>>> {standard input}: Assembler messages:
>>>>>> {standard input}:3034: Warning: setting incorrect section attributes for .rodata.vpci
>>>>>> {standard input}: Assembler messages:
>>>>>> {standard input}:6686: Warning: setting incorrect section attributes for .rodata.vpci "
>>>>>
>>>>> What are the attributes for .rodata.vpci in the object files?  You can
>>>>> get those using objdump or readelf, for example:
>>>>>
>>>>> $ objdump -h xen/drivers/vpci/msi.o
>>>>> [...]
>>>>>  17 .data.vpci.9  00000008  0000000000000000  0000000000000000  00000a50  2**3
>>>>>                   CONTENTS, ALLOC, LOAD, RELOC, DATA
>>>>>
>>>>> It should be READONLY, otherwise you will get those messages.
>>>>>
>>>>>> And, during booting Xen, all value of __start_vpci_array is incorrect.
>>>>>> Do I miss anything?
>>>>>
>>>>> I think that's likely because you haven't moved the instance of
>>>>> VPCI_ARRAY in the linker script?
>>>> Oh, right. Sorry, I forgot to move it.
>>>> After changing this, it works now.
>>>>
>>>> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
>>>> index bf956b6c5fc0..c88fd62f4f0d 100644
>>>> --- a/xen/arch/x86/xen.lds.S
>>>> +++ b/xen/arch/x86/xen.lds.S
>>>> @@ -134,6 +134,7 @@ SECTIONS
>>>>         BUGFRAMES
>>>>
>>>>         *(.rodata)
>>>> +       VPCI_ARRAY
>>>>         *(.rodata.*)
>>>>         *(.data.rel.ro)
>>>>         *(.data.rel.ro.*)
>>>> @@ -148,7 +149,6 @@ SECTIONS
>>>>         *(.note.gnu.build-id)
>>>>         __note_gnu_build_id_end = .;
>>>>  #endif
>>>> -       VPCI_ARRAY
>>>>    } PHDR(text)
>>>
>>> FWIW, I would put it ahead of *(.rodata).  Remember to also modify the
>>> linker script for all the other arches, not just x86.
>>
>> Whether before *(.rodata.*) or before *(.rodata), there still is the warning " Warning: setting incorrect section attributes for .rodata.vpci "
>> And the objdump shows "rodata.vpci" has no "readonly" word.
> 
> Did you check what gcc emits? It may be requesting "aw" instead of the
> wanted "a" in the section attributes. Since there are relocations here,
> ".rodata." may not be the correct prefix to use; it may instead need to
> be ".data.rel.ro.".

What's your opinion about changing to ".data.rel.ro.vpci" ?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-06-11 10:19                       ` Chen, Jiqian
@ 2025-06-11 10:57                         ` Roger Pau Monné
  0 siblings, 0 replies; 44+ messages in thread
From: Roger Pau Monné @ 2025-06-11 10:57 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Julien Grall, Stefano Stabellini,
	Huang, Ray

On Wed, Jun 11, 2025 at 10:19:34AM +0000, Chen, Jiqian wrote:
> Hi Roger,
> 
> On 2025/6/10 15:08, Jan Beulich wrote:
> > On 10.06.2025 05:52, Chen, Jiqian wrote:
> >> On 2025/6/9 18:40, Roger Pau Monné wrote:
> >>> On Mon, Jun 09, 2025 at 10:18:42AM +0000, Chen, Jiqian wrote:
> >>>> On 2025/6/9 17:29, Roger Pau Monné wrote:
> >>>>> On Mon, Jun 09, 2025 at 07:50:21AM +0000, Chen, Jiqian wrote:
> >>>>>> On 2025/6/6 17:14, Roger Pau Monné wrote:
> >>>>>>> On Fri, Jun 06, 2025 at 09:05:48AM +0200, Jan Beulich wrote:
> >>>>>>>> On 06.06.2025 08:29, Chen, Jiqian wrote:
> >>>>>>>>> On 2025/6/5 20:50, Roger Pau Monné wrote:
> >>>>>>>>>> On Mon, May 26, 2025 at 05:45:53PM +0800, Jiqian Chen wrote: 
> >>>>>>>>>>> +  }; \
> >>>>>>>>>>> +  static vpci_capability_t *const finit##_entry  \
> >>>>>>>>>>> +               __used_section(".data.vpci") = &finit##_t
> >>>>>>>>>>
> >>>>>>>>>> IMO this should better use .rodata instead of .data. 
> >>>>>>>>> Is below change correct?
> >>>>>>>>>
> >>>>>>>>> +    static const vpci_capability_t *const finit##_entry  \
> >>>>>>>>> +        __used_section(".rodata") = &finit##_t
> >>>>>>>>
> >>>>>>>> No, specifically because ...
> >>>>>>>>
> >>>>>>>>>> Not that it matters much in practice, as we place it in .rodata anyway.  Note
> >>>>>>>>>> however you will have to move the placement of the VPCI_ARRAY in the
> >>>>>>>>>> linker script ahead of *(.rodata.*), otherwise that section match will
> >>>>>>>>>> consume the vPCI data.
> >>>>>>>>> I am sorry, how to move it ahead of *(.rodata.*) ?
> >>>>>>>>> Is below change correct?
> >>>>>>>>>
> >>>>>>>>> diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
> >>>>>>>>> index 793d0e11450c..3817642135aa 100644
> >>>>>>>>> --- a/xen/include/xen/xen.lds.h
> >>>>>>>>> +++ b/xen/include/xen/xen.lds.h
> >>>>>>>>> @@ -188,7 +188,7 @@
> >>>>>>>>>  #define VPCI_ARRAY               \
> >>>>>>>>>         . = ALIGN(POINTER_ALIGN); \
> >>>>>>>>>         __start_vpci_array = .;   \
> >>>>>>>>> -       *(SORT(.data.vpci.*))     \
> >>>>>>>>> +       *(.rodata)             \
> >>>>>>>>
> >>>>>>>> ... this isn't - you'd move _all_ of .rodata into here, which definitely
> >>>>>>>> isn't what you want. What I understand Roger meant was a .rodata-like
> >>>>>>>> section, e.g. .rodata.vpci.* (much like it was .data.vpci.* before).
> >>>>>>>
> >>>>>>> Indeed, my suggestion was merely to use .rodata instead of .data, as
> >>>>>>> that's more accurate IMO.  I think it should be *(.rodata.vpci) (and
> >>>>>>> same section change for the __used_section() attribute.
> >>>>>>
> >>>>>> If I understand correctly, the next version will be:
> >>>>>>
> >>>>>> +    static const vpci_capability_t *const finit##_entry  \
> >>>>>> +        __used_section(".rodata.vpci") = &finit##_t
> >>>>>> +
> >>>>>>
> >>>>>> and
> >>>>>>
> >>>>>>  #define VPCI_ARRAY               \
> >>>>>>         . = ALIGN(POINTER_ALIGN); \
> >>>>>>         __start_vpci_array = .;   \
> >>>>>> -       *(SORT(.data.vpci.*))     \
> >>>>>> +       *(.rodata.vpci)           \
> >>>>>>         __end_vpci_array = .;
> >>>>>
> >>>>> Did you also move the instances of VPCI_ARRAY in the linker scripts so
> >>>>> it's before the catch-all *(.rodata.*)?
> >>>>>
> >>>>>>
> >>>>>> But, that encountered an warning when compiling.
> >>>>>> " {standard input}: Assembler messages:
> >>>>>> {standard input}:1160: Warning: setting incorrect section attributes for .rodata.vpci
> >>>>>> {standard input}: Assembler messages:
> >>>>>> {standard input}:3034: Warning: setting incorrect section attributes for .rodata.vpci
> >>>>>> {standard input}: Assembler messages:
> >>>>>> {standard input}:6686: Warning: setting incorrect section attributes for .rodata.vpci "
> >>>>>
> >>>>> What are the attributes for .rodata.vpci in the object files?  You can
> >>>>> get those using objdump or readelf, for example:
> >>>>>
> >>>>> $ objdump -h xen/drivers/vpci/msi.o
> >>>>> [...]
> >>>>>  17 .data.vpci.9  00000008  0000000000000000  0000000000000000  00000a50  2**3
> >>>>>                   CONTENTS, ALLOC, LOAD, RELOC, DATA
> >>>>>
> >>>>> It should be READONLY, otherwise you will get those messages.
> >>>>>
> >>>>>> And, during booting Xen, all value of __start_vpci_array is incorrect.
> >>>>>> Do I miss anything?
> >>>>>
> >>>>> I think that's likely because you haven't moved the instance of
> >>>>> VPCI_ARRAY in the linker script?
> >>>> Oh, right. Sorry, I forgot to move it.
> >>>> After changing this, it works now.
> >>>>
> >>>> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> >>>> index bf956b6c5fc0..c88fd62f4f0d 100644
> >>>> --- a/xen/arch/x86/xen.lds.S
> >>>> +++ b/xen/arch/x86/xen.lds.S
> >>>> @@ -134,6 +134,7 @@ SECTIONS
> >>>>         BUGFRAMES
> >>>>
> >>>>         *(.rodata)
> >>>> +       VPCI_ARRAY
> >>>>         *(.rodata.*)
> >>>>         *(.data.rel.ro)
> >>>>         *(.data.rel.ro.*)
> >>>> @@ -148,7 +149,6 @@ SECTIONS
> >>>>         *(.note.gnu.build-id)
> >>>>         __note_gnu_build_id_end = .;
> >>>>  #endif
> >>>> -       VPCI_ARRAY
> >>>>    } PHDR(text)
> >>>
> >>> FWIW, I would put it ahead of *(.rodata).  Remember to also modify the
> >>> linker script for all the other arches, not just x86.
> >>
> >> Whether before *(.rodata.*) or before *(.rodata), there still is the warning " Warning: setting incorrect section attributes for .rodata.vpci "
> >> And the objdump shows "rodata.vpci" has no "readonly" word.
> > 
> > Did you check what gcc emits? It may be requesting "aw" instead of the
> > wanted "a" in the section attributes. Since there are relocations here,
> > ".rodata." may not be the correct prefix to use; it may instead need to
> > be ".data.rel.ro.".
> 
> What's your opinion about changing to ".data.rel.ro.vpci" ?

Yeah, it's longer but it's the correct section prefix to use.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2025-06-11 10:58 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-26  9:45 [PATCH v5 00/10] Support hiding capability when its initialization fails Jiqian Chen
2025-05-26  9:45 ` [PATCH v5 01/10] vpci/header: Move emulating cap list logic into new function Jiqian Chen
2025-05-26  9:45 ` [PATCH v5 02/10] vpci/header: Emulate legacy capability list for dom0 Jiqian Chen
2025-06-05 10:54   ` Roger Pau Monné
2025-05-26  9:45 ` [PATCH v5 03/10] vpci/header: Emulate extended " Jiqian Chen
2025-06-05 11:24   ` Roger Pau Monné
2025-06-06  4:05     ` Chen, Jiqian
2025-05-26  9:45 ` [PATCH v5 04/10] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
2025-06-05 12:50   ` Roger Pau Monné
2025-06-05 12:59     ` Jan Beulich
2025-06-06  6:29     ` Chen, Jiqian
2025-06-06  7:05       ` Jan Beulich
2025-06-06  9:14         ` Roger Pau Monné
2025-06-09  7:50           ` Chen, Jiqian
2025-06-09  9:29             ` Roger Pau Monné
2025-06-09 10:18               ` Chen, Jiqian
2025-06-09 10:40                 ` Roger Pau Monné
2025-06-10  3:52                   ` Chen, Jiqian
2025-06-10  7:08                     ` Jan Beulich
2025-06-10  9:03                       ` Chen, Jiqian
2025-06-11 10:19                       ` Chen, Jiqian
2025-06-11 10:57                         ` Roger Pau Monné
2025-05-26  9:45 ` [PATCH v5 05/10] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
2025-06-05 13:35   ` Roger Pau Monné
2025-06-06  7:03     ` Chen, Jiqian
2025-06-06  9:16       ` Roger Pau Monné
2025-05-26  9:45 ` [PATCH v5 06/10] vpci: Hide extended " Jiqian Chen
2025-06-05 14:47   ` Roger Pau Monné
2025-06-06  8:30     ` Chen, Jiqian
2025-06-06  9:18       ` Roger Pau Monné
2025-05-26  9:45 ` [PATCH v5 07/10] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
2025-06-05 15:00   ` Roger Pau Monné
2025-05-26  9:45 ` [PATCH v5 08/10] vpci/rebar: Free Rebar resources when init_rebar() fails Jiqian Chen
2025-06-05 15:14   ` Roger Pau Monné
2025-06-06  8:32     ` Chen, Jiqian
2025-06-06  9:20       ` Roger Pau Monné
2025-05-26  9:45 ` [PATCH v5 09/10] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
2025-06-05 15:19   ` Roger Pau Monné
2025-06-06  8:38     ` Chen, Jiqian
2025-06-06  9:21       ` Roger Pau Monné
2025-05-26  9:45 ` [PATCH v5 10/10] vpci/msix: Free MSIX resources when init_msix() fails Jiqian Chen
2025-06-05 15:34   ` Roger Pau Monné
2025-06-06  8:44     ` Chen, Jiqian
2025-06-06  9:23       ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.