All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/10] Support hiding capability when its initialization fails
@ 2025-05-09  9:05 Jiqian Chen
  2025-05-09  9:05 ` [PATCH v4 01/10] vpci/header: Move emulating cap list logic into new function Jiqian Chen
                   ` (9 more replies)
  0 siblings, 10 replies; 45+ messages in thread
From: Jiqian Chen @ 2025-05-09  9:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
	Stefano Stabellini

Hi,

This series is to
emulate legacy and extended capability list for dom0, including patch #1, #2, #3.
hide legacy and extended capability when its initialization fails, including patch #4, #5, #6.
remove all related registers and other resources when initializing capability fails, including patch #7, #8, #9, #10.

Best regards,
Jiqian Chen.
---
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
Jiqian Chen (10):
  vpci/header: Move emulating cap list logic into new function
  vpci/header: Emulate legacy capability list for dom0
  vpci/header: Emulate extended capability list for dom0
  vpci: Refactor REGISTER_VPCI_INIT
  vpci: Hide legacy capability when it fails to initialize
  vpci: Hide extended capability when it fails to initialize
  vpci: Refactor vpci_remove_register to remove matched registers
  vpci/rebar: Remove registers when init_rebar() fails
  vpci/msi: Free MSI resources when init_msi() fails
  vpci/msix: Add function to clean MSIX resources

 tools/tests/vpci/main.c    |   4 +-
 xen/drivers/vpci/header.c  | 187 +++++++++++++--------
 xen/drivers/vpci/msi.c     |  22 ++-
 xen/drivers/vpci/msix.c    |  29 +++-
 xen/drivers/vpci/rebar.c   |  35 ++--
 xen/drivers/vpci/vpci.c    | 325 ++++++++++++++++++++++++++++++++-----
 xen/include/xen/pci_regs.h |   5 +-
 xen/include/xen/vpci.h     |  38 +++--
 xen/include/xen/xen.lds.h  |   2 +-
 9 files changed, 505 insertions(+), 142 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v4 01/10] vpci/header: Move emulating cap list logic into new function
  2025-05-09  9:05 [PATCH v4 00/10] Support hiding capability when its initialization fails Jiqian Chen
@ 2025-05-09  9:05 ` Jiqian Chen
  2025-05-09  9:05 ` [PATCH v4 02/10] vpci/header: Emulate legacy capability list for dom0 Jiqian Chen
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Jiqian Chen @ 2025-05-09  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

No functional changes.
Follow-on changes will benifit from this.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v3->v4 changes:
* Add Acked-by of Roger.

v2->v3 changes:
new patch.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c | 138 ++++++++++++++++++++------------------
 1 file changed, 73 insertions(+), 65 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index ef6c965c081c..3e9b44454b43 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -745,6 +745,75 @@ static int bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
     return !bar->mem ? -ENOMEM : 0;
 }
 
+static int vpci_init_capability_list(struct pci_dev *pdev)
+{
+    int rc;
+    bool mask_cap_list = false;
+
+    if ( !is_hardware_domain(pdev->domain) &&
+         pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
+    {
+        /* Only expose capabilities to the guest that vPCI can handle. */
+        unsigned int next, ttl = 48;
+        static const unsigned int supported_caps[] = {
+            PCI_CAP_ID_MSI,
+            PCI_CAP_ID_MSIX,
+        };
+
+        next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
+                                     supported_caps,
+                                     ARRAY_SIZE(supported_caps), &ttl);
+
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                               PCI_CAPABILITY_LIST, 1,
+                               (void *)(uintptr_t)next);
+        if ( rc )
+            return rc;
+
+        next &= ~3;
+
+        if ( !next )
+            /*
+             * If we don't have any supported capabilities to expose to the
+             * guest, mask the PCI_STATUS_CAP_LIST bit in the status
+             * register.
+             */
+            mask_cap_list = true;
+
+        while ( next && ttl )
+        {
+            unsigned int pos = next;
+
+            next = pci_find_next_cap_ttl(pdev->sbdf,
+                                         pos + PCI_CAP_LIST_NEXT,
+                                         supported_caps,
+                                         ARRAY_SIZE(supported_caps), &ttl);
+
+            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
+                                   pos + PCI_CAP_LIST_ID, 1, NULL);
+            if ( rc )
+                return rc;
+
+            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                   pos + PCI_CAP_LIST_NEXT, 1,
+                                   (void *)(uintptr_t)next);
+            if ( rc )
+                return rc;
+
+            next &= ~3;
+        }
+    }
+
+    /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
+    return vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
+                                  PCI_STATUS, 2, NULL,
+                                  PCI_STATUS_RO_MASK &
+                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
+                                  PCI_STATUS_RW1C_MASK,
+                                  mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
+                                  PCI_STATUS_RSVDZ_MASK);
+}
+
 static int cf_check init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -753,7 +822,6 @@ static int cf_check init_header(struct pci_dev *pdev)
     struct vpci_header *header = &pdev->vpci->header;
     struct vpci_bar *bars = header->bars;
     int rc;
-    bool mask_cap_list = false;
     bool is_hwdom = is_hardware_domain(pdev->domain);
 
     ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
@@ -794,61 +862,12 @@ static int cf_check init_header(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
+    rc = vpci_init_capability_list(pdev);
+    if ( rc )
+        return rc;
+
     if ( !is_hwdom )
     {
-        if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
-        {
-            /* Only expose capabilities to the guest that vPCI can handle. */
-            unsigned int next, ttl = 48;
-            static const unsigned int supported_caps[] = {
-                PCI_CAP_ID_MSI,
-                PCI_CAP_ID_MSIX,
-            };
-
-            next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
-                                         supported_caps,
-                                         ARRAY_SIZE(supported_caps), &ttl);
-
-            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
-                                   PCI_CAPABILITY_LIST, 1,
-                                   (void *)(uintptr_t)next);
-            if ( rc )
-                return rc;
-
-            next &= ~3;
-
-            if ( !next )
-                /*
-                 * If we don't have any supported capabilities to expose to the
-                 * guest, mask the PCI_STATUS_CAP_LIST bit in the status
-                 * register.
-                 */
-                mask_cap_list = true;
-
-            while ( next && ttl )
-            {
-                unsigned int pos = next;
-
-                next = pci_find_next_cap_ttl(pdev->sbdf,
-                                             pos + PCI_CAP_LIST_NEXT,
-                                             supported_caps,
-                                             ARRAY_SIZE(supported_caps), &ttl);
-
-                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
-                                       pos + PCI_CAP_LIST_ID, 1, NULL);
-                if ( rc )
-                    return rc;
-
-                rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
-                                       pos + PCI_CAP_LIST_NEXT, 1,
-                                       (void *)(uintptr_t)next);
-                if ( rc )
-                    return rc;
-
-                next &= ~3;
-            }
-        }
-
         /* Extended capabilities read as zero, write ignore */
         rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL, 0x100, 4,
                                (void *)0);
@@ -856,17 +875,6 @@ static int cf_check init_header(struct pci_dev *pdev)
             return rc;
     }
 
-    /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
-    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
-                                PCI_STATUS, 2, NULL,
-                                PCI_STATUS_RO_MASK &
-                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
-                                PCI_STATUS_RW1C_MASK,
-                                mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
-                                PCI_STATUS_RSVDZ_MASK);
-    if ( rc )
-        return rc;
-
     if ( pdev->ignore_bars )
         return 0;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 02/10] vpci/header: Emulate legacy capability list for dom0
  2025-05-09  9:05 [PATCH v4 00/10] Support hiding capability when its initialization fails Jiqian Chen
  2025-05-09  9:05 ` [PATCH v4 01/10] vpci/header: Move emulating cap list logic into new function Jiqian Chen
@ 2025-05-09  9:05 ` Jiqian Chen
  2025-05-15 16:29   ` Roger Pau Monné
  2025-05-09  9:05 ` [PATCH v4 03/10] vpci/header: Emulate extended " Jiqian Chen
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Jiqian Chen @ 2025-05-09  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

Current logic of emulating legacy capability list is only for domU.
So, expand it to emulate for dom0 too. Then it will be easy to hide
a capability whose initialization fails in a function.

And restrict adding PCI_STATUS register only for domU since dom0
has no limitation to access that register.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v3->v4 changes:
* Also pass supported_caps to pci_find_next_cap_ttl() for dom0 since the n is zero when dom0,
  and add a comment to explain it.
* Restrict adding PCI_STATUS register only for domU since dom0 has no limitation to access that register.
* For dom0 register handler, set vpci_hw_write8 to it instead of NULL.

v2->v3 changes:
* Not to add handler of PCI_CAP_LIST_ID when domain is dom0.

v1->v2 changes:
new patch.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c | 53 ++++++++++++++++++++++++---------------
 xen/drivers/vpci/vpci.c   |  6 +++++
 xen/include/xen/vpci.h    |  2 ++
 3 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 3e9b44454b43..a06c518c506c 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -749,9 +749,9 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
 {
     int rc;
     bool mask_cap_list = false;
+    bool is_hwdom = is_hardware_domain(pdev->domain);
 
-    if ( !is_hardware_domain(pdev->domain) &&
-         pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
+    if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
     {
         /* Only expose capabilities to the guest that vPCI can handle. */
         unsigned int next, ttl = 48;
@@ -759,12 +759,18 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
             PCI_CAP_ID_MSI,
             PCI_CAP_ID_MSIX,
         };
+        /*
+         * For dom0, we should expose all capabilities instead of a fixed
+         * capabilities array, so setting n to 0 here is to get the next
+         * capability position directly in pci_find_next_cap_ttl.
+         */
+        const unsigned int n = is_hwdom ? 0 : ARRAY_SIZE(supported_caps);
 
         next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
-                                     supported_caps,
-                                     ARRAY_SIZE(supported_caps), &ttl);
+                                     supported_caps, n, &ttl);
 
-        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+        rc = vpci_add_register(pdev->vpci, vpci_read_val,
+                               is_hwdom ? vpci_hw_write8 : NULL,
                                PCI_CAPABILITY_LIST, 1,
                                (void *)(uintptr_t)next);
         if ( rc )
@@ -772,7 +778,7 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
 
         next &= ~3;
 
-        if ( !next )
+        if ( !next && !is_hwdom )
             /*
              * If we don't have any supported capabilities to expose to the
              * guest, mask the PCI_STATUS_CAP_LIST bit in the status
@@ -786,15 +792,18 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
 
             next = pci_find_next_cap_ttl(pdev->sbdf,
                                          pos + PCI_CAP_LIST_NEXT,
-                                         supported_caps,
-                                         ARRAY_SIZE(supported_caps), &ttl);
+                                         supported_caps, n, &ttl);
 
-            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
-                                   pos + PCI_CAP_LIST_ID, 1, NULL);
-            if ( rc )
-                return rc;
+            if ( !is_hwdom )
+            {
+                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
+                                       pos + PCI_CAP_LIST_ID, 1, NULL);
+                if ( rc )
+                    return rc;
+            }
 
-            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+            rc = vpci_add_register(pdev->vpci, vpci_read_val,
+                                   is_hwdom ? vpci_hw_write8 : NULL,
                                    pos + PCI_CAP_LIST_NEXT, 1,
                                    (void *)(uintptr_t)next);
             if ( rc )
@@ -805,13 +814,17 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
     }
 
     /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
-    return vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
-                                  PCI_STATUS, 2, NULL,
-                                  PCI_STATUS_RO_MASK &
-                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
-                                  PCI_STATUS_RW1C_MASK,
-                                  mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
-                                  PCI_STATUS_RSVDZ_MASK);
+    return is_hwdom ? 0 : vpci_add_register_mask(pdev->vpci, vpci_hw_read16,
+                                                 vpci_hw_write16, PCI_STATUS,
+                                                 2, NULL,
+                                                 PCI_STATUS_RO_MASK &
+                                                    ~(mask_cap_list ?
+                                                        PCI_STATUS_CAP_LIST :
+                                                        0),
+                                                 PCI_STATUS_RW1C_MASK,
+                                                 mask_cap_list ?
+                                                    PCI_STATUS_CAP_LIST : 0,
+                                                 PCI_STATUS_RSVDZ_MASK);
 }
 
 static int cf_check init_header(struct pci_dev *pdev)
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..cf3326a966d0 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -226,6 +226,12 @@ uint32_t cf_check vpci_hw_read32(
     return pci_conf_read32(pdev->sbdf, reg);
 }
 
+void cf_check vpci_hw_write8(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
+{
+    pci_conf_write8(pdev->sbdf, reg, val);
+}
+
 void cf_check vpci_hw_write16(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 475981cb8155..fc8d5b470b0b 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -76,6 +76,8 @@ uint32_t cf_check vpci_hw_read16(
     const struct pci_dev *pdev, unsigned int reg, void *data);
 uint32_t cf_check vpci_hw_read32(
     const struct pci_dev *pdev, unsigned int reg, void *data);
+void cf_check vpci_hw_write8(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 void cf_check vpci_hw_write16(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-09  9:05 [PATCH v4 00/10] Support hiding capability when its initialization fails Jiqian Chen
  2025-05-09  9:05 ` [PATCH v4 01/10] vpci/header: Move emulating cap list logic into new function Jiqian Chen
  2025-05-09  9:05 ` [PATCH v4 02/10] vpci/header: Emulate legacy capability list for dom0 Jiqian Chen
@ 2025-05-09  9:05 ` Jiqian Chen
  2025-05-18 14:20   ` Jan Beulich
  2025-05-09  9:05 ` [PATCH v4 04/10] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Jiqian Chen @ 2025-05-09  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

Add a new function to emulate extended capability list for dom0,
and call it in init_header(). So that it will be easy to hide a
extended capability whose initialization fails.

As for the extended capability list of domU, just move the logic
into above function and keep hiding it for domU.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v3->v4 changes:
* Add check "if ( !header )   return 0;" to avoid adding handler for
  device that has no extended capabilities.

v2->v3 changes:
* In vpci_init_ext_capability_list(), when domain is domU, directly return after adding a handler(hiding all extended capability for domU).
* In vpci_init_ext_capability_list(), change condition to be "while ( pos >= 0x100U && ttl-- )" instead of "while ( pos && ttl-- )".
* Add new function vpci_hw_write32, and pass it to extended capability handler for dom0.

v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c | 39 +++++++++++++++++++++++++++++++--------
 xen/drivers/vpci/vpci.c   |  6 ++++++
 xen/include/xen/vpci.h    |  2 ++
 3 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index a06c518c506c..2915c801adeb 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
                                                  PCI_STATUS_RSVDZ_MASK);
 }
 
+static int vpci_init_ext_capability_list(struct pci_dev *pdev)
+{
+    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
+
+    if ( !is_hardware_domain(pdev->domain) )
+        /* Extended capabilities read as zero, write ignore for guest */
+        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                 pos, 4, (void *)0);
+
+    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
+    {
+        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
+        int rc;
+
+        if ( !header )
+            return 0;
+
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
+                               pos, 4, (void *)(uintptr_t)header);
+        if ( rc )
+            return rc;
+
+        pos = PCI_EXT_CAP_NEXT(header);
+    }
+
+    return 0;
+}
+
 static int cf_check init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -879,14 +907,9 @@ static int cf_check init_header(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
-    if ( !is_hwdom )
-    {
-        /* Extended capabilities read as zero, write ignore */
-        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL, 0x100, 4,
-                               (void *)0);
-        if ( rc )
-            return rc;
-    }
+    rc = vpci_init_ext_capability_list(pdev);
+    if ( rc )
+        return rc;
 
     if ( pdev->ignore_bars )
         return 0;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index cf3326a966d0..2022b61ea7b6 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -238,6 +238,12 @@ void cf_check vpci_hw_write16(
     pci_conf_write16(pdev->sbdf, reg, val);
 }
 
+void cf_check vpci_hw_write32(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
+{
+    pci_conf_write32(pdev->sbdf, reg, val);
+}
+
 int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
                            vpci_write_t *write_handler, unsigned int offset,
                            unsigned int size, void *data, uint32_t ro_mask,
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index fc8d5b470b0b..61d16cc8b897 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -80,6 +80,8 @@ void cf_check vpci_hw_write8(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 void cf_check vpci_hw_write16(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
+void cf_check vpci_hw_write32(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 
 /*
  * Check for pending vPCI operations on this vcpu. Returns true if the vcpu
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-05-09  9:05 [PATCH v4 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (2 preceding siblings ...)
  2025-05-09  9:05 ` [PATCH v4 03/10] vpci/header: Emulate extended " Jiqian Chen
@ 2025-05-09  9:05 ` Jiqian Chen
  2025-05-18 14:34   ` Jan Beulich
  2025-05-09  9:05 ` [PATCH v4 05/10] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Jiqian Chen @ 2025-05-09  9:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall,
	Stefano Stabellini

Refactor REGISTER_VPCI_INIT to contain more capability specific
information, this is benefit for follow-on changes to hide capability
when initialization fails.

What's more, change the definition of init_header() since it is
not a capability and it is needed for all devices' PCI config space.

After refactor, the "priority" of initializing capabilities isn't
needed anymore, so delete its related codes.

Note:
Call vpci_make_msix_hole() in the end of init_msix() since the change
of sequence of init_header() and init_msix().

The cleanup hook is also added in this change, even if it's still
unused. Further changes will make use of it.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
v3->v4 changes
* Delete the useless trailing dot of section ".data.vpci".
* Add description about priority since this patch removes the initializing priority of capabilities and priority is not needed anymore.
* Change the hook name from fini to cleanup.
* Change the name x and y to be finit and fclean.
* Remove the unnecessary check "!capability->init"

v2->v3 changes:
* This is separated from patch "vpci: Hide capability when it fails to initialize" of v2.
* Delete __maybe_unused attribute of "out" in function vpci_assign_devic().
* Rename REGISTER_VPCI_EXTEND_CAP to REGISTER_VPCI_EXTENDED_CAP.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c |  3 +--
 xen/drivers/vpci/msi.c    |  2 +-
 xen/drivers/vpci/msix.c   |  8 +++++--
 xen/drivers/vpci/rebar.c  |  2 +-
 xen/drivers/vpci/vpci.c   | 47 ++++++++++++++++++++++++++++++---------
 xen/include/xen/vpci.h    | 30 ++++++++++++++++++-------
 xen/include/xen/xen.lds.h |  2 +-
 7 files changed, 69 insertions(+), 25 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 2915c801adeb..9373d1b8be0d 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -855,7 +855,7 @@ static int vpci_init_ext_capability_list(struct pci_dev *pdev)
     return 0;
 }
 
-static int cf_check init_header(struct pci_dev *pdev)
+int vpci_init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
     uint64_t addr, size;
@@ -1051,7 +1051,6 @@ static int cf_check init_header(struct pci_dev *pdev)
     pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
     return rc;
 }
-REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 66e5a8a116be..ea7dc0c060ea 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi, NULL);
 
 void vpci_dump_msi(void)
 {
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 74211301ba10..f8ce89b8b32f 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -703,9 +703,13 @@ static int cf_check init_msix(struct pci_dev *pdev)
     pdev->vpci->msix = msix;
     list_add(&msix->next, &d->arch.hvm.msix_tables);
 
-    return 0;
+    spin_lock(&pdev->vpci->lock);
+    rc = vpci_make_msix_hole(pdev);
+    spin_unlock(&pdev->vpci->lock);
+
+    return rc;
 }
-REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
+REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSIX, init_msix, NULL);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
index 793937449af7..026f8f7972d9 100644
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_EXTENDED_CAP(PCI_EXT_CAP_ID_REBAR, init_rebar, NULL);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 2022b61ea7b6..f03e1a8eebc0 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -36,8 +36,8 @@ struct vpci_register {
 };
 
 #ifdef __XEN__
-extern vpci_register_init_t *const __start_vpci_array[];
-extern vpci_register_init_t *const __end_vpci_array[];
+extern vpci_capability_t *const __start_vpci_array[];
+extern vpci_capability_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
@@ -83,6 +83,35 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
 
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
+static int vpci_init_capabilities(struct pci_dev *pdev)
+{
+    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        const vpci_capability_t *capability = __start_vpci_array[i];
+        const unsigned int cap = capability->id;
+        const bool is_ext = capability->is_ext;
+        unsigned int pos;
+        int rc;
+
+        if ( !is_hardware_domain(pdev->domain) && is_ext )
+            continue;
+
+        if ( !is_ext )
+            pos = pci_find_cap_offset(pdev->sbdf, cap);
+        else
+            pos = pci_find_ext_capability(pdev->sbdf, cap);
+
+        if ( !pos )
+            continue;
+
+        rc = capability->init(pdev);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 void vpci_deassign_device(struct pci_dev *pdev)
 {
     unsigned int i;
@@ -128,7 +157,6 @@ void vpci_deassign_device(struct pci_dev *pdev)
 
 int vpci_assign_device(struct pci_dev *pdev)
 {
-    unsigned int i;
     const unsigned long *ro_map;
     int rc = 0;
 
@@ -159,14 +187,13 @@ int vpci_assign_device(struct pci_dev *pdev)
         goto out;
 #endif
 
-    for ( i = 0; i < NUM_VPCI_INIT; i++ )
-    {
-        rc = __start_vpci_array[i](pdev);
-        if ( rc )
-            break;
-    }
+    rc = vpci_init_header(pdev);
+    if ( rc )
+        goto out;
+
+    rc = vpci_init_capabilities(pdev);
 
- out: __maybe_unused;
+ out:
     if ( rc )
         vpci_deassign_device(pdev);
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 61d16cc8b897..7d4274e178ee 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -13,11 +13,12 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
 typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
                           uint32_t val, void *data);
 
-typedef int vpci_register_init_t(struct pci_dev *dev);
-
-#define VPCI_PRIORITY_HIGH      "1"
-#define VPCI_PRIORITY_MIDDLE    "5"
-#define VPCI_PRIORITY_LOW       "9"
+typedef struct {
+    unsigned int id;
+    bool is_ext;
+    int (*init)(struct pci_dev *pdev);
+    void (*cleanup)(struct pci_dev *pdev);
+} vpci_capability_t;
 
 #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
 
@@ -29,9 +30,22 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
  */
 #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
 
-#define REGISTER_VPCI_INIT(x, p)                \
-  static vpci_register_init_t *const x##_entry  \
-               __used_section(".data.vpci." p) = (x)
+#define REGISTER_VPCI_CAP(cap, finit, fclean, ext) \
+  static vpci_capability_t finit##_t = { \
+        .id = (cap), \
+        .init = (finit), \
+        .cleanup = (fclean), \
+        .is_ext = (ext), \
+  }; \
+  static vpci_capability_t *const finit##_entry  \
+               __used_section(".data.vpci") = &finit##_t
+
+#define REGISTER_VPCI_LEGACY_CAP(cap, finit, fclean) \
+                REGISTER_VPCI_CAP(cap, finit, fclean, false)
+#define REGISTER_VPCI_EXTENDED_CAP(cap, finit, fclean) \
+                REGISTER_VPCI_CAP(cap, finit, fclean, true)
+
+int __must_check vpci_init_header(struct pci_dev *pdev);
 
 /* Assign vPCI to device by adding handlers. */
 int __must_check vpci_assign_device(struct pci_dev *pdev);
diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
index 793d0e11450c..84ec506b00da 100644
--- a/xen/include/xen/xen.lds.h
+++ b/xen/include/xen/xen.lds.h
@@ -188,7 +188,7 @@
 #define VPCI_ARRAY               \
        . = ALIGN(POINTER_ALIGN); \
        __start_vpci_array = .;   \
-       *(SORT(.data.vpci.*))     \
+       *(.data.vpci)             \
        __end_vpci_array = .;
 #else
 #define VPCI_ARRAY
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 05/10] vpci: Hide legacy capability when it fails to initialize
  2025-05-09  9:05 [PATCH v4 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (3 preceding siblings ...)
  2025-05-09  9:05 ` [PATCH v4 04/10] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
@ 2025-05-09  9:05 ` Jiqian Chen
  2025-05-18 14:44   ` Jan Beulich
  2025-05-09  9:05 ` [PATCH v4 06/10] vpci: Hide extended " Jiqian Chen
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Jiqian Chen @ 2025-05-09  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When vpci fails to initialize a legacy capability of device, it just
returns an error and vPCI gets disabled for the whole device.  That
most likely renders the device unusable, plus possibly causing issues
to Xen itself if guest attempts to program the native MSI or MSI-X
capabilities if present.

So, add new function to hide legacy capability when initialization
fails. And remove the failed legacy capability from the vpci emulated
legacy capability list.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v3->v4 changes:
* Modify the commit message.
* In function vpci_get_previous_cap_register(), add an ASSERT_UNREACHABLE() if offset below 0x40.
* Modify vpci_capability_mask() to return error instead of using ASSERT.
* Use vpci_remove_register to remove PCI_CAP_LIST_ID register instead of open code.
* Add check "if ( !offset )" in vpci_capability_mask().

v2->v3 changes:
* Separated from the last version patch "vpci: Hide capability when it fails to initialize"
* Whole implementation changed because last version is wrong.
  This version adds a new helper function vpci_get_register() and uses it to get
  target handler and previous handler from vpci->handlers, then remove the target.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
  remove failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/vpci.c | 142 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 125 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index f03e1a8eebc0..e1d4e9aa9b88 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -35,6 +35,22 @@ struct vpci_register {
     uint32_t rsvdz_mask;
 };
 
+static int vpci_register_cmp(const struct vpci_register *r1,
+                             const struct vpci_register *r2)
+{
+    /* Return 0 if registers overlap. */
+    if ( r1->offset < r2->offset + r2->size &&
+         r2->offset < r1->offset + r1->size )
+        return 0;
+    if ( r1->offset < r2->offset )
+        return -1;
+    if ( r1->offset > r2->offset )
+        return 1;
+
+    ASSERT_UNREACHABLE();
+    return 0;
+}
+
 #ifdef __XEN__
 extern vpci_capability_t *const __start_vpci_array[];
 extern vpci_capability_t *const __end_vpci_array[];
@@ -83,6 +99,100 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
 
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
+static struct vpci_register *vpci_get_register(struct vpci *vpci,
+                                               unsigned int offset,
+                                               unsigned int size)
+{
+    const struct vpci_register r = { .offset = offset, .size = size };
+    struct vpci_register *rm;
+
+    ASSERT(spin_is_locked(&vpci->lock));
+    list_for_each_entry ( rm, &vpci->handlers, node )
+    {
+        int cmp = vpci_register_cmp(&r, rm);
+
+        if ( !cmp && rm->offset == offset && rm->size == size )
+            return rm;
+        if ( cmp <= 0 )
+            break;
+    }
+
+    return NULL;
+}
+
+static struct vpci_register *vpci_get_previous_cap_register(
+    struct vpci *vpci, unsigned int offset)
+{
+    uint32_t next;
+    struct vpci_register *r;
+
+    if ( offset < 0x40 )
+    {
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    r = vpci_get_register(vpci, PCI_CAPABILITY_LIST, 1);
+    if ( !r )
+        return NULL;
+
+    next = (uint32_t)(uintptr_t)r->private;
+    while ( next >= 0x40 && next != offset )
+    {
+        r = vpci_get_register(vpci, next + PCI_CAP_LIST_NEXT, 1);
+        if ( !r )
+            return NULL;
+        next = (uint32_t)(uintptr_t)r->private;
+    }
+
+    if ( next < 0x40 )
+        return NULL;
+
+    return r;
+}
+
+static int vpci_capability_mask(struct pci_dev *pdev, unsigned int cap)
+{
+    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
+    struct vpci_register *prev_next_r, *next_r;
+    struct vpci *vpci = pdev->vpci;
+
+    if ( !offset )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&vpci->lock);
+    next_r = vpci_get_register(vpci, offset + PCI_CAP_LIST_NEXT, 1);
+    if ( !next_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    prev_next_r = vpci_get_previous_cap_register(vpci, offset);
+    if ( !prev_next_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    prev_next_r->private = next_r->private;
+    /*
+     * Not calling vpci_remove_register() here is to avoid redoing
+     * the register search
+     */
+    list_del(&next_r->node);
+    spin_unlock(&vpci->lock);
+    xfree(next_r);
+
+    if ( !is_hardware_domain(pdev->domain) )
+        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
+
+    return 0;
+}
+
 static int vpci_init_capabilities(struct pci_dev *pdev)
 {
     for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
@@ -106,7 +216,21 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
 
         rc = capability->init(pdev);
         if ( rc )
-            return rc;
+        {
+            if ( capability->cleanup )
+                capability->cleanup(pdev);
+
+            printk(XENLOG_WARNING
+                   "%pd %pp: %s cap %u init fail rc=%d, mask it\n",
+                   pdev->domain, &pdev->sbdf,
+                   is_ext ? "extended" : "legacy", cap, rc);
+            if ( !is_ext )
+            {
+                rc = vpci_capability_mask(pdev, cap);
+                if ( rc )
+                    return rc;
+            }
+        }
     }
 
     return 0;
@@ -201,22 +325,6 @@ int vpci_assign_device(struct pci_dev *pdev)
 }
 #endif /* __XEN__ */
 
-static int vpci_register_cmp(const struct vpci_register *r1,
-                             const struct vpci_register *r2)
-{
-    /* Return 0 if registers overlap. */
-    if ( r1->offset < r2->offset + r2->size &&
-         r2->offset < r1->offset + r1->size )
-        return 0;
-    if ( r1->offset < r2->offset )
-        return -1;
-    if ( r1->offset > r2->offset )
-        return 1;
-
-    ASSERT_UNREACHABLE();
-    return 0;
-}
-
 /* Dummy hooks, writes are ignored, reads return 1's */
 static uint32_t cf_check vpci_ignored_read(
     const struct pci_dev *pdev, unsigned int reg, void *data)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 06/10] vpci: Hide extended capability when it fails to initialize
  2025-05-09  9:05 [PATCH v4 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (4 preceding siblings ...)
  2025-05-09  9:05 ` [PATCH v4 05/10] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
@ 2025-05-09  9:05 ` Jiqian Chen
  2025-05-18 14:47   ` Jan Beulich
  2025-05-09  9:05 ` [PATCH v4 07/10] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Jiqian Chen @ 2025-05-09  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When vpci fails to initialize a extended capability of device, it
just returns an error and vPCI gets disabled for the whole device.

So, add function to hide extended capability when initialization
fails. And remove the failed extended capability handler from vpci
extended capability list.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v3->v4 changes:
* Change definition of PCI_EXT_CAP_NEXT to be "#define PCI_EXT_CAP_NEXT(header) (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xFFCU)" to avoid redundancy.
* Modify the commit message.
* Change vpci_ext_capability_mask() to return error instead of using ASSERT.
* Set the capability ID part to be zero when we need to hide the capability of position 0x100U.
* Add check "if ( !offset )" in vpci_ext_capability_mask().

v2->v3 changes:
* Separated from the last version patch "vpci: Hide capability when it fails to initialize".
* Whole implementation changed because last version is wrong.
  This version gets target handler and previous handler from vpci->handlers, then remove the target.
* Note: a case in function vpci_ext_capability_mask() needs to be discussed,
  because it may change the offset of next capability when the offset of target
  capability is 0x100U(the first extended capability), my implementation is just to
  ignore and let hardware to handle the target capability.

v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to
  remove failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/vpci.c    | 100 +++++++++++++++++++++++++++++++++++--
 xen/include/xen/pci_regs.h |   5 +-
 2 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index e1d4e9aa9b88..76d663753e7b 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -193,6 +193,98 @@ static int vpci_capability_mask(struct pci_dev *pdev, unsigned int cap)
     return 0;
 }
 
+static struct vpci_register *vpci_get_previous_ext_cap_register(
+    struct vpci *vpci, unsigned int offset)
+{
+    uint32_t header;
+    unsigned int pos = PCI_CFG_SPACE_SIZE;
+    struct vpci_register *r;
+
+    if ( offset <= PCI_CFG_SPACE_SIZE )
+    {
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    r = vpci_get_register(vpci, pos, 4);
+    if ( !r )
+        return NULL;
+
+    header = (uint32_t)(uintptr_t)r->private;
+    pos = PCI_EXT_CAP_NEXT(header);
+    while ( pos > PCI_CFG_SPACE_SIZE && pos != offset )
+    {
+        r = vpci_get_register(vpci, pos, 4);
+        if ( !r )
+            return NULL;
+        header = (uint32_t)(uintptr_t)r->private;
+        pos = PCI_EXT_CAP_NEXT(header);
+    }
+
+    if ( pos <= PCI_CFG_SPACE_SIZE )
+        return NULL;
+
+    return r;
+}
+
+static int vpci_ext_capability_mask(struct pci_dev *pdev, unsigned int cap)
+{
+    const unsigned int offset = pci_find_ext_capability(pdev->sbdf, cap);
+    struct vpci_register *rm, *prev_r;
+    struct vpci *vpci = pdev->vpci;
+    uint32_t header, pre_header;
+
+    if ( !offset )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&vpci->lock);
+    rm = vpci_get_register(vpci, offset, 4);
+    if ( !rm )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    header = (uint32_t)(uintptr_t)rm->private;
+    if ( offset == PCI_CFG_SPACE_SIZE )
+    {
+        if ( PCI_EXT_CAP_NEXT(header) <= PCI_CFG_SPACE_SIZE )
+            rm->private = (void *)(uintptr_t)0;
+        else
+            /*
+             * If this case removes target capability of position 0x100U, then
+             * it needs to move the next capability to be in position 0x100U,
+             * that would cause the offset of next capability in vpci different
+             * from the hardware, then cause error accesses, so here chooses to
+             * set the capability ID part to be zero.
+             */
+            rm->private = (void *)(uintptr_t)(header &
+                                              ~PCI_EXT_CAP_ID(header));
+
+        spin_unlock(&vpci->lock);
+        return 0;
+    }
+
+    prev_r = vpci_get_previous_ext_cap_register(vpci, offset);
+    if ( !prev_r )
+    {
+        spin_unlock(&vpci->lock);
+        return -ENODEV;
+    }
+
+    pre_header = (uint32_t)(uintptr_t)prev_r->private;
+    prev_r->private = (void *)(uintptr_t)((pre_header &
+                                           ~PCI_EXT_CAP_NEXT_MASK) |
+                                          (header & PCI_EXT_CAP_NEXT_MASK));
+    list_del(&rm->node);
+    spin_unlock(&vpci->lock);
+    xfree(rm);
+    return 0;
+}
+
 static int vpci_init_capabilities(struct pci_dev *pdev)
 {
     for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
@@ -225,11 +317,11 @@ static int vpci_init_capabilities(struct pci_dev *pdev)
                    pdev->domain, &pdev->sbdf,
                    is_ext ? "extended" : "legacy", cap, rc);
             if ( !is_ext )
-            {
                 rc = vpci_capability_mask(pdev, cap);
-                if ( rc )
-                    return rc;
-            }
+            else
+                rc = vpci_ext_capability_mask(pdev, cap);
+            if ( rc )
+                return rc;
         }
     }
 
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index 27b4f44eedf3..e62bf72ab3d3 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -448,7 +448,10 @@
 /* Extended Capabilities (PCI-X 2.0 and Express) */
 #define PCI_EXT_CAP_ID(header)		((header) & 0x0000ffff)
 #define PCI_EXT_CAP_VER(header)		(((header) >> 16) & 0xf)
-#define PCI_EXT_CAP_NEXT(header)	(((header) >> 20) & 0xffc)
+#define PCI_EXT_CAP_NEXT_MASK		0xFFF00000U
+/* Bottom two bits of next capability position are reserved. */
+#define PCI_EXT_CAP_NEXT(header) \
+    (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xFFCU)
 
 #define PCI_EXT_CAP_ID_ERR	1
 #define PCI_EXT_CAP_ID_VC	2
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 07/10] vpci: Refactor vpci_remove_register to remove matched registers
  2025-05-09  9:05 [PATCH v4 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (5 preceding siblings ...)
  2025-05-09  9:05 ` [PATCH v4 06/10] vpci: Hide extended " Jiqian Chen
@ 2025-05-09  9:05 ` Jiqian Chen
  2025-05-19  6:30   ` Jan Beulich
  2025-05-09  9:05 ` [PATCH v4 08/10] vpci/rebar: Remove registers when init_rebar() fails Jiqian Chen
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Jiqian Chen @ 2025-05-09  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Anthony PERARD

vpci_remove_register() only supports removing a register in a time,
but the follow-on changes need to remove all registers within a range.
So, refactor it to support removing all matched registers in a calling
time.

And it is no matter to remove a non exist register, so remove the
__must_check prefix.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
---
v3->v4 changes:
* Use list_for_each_entry_safe instead of list_for_each_entry.
* Return ERANGE if overlap.

v2->v3 changes:
* Add new check to return error if registers overlap but not inside range.

v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 tools/tests/vpci/main.c |  4 ++--
 xen/drivers/vpci/vpci.c | 38 ++++++++++++++++++++------------------
 xen/include/xen/vpci.h  |  4 ++--
 3 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
index 33223db3eb77..ca72877d60cd 100644
--- a/tools/tests/vpci/main.c
+++ b/tools/tests/vpci/main.c
@@ -132,10 +132,10 @@ static void vpci_write32_mask(const struct pci_dev *pdev, unsigned int reg,
                                   rsvdz_mask))
 
 #define VPCI_REMOVE_REG(off, size)                                          \
-    assert(!vpci_remove_register(test_pdev.vpci, off, size))
+    assert(!vpci_remove_registers(test_pdev.vpci, off, size))
 
 #define VPCI_REMOVE_INVALID_REG(off, size)                                  \
-    assert(vpci_remove_register(test_pdev.vpci, off, size))
+    assert(vpci_remove_registers(test_pdev.vpci, off, size))
 
 /* Read a 32b register using all possible sizes. */
 void multiread4_check(unsigned int reg, uint32_t val)
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 76d663753e7b..ae39a0436284 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -180,7 +180,7 @@ static int vpci_capability_mask(struct pci_dev *pdev, unsigned int cap)
 
     prev_next_r->private = next_r->private;
     /*
-     * Not calling vpci_remove_register() here is to avoid redoing
+     * Not calling vpci_remove_registers() here is to avoid redoing
      * the register search
      */
     list_del(&next_r->node);
@@ -188,7 +188,7 @@ static int vpci_capability_mask(struct pci_dev *pdev, unsigned int cap)
     xfree(next_r);
 
     if ( !is_hardware_domain(pdev->domain) )
-        return vpci_remove_register(vpci, offset + PCI_CAP_LIST_ID, 1);
+        return vpci_remove_registers(vpci, offset + PCI_CAP_LIST_ID, 1);
 
     return 0;
 }
@@ -532,34 +532,36 @@ int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
     return 0;
 }
 
-int vpci_remove_register(struct vpci *vpci, unsigned int offset,
-                         unsigned int size)
+int vpci_remove_registers(struct vpci *vpci, unsigned int start,
+                          unsigned int size)
 {
-    const struct vpci_register r = { .offset = offset, .size = size };
-    struct vpci_register *rm;
+    struct vpci_register *rm, *tmp;
+    unsigned int end = start + size;
 
     spin_lock(&vpci->lock);
-    list_for_each_entry ( rm, &vpci->handlers, node )
+    list_for_each_entry_safe ( rm, tmp, &vpci->handlers, node )
     {
-        int cmp = vpci_register_cmp(&r, rm);
-
-        /*
-         * NB: do not use a switch so that we can use break to
-         * get out of the list loop earlier if required.
-         */
-        if ( !cmp && rm->offset == offset && rm->size == size )
+        /* Remove rm if rm is inside the range. */
+        if ( rm->offset >= start && rm->offset + rm->size <= end )
         {
             list_del(&rm->node);
-            spin_unlock(&vpci->lock);
             xfree(rm);
-            return 0;
+            continue;
         }
-        if ( cmp <= 0 )
+
+        /* Return error if registers overlap but not inside. */
+        if ( rm->offset + rm->size > start && rm->offset < end )
+        {
+            spin_unlock(&vpci->lock);
+            return -ERANGE;
+        }
+
+        if ( start < rm->offset )
             break;
     }
     spin_unlock(&vpci->lock);
 
-    return -ENOENT;
+    return 0;
 }
 
 /* Wrappers for performing reads/writes to the underlying hardware. */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 7d4274e178ee..346006438fe4 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -71,8 +71,8 @@ static inline int __must_check vpci_add_register(struct vpci *vpci,
                                   size, data, 0, 0, 0, 0);
 }
 
-int __must_check vpci_remove_register(struct vpci *vpci, unsigned int offset,
-                                      unsigned int size);
+int vpci_remove_registers(struct vpci *vpci, unsigned int start,
+                          unsigned int size);
 
 /* Generic read/write handlers for the PCI config space. */
 uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 08/10] vpci/rebar: Remove registers when init_rebar() fails
  2025-05-09  9:05 [PATCH v4 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (6 preceding siblings ...)
  2025-05-09  9:05 ` [PATCH v4 07/10] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
@ 2025-05-09  9:05 ` Jiqian Chen
  2025-05-19  6:54   ` Jan Beulich
  2025-05-09  9:05 ` [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
  2025-05-09  9:05 ` [PATCH v4 10/10] vpci/msix: Add function to clean MSIX resources Jiqian Chen
  9 siblings, 1 reply; 45+ messages in thread
From: Jiqian Chen @ 2025-05-09  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_rebar() fails, the previous new changes will hide Rebar
capability, it can't rely on vpci_deassign_device() to remove all
Rebar related registers anymore, those registers must be removed
in cleanup function of Rebar.

To do that, call vpci_remove_registers() to remove all possible
registered registers.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v3->v4 changes:
* Change function name from fini_rebar() to cleanup_rebar().
* Change the error number to be E2BIG and ENXIO in init_rebar().

v2->v3 changes:
* Use fini_rebar() to remove all register instead of in the failure path of init_rebar();

v1->v2 changes:
* Called vpci_remove_registers() to remove all possible registered registers instead of using a array to record all registered register.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/rebar.c | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
index 026f8f7972d9..d2d8a8915afb 100644
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -49,6 +49,26 @@ static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
     bar->guest_addr = bar->addr;
 }
 
+static void cleanup_rebar(struct pci_dev *pdev)
+{
+    uint32_t ctrl;
+    unsigned int nbars;
+    unsigned int rebar_offset = pci_find_ext_capability(pdev->sbdf,
+                                                        PCI_EXT_CAP_ID_REBAR);
+
+    if ( !rebar_offset || !is_hardware_domain(pdev->domain) )
+    {
+        ASSERT_UNREACHABLE();
+        return;
+    }
+
+    ctrl = pci_conf_read32(pdev->sbdf, rebar_offset + PCI_REBAR_CTRL(0));
+    nbars = MASK_EXTR(ctrl, PCI_REBAR_CTRL_NBAR_MASK);
+
+    vpci_remove_registers(pdev->vpci, rebar_offset + PCI_REBAR_CAP(0),
+                          PCI_REBAR_CTRL(nbars - 1));
+}
+
 static int cf_check init_rebar(struct pci_dev *pdev)
 {
     uint32_t ctrl;
@@ -80,7 +100,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: too big BAR number %u in REBAR_CTRL\n",
                    pdev->domain, &pdev->sbdf, index);
-            continue;
+            return -E2BIG;
         }
 
         bar = &pdev->vpci->header.bars[index];
@@ -88,7 +108,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: BAR%u is not in memory space\n",
                    pdev->domain, &pdev->sbdf, index);
-            continue;
+            return -ENXIO;
         }
 
         rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rebar_ctrl_write,
@@ -97,14 +117,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: BAR%u fail to add reg of REBAR_CTRL rc=%d\n",
                    pdev->domain, &pdev->sbdf, index, rc);
-            /*
-             * Ideally we would hide the ReBar capability on error, but code
-             * for doing so still needs to be written. Use continue instead
-             * to keep any already setup register hooks, as returning an
-             * error will cause the hardware domain to get unmediated access
-             * to all device registers.
-             */
-            continue;
+            return rc;
         }
 
         bar->resizable_sizes =
@@ -118,7 +131,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_EXTENDED_CAP(PCI_EXT_CAP_ID_REBAR, init_rebar, NULL);
+REGISTER_VPCI_EXTENDED_CAP(PCI_EXT_CAP_ID_REBAR, init_rebar, cleanup_rebar);
 
 /*
  * Local variables:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-09  9:05 [PATCH v4 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (7 preceding siblings ...)
  2025-05-09  9:05 ` [PATCH v4 08/10] vpci/rebar: Remove registers when init_rebar() fails Jiqian Chen
@ 2025-05-09  9:05 ` Jiqian Chen
  2025-05-20  6:40   ` Jan Beulich
  2025-05-09  9:05 ` [PATCH v4 10/10] vpci/msix: Add function to clean MSIX resources Jiqian Chen
  9 siblings, 1 reply; 45+ messages in thread
From: Jiqian Chen @ 2025-05-09  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_msi() fails, the previous new changes will hide MSI
capability, it can't rely on vpci_deassign_device() to remove
all MSI related resources anymore, those resources must be
removed in cleanup function of MSI.

To do that, add a new function to free MSI resources.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v3->v4 changes:
* Change function name from fini_msi() to cleanup_msi().
* Remove unnecessary comment.
* Change to use XFREE to free vpci->msi.

v2->v3 changes:
* Remove all fail path, and use fini_msi() hook instead.
* Change the method to calculating the size of msi registers.

v1->v2 changes:
* Added a new function fini_msi to free all MSI resources instead of using an array to record registered registers.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/msi.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index ea7dc0c060ea..306da49bd3ec 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -193,6 +193,26 @@ static void cf_check mask_write(
     msi->mask = val;
 }
 
+static void cleanup_msi(struct pci_dev *pdev)
+{
+    unsigned int end, size;
+    struct vpci *vpci = pdev->vpci;
+    const unsigned int msi_pos = pdev->msi_pos;
+
+    if ( !msi_pos || !vpci->msi )
+        return;
+
+    if ( vpci->msi->masking )
+        end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
+    else
+        end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
+
+    size = end - msi_control_reg(msi_pos);
+
+    vpci_remove_registers(vpci, msi_control_reg(msi_pos), size);
+    XFREE(vpci->msi);
+}
+
 static int cf_check init_msi(struct pci_dev *pdev)
 {
     unsigned int pos = pdev->msi_pos;
@@ -270,7 +290,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi, NULL);
+REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi, cleanup_msi);
 
 void vpci_dump_msi(void)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 10/10] vpci/msix: Add function to clean MSIX resources
  2025-05-09  9:05 [PATCH v4 00/10] Support hiding capability when its initialization fails Jiqian Chen
                   ` (8 preceding siblings ...)
  2025-05-09  9:05 ` [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
@ 2025-05-09  9:05 ` Jiqian Chen
  9 siblings, 0 replies; 45+ messages in thread
From: Jiqian Chen @ 2025-05-09  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_msix() fails, it needs to clean all MSIX resources.
So, add a new function to do that.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v3->v4 changes:
* Change function name from fini_msix() to cleanup_msix().
* Change to use XFREE to free vpci->msix.
* In cleanup function, change the sequence of check and remove action according to init_msix().

v2->v3 changes:
* Remove unnecessary clean operations in fini_msix().

v1->v2 changes:
new patch.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/msix.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index f8ce89b8b32f..b80491ed32bf 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -655,6 +655,27 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
     return 0;
 }
 
+static void cleanup_msix(struct pci_dev *pdev)
+{
+    struct vpci *vpci = pdev->vpci;
+    unsigned int msix_pos = pdev->msix_pos;
+
+    if ( !msix_pos )
+        return;
+
+    vpci_remove_registers(vpci, msix_control_reg(msix_pos), 2);
+
+    if ( !vpci->msix )
+        return;
+
+    for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
+        if ( vpci->msix->table[i] )
+            iounmap(vpci->msix->table[i]);
+
+    list_del(&vpci->msix->next);
+    XFREE(vpci->msix);
+}
+
 static int cf_check init_msix(struct pci_dev *pdev)
 {
     struct domain *d = pdev->domain;
@@ -709,7 +730,7 @@ static int cf_check init_msix(struct pci_dev *pdev)
 
     return rc;
 }
-REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSIX, init_msix, NULL);
+REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSIX, init_msix, cleanup_msix);
 
 /*
  * Local variables:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 02/10] vpci/header: Emulate legacy capability list for dom0
  2025-05-09  9:05 ` [PATCH v4 02/10] vpci/header: Emulate legacy capability list for dom0 Jiqian Chen
@ 2025-05-15 16:29   ` Roger Pau Monné
  2025-05-16  2:33     ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2025-05-15 16:29 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Fri, May 09, 2025 at 05:05:34PM +0800, Jiqian Chen wrote:
> Current logic of emulating legacy capability list is only for domU.
> So, expand it to emulate for dom0 too. Then it will be easy to hide
> a capability whose initialization fails in a function.
> 
> And restrict adding PCI_STATUS register only for domU since dom0
> has no limitation to access that register.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v3->v4 changes:
> * Also pass supported_caps to pci_find_next_cap_ttl() for dom0 since the n is zero when dom0,
>   and add a comment to explain it.
> * Restrict adding PCI_STATUS register only for domU since dom0 has no limitation to access that register.
> * For dom0 register handler, set vpci_hw_write8 to it instead of NULL.
> 
> v2->v3 changes:
> * Not to add handler of PCI_CAP_LIST_ID when domain is dom0.
> 
> v1->v2 changes:
> new patch.
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/header.c | 53 ++++++++++++++++++++++++---------------
>  xen/drivers/vpci/vpci.c   |  6 +++++
>  xen/include/xen/vpci.h    |  2 ++
>  3 files changed, 41 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 3e9b44454b43..a06c518c506c 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -749,9 +749,9 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>  {
>      int rc;
>      bool mask_cap_list = false;
> +    bool is_hwdom = is_hardware_domain(pdev->domain);
>  
> -    if ( !is_hardware_domain(pdev->domain) &&
> -         pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
> +    if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
>      {
>          /* Only expose capabilities to the guest that vPCI can handle. */
>          unsigned int next, ttl = 48;
> @@ -759,12 +759,18 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>              PCI_CAP_ID_MSI,
>              PCI_CAP_ID_MSIX,
>          };
> +        /*
> +         * For dom0, we should expose all capabilities instead of a fixed
> +         * capabilities array, so setting n to 0 here is to get the next
> +         * capability position directly in pci_find_next_cap_ttl.
> +         */
> +        const unsigned int n = is_hwdom ? 0 : ARRAY_SIZE(supported_caps);
>  
>          next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
> -                                     supported_caps,
> -                                     ARRAY_SIZE(supported_caps), &ttl);
> +                                     supported_caps, n, &ttl);
>  
> -        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +        rc = vpci_add_register(pdev->vpci, vpci_read_val,
> +                               is_hwdom ? vpci_hw_write8 : NULL,
>                                 PCI_CAPABILITY_LIST, 1,
>                                 (void *)(uintptr_t)next);
>          if ( rc )
> @@ -772,7 +778,7 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>  
>          next &= ~3;
>  
> -        if ( !next )
> +        if ( !next && !is_hwdom )
>              /*
>               * If we don't have any supported capabilities to expose to the
>               * guest, mask the PCI_STATUS_CAP_LIST bit in the status
> @@ -786,15 +792,18 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>  
>              next = pci_find_next_cap_ttl(pdev->sbdf,
>                                           pos + PCI_CAP_LIST_NEXT,
> -                                         supported_caps,
> -                                         ARRAY_SIZE(supported_caps), &ttl);
> +                                         supported_caps, n, &ttl);
>  
> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
> -                                   pos + PCI_CAP_LIST_ID, 1, NULL);
> -            if ( rc )
> -                return rc;
> +            if ( !is_hwdom )
> +            {
> +                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
> +                                       pos + PCI_CAP_LIST_ID, 1, NULL);
> +                if ( rc )
> +                    return rc;
> +            }
>  
> -            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +            rc = vpci_add_register(pdev->vpci, vpci_read_val,
> +                                   is_hwdom ? vpci_hw_write8 : NULL,
>                                     pos + PCI_CAP_LIST_NEXT, 1,
>                                     (void *)(uintptr_t)next);
>              if ( rc )
> @@ -805,13 +814,17 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>      }
>  
>      /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
> -    return vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
> -                                  PCI_STATUS, 2, NULL,
> -                                  PCI_STATUS_RO_MASK &
> -                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
> -                                  PCI_STATUS_RW1C_MASK,
> -                                  mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
> -                                  PCI_STATUS_RSVDZ_MASK);
> +    return is_hwdom ? 0 : vpci_add_register_mask(pdev->vpci, vpci_hw_read16,
> +                                                 vpci_hw_write16, PCI_STATUS,
> +                                                 2, NULL,
> +                                                 PCI_STATUS_RO_MASK &
> +                                                    ~(mask_cap_list ?
> +                                                        PCI_STATUS_CAP_LIST :
> +                                                        0),
> +                                                 PCI_STATUS_RW1C_MASK,
> +                                                 mask_cap_list ?
> +                                                    PCI_STATUS_CAP_LIST : 0,
> +                                                 PCI_STATUS_RSVDZ_MASK);

Wow, that's a bit too much indentation for my taste.  Do you think you
could do:

/* Return early for the hw domain, no masking of PCI_STATUS. */
if ( is_hwdom )
    return 0;
...

So that you don't have to touch the current return chunk?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 02/10] vpci/header: Emulate legacy capability list for dom0
  2025-05-15 16:29   ` Roger Pau Monné
@ 2025-05-16  2:33     ` Chen, Jiqian
  0 siblings, 0 replies; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-16  2:33 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/5/16 00:29, Roger Pau Monné wrote:
> On Fri, May 09, 2025 at 05:05:34PM +0800, Jiqian Chen wrote:
>> @@ -786,15 +792,18 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>  
>>              next = pci_find_next_cap_ttl(pdev->sbdf,
>>                                           pos + PCI_CAP_LIST_NEXT,
>> -                                         supported_caps,
>> -                                         ARRAY_SIZE(supported_caps), &ttl);
>> +                                         supported_caps, n, &ttl);
>>  
>> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
>> -                                   pos + PCI_CAP_LIST_ID, 1, NULL);
>> -            if ( rc )
>> -                return rc;
>> +            if ( !is_hwdom )
>> +            {
>> +                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
>> +                                       pos + PCI_CAP_LIST_ID, 1, NULL);
>> +                if ( rc )
>> +                    return rc;
>> +            }
>>  
>> -            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>> +            rc = vpci_add_register(pdev->vpci, vpci_read_val,
>> +                                   is_hwdom ? vpci_hw_write8 : NULL,
>>                                     pos + PCI_CAP_LIST_NEXT, 1,
>>                                     (void *)(uintptr_t)next);
>>              if ( rc )
>> @@ -805,13 +814,17 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>      }
>>  
>>      /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
>> -    return vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
>> -                                  PCI_STATUS, 2, NULL,
>> -                                  PCI_STATUS_RO_MASK &
>> -                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
>> -                                  PCI_STATUS_RW1C_MASK,
>> -                                  mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
>> -                                  PCI_STATUS_RSVDZ_MASK);
>> +    return is_hwdom ? 0 : vpci_add_register_mask(pdev->vpci, vpci_hw_read16,
>> +                                                 vpci_hw_write16, PCI_STATUS,
>> +                                                 2, NULL,
>> +                                                 PCI_STATUS_RO_MASK &
>> +                                                    ~(mask_cap_list ?
>> +                                                        PCI_STATUS_CAP_LIST :
>> +                                                        0),
>> +                                                 PCI_STATUS_RW1C_MASK,
>> +                                                 mask_cap_list ?
>> +                                                    PCI_STATUS_CAP_LIST : 0,
>> +                                                 PCI_STATUS_RSVDZ_MASK);
> 
> Wow, that's a bit too much indentation for my taste.  Do you think you
> could do:
> 
> /* Return early for the hw domain, no masking of PCI_STATUS. */
> if ( is_hwdom )
>     return 0;
> ...
> 
> So that you don't have to touch the current return chunk?
It seems better.
Will do in next version.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-09  9:05 ` [PATCH v4 03/10] vpci/header: Emulate extended " Jiqian Chen
@ 2025-05-18 14:20   ` Jan Beulich
  2025-05-19  6:43     ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2025-05-18 14:20 UTC (permalink / raw)
  To: Jiqian Chen, Roger Pau Monné; +Cc: Huang Rui, xen-devel

On 09.05.2025 11:05, Jiqian Chen wrote:
> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>                                                   PCI_STATUS_RSVDZ_MASK);
>  }
>  
> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
> +{
> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;

The ttl value exists (in the function you took it from) to make sure
the loop below eventually ends. That is, to be able to kind of
gracefully deal with loops in the linked list. Such loops, however,
would ...

> +    if ( !is_hardware_domain(pdev->domain) )
> +        /* Extended capabilities read as zero, write ignore for guest */
> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +                                 pos, 4, (void *)0);
> +
> +    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
> +    {
> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
> +        int rc;
> +
> +        if ( !header )
> +            return 0;
> +
> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
> +                               pos, 4, (void *)(uintptr_t)header);

... mean we may invoke this twice for the same capability. Such
a secondary invocation would fail with -EEXIST, causing device init
to fail altogether. Which is kind of against our aim of exposing
(in a controlled manner) as much of the PCI hardware as possible.

Imo we ought to be using a bitmap to detect the situation earlier
and hence to be able to avoid redundant register addition. Thoughts?

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-05-09  9:05 ` [PATCH v4 04/10] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
@ 2025-05-18 14:34   ` Jan Beulich
  2025-05-19  6:56     ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2025-05-18 14:34 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Huang Rui, Roger Pau Monné, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel

On 09.05.2025 11:05, Jiqian Chen wrote:
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>  
>      return 0;
>  }
> -REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
> +REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi, NULL);

To keep identifier length bounded, how about REGISTER_VPCI_CAP() here
and ...

> --- a/xen/drivers/vpci/rebar.c
> +++ b/xen/drivers/vpci/rebar.c
> @@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
>  
>      return 0;
>  }
> -REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
> +REGISTER_VPCI_EXTENDED_CAP(PCI_EXT_CAP_ID_REBAR, init_rebar, NULL);

... and REGISTER_VPCI_EXTCAP() here?

> @@ -83,6 +83,35 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>  
> +static int vpci_init_capabilities(struct pci_dev *pdev)
> +{
> +    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
> +    {
> +        const vpci_capability_t *capability = __start_vpci_array[i];
> +        const unsigned int cap = capability->id;
> +        const bool is_ext = capability->is_ext;
> +        unsigned int pos;
> +        int rc;
> +
> +        if ( !is_hardware_domain(pdev->domain) && is_ext )
> +            continue;

Fold this into ...

> +        if ( !is_ext )
> +            pos = pci_find_cap_offset(pdev->sbdf, cap);
> +        else
> +            pos = pci_find_ext_capability(pdev->sbdf, cap);

... this by adding a middle "else if()"?

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 05/10] vpci: Hide legacy capability when it fails to initialize
  2025-05-09  9:05 ` [PATCH v4 05/10] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
@ 2025-05-18 14:44   ` Jan Beulich
  2025-05-19  7:35     ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2025-05-18 14:44 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: Huang Rui, Roger Pau Monné, xen-devel

On 09.05.2025 11:05, Jiqian Chen wrote:
> @@ -83,6 +99,100 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>  
> +static struct vpci_register *vpci_get_register(struct vpci *vpci,
> +                                               unsigned int offset,
> +                                               unsigned int size)
> +{
> +    const struct vpci_register r = { .offset = offset, .size = size };
> +    struct vpci_register *rm;
> +
> +    ASSERT(spin_is_locked(&vpci->lock));
> +    list_for_each_entry ( rm, &vpci->handlers, node )
> +    {
> +        int cmp = vpci_register_cmp(&r, rm);
> +
> +        if ( !cmp && rm->offset == offset && rm->size == size )

What's the point of using vpci_register_cmp() when you need to do
the "exact match" check here anyway?

> +static int vpci_capability_mask(struct pci_dev *pdev, unsigned int cap)

What's the word "mask" indicating here? The function doesn't return any
mask afaics. Do you perhaps mean "hide"?

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 06/10] vpci: Hide extended capability when it fails to initialize
  2025-05-09  9:05 ` [PATCH v4 06/10] vpci: Hide extended " Jiqian Chen
@ 2025-05-18 14:47   ` Jan Beulich
  2025-05-19  7:41     ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2025-05-18 14:47 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: Huang Rui, Roger Pau Monné, xen-devel

On 09.05.2025 11:05, Jiqian Chen wrote:
> --- a/xen/include/xen/pci_regs.h
> +++ b/xen/include/xen/pci_regs.h
> @@ -448,7 +448,10 @@
>  /* Extended Capabilities (PCI-X 2.0 and Express) */
>  #define PCI_EXT_CAP_ID(header)		((header) & 0x0000ffff)
>  #define PCI_EXT_CAP_VER(header)		(((header) >> 16) & 0xf)
> -#define PCI_EXT_CAP_NEXT(header)	(((header) >> 20) & 0xffc)
> +#define PCI_EXT_CAP_NEXT_MASK		0xFFF00000U
> +/* Bottom two bits of next capability position are reserved. */
> +#define PCI_EXT_CAP_NEXT(header) \
> +    (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xFFCU)

Please can the hex digits all be / remain to be lower case, with just
the U suffixes be upper case?

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 07/10] vpci: Refactor vpci_remove_register to remove matched registers
  2025-05-09  9:05 ` [PATCH v4 07/10] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
@ 2025-05-19  6:30   ` Jan Beulich
  2025-05-19  7:44     ` Chen, Jiqian
  2025-05-19 10:24     ` Roger Pau Monné
  0 siblings, 2 replies; 45+ messages in thread
From: Jan Beulich @ 2025-05-19  6:30 UTC (permalink / raw)
  To: Jiqian Chen, Roger Pau Monné; +Cc: Huang Rui, Anthony PERARD, xen-devel

On 09.05.2025 11:05, Jiqian Chen wrote:
> vpci_remove_register() only supports removing a register in a time,
> but the follow-on changes need to remove all registers within a range.
> So, refactor it to support removing all matched registers in a calling
> time.

Generally I'm a little wary of changing behavior for existing callers,
but I guess Roger already did signal he's okay taking that route?

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-18 14:20   ` Jan Beulich
@ 2025-05-19  6:43     ` Chen, Jiqian
  2025-05-19  6:56       ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-19  6:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, xen-devel@lists.xenproject.org, Huang, Ray,
	Chen, Jiqian

On 2025/5/18 22:20, Jan Beulich wrote:
> On 09.05.2025 11:05, Jiqian Chen wrote:
>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>                                                   PCI_STATUS_RSVDZ_MASK);
>>  }
>>  
>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>> +{
>> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
> 
> The ttl value exists (in the function you took it from) to make sure
> the loop below eventually ends. That is, to be able to kind of
> gracefully deal with loops in the linked list. Such loops, however,
> would ...
> 
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +        /* Extended capabilities read as zero, write ignore for guest */
>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>> +                                 pos, 4, (void *)0);
>> +
>> +    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
>> +    {
>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>> +        int rc;
>> +
>> +        if ( !header )
>> +            return 0;
>> +
>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>> +                               pos, 4, (void *)(uintptr_t)header);
> 
> ... mean we may invoke this twice for the same capability. Such
> a secondary invocation would fail with -EEXIST, causing device init
> to fail altogether. Which is kind of against our aim of exposing
> (in a controlled manner) as much of the PCI hardware as possible.
May I know what situation that can make this twice for one capability when initialization?
Does hardware capability list have a cycle?

> 
> Imo we ought to be using a bitmap to detect the situation earlier
> and hence to be able to avoid redundant register addition. Thoughts?
Can we just let it go forward and continue to add register for next capability when rc == -EXIST, instead of returning error ?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 08/10] vpci/rebar: Remove registers when init_rebar() fails
  2025-05-09  9:05 ` [PATCH v4 08/10] vpci/rebar: Remove registers when init_rebar() fails Jiqian Chen
@ 2025-05-19  6:54   ` Jan Beulich
  2025-05-19  7:49     ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2025-05-19  6:54 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: Huang Rui, Roger Pau Monné, xen-devel

On 09.05.2025 11:05, Jiqian Chen wrote:
> --- a/xen/drivers/vpci/rebar.c
> +++ b/xen/drivers/vpci/rebar.c
> @@ -49,6 +49,26 @@ static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
>      bar->guest_addr = bar->addr;
>  }
>  
> +static void cleanup_rebar(struct pci_dev *pdev)

Just to remind you that any hook functions need to be cf_check.

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-19  6:43     ` Chen, Jiqian
@ 2025-05-19  6:56       ` Jan Beulich
  2025-05-19  7:13         ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2025-05-19  6:56 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Roger Pau Monné, xen-devel@lists.xenproject.org, Huang, Ray

On 19.05.2025 08:43, Chen, Jiqian wrote:
> On 2025/5/18 22:20, Jan Beulich wrote:
>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>>                                                   PCI_STATUS_RSVDZ_MASK);
>>>  }
>>>  
>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>>> +{
>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
>>
>> The ttl value exists (in the function you took it from) to make sure
>> the loop below eventually ends. That is, to be able to kind of
>> gracefully deal with loops in the linked list. Such loops, however,
>> would ...
>>
>>> +    if ( !is_hardware_domain(pdev->domain) )
>>> +        /* Extended capabilities read as zero, write ignore for guest */
>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>> +                                 pos, 4, (void *)0);
>>> +
>>> +    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
>>> +    {
>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>>> +        int rc;
>>> +
>>> +        if ( !header )
>>> +            return 0;
>>> +
>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>>> +                               pos, 4, (void *)(uintptr_t)header);
>>
>> ... mean we may invoke this twice for the same capability. Such
>> a secondary invocation would fail with -EEXIST, causing device init
>> to fail altogether. Which is kind of against our aim of exposing
>> (in a controlled manner) as much of the PCI hardware as possible.
> May I know what situation that can make this twice for one capability when initialization?
> Does hardware capability list have a cycle?

Any of this is to work around flawed hardware, I suppose.

>> Imo we ought to be using a bitmap to detect the situation earlier
>> and hence to be able to avoid redundant register addition. Thoughts?
> Can we just let it go forward and continue to add register for next capability when rc == -EXIST, instead of returning error ?

Possible, but feels wrong.

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-05-18 14:34   ` Jan Beulich
@ 2025-05-19  6:56     ` Chen, Jiqian
  2025-05-19 13:07       ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-19  6:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, Andrew Cooper, Anthony PERARD,
	Orzel, Michal, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/5/18 22:34, Jan Beulich wrote:
> On 09.05.2025 11:05, Jiqian Chen wrote:
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>>  
>>      return 0;
>>  }
>> -REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
>> +REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi, NULL);
> 
> To keep identifier length bounded, how about REGISTER_VPCI_CAP() here
> and ...
> 
>> --- a/xen/drivers/vpci/rebar.c
>> +++ b/xen/drivers/vpci/rebar.c
>> @@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
>>  
>>      return 0;
>>  }
>> -REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
>> +REGISTER_VPCI_EXTENDED_CAP(PCI_EXT_CAP_ID_REBAR, init_rebar, NULL);
> 
> ... and REGISTER_VPCI_EXTCAP() here?

If so, I need to change the name of REGISTER_VPCI_CAP to be _REGISTER_VPCI_CAP ?

#define REGISTER_VPCI_CAP(cap, finit, fclean, ext) \
  static vpci_capability_t finit##_t = { \
        .id = (cap), \
        .init = (finit), \
        .cleanup = (fclean), \
        .is_ext = (ext), \
  }; \
  static vpci_capability_t *const finit##_entry  \
               __used_section(".data.vpci") = &finit##_t

> 
>> @@ -83,6 +83,35 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>>  
>>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>  
>> +static int vpci_init_capabilities(struct pci_dev *pdev)
>> +{
>> +    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
>> +    {
>> +        const vpci_capability_t *capability = __start_vpci_array[i];
>> +        const unsigned int cap = capability->id;
>> +        const bool is_ext = capability->is_ext;
>> +        unsigned int pos;
>> +        int rc;
>> +
>> +        if ( !is_hardware_domain(pdev->domain) && is_ext )
>> +            continue;
> 
> Fold this into ...
> 
>> +        if ( !is_ext )
>> +            pos = pci_find_cap_offset(pdev->sbdf, cap);
>> +        else
>> +            pos = pci_find_ext_capability(pdev->sbdf, cap);
> 
> ... this by adding a middle "else if()"?
It seems better, will do.
Thanks.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-19  6:56       ` Jan Beulich
@ 2025-05-19  7:13         ` Chen, Jiqian
  2025-05-19 13:10           ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-19  7:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, xen-devel@lists.xenproject.org, Huang, Ray,
	Chen, Jiqian

On 2025/5/19 14:56, Jan Beulich wrote:
> On 19.05.2025 08:43, Chen, Jiqian wrote:
>> On 2025/5/18 22:20, Jan Beulich wrote:
>>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>>>                                                   PCI_STATUS_RSVDZ_MASK);
>>>>  }
>>>>  
>>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>>>> +{
>>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
>>>
>>> The ttl value exists (in the function you took it from) to make sure
>>> the loop below eventually ends. That is, to be able to kind of
>>> gracefully deal with loops in the linked list. Such loops, however,
>>> would ...
>>>
>>>> +    if ( !is_hardware_domain(pdev->domain) )
>>>> +        /* Extended capabilities read as zero, write ignore for guest */
>>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>>> +                                 pos, 4, (void *)0);
>>>> +
>>>> +    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
>>>> +    {
>>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>>>> +        int rc;
>>>> +
>>>> +        if ( !header )
>>>> +            return 0;
>>>> +
>>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>>>> +                               pos, 4, (void *)(uintptr_t)header);
>>>
>>> ... mean we may invoke this twice for the same capability. Such
>>> a secondary invocation would fail with -EEXIST, causing device init
>>> to fail altogether. Which is kind of against our aim of exposing
>>> (in a controlled manner) as much of the PCI hardware as possible.
>> May I know what situation that can make this twice for one capability when initialization?
>> Does hardware capability list have a cycle?
> 
> Any of this is to work around flawed hardware, I suppose.
> 
>>> Imo we ought to be using a bitmap to detect the situation earlier
>>> and hence to be able to avoid redundant register addition. Thoughts?
>> Can we just let it go forward and continue to add register for next capability when rc == -EXIST, instead of returning error ?
> 
> Possible, but feels wrong.
How about when EXIST, setting the next bits of previous extended capability to be zero and return 0? Then we break the cycle.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 05/10] vpci: Hide legacy capability when it fails to initialize
  2025-05-18 14:44   ` Jan Beulich
@ 2025-05-19  7:35     ` Chen, Jiqian
  2025-05-19 10:04       ` Roger Pau Monné
  0 siblings, 1 reply; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-19  7:35 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/5/18 22:44, Jan Beulich wrote:
> On 09.05.2025 11:05, Jiqian Chen wrote:
>> @@ -83,6 +99,100 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>>  
>>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>  
>> +static struct vpci_register *vpci_get_register(struct vpci *vpci,
>> +                                               unsigned int offset,
>> +                                               unsigned int size)
>> +{
>> +    const struct vpci_register r = { .offset = offset, .size = size };
>> +    struct vpci_register *rm;
>> +
>> +    ASSERT(spin_is_locked(&vpci->lock));
>> +    list_for_each_entry ( rm, &vpci->handlers, node )
>> +    {
>> +        int cmp = vpci_register_cmp(&r, rm);
>> +
>> +        if ( !cmp && rm->offset == offset && rm->size == size )
> 
> What's the point of using vpci_register_cmp() when you need to do
> the "exact match" check here anyway?
Oh, you are right.
Will remove "!cmp" here in next version.

> 
>> +static int vpci_capability_mask(struct pci_dev *pdev, unsigned int cap)
> 
> What's the word "mask" indicating here? The function doesn't return any
> mask afaics. Do you perhaps mean "hide"?
Yes, hide.
This is a question of naming preference.
I remember Roger suggested this name, but maybe I remember wrong.
For double confirmation, hi Roger, are you fine that I change this name from mask to hide?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 06/10] vpci: Hide extended capability when it fails to initialize
  2025-05-18 14:47   ` Jan Beulich
@ 2025-05-19  7:41     ` Chen, Jiqian
  2025-05-19 13:12       ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-19  7:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Huang, Ray, Roger Pau Monné, xen-devel@lists.xenproject.org,
	Chen, Jiqian

On 2025/5/18 22:47, Jan Beulich wrote:
> On 09.05.2025 11:05, Jiqian Chen wrote:
>> --- a/xen/include/xen/pci_regs.h
>> +++ b/xen/include/xen/pci_regs.h
>> @@ -448,7 +448,10 @@
>>  /* Extended Capabilities (PCI-X 2.0 and Express) */
>>  #define PCI_EXT_CAP_ID(header)		((header) & 0x0000ffff)
>>  #define PCI_EXT_CAP_VER(header)		(((header) >> 16) & 0xf)
>> -#define PCI_EXT_CAP_NEXT(header)	(((header) >> 20) & 0xffc)
>> +#define PCI_EXT_CAP_NEXT_MASK		0xFFF00000U
>> +/* Bottom two bits of next capability position are reserved. */
>> +#define PCI_EXT_CAP_NEXT(header) \
>> +    (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xFFCU)
> 
> Please can the hex digits all be / remain to be lower case, with just
> the U suffixes be upper case?
Including "0xFFF00000U" or just "0xFFCU" ?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 07/10] vpci: Refactor vpci_remove_register to remove matched registers
  2025-05-19  6:30   ` Jan Beulich
@ 2025-05-19  7:44     ` Chen, Jiqian
  2025-05-19 10:24     ` Roger Pau Monné
  1 sibling, 0 replies; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-19  7:44 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Anthony PERARD, Huang, Ray,
	Chen, Jiqian

On 2025/5/19 14:30, Jan Beulich wrote:
> On 09.05.2025 11:05, Jiqian Chen wrote:
>> vpci_remove_register() only supports removing a register in a time,
>> but the follow-on changes need to remove all registers within a range.
>> So, refactor it to support removing all matched registers in a calling
>> time.
> 
> Generally I'm a little wary of changing behavior for existing callers,
> but I guess Roger already did signal he's okay taking that route?
Yes, this is suggested by Roger.
vpci_remove_register() is just used for test without my series.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 08/10] vpci/rebar: Remove registers when init_rebar() fails
  2025-05-19  6:54   ` Jan Beulich
@ 2025-05-19  7:49     ` Chen, Jiqian
  0 siblings, 0 replies; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-19  7:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel@lists.xenproject.org, Roger Pau Monné, Huang, Ray,
	Chen, Jiqian

On 2025/5/19 14:54, Jan Beulich wrote:
> On 09.05.2025 11:05, Jiqian Chen wrote:
>> --- a/xen/drivers/vpci/rebar.c
>> +++ b/xen/drivers/vpci/rebar.c
>> @@ -49,6 +49,26 @@ static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
>>      bar->guest_addr = bar->addr;
>>  }
>>  
>> +static void cleanup_rebar(struct pci_dev *pdev)
> 
> Just to remind you that any hook functions need to be cf_check.
Thank you for reminding me. This is my first time to know this.
I will add cf_check for all cleanup_x in my series.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 05/10] vpci: Hide legacy capability when it fails to initialize
  2025-05-19  7:35     ` Chen, Jiqian
@ 2025-05-19 10:04       ` Roger Pau Monné
  0 siblings, 0 replies; 45+ messages in thread
From: Roger Pau Monné @ 2025-05-19 10:04 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: Jan Beulich, xen-devel@lists.xenproject.org, Huang, Ray

On Mon, May 19, 2025 at 07:35:49AM +0000, Chen, Jiqian wrote:
> On 2025/5/18 22:44, Jan Beulich wrote:
> > On 09.05.2025 11:05, Jiqian Chen wrote:
> >> +static int vpci_capability_mask(struct pci_dev *pdev, unsigned int cap)
> > 
> > What's the word "mask" indicating here? The function doesn't return any
> > mask afaics. Do you perhaps mean "hide"?
> Yes, hide.
> This is a question of naming preference.
> I remember Roger suggested this name, but maybe I remember wrong.
> For double confirmation, hi Roger, are you fine that I change this name from mask to hide?

Sure, it's fine to use hide, and it's possibly more inline with how
we use hide in other functions (like pci_hide_device()).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 07/10] vpci: Refactor vpci_remove_register to remove matched registers
  2025-05-19  6:30   ` Jan Beulich
  2025-05-19  7:44     ` Chen, Jiqian
@ 2025-05-19 10:24     ` Roger Pau Monné
  1 sibling, 0 replies; 45+ messages in thread
From: Roger Pau Monné @ 2025-05-19 10:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jiqian Chen, Huang Rui, Anthony PERARD, xen-devel

On Mon, May 19, 2025 at 08:30:22AM +0200, Jan Beulich wrote:
> On 09.05.2025 11:05, Jiqian Chen wrote:
> > vpci_remove_register() only supports removing a register in a time,
> > but the follow-on changes need to remove all registers within a range.
> > So, refactor it to support removing all matched registers in a calling
> > time.
> 
> Generally I'm a little wary of changing behavior for existing callers,
> but I guess Roger already did signal he's okay taking that route?

The only users of that function currently are in tools/tests/vpci/,
so changing is not a problem IMO.

Regards, Roger.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 04/10] vpci: Refactor REGISTER_VPCI_INIT
  2025-05-19  6:56     ` Chen, Jiqian
@ 2025-05-19 13:07       ` Jan Beulich
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2025-05-19 13:07 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Roger Pau Monné, Andrew Cooper, Anthony PERARD,
	Orzel, Michal, Julien Grall, Stefano Stabellini,
	xen-devel@lists.xenproject.org, Huang, Ray

On 19.05.2025 08:56, Chen, Jiqian wrote:
> On 2025/5/18 22:34, Jan Beulich wrote:
>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>> --- a/xen/drivers/vpci/msi.c
>>> +++ b/xen/drivers/vpci/msi.c
>>> @@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>>>  
>>>      return 0;
>>>  }
>>> -REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
>>> +REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi, NULL);
>>
>> To keep identifier length bounded, how about REGISTER_VPCI_CAP() here
>> and ...
>>
>>> --- a/xen/drivers/vpci/rebar.c
>>> +++ b/xen/drivers/vpci/rebar.c
>>> @@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
>>>  
>>>      return 0;
>>>  }
>>> -REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
>>> +REGISTER_VPCI_EXTENDED_CAP(PCI_EXT_CAP_ID_REBAR, init_rebar, NULL);
>>
>> ... and REGISTER_VPCI_EXTCAP() here?
> 
> If so, I need to change the name of REGISTER_VPCI_CAP to be _REGISTER_VPCI_CAP ?
> 
> #define REGISTER_VPCI_CAP(cap, finit, fclean, ext) \
>   static vpci_capability_t finit##_t = { \
>         .id = (cap), \
>         .init = (finit), \
>         .cleanup = (fclean), \
>         .is_ext = (ext), \
>   }; \
>   static vpci_capability_t *const finit##_entry  \
>                __used_section(".data.vpci") = &finit##_t

That's a reserved name then. Since it's used only twice (to produce the
other two macros), REGISTER_PCI_CAPABILITY() maybe? Or one of
REGISTER_PCI__CAP() / REGISTER_PCI_CAP_()?

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-19  7:13         ` Chen, Jiqian
@ 2025-05-19 13:10           ` Jan Beulich
  2025-05-19 13:21             ` Roger Pau Monné
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2025-05-19 13:10 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Roger Pau Monné, xen-devel@lists.xenproject.org, Huang, Ray

On 19.05.2025 09:13, Chen, Jiqian wrote:
> On 2025/5/19 14:56, Jan Beulich wrote:
>> On 19.05.2025 08:43, Chen, Jiqian wrote:
>>> On 2025/5/18 22:20, Jan Beulich wrote:
>>>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>>>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>>>>                                                   PCI_STATUS_RSVDZ_MASK);
>>>>>  }
>>>>>  
>>>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>>>>> +{
>>>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
>>>>
>>>> The ttl value exists (in the function you took it from) to make sure
>>>> the loop below eventually ends. That is, to be able to kind of
>>>> gracefully deal with loops in the linked list. Such loops, however,
>>>> would ...
>>>>
>>>>> +    if ( !is_hardware_domain(pdev->domain) )
>>>>> +        /* Extended capabilities read as zero, write ignore for guest */
>>>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>>>> +                                 pos, 4, (void *)0);
>>>>> +
>>>>> +    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
>>>>> +    {
>>>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>>>>> +        int rc;
>>>>> +
>>>>> +        if ( !header )
>>>>> +            return 0;
>>>>> +
>>>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>>>>> +                               pos, 4, (void *)(uintptr_t)header);
>>>>
>>>> ... mean we may invoke this twice for the same capability. Such
>>>> a secondary invocation would fail with -EEXIST, causing device init
>>>> to fail altogether. Which is kind of against our aim of exposing
>>>> (in a controlled manner) as much of the PCI hardware as possible.
>>> May I know what situation that can make this twice for one capability when initialization?
>>> Does hardware capability list have a cycle?
>>
>> Any of this is to work around flawed hardware, I suppose.
>>
>>>> Imo we ought to be using a bitmap to detect the situation earlier
>>>> and hence to be able to avoid redundant register addition. Thoughts?
>>> Can we just let it go forward and continue to add register for next capability when rc == -EXIST, instead of returning error ?
>>
>> Possible, but feels wrong.
> How about when EXIST, setting the next bits of previous extended capability to be zero and return 0? Then we break the cycle.

Hmm. Again an option, yet again I'm not certain. But that's perhaps just
me, and Roger may be fine with it. IOW we might as well start out this way,
and adjust if (ever) an issue with a real device is found.

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 06/10] vpci: Hide extended capability when it fails to initialize
  2025-05-19  7:41     ` Chen, Jiqian
@ 2025-05-19 13:12       ` Jan Beulich
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2025-05-19 13:12 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Huang, Ray, Roger Pau Monné, xen-devel@lists.xenproject.org

On 19.05.2025 09:41, Chen, Jiqian wrote:
> On 2025/5/18 22:47, Jan Beulich wrote:
>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>> --- a/xen/include/xen/pci_regs.h
>>> +++ b/xen/include/xen/pci_regs.h
>>> @@ -448,7 +448,10 @@
>>>  /* Extended Capabilities (PCI-X 2.0 and Express) */
>>>  #define PCI_EXT_CAP_ID(header)		((header) & 0x0000ffff)
>>>  #define PCI_EXT_CAP_VER(header)		(((header) >> 16) & 0xf)
>>> -#define PCI_EXT_CAP_NEXT(header)	(((header) >> 20) & 0xffc)
>>> +#define PCI_EXT_CAP_NEXT_MASK		0xFFF00000U
>>> +/* Bottom two bits of next capability position are reserved. */
>>> +#define PCI_EXT_CAP_NEXT(header) \
>>> +    (MASK_EXTR(header, PCI_EXT_CAP_NEXT_MASK) & 0xFFCU)
>>
>> Please can the hex digits all be / remain to be lower case, with just
>> the U suffixes be upper case?
> Including "0xFFF00000U" or just "0xFFCU" ?

Both. See also patch context here.

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-19 13:10           ` Jan Beulich
@ 2025-05-19 13:21             ` Roger Pau Monné
  2025-05-21  6:08               ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2025-05-19 13:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Chen, Jiqian, xen-devel@lists.xenproject.org, Huang, Ray

On Mon, May 19, 2025 at 03:10:17PM +0200, Jan Beulich wrote:
> On 19.05.2025 09:13, Chen, Jiqian wrote:
> > On 2025/5/19 14:56, Jan Beulich wrote:
> >> On 19.05.2025 08:43, Chen, Jiqian wrote:
> >>> On 2025/5/18 22:20, Jan Beulich wrote:
> >>>> On 09.05.2025 11:05, Jiqian Chen wrote:
> >>>>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
> >>>>>                                                   PCI_STATUS_RSVDZ_MASK);
> >>>>>  }
> >>>>>  
> >>>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
> >>>>> +{
> >>>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
> >>>>
> >>>> The ttl value exists (in the function you took it from) to make sure
> >>>> the loop below eventually ends. That is, to be able to kind of
> >>>> gracefully deal with loops in the linked list. Such loops, however,
> >>>> would ...
> >>>>
> >>>>> +    if ( !is_hardware_domain(pdev->domain) )
> >>>>> +        /* Extended capabilities read as zero, write ignore for guest */
> >>>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> >>>>> +                                 pos, 4, (void *)0);
> >>>>> +
> >>>>> +    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
> >>>>> +    {
> >>>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
> >>>>> +        int rc;
> >>>>> +
> >>>>> +        if ( !header )
> >>>>> +            return 0;
> >>>>> +
> >>>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
> >>>>> +                               pos, 4, (void *)(uintptr_t)header);
> >>>>
> >>>> ... mean we may invoke this twice for the same capability. Such
> >>>> a secondary invocation would fail with -EEXIST, causing device init
> >>>> to fail altogether. Which is kind of against our aim of exposing
> >>>> (in a controlled manner) as much of the PCI hardware as possible.
> >>> May I know what situation that can make this twice for one capability when initialization?
> >>> Does hardware capability list have a cycle?
> >>
> >> Any of this is to work around flawed hardware, I suppose.
> >>
> >>>> Imo we ought to be using a bitmap to detect the situation earlier
> >>>> and hence to be able to avoid redundant register addition. Thoughts?
> >>> Can we just let it go forward and continue to add register for next capability when rc == -EXIST, instead of returning error ?
> >>
> >> Possible, but feels wrong.
> > How about when EXIST, setting the next bits of previous extended capability to be zero and return 0? Then we break the cycle.
> 
> Hmm. Again an option, yet again I'm not certain. But that's perhaps just
> me, and Roger may be fine with it. IOW we might as well start out this way,
> and adjust if (ever) an issue with a real device is found.

Returning -EEXIST might be fine, but at that point there's no further
capability to process.  There's a loop in the linked capability list,
and we should just exit.  There needs to be a warning in this case,
and since this is for the hardware domain only it shouldn't be fatal.

If it was for domUs we would possibly need to discuss whether
assigning the device should fail if a capability linked list loop is
found.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-09  9:05 ` [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
@ 2025-05-20  6:40   ` Jan Beulich
  2025-05-20  9:09     ` Roger Pau Monné
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2025-05-20  6:40 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: Huang Rui, Roger Pau Monné, xen-devel

On 09.05.2025 11:05, Jiqian Chen wrote:
> When init_msi() fails, the previous new changes will hide MSI
> capability, it can't rely on vpci_deassign_device() to remove
> all MSI related resources anymore, those resources must be
> removed in cleanup function of MSI.

That's because vpci_deassign_device() simply isn't called anymore?
Could do with wording along these lines then. But (also applicable
to the previous patch) - doesn't this need to come earlier? And is
it sufficient to simply remove the register intercepts? Don't you
need to put in place ones dropping all writes and making all reads
return either 0 or ~0 (covering in particular Dom0, while for DomU-s
this may already be the case by default behavior)?

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-20  6:40   ` Jan Beulich
@ 2025-05-20  9:09     ` Roger Pau Monné
  2025-05-20  9:14       ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2025-05-20  9:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jiqian Chen, Huang Rui, xen-devel

On Tue, May 20, 2025 at 08:40:28AM +0200, Jan Beulich wrote:
> On 09.05.2025 11:05, Jiqian Chen wrote:
> > When init_msi() fails, the previous new changes will hide MSI
> > capability, it can't rely on vpci_deassign_device() to remove
> > all MSI related resources anymore, those resources must be
> > removed in cleanup function of MSI.
> 
> That's because vpci_deassign_device() simply isn't called anymore?
> Could do with wording along these lines then. But (also applicable
> to the previous patch) - doesn't this need to come earlier? And is
> it sufficient to simply remove the register intercepts? Don't you
> need to put in place ones dropping all writes and making all reads
> return either 0 or ~0 (covering in particular Dom0, while for DomU-s
> this may already be the case by default behavior)?

For domUs this is already the default behavior.

For dom0 I think it should be enough to hide the capability from the
linked list, but not hide all the capability related
registers.  IMO a well behaved dom0 won't try to access capabilities
disconnected from the linked list, and in general we allow dom0 access
to as much as possible.

For dom0 Xen could drop writes to the MSI(-X) enable bit, thus forcing
MSI(-X) to stay disabled.  I however don't see this as a mandatory
step for the work here.

Regards, Roger.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-20  9:09     ` Roger Pau Monné
@ 2025-05-20  9:14       ` Jan Beulich
  2025-05-20  9:43         ` Roger Pau Monné
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2025-05-20  9:14 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Jiqian Chen, Huang Rui, xen-devel

On 20.05.2025 11:09, Roger Pau Monné wrote:
> On Tue, May 20, 2025 at 08:40:28AM +0200, Jan Beulich wrote:
>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>> When init_msi() fails, the previous new changes will hide MSI
>>> capability, it can't rely on vpci_deassign_device() to remove
>>> all MSI related resources anymore, those resources must be
>>> removed in cleanup function of MSI.
>>
>> That's because vpci_deassign_device() simply isn't called anymore?
>> Could do with wording along these lines then. But (also applicable
>> to the previous patch) - doesn't this need to come earlier? And is
>> it sufficient to simply remove the register intercepts? Don't you
>> need to put in place ones dropping all writes and making all reads
>> return either 0 or ~0 (covering in particular Dom0, while for DomU-s
>> this may already be the case by default behavior)?
> 
> For domUs this is already the default behavior.
> 
> For dom0 I think it should be enough to hide the capability from the
> linked list, but not hide all the capability related
> registers.  IMO a well behaved dom0 won't try to access capabilities
> disconnected from the linked list,

Just that I've seen drivers knowing where their device has certain
capabilities, thus not bothering to look up the respective
capability.

> and in general we allow dom0 access
> to as much as possible.
> 
> For dom0 Xen could drop writes to the MSI(-X) enable bit, thus forcing
> MSI(-X) to stay disabled.  I however don't see this as a mandatory
> step for the work here.

You're the maintainer, so you have the final say.

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-20  9:14       ` Jan Beulich
@ 2025-05-20  9:43         ` Roger Pau Monné
  2025-05-21  7:00           ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2025-05-20  9:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jiqian Chen, Huang Rui, xen-devel

On Tue, May 20, 2025 at 11:14:27AM +0200, Jan Beulich wrote:
> On 20.05.2025 11:09, Roger Pau Monné wrote:
> > On Tue, May 20, 2025 at 08:40:28AM +0200, Jan Beulich wrote:
> >> On 09.05.2025 11:05, Jiqian Chen wrote:
> >>> When init_msi() fails, the previous new changes will hide MSI
> >>> capability, it can't rely on vpci_deassign_device() to remove
> >>> all MSI related resources anymore, those resources must be
> >>> removed in cleanup function of MSI.
> >>
> >> That's because vpci_deassign_device() simply isn't called anymore?
> >> Could do with wording along these lines then. But (also applicable
> >> to the previous patch) - doesn't this need to come earlier? And is
> >> it sufficient to simply remove the register intercepts? Don't you
> >> need to put in place ones dropping all writes and making all reads
> >> return either 0 or ~0 (covering in particular Dom0, while for DomU-s
> >> this may already be the case by default behavior)?
> > 
> > For domUs this is already the default behavior.
> > 
> > For dom0 I think it should be enough to hide the capability from the
> > linked list, but not hide all the capability related
> > registers.  IMO a well behaved dom0 won't try to access capabilities
> > disconnected from the linked list,
> 
> Just that I've seen drivers knowing where their device has certain
> capabilities, thus not bothering to look up the respective
> capability.

OK, so let's make the control register read-only in case of failure.

If MSI(-X) is already enabled we should also make the entries
read-only, and while that's not very complicated for MSI, it does get
more convoluted for MSI-X.  I'm fine with just making the control
register read-only for the time being.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-19 13:21             ` Roger Pau Monné
@ 2025-05-21  6:08               ` Chen, Jiqian
  2025-05-21  6:25                 ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-21  6:08 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: Chen, Jiqian, xen-devel@lists.xenproject.org, Huang, Ray

On 2025/5/19 21:21, Roger Pau Monné wrote:
> On Mon, May 19, 2025 at 03:10:17PM +0200, Jan Beulich wrote:
>> On 19.05.2025 09:13, Chen, Jiqian wrote:
>>> On 2025/5/19 14:56, Jan Beulich wrote:
>>>> On 19.05.2025 08:43, Chen, Jiqian wrote:
>>>>> On 2025/5/18 22:20, Jan Beulich wrote:
>>>>>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>>>>>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>>>>>>                                                   PCI_STATUS_RSVDZ_MASK);
>>>>>>>  }
>>>>>>>  
>>>>>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>>>>>>> +{
>>>>>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
>>>>>>
>>>>>> The ttl value exists (in the function you took it from) to make sure
>>>>>> the loop below eventually ends. That is, to be able to kind of
>>>>>> gracefully deal with loops in the linked list. Such loops, however,
>>>>>> would ...
>>>>>>
>>>>>>> +    if ( !is_hardware_domain(pdev->domain) )
>>>>>>> +        /* Extended capabilities read as zero, write ignore for guest */
>>>>>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>>>>>> +                                 pos, 4, (void *)0);
>>>>>>> +
>>>>>>> +    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
>>>>>>> +    {
>>>>>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>>>>>>> +        int rc;
>>>>>>> +
>>>>>>> +        if ( !header )
>>>>>>> +            return 0;
>>>>>>> +
>>>>>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>>>>>>> +                               pos, 4, (void *)(uintptr_t)header);
>>>>>>
>>>>>> ... mean we may invoke this twice for the same capability. Such
>>>>>> a secondary invocation would fail with -EEXIST, causing device init
>>>>>> to fail altogether. Which is kind of against our aim of exposing
>>>>>> (in a controlled manner) as much of the PCI hardware as possible.
>>>>> May I know what situation that can make this twice for one capability when initialization?
>>>>> Does hardware capability list have a cycle?
>>>>
>>>> Any of this is to work around flawed hardware, I suppose.
>>>>
>>>>>> Imo we ought to be using a bitmap to detect the situation earlier
>>>>>> and hence to be able to avoid redundant register addition. Thoughts?
>>>>> Can we just let it go forward and continue to add register for next capability when rc == -EXIST, instead of returning error ?
>>>>
>>>> Possible, but feels wrong.
>>> How about when EXIST, setting the next bits of previous extended capability to be zero and return 0? Then we break the cycle.
>>
>> Hmm. Again an option, yet again I'm not certain. But that's perhaps just
>> me, and Roger may be fine with it. IOW we might as well start out this way,
>> and adjust if (ever) an issue with a real device is found.
> 
> Returning -EEXIST might be fine, but at that point there's no further
> capability to process.  There's a loop in the linked capability list,
> and we should just exit.  There needs to be a warning in this case,
> and since this is for the hardware domain only it shouldn't be fatal.
> 
If I understand correctly, I need to add below in next version?

         rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
                                pos, 4, (void *)(uintptr_t)header);
+
+        if ( rc == -EEXIST )
+        {
+            printk(XENLOG_WARNING
+                   "%pd %pp: there is a loop in the linked capability list\n",
+                   pdev->domain, &pdev->sbdf);
+            return 0;
+        }
+
         if ( rc )
             return rc;

> If it was for domUs we would possibly need to discuss whether
> assigning the device should fail if a capability linked list loop is
> found.
> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-21  6:08               ` Chen, Jiqian
@ 2025-05-21  6:25                 ` Jan Beulich
  2025-05-21  6:44                   ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2025-05-21  6:25 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Roger Pau Monné

On 21.05.2025 08:08, Chen, Jiqian wrote:
> On 2025/5/19 21:21, Roger Pau Monné wrote:
>> On Mon, May 19, 2025 at 03:10:17PM +0200, Jan Beulich wrote:
>>> On 19.05.2025 09:13, Chen, Jiqian wrote:
>>>> On 2025/5/19 14:56, Jan Beulich wrote:
>>>>> On 19.05.2025 08:43, Chen, Jiqian wrote:
>>>>>> On 2025/5/18 22:20, Jan Beulich wrote:
>>>>>>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>>>>>>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>>>>>>>                                                   PCI_STATUS_RSVDZ_MASK);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>>>>>>>> +{
>>>>>>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
>>>>>>>
>>>>>>> The ttl value exists (in the function you took it from) to make sure
>>>>>>> the loop below eventually ends. That is, to be able to kind of
>>>>>>> gracefully deal with loops in the linked list. Such loops, however,
>>>>>>> would ...
>>>>>>>
>>>>>>>> +    if ( !is_hardware_domain(pdev->domain) )
>>>>>>>> +        /* Extended capabilities read as zero, write ignore for guest */
>>>>>>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>>>>>>> +                                 pos, 4, (void *)0);
>>>>>>>> +
>>>>>>>> +    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
>>>>>>>> +    {
>>>>>>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>>>>>>>> +        int rc;
>>>>>>>> +
>>>>>>>> +        if ( !header )
>>>>>>>> +            return 0;
>>>>>>>> +
>>>>>>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>>>>>>>> +                               pos, 4, (void *)(uintptr_t)header);
>>>>>>>
>>>>>>> ... mean we may invoke this twice for the same capability. Such
>>>>>>> a secondary invocation would fail with -EEXIST, causing device init
>>>>>>> to fail altogether. Which is kind of against our aim of exposing
>>>>>>> (in a controlled manner) as much of the PCI hardware as possible.
>>>>>> May I know what situation that can make this twice for one capability when initialization?
>>>>>> Does hardware capability list have a cycle?
>>>>>
>>>>> Any of this is to work around flawed hardware, I suppose.
>>>>>
>>>>>>> Imo we ought to be using a bitmap to detect the situation earlier
>>>>>>> and hence to be able to avoid redundant register addition. Thoughts?
>>>>>> Can we just let it go forward and continue to add register for next capability when rc == -EXIST, instead of returning error ?
>>>>>
>>>>> Possible, but feels wrong.
>>>> How about when EXIST, setting the next bits of previous extended capability to be zero and return 0? Then we break the cycle.
>>>
>>> Hmm. Again an option, yet again I'm not certain. But that's perhaps just
>>> me, and Roger may be fine with it. IOW we might as well start out this way,
>>> and adjust if (ever) an issue with a real device is found.
>>
>> Returning -EEXIST might be fine, but at that point there's no further
>> capability to process.  There's a loop in the linked capability list,
>> and we should just exit.  There needs to be a warning in this case,
>> and since this is for the hardware domain only it shouldn't be fatal.
>>
> If I understand correctly, I need to add below in next version?
> 
>          rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>                                 pos, 4, (void *)(uintptr_t)header);
> +
> +        if ( rc == -EEXIST )
> +        {
> +            printk(XENLOG_WARNING
> +                   "%pd %pp: there is a loop in the linked capability list\n",

I think we shouldn't say "loop" unless we firmly know that's what the
issue is. Maybe use "overlap" instead? And then also log the offending
register range? (As a nit: "there is" and "linked" are not adding any
value to the log message; to keep them short [without losing
information], please try to avoid such.)

Jan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0
  2025-05-21  6:25                 ` Jan Beulich
@ 2025-05-21  6:44                   ` Chen, Jiqian
  0 siblings, 0 replies; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-21  6:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Roger Pau Monné,
	Chen, Jiqian

On 2025/5/21 14:25, Jan Beulich wrote:
> On 21.05.2025 08:08, Chen, Jiqian wrote:
>> On 2025/5/19 21:21, Roger Pau Monné wrote:
>>> On Mon, May 19, 2025 at 03:10:17PM +0200, Jan Beulich wrote:
>>>> On 19.05.2025 09:13, Chen, Jiqian wrote:
>>>>> On 2025/5/19 14:56, Jan Beulich wrote:
>>>>>> On 19.05.2025 08:43, Chen, Jiqian wrote:
>>>>>>> On 2025/5/18 22:20, Jan Beulich wrote:
>>>>>>>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>>>>>>>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>>>>>>>>                                                   PCI_STATUS_RSVDZ_MASK);
>>>>>>>>>  }
>>>>>>>>>  
>>>>>>>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>>>>>>>>> +{
>>>>>>>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
>>>>>>>>
>>>>>>>> The ttl value exists (in the function you took it from) to make sure
>>>>>>>> the loop below eventually ends. That is, to be able to kind of
>>>>>>>> gracefully deal with loops in the linked list. Such loops, however,
>>>>>>>> would ...
>>>>>>>>
>>>>>>>>> +    if ( !is_hardware_domain(pdev->domain) )
>>>>>>>>> +        /* Extended capabilities read as zero, write ignore for guest */
>>>>>>>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>>>>>>>> +                                 pos, 4, (void *)0);
>>>>>>>>> +
>>>>>>>>> +    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
>>>>>>>>> +    {
>>>>>>>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>>>>>>>>> +        int rc;
>>>>>>>>> +
>>>>>>>>> +        if ( !header )
>>>>>>>>> +            return 0;
>>>>>>>>> +
>>>>>>>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>>>>>>>>> +                               pos, 4, (void *)(uintptr_t)header);
>>>>>>>>
>>>>>>>> ... mean we may invoke this twice for the same capability. Such
>>>>>>>> a secondary invocation would fail with -EEXIST, causing device init
>>>>>>>> to fail altogether. Which is kind of against our aim of exposing
>>>>>>>> (in a controlled manner) as much of the PCI hardware as possible.
>>>>>>> May I know what situation that can make this twice for one capability when initialization?
>>>>>>> Does hardware capability list have a cycle?
>>>>>>
>>>>>> Any of this is to work around flawed hardware, I suppose.
>>>>>>
>>>>>>>> Imo we ought to be using a bitmap to detect the situation earlier
>>>>>>>> and hence to be able to avoid redundant register addition. Thoughts?
>>>>>>> Can we just let it go forward and continue to add register for next capability when rc == -EXIST, instead of returning error ?
>>>>>>
>>>>>> Possible, but feels wrong.
>>>>> How about when EXIST, setting the next bits of previous extended capability to be zero and return 0? Then we break the cycle.
>>>>
>>>> Hmm. Again an option, yet again I'm not certain. But that's perhaps just
>>>> me, and Roger may be fine with it. IOW we might as well start out this way,
>>>> and adjust if (ever) an issue with a real device is found.
>>>
>>> Returning -EEXIST might be fine, but at that point there's no further
>>> capability to process.  There's a loop in the linked capability list,
>>> and we should just exit.  There needs to be a warning in this case,
>>> and since this is for the hardware domain only it shouldn't be fatal.
>>>
>> If I understand correctly, I need to add below in next version?
>>
>>          rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>>                                 pos, 4, (void *)(uintptr_t)header);
>> +
>> +        if ( rc == -EEXIST )
>> +        {
>> +            printk(XENLOG_WARNING
>> +                   "%pd %pp: there is a loop in the linked capability list\n",
> 
> I think we shouldn't say "loop" unless we firmly know that's what the
> issue is. Maybe use "overlap" instead? And then also log the offending
> register range? (As a nit: "there is" and "linked" are not adding any
> value to the log message; to keep them short [without losing
> information], please try to avoid such.)
OK, below may be more in line with your opinion.

         rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
                                pos, 4, (void *)(uintptr_t)header);
+
+        if ( rc == -EEXIST )
+        {
+            printk(XENLOG_WARNING
+                   "%pd %pp: overlap in extended cap list, offset %#x\n",
+                   pdev->domain, &pdev->sbdf, pos);
+            return 0;
+        }
+
         if ( rc )
             return rc;

> 
> Jan

-- 
Best regards,
Jiqian Chen.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-20  9:43         ` Roger Pau Monné
@ 2025-05-21  7:00           ` Chen, Jiqian
  2025-05-21 11:23             ` Roger Pau Monné
  0 siblings, 1 reply; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-21  7:00 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: Chen, Jiqian, Huang, Ray, xen-devel@lists.xenproject.org

On 2025/5/20 17:43, Roger Pau Monné wrote:
> On Tue, May 20, 2025 at 11:14:27AM +0200, Jan Beulich wrote:
>> On 20.05.2025 11:09, Roger Pau Monné wrote:
>>> On Tue, May 20, 2025 at 08:40:28AM +0200, Jan Beulich wrote:
>>>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>>>> When init_msi() fails, the previous new changes will hide MSI
>>>>> capability, it can't rely on vpci_deassign_device() to remove
>>>>> all MSI related resources anymore, those resources must be
>>>>> removed in cleanup function of MSI.
>>>>
>>>> That's because vpci_deassign_device() simply isn't called anymore?
>>>> Could do with wording along these lines then. But (also applicable
>>>> to the previous patch) - doesn't this need to come earlier? And is
>>>> it sufficient to simply remove the register intercepts? Don't you
>>>> need to put in place ones dropping all writes and making all reads
>>>> return either 0 or ~0 (covering in particular Dom0, while for DomU-s
>>>> this may already be the case by default behavior)?
>>>
>>> For domUs this is already the default behavior.
>>>
>>> For dom0 I think it should be enough to hide the capability from the
>>> linked list, but not hide all the capability related
>>> registers.  IMO a well behaved dom0 won't try to access capabilities
>>> disconnected from the linked list,
>>
>> Just that I've seen drivers knowing where their device has certain
>> capabilities, thus not bothering to look up the respective
>> capability.
> 
> OK, so let's make the control register read-only in case of failure.
> 
> If MSI(-X) is already enabled we should also make the entries
> read-only, and while that's not very complicated for MSI, it does get
> more convoluted for MSI-X.  I'm fine with just making the control
> register read-only for the time being.
If I understand correctly, I need to avoid control register being removed and set the write hook of control register to be vpci_ignored_write and avoid freeing vpci->msi?

"
     if ( !msi_pos || !vpci->msi )
         return;

+    spin_lock(&vpci->lock);
+    control = vpci_get_register(vpci, msi_control_reg(msi_pos), 2);
+    if ( control )
+        control->write = vpci_ignored_write;
+    spin_unlock(&vpci->lock);
+
     if ( vpci->msi->masking )
         end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
     else
         end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;

-    size = end - msi_control_reg(msi_pos);
+    start = msi_control_reg(msi_pos) + 2;
+    size = end - start;

-    vpci_remove_registers(vpci, msi_control_reg(msi_pos), size);
-    XFREE(vpci->msi);
+    vpci_remove_registers(vpci, start, size);
"

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-21  7:00           ` Chen, Jiqian
@ 2025-05-21 11:23             ` Roger Pau Monné
  2025-05-22  2:21               ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2025-05-21 11:23 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: Jan Beulich, Huang, Ray, xen-devel@lists.xenproject.org

On Wed, May 21, 2025 at 07:00:37AM +0000, Chen, Jiqian wrote:
> On 2025/5/20 17:43, Roger Pau Monné wrote:
> > On Tue, May 20, 2025 at 11:14:27AM +0200, Jan Beulich wrote:
> >> On 20.05.2025 11:09, Roger Pau Monné wrote:
> >>> On Tue, May 20, 2025 at 08:40:28AM +0200, Jan Beulich wrote:
> >>>> On 09.05.2025 11:05, Jiqian Chen wrote:
> >>>>> When init_msi() fails, the previous new changes will hide MSI
> >>>>> capability, it can't rely on vpci_deassign_device() to remove
> >>>>> all MSI related resources anymore, those resources must be
> >>>>> removed in cleanup function of MSI.
> >>>>
> >>>> That's because vpci_deassign_device() simply isn't called anymore?
> >>>> Could do with wording along these lines then. But (also applicable
> >>>> to the previous patch) - doesn't this need to come earlier? And is
> >>>> it sufficient to simply remove the register intercepts? Don't you
> >>>> need to put in place ones dropping all writes and making all reads
> >>>> return either 0 or ~0 (covering in particular Dom0, while for DomU-s
> >>>> this may already be the case by default behavior)?
> >>>
> >>> For domUs this is already the default behavior.
> >>>
> >>> For dom0 I think it should be enough to hide the capability from the
> >>> linked list, but not hide all the capability related
> >>> registers.  IMO a well behaved dom0 won't try to access capabilities
> >>> disconnected from the linked list,
> >>
> >> Just that I've seen drivers knowing where their device has certain
> >> capabilities, thus not bothering to look up the respective
> >> capability.
> > 
> > OK, so let's make the control register read-only in case of failure.
> > 
> > If MSI(-X) is already enabled we should also make the entries
> > read-only, and while that's not very complicated for MSI, it does get
> > more convoluted for MSI-X.  I'm fine with just making the control
> > register read-only for the time being.
> If I understand correctly, I need to avoid control register being removed and set the write hook of control register to be vpci_ignored_write and avoid freeing vpci->msi?
> 
> "
>      if ( !msi_pos || !vpci->msi )
>          return;
> 
> +    spin_lock(&vpci->lock);
> +    control = vpci_get_register(vpci, msi_control_reg(msi_pos), 2);
> +    if ( control )
> +        control->write = vpci_ignored_write;
> +    spin_unlock(&vpci->lock);
> +
>      if ( vpci->msi->masking )
>          end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
>      else
>          end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
> 
> -    size = end - msi_control_reg(msi_pos);
> +    start = msi_control_reg(msi_pos) + 2;
> +    size = end - start;
> 
> -    vpci_remove_registers(vpci, msi_control_reg(msi_pos), size);
> -    XFREE(vpci->msi);
> +    vpci_remove_registers(vpci, start, size);

I think you want to first purge all the MSI range, and then add the
control register, also you want to keep the XFREE(), and set the
register as:

vpci_add_register(vpci, vpci_hw_read16, NULL, msi_control_reg(msi_pos),
                  2, NULL);

So that you make it strictly hardware read-only, and not use the data
in vpci->msi.

Regards, Roger.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-21 11:23             ` Roger Pau Monné
@ 2025-05-22  2:21               ` Chen, Jiqian
  2025-05-22  7:12                 ` Roger Pau Monné
  0 siblings, 1 reply; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-22  2:21 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel@lists.xenproject.org, Huang, Ray,
	Chen, Jiqian

On 2025/5/21 19:23, Roger Pau Monné wrote:
> On Wed, May 21, 2025 at 07:00:37AM +0000, Chen, Jiqian wrote:
>> On 2025/5/20 17:43, Roger Pau Monné wrote:
>>> On Tue, May 20, 2025 at 11:14:27AM +0200, Jan Beulich wrote:
>>>> On 20.05.2025 11:09, Roger Pau Monné wrote:
>>>>> On Tue, May 20, 2025 at 08:40:28AM +0200, Jan Beulich wrote:
>>>>>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>>>>>> When init_msi() fails, the previous new changes will hide MSI
>>>>>>> capability, it can't rely on vpci_deassign_device() to remove
>>>>>>> all MSI related resources anymore, those resources must be
>>>>>>> removed in cleanup function of MSI.
>>>>>>
>>>>>> That's because vpci_deassign_device() simply isn't called anymore?
>>>>>> Could do with wording along these lines then. But (also applicable
>>>>>> to the previous patch) - doesn't this need to come earlier? And is
>>>>>> it sufficient to simply remove the register intercepts? Don't you
>>>>>> need to put in place ones dropping all writes and making all reads
>>>>>> return either 0 or ~0 (covering in particular Dom0, while for DomU-s
>>>>>> this may already be the case by default behavior)?
>>>>>
>>>>> For domUs this is already the default behavior.
>>>>>
>>>>> For dom0 I think it should be enough to hide the capability from the
>>>>> linked list, but not hide all the capability related
>>>>> registers.  IMO a well behaved dom0 won't try to access capabilities
>>>>> disconnected from the linked list,
>>>>
>>>> Just that I've seen drivers knowing where their device has certain
>>>> capabilities, thus not bothering to look up the respective
>>>> capability.
>>>
>>> OK, so let's make the control register read-only in case of failure.
>>>
>>> If MSI(-X) is already enabled we should also make the entries
>>> read-only, and while that's not very complicated for MSI, it does get
>>> more convoluted for MSI-X.  I'm fine with just making the control
>>> register read-only for the time being.
>> If I understand correctly, I need to avoid control register being removed and set the write hook of control register to be vpci_ignored_write and avoid freeing vpci->msi?
>>
>> "
>>      if ( !msi_pos || !vpci->msi )
>>          return;
>>
>> +    spin_lock(&vpci->lock);
>> +    control = vpci_get_register(vpci, msi_control_reg(msi_pos), 2);
>> +    if ( control )
>> +        control->write = vpci_ignored_write;
>> +    spin_unlock(&vpci->lock);
>> +
>>      if ( vpci->msi->masking )
>>          end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
>>      else
>>          end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
>>
>> -    size = end - msi_control_reg(msi_pos);
>> +    start = msi_control_reg(msi_pos) + 2;
>> +    size = end - start;
>>
>> -    vpci_remove_registers(vpci, msi_control_reg(msi_pos), size);
>> -    XFREE(vpci->msi);
>> +    vpci_remove_registers(vpci, start, size);
> 
> I think you want to first purge all the MSI range, and then add the
> control register, also you want to keep the XFREE(), and set the
> register as:
Understood.

> 
> vpci_add_register(vpci, vpci_hw_read16, NULL, msi_control_reg(msi_pos),
>                   2, NULL);
And one more question, how do I process return value of vpci_add_register since definition of cleanup hook is "void"?
Print a error message if fail?

> 
> So that you make it strictly hardware read-only, and not use the data
> in vpci->msi.
> 
> Regards, Roger.

-- 
Best regards,
Jiqian Chen.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-22  2:21               ` Chen, Jiqian
@ 2025-05-22  7:12                 ` Roger Pau Monné
  2025-05-22  7:27                   ` Chen, Jiqian
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2025-05-22  7:12 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: Jan Beulich, xen-devel@lists.xenproject.org, Huang, Ray

On Thu, May 22, 2025 at 02:21:16AM +0000, Chen, Jiqian wrote:
> On 2025/5/21 19:23, Roger Pau Monné wrote:
> > On Wed, May 21, 2025 at 07:00:37AM +0000, Chen, Jiqian wrote:
> >> On 2025/5/20 17:43, Roger Pau Monné wrote:
> >>> On Tue, May 20, 2025 at 11:14:27AM +0200, Jan Beulich wrote:
> >>>> On 20.05.2025 11:09, Roger Pau Monné wrote:
> >>>>> On Tue, May 20, 2025 at 08:40:28AM +0200, Jan Beulich wrote:
> >>>>>> On 09.05.2025 11:05, Jiqian Chen wrote:
> >>>>>>> When init_msi() fails, the previous new changes will hide MSI
> >>>>>>> capability, it can't rely on vpci_deassign_device() to remove
> >>>>>>> all MSI related resources anymore, those resources must be
> >>>>>>> removed in cleanup function of MSI.
> >>>>>>
> >>>>>> That's because vpci_deassign_device() simply isn't called anymore?
> >>>>>> Could do with wording along these lines then. But (also applicable
> >>>>>> to the previous patch) - doesn't this need to come earlier? And is
> >>>>>> it sufficient to simply remove the register intercepts? Don't you
> >>>>>> need to put in place ones dropping all writes and making all reads
> >>>>>> return either 0 or ~0 (covering in particular Dom0, while for DomU-s
> >>>>>> this may already be the case by default behavior)?
> >>>>>
> >>>>> For domUs this is already the default behavior.
> >>>>>
> >>>>> For dom0 I think it should be enough to hide the capability from the
> >>>>> linked list, but not hide all the capability related
> >>>>> registers.  IMO a well behaved dom0 won't try to access capabilities
> >>>>> disconnected from the linked list,
> >>>>
> >>>> Just that I've seen drivers knowing where their device has certain
> >>>> capabilities, thus not bothering to look up the respective
> >>>> capability.
> >>>
> >>> OK, so let's make the control register read-only in case of failure.
> >>>
> >>> If MSI(-X) is already enabled we should also make the entries
> >>> read-only, and while that's not very complicated for MSI, it does get
> >>> more convoluted for MSI-X.  I'm fine with just making the control
> >>> register read-only for the time being.
> >> If I understand correctly, I need to avoid control register being removed and set the write hook of control register to be vpci_ignored_write and avoid freeing vpci->msi?
> >>
> >> "
> >>      if ( !msi_pos || !vpci->msi )
> >>          return;
> >>
> >> +    spin_lock(&vpci->lock);
> >> +    control = vpci_get_register(vpci, msi_control_reg(msi_pos), 2);
> >> +    if ( control )
> >> +        control->write = vpci_ignored_write;
> >> +    spin_unlock(&vpci->lock);
> >> +
> >>      if ( vpci->msi->masking )
> >>          end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
> >>      else
> >>          end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
> >>
> >> -    size = end - msi_control_reg(msi_pos);
> >> +    start = msi_control_reg(msi_pos) + 2;
> >> +    size = end - start;
> >>
> >> -    vpci_remove_registers(vpci, msi_control_reg(msi_pos), size);
> >> -    XFREE(vpci->msi);
> >> +    vpci_remove_registers(vpci, start, size);
> > 
> > I think you want to first purge all the MSI range, and then add the
> > control register, also you want to keep the XFREE(), and set the
> > register as:
> Understood.
> 
> > 
> > vpci_add_register(vpci, vpci_hw_read16, NULL, msi_control_reg(msi_pos),
> >                   2, NULL);
> And one more question, how do I process return value of vpci_add_register since definition of cleanup hook is "void"?
> Print a error message if fail?

Well, we should consider the cleanup function returning an error code.
vpci_remove_registers() can also fail, and the error is currently
ignored.  Both cases should result in failing to assign the device to
the domain IMO.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails
  2025-05-22  7:12                 ` Roger Pau Monné
@ 2025-05-22  7:27                   ` Chen, Jiqian
  0 siblings, 0 replies; 45+ messages in thread
From: Chen, Jiqian @ 2025-05-22  7:27 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel@lists.xenproject.org, Huang, Ray,
	Chen, Jiqian

On 2025/5/22 15:12, Roger Pau Monné wrote:
> On Thu, May 22, 2025 at 02:21:16AM +0000, Chen, Jiqian wrote:
>> On 2025/5/21 19:23, Roger Pau Monné wrote:
>>> On Wed, May 21, 2025 at 07:00:37AM +0000, Chen, Jiqian wrote:
>>>> On 2025/5/20 17:43, Roger Pau Monné wrote:
>>>>> On Tue, May 20, 2025 at 11:14:27AM +0200, Jan Beulich wrote:
>>>>>> On 20.05.2025 11:09, Roger Pau Monné wrote:
>>>>>>> On Tue, May 20, 2025 at 08:40:28AM +0200, Jan Beulich wrote:
>>>>>>>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>>>>>>>> When init_msi() fails, the previous new changes will hide MSI
>>>>>>>>> capability, it can't rely on vpci_deassign_device() to remove
>>>>>>>>> all MSI related resources anymore, those resources must be
>>>>>>>>> removed in cleanup function of MSI.
>>>>>>>>
>>>>>>>> That's because vpci_deassign_device() simply isn't called anymore?
>>>>>>>> Could do with wording along these lines then. But (also applicable
>>>>>>>> to the previous patch) - doesn't this need to come earlier? And is
>>>>>>>> it sufficient to simply remove the register intercepts? Don't you
>>>>>>>> need to put in place ones dropping all writes and making all reads
>>>>>>>> return either 0 or ~0 (covering in particular Dom0, while for DomU-s
>>>>>>>> this may already be the case by default behavior)?
>>>>>>>
>>>>>>> For domUs this is already the default behavior.
>>>>>>>
>>>>>>> For dom0 I think it should be enough to hide the capability from the
>>>>>>> linked list, but not hide all the capability related
>>>>>>> registers.  IMO a well behaved dom0 won't try to access capabilities
>>>>>>> disconnected from the linked list,
>>>>>>
>>>>>> Just that I've seen drivers knowing where their device has certain
>>>>>> capabilities, thus not bothering to look up the respective
>>>>>> capability.
>>>>>
>>>>> OK, so let's make the control register read-only in case of failure.
>>>>>
>>>>> If MSI(-X) is already enabled we should also make the entries
>>>>> read-only, and while that's not very complicated for MSI, it does get
>>>>> more convoluted for MSI-X.  I'm fine with just making the control
>>>>> register read-only for the time being.
>>>> If I understand correctly, I need to avoid control register being removed and set the write hook of control register to be vpci_ignored_write and avoid freeing vpci->msi?
>>>>
>>>> "
>>>>      if ( !msi_pos || !vpci->msi )
>>>>          return;
>>>>
>>>> +    spin_lock(&vpci->lock);
>>>> +    control = vpci_get_register(vpci, msi_control_reg(msi_pos), 2);
>>>> +    if ( control )
>>>> +        control->write = vpci_ignored_write;
>>>> +    spin_unlock(&vpci->lock);
>>>> +
>>>>      if ( vpci->msi->masking )
>>>>          end = msi_pending_bits_reg(msi_pos, vpci->msi->address64);
>>>>      else
>>>>          end = msi_mask_bits_reg(msi_pos, vpci->msi->address64) - 2;
>>>>
>>>> -    size = end - msi_control_reg(msi_pos);
>>>> +    start = msi_control_reg(msi_pos) + 2;
>>>> +    size = end - start;
>>>>
>>>> -    vpci_remove_registers(vpci, msi_control_reg(msi_pos), size);
>>>> -    XFREE(vpci->msi);
>>>> +    vpci_remove_registers(vpci, start, size);
>>>
>>> I think you want to first purge all the MSI range, and then add the
>>> control register, also you want to keep the XFREE(), and set the
>>> register as:
>> Understood.
>>
>>>
>>> vpci_add_register(vpci, vpci_hw_read16, NULL, msi_control_reg(msi_pos),
>>>                   2, NULL);
>> And one more question, how do I process return value of vpci_add_register since definition of cleanup hook is "void"?
>> Print a error message if fail?
> 
> Well, we should consider the cleanup function returning an error code.
> vpci_remove_registers() can also fail, and the error is currently
> ignored.  Both cases should result in failing to assign the device to
> the domain IMO.
OK, will change in next version.
Thank you!

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2025-05-22  7:27 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-09  9:05 [PATCH v4 00/10] Support hiding capability when its initialization fails Jiqian Chen
2025-05-09  9:05 ` [PATCH v4 01/10] vpci/header: Move emulating cap list logic into new function Jiqian Chen
2025-05-09  9:05 ` [PATCH v4 02/10] vpci/header: Emulate legacy capability list for dom0 Jiqian Chen
2025-05-15 16:29   ` Roger Pau Monné
2025-05-16  2:33     ` Chen, Jiqian
2025-05-09  9:05 ` [PATCH v4 03/10] vpci/header: Emulate extended " Jiqian Chen
2025-05-18 14:20   ` Jan Beulich
2025-05-19  6:43     ` Chen, Jiqian
2025-05-19  6:56       ` Jan Beulich
2025-05-19  7:13         ` Chen, Jiqian
2025-05-19 13:10           ` Jan Beulich
2025-05-19 13:21             ` Roger Pau Monné
2025-05-21  6:08               ` Chen, Jiqian
2025-05-21  6:25                 ` Jan Beulich
2025-05-21  6:44                   ` Chen, Jiqian
2025-05-09  9:05 ` [PATCH v4 04/10] vpci: Refactor REGISTER_VPCI_INIT Jiqian Chen
2025-05-18 14:34   ` Jan Beulich
2025-05-19  6:56     ` Chen, Jiqian
2025-05-19 13:07       ` Jan Beulich
2025-05-09  9:05 ` [PATCH v4 05/10] vpci: Hide legacy capability when it fails to initialize Jiqian Chen
2025-05-18 14:44   ` Jan Beulich
2025-05-19  7:35     ` Chen, Jiqian
2025-05-19 10:04       ` Roger Pau Monné
2025-05-09  9:05 ` [PATCH v4 06/10] vpci: Hide extended " Jiqian Chen
2025-05-18 14:47   ` Jan Beulich
2025-05-19  7:41     ` Chen, Jiqian
2025-05-19 13:12       ` Jan Beulich
2025-05-09  9:05 ` [PATCH v4 07/10] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
2025-05-19  6:30   ` Jan Beulich
2025-05-19  7:44     ` Chen, Jiqian
2025-05-19 10:24     ` Roger Pau Monné
2025-05-09  9:05 ` [PATCH v4 08/10] vpci/rebar: Remove registers when init_rebar() fails Jiqian Chen
2025-05-19  6:54   ` Jan Beulich
2025-05-19  7:49     ` Chen, Jiqian
2025-05-09  9:05 ` [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
2025-05-20  6:40   ` Jan Beulich
2025-05-20  9:09     ` Roger Pau Monné
2025-05-20  9:14       ` Jan Beulich
2025-05-20  9:43         ` Roger Pau Monné
2025-05-21  7:00           ` Chen, Jiqian
2025-05-21 11:23             ` Roger Pau Monné
2025-05-22  2:21               ` Chen, Jiqian
2025-05-22  7:12                 ` Roger Pau Monné
2025-05-22  7:27                   ` Chen, Jiqian
2025-05-09  9:05 ` [PATCH v4 10/10] vpci/msix: Add function to clean MSIX resources Jiqian Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.