All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] Support hiding capability when its initialization fails
@ 2025-04-09  6:45 Jiqian Chen
  2025-04-09  6:45 ` [PATCH v2 1/8] driver/pci: Get next capability without passing caps Jiqian Chen
                   ` (7 more replies)
  0 siblings, 8 replies; 41+ messages in thread
From: Jiqian Chen @ 2025-04-09  6:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
	Stefano Stabellini

Hi,

This series is to
emulate legacy and extended capability list for host, including patch #1, #2, #3,
hide capability when its initialization fails, including patch #4,
remove all related registers and other resources when failure, including patch #5, #6, #7, #8.

Best regards,
Jiqian Chen.
---
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
Jiqian Chen (8):
  driver/pci: Get next capability without passing caps
  vpci/header: Emulate legacy capability list for host
  vpci/header: Emulate extended capability list for host
  vpci: Hide capability when it fails to initialize
  vpci: Refactor vpci_remove_register to remove matched registers
  vpci/rebar: Remove registers when init_rebar() fails
  vpci/msi: Free MSI resources when init_msi() fails
  vpci/msix: Add function to clean MSIX resources

 tools/tests/vpci/main.c           |   4 +-
 xen/arch/x86/hvm/intercept.c      |  44 +++++++
 xen/arch/x86/include/asm/hvm/io.h |   3 +
 xen/drivers/pci/pci.c             |   6 +-
 xen/drivers/vpci/header.c         | 180 ++++++++++++++++-----------
 xen/drivers/vpci/msi.c            |  49 ++++++--
 xen/drivers/vpci/msix.c           |  67 +++++++++-
 xen/drivers/vpci/rebar.c          |  27 ++--
 xen/drivers/vpci/vpci.c           | 198 ++++++++++++++++++++++++------
 xen/include/xen/pci.h             |   2 +-
 xen/include/xen/pci_regs.h        |   1 +
 xen/include/xen/vpci.h            |  30 +++--
 xen/include/xen/xen.lds.h         |   2 +-
 13 files changed, 461 insertions(+), 152 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/8] driver/pci: Get next capability without passing caps
  2025-04-09  6:45 [PATCH v2 0/8] Support hiding capability when its initialization fails Jiqian Chen
@ 2025-04-09  6:45 ` Jiqian Chen
  2025-04-10 12:34   ` Jan Beulich
  2025-04-09  6:45 ` [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host Jiqian Chen
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Jiqian Chen @ 2025-04-09  6:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
	Stefano Stabellini

Modify function pci_find_next_cap_ttl to support returning position
of next capability when array "caps" is empty or size "n" is zero.

That can help caller to get next capability offset if caller just
has a information of current capability offset.

That will be used in a follow-on change.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 xen/drivers/pci/pci.c | 6 +++++-
 xen/include/xen/pci.h | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
index edf5b9f7ae9f..ec81d0db6133 100644
--- a/xen/drivers/pci/pci.c
+++ b/xen/drivers/pci/pci.c
@@ -40,7 +40,7 @@ unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, unsigned int cap)
 }
 
 unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
-                                   const unsigned int caps[], unsigned int n,
+                                   const unsigned int *caps, unsigned int n,
                                    unsigned int *ttl)
 {
     while ( (*ttl)-- )
@@ -55,6 +55,10 @@ unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
 
         if ( id == 0xff )
             break;
+
+        if ( !caps || n == 0 )
+            return pos;
+
         for ( i = 0; i < n; i++ )
         {
             if ( id == caps[i] )
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index ef601966533e..cc84f2cbebfc 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -251,7 +251,7 @@ int pci_mmcfg_write(unsigned int seg, unsigned int bus,
                     unsigned int devfn, int reg, int len, u32 value);
 unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, unsigned int cap);
 unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
-                                   const unsigned int caps[], unsigned int n,
+                                   const unsigned int *caps, unsigned int n,
                                    unsigned int *ttl);
 unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
                                unsigned int cap);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-09  6:45 [PATCH v2 0/8] Support hiding capability when its initialization fails Jiqian Chen
  2025-04-09  6:45 ` [PATCH v2 1/8] driver/pci: Get next capability without passing caps Jiqian Chen
@ 2025-04-09  6:45 ` Jiqian Chen
  2025-04-10 12:40   ` Jan Beulich
                     ` (2 more replies)
  2025-04-09  6:45 ` [PATCH v2 3/8] vpci/header: Emulate extended " Jiqian Chen
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 41+ messages in thread
From: Jiqian Chen @ 2025-04-09  6:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

Current logic of init_header() only emulates legacy capability list
for guest, expand it to emulate for host too. So that it will be
easy to hide a capability whose initialization fails and no need
to distinguish host or guest.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c | 139 ++++++++++++++++++++------------------
 1 file changed, 74 insertions(+), 65 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index ef6c965c081c..0910eb940e23 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -745,6 +745,76 @@ static int bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
     return !bar->mem ? -ENOMEM : 0;
 }
 
+/* These capabilities can be exposed to the guest, that vPCI can handle. */
+static const unsigned int guest_supported_caps[] = {
+    PCI_CAP_ID_MSI,
+    PCI_CAP_ID_MSIX,
+};
+
+static int vpci_init_capability_list(struct pci_dev *pdev)
+{
+    int rc;
+    bool mask_cap_list = false;
+    bool is_hwdom = is_hardware_domain(pdev->domain);
+    const unsigned int *caps = is_hwdom ? NULL : guest_supported_caps;
+    const unsigned int n = is_hwdom ? 0 : ARRAY_SIZE(guest_supported_caps);
+
+    if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
+    {
+        unsigned int next, ttl = 48;
+
+        next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
+                                     caps, n, &ttl);
+
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                               PCI_CAPABILITY_LIST, 1,
+                               (void *)(uintptr_t)next);
+        if ( rc )
+            return rc;
+
+        next &= ~3;
+
+        if ( !next && !is_hwdom )
+            /*
+             * If we don't have any supported capabilities to expose to the
+             * guest, mask the PCI_STATUS_CAP_LIST bit in the status register.
+             */
+            mask_cap_list = true;
+
+        while ( next && ttl )
+        {
+            unsigned int pos = next;
+
+            next = pci_find_next_cap_ttl(pdev->sbdf, pos + PCI_CAP_LIST_NEXT,
+                                         caps, n, &ttl);
+
+            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
+                                   pos + PCI_CAP_LIST_ID, 1, NULL);
+            if ( rc )
+                return rc;
+
+            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                   pos + PCI_CAP_LIST_NEXT, 1,
+                                   (void *)(uintptr_t)next);
+            if ( rc )
+                return rc;
+
+            next &= ~3;
+        }
+    }
+
+    /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
+    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
+                                PCI_STATUS, 2, NULL,
+                                PCI_STATUS_RO_MASK &
+                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
+                                PCI_STATUS_RW1C_MASK,
+                                mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
+                                PCI_STATUS_RSVDZ_MASK);
+
+    return rc;
+}
+
 static int cf_check init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -753,7 +823,6 @@ static int cf_check init_header(struct pci_dev *pdev)
     struct vpci_header *header = &pdev->vpci->header;
     struct vpci_bar *bars = header->bars;
     int rc;
-    bool mask_cap_list = false;
     bool is_hwdom = is_hardware_domain(pdev->domain);
 
     ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
@@ -794,61 +863,12 @@ static int cf_check init_header(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
+    rc = vpci_init_capability_list(pdev);
+    if ( rc )
+        return rc;
+
     if ( !is_hwdom )
     {
-        if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
-        {
-            /* Only expose capabilities to the guest that vPCI can handle. */
-            unsigned int next, ttl = 48;
-            static const unsigned int supported_caps[] = {
-                PCI_CAP_ID_MSI,
-                PCI_CAP_ID_MSIX,
-            };
-
-            next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
-                                         supported_caps,
-                                         ARRAY_SIZE(supported_caps), &ttl);
-
-            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
-                                   PCI_CAPABILITY_LIST, 1,
-                                   (void *)(uintptr_t)next);
-            if ( rc )
-                return rc;
-
-            next &= ~3;
-
-            if ( !next )
-                /*
-                 * If we don't have any supported capabilities to expose to the
-                 * guest, mask the PCI_STATUS_CAP_LIST bit in the status
-                 * register.
-                 */
-                mask_cap_list = true;
-
-            while ( next && ttl )
-            {
-                unsigned int pos = next;
-
-                next = pci_find_next_cap_ttl(pdev->sbdf,
-                                             pos + PCI_CAP_LIST_NEXT,
-                                             supported_caps,
-                                             ARRAY_SIZE(supported_caps), &ttl);
-
-                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
-                                       pos + PCI_CAP_LIST_ID, 1, NULL);
-                if ( rc )
-                    return rc;
-
-                rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
-                                       pos + PCI_CAP_LIST_NEXT, 1,
-                                       (void *)(uintptr_t)next);
-                if ( rc )
-                    return rc;
-
-                next &= ~3;
-            }
-        }
-
         /* Extended capabilities read as zero, write ignore */
         rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL, 0x100, 4,
                                (void *)0);
@@ -856,17 +876,6 @@ static int cf_check init_header(struct pci_dev *pdev)
             return rc;
     }
 
-    /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
-    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
-                                PCI_STATUS, 2, NULL,
-                                PCI_STATUS_RO_MASK &
-                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
-                                PCI_STATUS_RW1C_MASK,
-                                mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
-                                PCI_STATUS_RSVDZ_MASK);
-    if ( rc )
-        return rc;
-
     if ( pdev->ignore_bars )
         return 0;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 3/8] vpci/header: Emulate extended capability list for host
  2025-04-09  6:45 [PATCH v2 0/8] Support hiding capability when its initialization fails Jiqian Chen
  2025-04-09  6:45 ` [PATCH v2 1/8] driver/pci: Get next capability without passing caps Jiqian Chen
  2025-04-09  6:45 ` [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host Jiqian Chen
@ 2025-04-09  6:45 ` Jiqian Chen
  2025-04-15  9:42   ` Roger Pau Monné
  2025-04-15  9:49   ` Jan Beulich
  2025-04-09  6:45 ` [PATCH v2 4/8] vpci: Hide capability when it fails to initialize Jiqian Chen
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 41+ messages in thread
From: Jiqian Chen @ 2025-04-09  6:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

Add a new function to emulate extended capability list for host,
and call it in init_header(). So that, it will be easy to hide
a capability whose initialization fails.

As for the extended capability list of guest, keep hiding it.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c | 44 ++++++++++++++++++++++++++++++++-------
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 0910eb940e23..6833d456566b 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -815,6 +815,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
     return rc;
 }
 
+static int vpci_init_ext_capability_list(struct pci_dev *pdev)
+{
+    int rc;
+    u32 header;
+    unsigned int pos = 0x100U, ttl = 480;
+
+    if ( !is_hardware_domain(pdev->domain) )
+    {
+        /* Extended capabilities read as zero, write ignore */
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                               pos, 4, (void *)0);
+        if ( rc )
+            return rc;
+    }
+
+    while ( pos && ttl-- )
+    {
+        header = pci_conf_read32(pdev->sbdf, pos);
+
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                               pos, 4, (void *)(uintptr_t)header);
+        if ( rc )
+            return rc;
+
+        if ( (header == 0) || (header == -1) )
+            return 0;
+
+        pos = PCI_EXT_CAP_NEXT(header);
+    }
+
+    return 0;
+}
+
 static int cf_check init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -867,14 +900,9 @@ static int cf_check init_header(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
-    if ( !is_hwdom )
-    {
-        /* Extended capabilities read as zero, write ignore */
-        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL, 0x100, 4,
-                               (void *)0);
-        if ( rc )
-            return rc;
-    }
+    rc = vpci_init_ext_capability_list(pdev);
+    if ( rc )
+        return rc;
 
     if ( pdev->ignore_bars )
         return 0;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 4/8] vpci: Hide capability when it fails to initialize
  2025-04-09  6:45 [PATCH v2 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (2 preceding siblings ...)
  2025-04-09  6:45 ` [PATCH v2 3/8] vpci/header: Emulate extended " Jiqian Chen
@ 2025-04-09  6:45 ` Jiqian Chen
  2025-04-15 10:47   ` Roger Pau Monné
  2025-04-09  6:45 ` [PATCH v2 5/8] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Jiqian Chen @ 2025-04-09  6:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall,
	Stefano Stabellini

When vpci fails to initialize a capability of a device, it just
return error instead of catching and processing exception. That
makes the entire device unusable.

So, refactor REGISTER_VPCI_INIT to contain more capability specific
information, and use new functions to hide capability when
initialization fails in vpci_assign_device().

Those new functions remove the failed legacy/extended capability
from the emulated legacy/extended capability list.

What's more, change the definition of init_header() since it is
not a capability and it is needed for all devices' PCI config space.

Note: call vpci_make_msix_hole() in the end of init_msix() since the
change of sequence of init_header() and init_msix().

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
cc: Michal Orzel <michal.orzel@amd.com>
cc: Jan Beulich <jbeulich@suse.com>
cc: Julien Grall <julien@xen.org>
cc: Stefano Stabellini <sstabellini@kernel.org>
---
v1->v2 changes:
* Removed the "priorities" of initializing capabilities since it isn't used anymore.
* Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove failed capability from list.
* Called vpci_make_msix_hole() in the end of init_msix().

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/header.c  |   3 +-
 xen/drivers/vpci/msi.c     |   2 +-
 xen/drivers/vpci/msix.c    |   8 +-
 xen/drivers/vpci/rebar.c   |   2 +-
 xen/drivers/vpci/vpci.c    | 175 +++++++++++++++++++++++++++++++------
 xen/include/xen/pci_regs.h |   1 +
 xen/include/xen/vpci.h     |  26 ++++--
 xen/include/xen/xen.lds.h  |   2 +-
 8 files changed, 179 insertions(+), 40 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 6833d456566b..51a67d76ad8a 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -848,7 +848,7 @@ static int vpci_init_ext_capability_list(struct pci_dev *pdev)
     return 0;
 }
 
-static int cf_check init_header(struct pci_dev *pdev)
+int vpci_init_header(struct pci_dev *pdev)
 {
     uint16_t cmd;
     uint64_t addr, size;
@@ -1044,7 +1044,6 @@ static int cf_check init_header(struct pci_dev *pdev)
     pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
     return rc;
 }
-REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 66e5a8a116be..ca89ae9b9c22 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi);
 
 void vpci_dump_msi(void)
 {
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 6bd8c55bb48e..6537374c79a0 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -751,9 +751,13 @@ static int cf_check init_msix(struct pci_dev *pdev)
     pdev->vpci->msix = msix;
     list_add(&msix->next, &d->arch.hvm.msix_tables);
 
-    return 0;
+    spin_lock(&pdev->vpci->lock);
+    rc = vpci_make_msix_hole(pdev);
+    spin_unlock(&pdev->vpci->lock);
+
+    return rc
 }
-REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
+REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSIX, init_msix);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
index 793937449af7..79858e5dc92f 100644
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_EXTEND_CAP(PCI_EXT_CAP_ID_REBAR, init_rebar);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..f1f125bfdab1 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -35,9 +35,25 @@ struct vpci_register {
     uint32_t rsvdz_mask;
 };
 
+static int vpci_register_cmp(const struct vpci_register *r1,
+                             const struct vpci_register *r2)
+{
+    /* Return 0 if registers overlap. */
+    if ( r1->offset < r2->offset + r2->size &&
+         r2->offset < r1->offset + r1->size )
+        return 0;
+    if ( r1->offset < r2->offset )
+        return -1;
+    if ( r1->offset > r2->offset )
+        return 1;
+
+    ASSERT_UNREACHABLE();
+    return 0;
+}
+
 #ifdef __XEN__
-extern vpci_register_init_t *const __start_vpci_array[];
-extern vpci_register_init_t *const __end_vpci_array[];
+extern vpci_capability_t *const __start_vpci_array[];
+extern vpci_capability_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
@@ -83,6 +99,133 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
 
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
+static void vpci_capability_mask(struct pci_dev *pdev,
+                                 const unsigned int cap)
+{
+    const unsigned int size = 1;
+    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
+    const struct vpci_register r = { .offset = offset, .size = size };
+    struct vpci_register *rm;
+    struct vpci *vpci = pdev->vpci;
+
+    spin_lock(&vpci->lock);
+    list_for_each_entry ( rm, &vpci->handlers, node )
+    {
+        int cmp = vpci_register_cmp(&r, rm);
+
+        if ( !cmp && rm->offset == offset && rm->size == size )
+        {
+            struct vpci_register *pre = list_entry(rm->node.prev,
+                                                   struct vpci_register,
+                                                   node);
+            struct vpci_register *next = list_entry(rm->node.next,
+                                                    struct vpci_register,
+                                                    node);
+
+            pre->private = next->private;
+
+            /* PCI_CAP_LIST_ID register of current capability */
+            list_del(&rm->node);
+            /* PCI_CAP_LIST_NEXT register of current capability */
+            list_del(&next->node);
+            spin_unlock(&vpci->lock);
+
+            xfree(rm);
+            xfree(next);
+            return;
+        }
+        if ( cmp <= 0 )
+            break;
+    }
+    spin_unlock(&vpci->lock);
+}
+
+static void vpci_ext_capability_mask(struct pci_dev *pdev,
+                                     const unsigned int cap)
+{
+    const unsigned int size = 4;
+    const unsigned int offset = pci_find_ext_capability(pdev->sbdf, cap);
+    const struct vpci_register r = { .offset = offset, .size = size };
+    struct vpci_register *rm;
+    struct vpci *vpci = pdev->vpci;
+
+    spin_lock(&vpci->lock);
+    list_for_each_entry ( rm, &vpci->handlers, node )
+    {
+        int cmp = vpci_register_cmp(&r, rm);
+
+        if ( !cmp && rm->offset == offset && rm->size == size )
+        {
+            struct vpci_register *pre;
+            u32 pre_header, header = (u32)(uintptr_t)rm->private;
+
+            if ( offset == 0x100U && PCI_EXT_CAP_NEXT(header) == 0 )
+            {
+                rm->private = (void *)(uintptr_t)0;
+                spin_unlock(&vpci->lock);
+                return;
+            }
+            else if ( offset == 0x100U )
+            {
+                pre = rm;
+                rm = list_entry(rm->node.next, struct vpci_register, node);
+                pre->private = rm->private;
+            }
+            else
+            {
+                pre = list_entry(rm->node.prev, struct vpci_register, node);
+                pre_header = (u32)(uintptr_t)pre->private;
+                pre->private =
+                    (void *)(uintptr_t)((pre_header & !PCI_EXT_CAP_NEXT_MASK) |
+                                        (header & PCI_EXT_CAP_NEXT_MASK));
+            }
+            list_del(&rm->node);
+            spin_unlock(&vpci->lock);
+            xfree(rm);
+            return;
+        }
+        if ( cmp <= 0 )
+            break;
+    }
+    spin_unlock(&vpci->lock);
+}
+
+static void vpci_init_capabilities(struct pci_dev *pdev)
+{
+    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        const vpci_capability_t *capability = __start_vpci_array[i];
+        const unsigned int cap = capability->id;
+        const bool is_ext = capability->is_ext;
+        unsigned int pos;
+        int rc;
+
+        if ( !is_hardware_domain(pdev->domain) && is_ext )
+            continue;
+
+        if ( is_ext )
+            pos = pci_find_ext_capability(pdev->sbdf, cap);
+        else
+            pos = pci_find_cap_offset(pdev->sbdf, cap);
+
+        if ( !pos )
+            continue;
+
+        rc = capability->init(pdev);
+
+        if ( rc )
+        {
+            printk(XENLOG_WARNING "%pd %pp: %s cap %u init fail rc=%d, mask it\n",
+                   pdev->domain, &pdev->sbdf,
+                   is_ext ? "extended" : "legacy", cap, rc);
+            if ( is_ext )
+                vpci_ext_capability_mask(pdev, cap);
+            else
+                vpci_capability_mask(pdev, cap);
+        }
+    }
+}
+
 void vpci_deassign_device(struct pci_dev *pdev)
 {
     unsigned int i;
@@ -128,7 +271,6 @@ void vpci_deassign_device(struct pci_dev *pdev)
 
 int vpci_assign_device(struct pci_dev *pdev)
 {
-    unsigned int i;
     const unsigned long *ro_map;
     int rc = 0;
 
@@ -159,12 +301,11 @@ int vpci_assign_device(struct pci_dev *pdev)
         goto out;
 #endif
 
-    for ( i = 0; i < NUM_VPCI_INIT; i++ )
-    {
-        rc = __start_vpci_array[i](pdev);
-        if ( rc )
-            break;
-    }
+    rc = vpci_init_header(pdev);
+    if ( rc )
+        goto out;
+
+    vpci_init_capabilities(pdev);
 
  out: __maybe_unused;
     if ( rc )
@@ -174,22 +315,6 @@ int vpci_assign_device(struct pci_dev *pdev)
 }
 #endif /* __XEN__ */
 
-static int vpci_register_cmp(const struct vpci_register *r1,
-                             const struct vpci_register *r2)
-{
-    /* Return 0 if registers overlap. */
-    if ( r1->offset < r2->offset + r2->size &&
-         r2->offset < r1->offset + r1->size )
-        return 0;
-    if ( r1->offset < r2->offset )
-        return -1;
-    if ( r1->offset > r2->offset )
-        return 1;
-
-    ASSERT_UNREACHABLE();
-    return 0;
-}
-
 /* Dummy hooks, writes are ignored, reads return 1's */
 static uint32_t cf_check vpci_ignored_read(
     const struct pci_dev *pdev, unsigned int reg, void *data)
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index 27b4f44eedf3..5fe6653fded4 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -449,6 +449,7 @@
 #define PCI_EXT_CAP_ID(header)		((header) & 0x0000ffff)
 #define PCI_EXT_CAP_VER(header)		(((header) >> 16) & 0xf)
 #define PCI_EXT_CAP_NEXT(header)	(((header) >> 20) & 0xffc)
+#define PCI_EXT_CAP_NEXT_MASK		0xFFC00000U
 
 #define PCI_EXT_CAP_ID_ERR	1
 #define PCI_EXT_CAP_ID_VC	2
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 807401b2eaa2..5016ded64d89 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -13,11 +13,11 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
 typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
                           uint32_t val, void *data);
 
-typedef int vpci_register_init_t(struct pci_dev *dev);
-
-#define VPCI_PRIORITY_HIGH      "1"
-#define VPCI_PRIORITY_MIDDLE    "5"
-#define VPCI_PRIORITY_LOW       "9"
+typedef struct {
+    unsigned int id;
+    bool is_ext;
+    int (*init)(struct pci_dev *pdev);
+} vpci_capability_t;
 
 #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
 
@@ -29,9 +29,19 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
  */
 #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
 
-#define REGISTER_VPCI_INIT(x, p)                \
-  static vpci_register_init_t *const x##_entry  \
-               __used_section(".data.vpci." p) = (x)
+#define REGISTER_VPCI_CAP(cap, x, ext) \
+  static vpci_capability_t x##_t = { \
+        .id = (cap), \
+        .init = (x), \
+        .is_ext = (ext), \
+  }; \
+  static vpci_capability_t *const x##_entry  \
+               __used_section(".data.vpci.") = &(x##_t)
+
+#define REGISTER_VPCI_LEGACY_CAP(cap, x) REGISTER_VPCI_CAP(cap, x, false)
+#define REGISTER_VPCI_EXTEND_CAP(cap, x) REGISTER_VPCI_CAP(cap, x, true)
+
+int __must_check vpci_init_header(struct pci_dev *pdev);
 
 /* Assign vPCI to device by adding handlers. */
 int __must_check vpci_assign_device(struct pci_dev *pdev);
diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
index 16a9b1ba03db..c73222112dd3 100644
--- a/xen/include/xen/xen.lds.h
+++ b/xen/include/xen/xen.lds.h
@@ -187,7 +187,7 @@
 #define VPCI_ARRAY               \
        . = ALIGN(POINTER_ALIGN); \
        __start_vpci_array = .;   \
-       *(SORT(.data.vpci.*))     \
+       *(.data.vpci.*)     \
        __end_vpci_array = .;
 #else
 #define VPCI_ARRAY
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 5/8] vpci: Refactor vpci_remove_register to remove matched registers
  2025-04-09  6:45 [PATCH v2 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (3 preceding siblings ...)
  2025-04-09  6:45 ` [PATCH v2 4/8] vpci: Hide capability when it fails to initialize Jiqian Chen
@ 2025-04-09  6:45 ` Jiqian Chen
  2025-04-15 11:06   ` Roger Pau Monné
  2025-04-09  6:45 ` [PATCH v2 6/8] vpci/rebar: Remove registers when init_rebar() fails Jiqian Chen
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Jiqian Chen @ 2025-04-09  6:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné, Anthony PERARD

vpci_remove_register() only supports removing a register in a time,
but the follow-on changes need to remove all registers within a
range. And it is only used for test. So, refactor it to support
removing all matched registers in a calling time.

And it is no matter to remove a non exist register, so remove the
__must_check prefix.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
cc: Anthony PERARD <anthony.perard@vates.tech>
---
v1->v2 changes:
new patch

Best regards,
Jiqian Chen.
---
 tools/tests/vpci/main.c |  4 ++--
 xen/drivers/vpci/vpci.c | 23 ++++++++++++-----------
 xen/include/xen/vpci.h  |  4 ++--
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
index 33223db3eb77..ca72877d60cd 100644
--- a/tools/tests/vpci/main.c
+++ b/tools/tests/vpci/main.c
@@ -132,10 +132,10 @@ static void vpci_write32_mask(const struct pci_dev *pdev, unsigned int reg,
                                   rsvdz_mask))
 
 #define VPCI_REMOVE_REG(off, size)                                          \
-    assert(!vpci_remove_register(test_pdev.vpci, off, size))
+    assert(!vpci_remove_registers(test_pdev.vpci, off, size))
 
 #define VPCI_REMOVE_INVALID_REG(off, size)                                  \
-    assert(vpci_remove_register(test_pdev.vpci, off, size))
+    assert(vpci_remove_registers(test_pdev.vpci, off, size))
 
 /* Read a 32b register using all possible sizes. */
 void multiread4_check(unsigned int reg, uint32_t val)
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index f1f125bfdab1..115d3c5f0c84 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -418,34 +418,35 @@ int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
     return 0;
 }
 
-int vpci_remove_register(struct vpci *vpci, unsigned int offset,
-                         unsigned int size)
+int vpci_remove_registers(struct vpci *vpci, unsigned int offset,
+                          unsigned int size)
 {
     const struct vpci_register r = { .offset = offset, .size = size };
     struct vpci_register *rm;
+    int rc = -ENOENT;
 
     spin_lock(&vpci->lock);
     list_for_each_entry ( rm, &vpci->handlers, node )
     {
         int cmp = vpci_register_cmp(&r, rm);
 
-        /*
-         * NB: do not use a switch so that we can use break to
-         * get out of the list loop earlier if required.
-         */
-        if ( !cmp && rm->offset == offset && rm->size == size )
+        if ( !cmp )
         {
+            struct vpci_register *prev =
+                list_entry(rm->node.prev, struct vpci_register, node);
+
             list_del(&rm->node);
-            spin_unlock(&vpci->lock);
             xfree(rm);
-            return 0;
+            rm = prev;
+            rc = 0;
         }
-        if ( cmp <= 0 )
+
+        if ( cmp < 0 )
             break;
     }
     spin_unlock(&vpci->lock);
 
-    return -ENOENT;
+    return rc;
 }
 
 /* Wrappers for performing reads/writes to the underlying hardware. */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 5016ded64d89..ee9163ca6a56 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -67,8 +67,8 @@ static inline int __must_check vpci_add_register(struct vpci *vpci,
                                   size, data, 0, 0, 0, 0);
 }
 
-int __must_check vpci_remove_register(struct vpci *vpci, unsigned int offset,
-                                      unsigned int size);
+int vpci_remove_registers(struct vpci *vpci, unsigned int offset,
+                          unsigned int size);
 
 /* Generic read/write handlers for the PCI config space. */
 uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 6/8] vpci/rebar: Remove registers when init_rebar() fails
  2025-04-09  6:45 [PATCH v2 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (4 preceding siblings ...)
  2025-04-09  6:45 ` [PATCH v2 5/8] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
@ 2025-04-09  6:45 ` Jiqian Chen
  2025-04-15 12:59   ` Roger Pau Monné
  2025-04-09  6:45 ` [PATCH v2 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
  2025-04-09  6:45 ` [PATCH v2 8/8] vpci/msix: Add function to clean MSIX resources Jiqian Chen
  7 siblings, 1 reply; 41+ messages in thread
From: Jiqian Chen @ 2025-04-09  6:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_rebar() fails, the previous new changes will hide Rebar
capability, it can't rely on vpci_deassign_device() to remove all
Rebar related registers anymore, those registers must be removed
in failure path of init_rebar.

To do that, call vpci_remove_registers() to remove all possible
registered registers in the failure patch.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v1->v2 changes:
* Called vpci_remove_registers() to remove all possible registered registers instead of using a array to record all registered register.

Best regards,
Jiqian Chen.
---
 xen/drivers/vpci/rebar.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
index 79858e5dc92f..e030937956e3 100644
--- a/xen/drivers/vpci/rebar.c
+++ b/xen/drivers/vpci/rebar.c
@@ -51,6 +51,7 @@ static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
 
 static int cf_check init_rebar(struct pci_dev *pdev)
 {
+    int rc = 0;
     uint32_t ctrl;
     unsigned int nbars;
     unsigned int rebar_offset = pci_find_ext_capability(pdev->sbdf,
@@ -70,7 +71,6 @@ static int cf_check init_rebar(struct pci_dev *pdev)
     nbars = MASK_EXTR(ctrl, PCI_REBAR_CTRL_NBAR_MASK);
     for ( unsigned int i = 0; i < nbars; i++ )
     {
-        int rc;
         struct vpci_bar *bar;
         unsigned int index;
 
@@ -80,7 +80,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: too big BAR number %u in REBAR_CTRL\n",
                    pdev->domain, &pdev->sbdf, index);
-            continue;
+            goto fail;
         }
 
         bar = &pdev->vpci->header.bars[index];
@@ -88,7 +88,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: BAR%u is not in memory space\n",
                    pdev->domain, &pdev->sbdf, index);
-            continue;
+            goto fail;
         }
 
         rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rebar_ctrl_write,
@@ -97,14 +97,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
         {
             printk(XENLOG_ERR "%pd %pp: BAR%u fail to add reg of REBAR_CTRL rc=%d\n",
                    pdev->domain, &pdev->sbdf, index, rc);
-            /*
-             * Ideally we would hide the ReBar capability on error, but code
-             * for doing so still needs to be written. Use continue instead
-             * to keep any already setup register hooks, as returning an
-             * error will cause the hardware domain to get unmediated access
-             * to all device registers.
-             */
-            continue;
+            goto fail;
         }
 
         bar->resizable_sizes =
@@ -117,6 +110,16 @@ static int cf_check init_rebar(struct pci_dev *pdev)
     }
 
     return 0;
+
+ fail:
+    /*
+     * Remove all possible registered registers except header.
+     * Header register will be removed in mask function.
+     */
+    vpci_remove_registers(pdev->vpci, rebar_offset + PCI_REBAR_CAP(0),
+                          PCI_REBAR_CTRL(nbars - 1));
+
+    return rc;
 }
 REGISTER_VPCI_EXTEND_CAP(PCI_EXT_CAP_ID_REBAR, init_rebar);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-04-09  6:45 [PATCH v2 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (5 preceding siblings ...)
  2025-04-09  6:45 ` [PATCH v2 6/8] vpci/rebar: Remove registers when init_rebar() fails Jiqian Chen
@ 2025-04-09  6:45 ` Jiqian Chen
  2025-04-15 13:29   ` Roger Pau Monné
  2025-04-09  6:45 ` [PATCH v2 8/8] vpci/msix: Add function to clean MSIX resources Jiqian Chen
  7 siblings, 1 reply; 41+ messages in thread
From: Jiqian Chen @ 2025-04-09  6:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Huang Rui, Jiqian Chen, Roger Pau Monné

When init_msi() fails, the previous new changes will hide MSI
capability, it can't rely on vpci_deassign_device() to remove
all MSI related resources anymore, those resources must be
cleaned up in failure path of init_msi.

To do that, add a new function to free MSI resources.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v1->v2 changes:
* Added a new function fini_msi to free all MSI resources instead of using an array to record registered registers.
---
 xen/drivers/vpci/msi.c | 47 +++++++++++++++++++++++++++++++++---------
 1 file changed, 37 insertions(+), 10 deletions(-)

diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index ca89ae9b9c22..48a466dba0ef 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -193,6 +193,33 @@ static void cf_check mask_write(
     msi->mask = val;
 }
 
+/* 12 is size of MSI structure for 32-bit Message Address without PVM */
+#define MSI_STRUCTURE_SIZE_32 12
+
+static void fini_msi(struct pci_dev *pdev)
+{
+    unsigned int size = MSI_STRUCTURE_SIZE_32;
+
+    if ( !pdev->vpci->msi )
+        return;
+
+    if ( pdev->vpci->msi->address64 )
+        size += 4;
+    if ( pdev->vpci->msi->masking )
+        size += 4;
+
+    /*
+     * Remove all possible registered registers except capability ID
+     * register and next capability pointer register, which will be
+     * removed in mask function.
+     */
+    vpci_remove_registers(pdev->vpci,
+                          msi_control_reg(pdev->msi_pos),
+                          size - PCI_MSI_FLAGS);
+    xfree(pdev->vpci->msi);
+    pdev->vpci->msi = NULL;
+}
+
 static int cf_check init_msi(struct pci_dev *pdev)
 {
     unsigned int pos = pdev->msi_pos;
@@ -209,12 +236,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
     ret = vpci_add_register(pdev->vpci, control_read, control_write,
                             msi_control_reg(pos), 2, pdev->vpci->msi);
     if ( ret )
-        /*
-         * NB: there's no need to free the msi struct or remove the register
-         * handlers form the config space, the caller will take care of the
-         * cleanup.
-         */
-        return ret;
+        goto fail;
 
     /* Get the maximum number of vectors the device supports. */
     control = pci_conf_read16(pdev->sbdf, msi_control_reg(pos));
@@ -237,20 +259,20 @@ static int cf_check init_msi(struct pci_dev *pdev)
     ret = vpci_add_register(pdev->vpci, address_read, address_write,
                             msi_lower_address_reg(pos), 4, pdev->vpci->msi);
     if ( ret )
-        return ret;
+        goto fail;
 
     ret = vpci_add_register(pdev->vpci, data_read, data_write,
                             msi_data_reg(pos, pdev->vpci->msi->address64), 2,
                             pdev->vpci->msi);
     if ( ret )
-        return ret;
+        goto fail;
 
     if ( pdev->vpci->msi->address64 )
     {
         ret = vpci_add_register(pdev->vpci, address_hi_read, address_hi_write,
                                 msi_upper_address_reg(pos), 4, pdev->vpci->msi);
         if ( ret )
-            return ret;
+            goto fail;
     }
 
     if ( pdev->vpci->msi->masking )
@@ -260,7 +282,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
                                                   pdev->vpci->msi->address64),
                                 4, pdev->vpci->msi);
         if ( ret )
-            return ret;
+            goto fail;
         /*
          * FIXME: do not add any handler for the pending bits for the hardware
          * domain, which means direct access. This will be revisited when
@@ -269,6 +291,11 @@ static int cf_check init_msi(struct pci_dev *pdev)
     }
 
     return 0;
+
+ fail:
+    fini_msi(pdev);
+
+    return ret;
 }
 REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 8/8] vpci/msix: Add function to clean MSIX resources
  2025-04-09  6:45 [PATCH v2 0/8] Support hiding capability when its initialization fails Jiqian Chen
                   ` (6 preceding siblings ...)
  2025-04-09  6:45 ` [PATCH v2 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
@ 2025-04-09  6:45 ` Jiqian Chen
  2025-04-15 13:45   ` Roger Pau Monné
  2025-04-17  7:34   ` Jan Beulich
  7 siblings, 2 replies; 41+ messages in thread
From: Jiqian Chen @ 2025-04-09  6:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Huang Rui, Jiqian Chen, Jan Beulich, Andrew Cooper,
	Roger Pau Monné

When init_msix() fails, it needs to clean all MSIX resources.
So, add a new function fini_msix() to do that.

And to unregister the mmio handler of vpci_msix_table_ops, add
a new function.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
cc: Jan Beulich <jbeulich@suse.com>
cc: Andrew Cooper <andrew.cooper3@citrix.com>
cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
v1->v2 changes:
new patch.

Best regards,
Jiqian Chen.
---
 xen/arch/x86/hvm/intercept.c      | 44 ++++++++++++++++++++++
 xen/arch/x86/include/asm/hvm/io.h |  3 ++
 xen/drivers/vpci/msix.c           | 61 ++++++++++++++++++++++++++++---
 3 files changed, 103 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index da22c386763e..5eacf51d4d2c 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -276,6 +276,50 @@ void register_mmio_handler(struct domain *d,
     handler->mmio.ops = ops;
 }
 
+void unregister_mmio_handler(struct domain *d,
+                             const struct hvm_mmio_ops *ops)
+{
+    unsigned int i, count = d->arch.hvm.io_handler_count;
+
+    ASSERT(d->arch.hvm.io_handler);
+
+    if ( !count )
+        return;
+
+    for ( i = 0; i < count; i++ )
+        if ( d->arch.hvm.io_handler[i].type == IOREQ_TYPE_COPY &&
+             d->arch.hvm.io_handler[i].mmio.ops == ops )
+            break;
+
+    if ( i == count )
+        return;
+
+    for ( ; i < count - 1; i++ )
+    {
+        struct hvm_io_handler *curr = &d->arch.hvm.io_handler[i];
+        struct hvm_io_handler *next = &d->arch.hvm.io_handler[i + 1];
+
+        curr->type = next->type;
+        curr->ops = next->ops;
+        if ( next->type == IOREQ_TYPE_COPY )
+        {
+            curr->portio.port = 0;
+            curr->portio.size = 0;
+            curr->portio.action = 0;
+            curr->mmio.ops = next->mmio.ops;
+        }
+        else
+        {
+            curr->mmio.ops = 0;
+            curr->portio.port = next->portio.port;
+            curr->portio.size = next->portio.size;
+            curr->portio.action = next->portio.action;
+        }
+    }
+
+    d->arch.hvm.io_handler_count--;
+}
+
 void register_portio_handler(struct domain *d, unsigned int port,
                              unsigned int size, portio_action_t action)
 {
diff --git a/xen/arch/x86/include/asm/hvm/io.h b/xen/arch/x86/include/asm/hvm/io.h
index 565bad300d20..018d2745fd99 100644
--- a/xen/arch/x86/include/asm/hvm/io.h
+++ b/xen/arch/x86/include/asm/hvm/io.h
@@ -75,6 +75,9 @@ bool hvm_mmio_internal(paddr_t gpa);
 void register_mmio_handler(struct domain *d,
                            const struct hvm_mmio_ops *ops);
 
+void unregister_mmio_handler(struct domain *d,
+                             const struct hvm_mmio_ops *ops);
+
 void register_portio_handler(
     struct domain *d, unsigned int port, unsigned int size,
     portio_action_t action);
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 6537374c79a0..60654d4f6d0b 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -703,6 +703,54 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
     return 0;
 }
 
+static void fini_msix(struct pci_dev *pdev)
+{
+    struct vpci *vpci = pdev->vpci;
+
+    if ( !vpci->msix )
+        return;
+
+    list_del(&vpci->msix->next);
+    if ( list_empty(&pdev->domain->arch.hvm.msix_tables) )
+        unregister_mmio_handler(pdev->domain, &vpci_msix_table_ops);
+
+    /* Remove any MSIX regions if present. */
+    for ( unsigned int i = 0;
+          vpci->msix && i < ARRAY_SIZE(vpci->msix->tables);
+          i++ )
+    {
+        unsigned long start = PFN_DOWN(vmsix_table_addr(pdev->vpci, i));
+        unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
+                                     vmsix_table_size(pdev->vpci, i) - 1);
+
+        for ( unsigned int j = 0; j < ARRAY_SIZE(vpci->header.bars); j++ )
+        {
+            int rc;
+            const struct vpci_bar *bar = &vpci->header.bars[j];
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            rc = rangeset_remove_range(bar->mem, start, end);
+            if ( rc )
+            {
+                gprintk(XENLOG_WARNING,
+                       "%pp: failed to remove MSIX table [%lx, %lx]: %d\n",
+                        &pdev->sbdf, start, end, rc);
+                return;
+            }
+        }
+    }
+
+    for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
+        if ( vpci->msix->table[i] )
+            iounmap(vpci->msix->table[i]);
+
+    vpci_remove_registers(vpci, msix_control_reg(pdev->msix_pos), 2);
+    xfree(vpci->msix);
+    vpci->msix = NULL;
+}
+
 static int cf_check init_msix(struct pci_dev *pdev)
 {
     struct domain *d = pdev->domain;
@@ -726,10 +774,7 @@ static int cf_check init_msix(struct pci_dev *pdev)
     rc = vpci_add_register(pdev->vpci, control_read, control_write,
                            msix_control_reg(msix_offset), 2, msix);
     if ( rc )
-    {
-        xfree(msix);
-        return rc;
-    }
+        goto fail;
 
     msix->max_entries = max_entries;
     msix->pdev = pdev;
@@ -755,7 +800,13 @@ static int cf_check init_msix(struct pci_dev *pdev)
     rc = vpci_make_msix_hole(pdev);
     spin_unlock(&pdev->vpci->lock);
 
-    return rc
+    if ( !rc )
+        return 0;
+
+ fail:
+    fini_msix(pdev);
+
+    return rc;
 }
 REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSIX, init_msix);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/8] driver/pci: Get next capability without passing caps
  2025-04-09  6:45 ` [PATCH v2 1/8] driver/pci: Get next capability without passing caps Jiqian Chen
@ 2025-04-10 12:34   ` Jan Beulich
  2025-04-11  2:51     ` Chen, Jiqian
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2025-04-10 12:34 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Huang Rui, Andrew Cooper, Anthony PERARD, Michal Orzel,
	Julien Grall, Roger Pau Monné, Stefano Stabellini, xen-devel

On 09.04.2025 08:45, Jiqian Chen wrote:
> --- a/xen/drivers/pci/pci.c
> +++ b/xen/drivers/pci/pci.c
> @@ -40,7 +40,7 @@ unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, unsigned int cap)
>  }
>  
>  unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
> -                                   const unsigned int caps[], unsigned int n,
> +                                   const unsigned int *caps, unsigned int n,

I don't follow the need for this change.

> @@ -55,6 +55,10 @@ unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
>  
>          if ( id == 0xff )
>              break;
> +
> +        if ( !caps || n == 0 )
> +            return pos;

Checking n to be zero ought to suffice here? In that case it doesn't matter
what caps is. Plus if n is non-zero, it clearly is an error if caps was NULL.

Jan


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-09  6:45 ` [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host Jiqian Chen
@ 2025-04-10 12:40   ` Jan Beulich
  2025-04-11  2:54     ` Chen, Jiqian
  2025-04-15  9:25   ` Roger Pau Monné
  2025-04-15 10:10   ` Roger Pau Monné
  2 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2025-04-10 12:40 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: Huang Rui, Roger Pau Monné, xen-devel

On 09.04.2025 08:45, Jiqian Chen wrote:
> Current logic of init_header() only emulates legacy capability list
> for guest, expand it to emulate for host too. So that it will be
> easy to hide a capability whose initialization fails and no need
> to distinguish host or guest.

Before I (maybe) look at the patch as a hole: When you say "guest" you
mean "DomU" (which is a common pair of synonyms we use), whereas when
you say "host" you mean "Dom0"? The latter is not a common pair of
synonyms; "host" generally covers the entire system, including Xen and
all domains. See e.g. wording that we use in XSAs.

Jan


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/8] driver/pci: Get next capability without passing caps
  2025-04-10 12:34   ` Jan Beulich
@ 2025-04-11  2:51     ` Chen, Jiqian
  2025-04-11  8:00       ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-11  2:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Anthony PERARD, Orzel, Michal, Julien Grall,
	Roger Pau Monné, Stefano Stabellini,
	xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/4/10 20:34, Jan Beulich wrote:
> On 09.04.2025 08:45, Jiqian Chen wrote:
>> --- a/xen/drivers/pci/pci.c
>> +++ b/xen/drivers/pci/pci.c
>> @@ -40,7 +40,7 @@ unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, unsigned int cap)
>>  }
>>  
>>  unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
>> -                                   const unsigned int caps[], unsigned int n,
>> +                                   const unsigned int *caps, unsigned int n,
> 
> I don't follow the need for this change.
This changed is for my next patch "vpci/header: Emulate legacy capability list for host".
Currently, vpci only emulates capability list for domU, not for dom0.
For domU, vpci exposes a fixed capability array which calls "supported_caps".
My changes want to emulate capability list for dom0.
I think vpci should expose all devices capabilities to dom0.
When I emulate it, I need to iterate over all capabilities without another fixed array,
so I need this function to return the position of next capability directly when passing a zero length array to this function.

> 
>> @@ -55,6 +55,10 @@ unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
>>  
>>          if ( id == 0xff )
>>              break;
>> +
>> +        if ( !caps || n == 0 )
>> +            return pos;
> 
> Checking n to be zero ought to suffice here? In that case it doesn't matter
> what caps is. Plus if n is non-zero, it clearly is an error if caps was NULL.
Two checking is to prevent null pointer errors.
But as you said, if checking n to be zero is enough, then I don't need to change the definition of this function.
I will change in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-10 12:40   ` Jan Beulich
@ 2025-04-11  2:54     ` Chen, Jiqian
  2025-04-11  8:01       ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-11  2:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, xen-devel@lists.xenproject.org, Huang, Ray,
	Chen, Jiqian

On 2025/4/10 20:40, Jan Beulich wrote:
> On 09.04.2025 08:45, Jiqian Chen wrote:
>> Current logic of init_header() only emulates legacy capability list
>> for guest, expand it to emulate for host too. So that it will be
>> easy to hide a capability whose initialization fails and no need
>> to distinguish host or guest.
> 
> Before I (maybe) look at the patch as a hole: When you say "guest" you
> mean "DomU" (which is a common pair of synonyms we use), whereas when
> you say "host" you mean "Dom0"? The latter is not a common pair of
> synonyms; "host" generally covers the entire system, including Xen and
> all domains. See e.g. wording that we use in XSAs.
Yes, guest means domU, host means dom0(or hardware domain?), I will change my word in next version
Thank you!

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/8] driver/pci: Get next capability without passing caps
  2025-04-11  2:51     ` Chen, Jiqian
@ 2025-04-11  8:00       ` Jan Beulich
  2025-04-11  8:08         ` Chen, Jiqian
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2025-04-11  8:00 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Anthony PERARD, Orzel, Michal, Julien Grall,
	Roger Pau Monné, Stefano Stabellini,
	xen-devel@lists.xenproject.org, Huang, Ray

On 11.04.2025 04:51, Chen, Jiqian wrote:
> On 2025/4/10 20:34, Jan Beulich wrote:
>> On 09.04.2025 08:45, Jiqian Chen wrote:
>>> --- a/xen/drivers/pci/pci.c
>>> +++ b/xen/drivers/pci/pci.c
>>> @@ -40,7 +40,7 @@ unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, unsigned int cap)
>>>  }
>>>  
>>>  unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
>>> -                                   const unsigned int caps[], unsigned int n,
>>> +                                   const unsigned int *caps, unsigned int n,
>>
>> I don't follow the need for this change.
> This changed is for my next patch "vpci/header: Emulate legacy capability list for host".
> Currently, vpci only emulates capability list for domU, not for dom0.
> For domU, vpci exposes a fixed capability array which calls "supported_caps".
> My changes want to emulate capability list for dom0.
> I think vpci should expose all devices capabilities to dom0.
> When I emulate it, I need to iterate over all capabilities without another fixed array,
> so I need this function to return the position of next capability directly when passing a zero length array to this function.

This doesn't answer my question. The change you need for the next patch is
the hunk below, not the one above. Aiui at least.

>>> @@ -55,6 +55,10 @@ unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
>>>  
>>>          if ( id == 0xff )
>>>              break;
>>> +
>>> +        if ( !caps || n == 0 )
>>> +            return pos;
>>
>> Checking n to be zero ought to suffice here? In that case it doesn't matter
>> what caps is. Plus if n is non-zero, it clearly is an error if caps was NULL.
> Two checking is to prevent null pointer errors.
> But as you said, if checking n to be zero is enough, then I don't need to change the definition of this function.
> I will change in next version.

If you really wanted to, you could add e.g. ASSERT(caps) after this if().

Jan


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-11  2:54     ` Chen, Jiqian
@ 2025-04-11  8:01       ` Jan Beulich
  0 siblings, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2025-04-11  8:01 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Roger Pau Monné, xen-devel@lists.xenproject.org, Huang, Ray

On 11.04.2025 04:54, Chen, Jiqian wrote:
> On 2025/4/10 20:40, Jan Beulich wrote:
>> On 09.04.2025 08:45, Jiqian Chen wrote:
>>> Current logic of init_header() only emulates legacy capability list
>>> for guest, expand it to emulate for host too. So that it will be
>>> easy to hide a capability whose initialization fails and no need
>>> to distinguish host or guest.
>>
>> Before I (maybe) look at the patch as a hole: When you say "guest" you
>> mean "DomU" (which is a common pair of synonyms we use), whereas when
>> you say "host" you mean "Dom0"? The latter is not a common pair of
>> synonyms; "host" generally covers the entire system, including Xen and
>> all domains. See e.g. wording that we use in XSAs.
> Yes, guest means domU, host means dom0(or hardware domain?),

We're less strict about that latter distinction - in most contexts
dom0 == hwdom is assumed.

> I will change my word in next version

Thanks.

Jan


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/8] driver/pci: Get next capability without passing caps
  2025-04-11  8:00       ` Jan Beulich
@ 2025-04-11  8:08         ` Chen, Jiqian
  0 siblings, 0 replies; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-11  8:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Anthony PERARD, Orzel, Michal, Julien Grall,
	Roger Pau Monné, Stefano Stabellini,
	xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/4/11 16:00, Jan Beulich wrote:
> On 11.04.2025 04:51, Chen, Jiqian wrote:
>> On 2025/4/10 20:34, Jan Beulich wrote:
>>> On 09.04.2025 08:45, Jiqian Chen wrote:
>>>> --- a/xen/drivers/pci/pci.c
>>>> +++ b/xen/drivers/pci/pci.c
>>>> @@ -40,7 +40,7 @@ unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, unsigned int cap)
>>>>  }
>>>>  
>>>>  unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
>>>> -                                   const unsigned int caps[], unsigned int n,
>>>> +                                   const unsigned int *caps, unsigned int n,
>>>
>>> I don't follow the need for this change.
>> This changed is for my next patch "vpci/header: Emulate legacy capability list for host".
>> Currently, vpci only emulates capability list for domU, not for dom0.
>> For domU, vpci exposes a fixed capability array which calls "supported_caps".
>> My changes want to emulate capability list for dom0.
>> I think vpci should expose all devices capabilities to dom0.
>> When I emulate it, I need to iterate over all capabilities without another fixed array,
>> so I need this function to return the position of next capability directly when passing a zero length array to this function.
> 
> This doesn't answer my question. The change you need for the next patch is
> the hunk below, not the one above. Aiui at least.
Ah, yes.
I will abandon changing this function definition in next version since it just needs to check "n == 0".

> 
>>>> @@ -55,6 +55,10 @@ unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
>>>>  
>>>>          if ( id == 0xff )
>>>>              break;
>>>> +
>>>> +        if ( !caps || n == 0 )
>>>> +            return pos;
>>>
>>> Checking n to be zero ought to suffice here? In that case it doesn't matter
>>> what caps is. Plus if n is non-zero, it clearly is an error if caps was NULL.
>> Two checking is to prevent null pointer errors.
>> But as you said, if checking n to be zero is enough, then I don't need to change the definition of this function.
>> I will change in next version.
> 
> If you really wanted to, you could add e.g. ASSERT(caps) after this if().
OK, will add in next version, too.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-09  6:45 ` [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host Jiqian Chen
  2025-04-10 12:40   ` Jan Beulich
@ 2025-04-15  9:25   ` Roger Pau Monné
  2025-04-15 10:07     ` Chen, Jiqian
  2025-04-15 10:10   ` Roger Pau Monné
  2 siblings, 1 reply; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-15  9:25 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Wed, Apr 09, 2025 at 02:45:22PM +0800, Jiqian Chen wrote:
> Current logic of init_header() only emulates legacy capability list
> for guest, expand it to emulate for host too. So that it will be
> easy to hide a capability whose initialization fails and no need
> to distinguish host or guest.

It might be best if the initial code movement of the logic in
init_header() into it's own separate function was done as a
non-functional change, and a later patch added support for dom0.

It's easier to then spot the differences that you are adding to
support dom0.

> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v1->v2 changes:
> new patch
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/header.c | 139 ++++++++++++++++++++------------------
>  1 file changed, 74 insertions(+), 65 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index ef6c965c081c..0910eb940e23 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -745,6 +745,76 @@ static int bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
>      return !bar->mem ? -ENOMEM : 0;
>  }
>  
> +/* These capabilities can be exposed to the guest, that vPCI can handle. */
> +static const unsigned int guest_supported_caps[] = {
> +    PCI_CAP_ID_MSI,
> +    PCI_CAP_ID_MSIX,
> +};

Is there a reason this needs to be defined outside of the function
scope?  So far it's only used by vpci_init_capability_list().

> +
> +static int vpci_init_capability_list(struct pci_dev *pdev)
> +{
> +    int rc;
> +    bool mask_cap_list = false;
> +    bool is_hwdom = is_hardware_domain(pdev->domain);
> +    const unsigned int *caps = is_hwdom ? NULL : guest_supported_caps;
> +    const unsigned int n = is_hwdom ? 0 : ARRAY_SIZE(guest_supported_caps);
> +
> +    if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
> +    {
> +        unsigned int next, ttl = 48;
> +
> +        next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
> +                                     caps, n, &ttl);
> +
> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +                               PCI_CAPABILITY_LIST, 1,
> +                               (void *)(uintptr_t)next);
> +        if ( rc )
> +            return rc;
> +
> +        next &= ~3;
> +
> +        if ( !next && !is_hwdom )
> +            /*
> +             * If we don't have any supported capabilities to expose to the
> +             * guest, mask the PCI_STATUS_CAP_LIST bit in the status register.
> +             */
> +            mask_cap_list = true;
> +
> +        while ( next && ttl )
> +        {
> +            unsigned int pos = next;
> +
> +            next = pci_find_next_cap_ttl(pdev->sbdf, pos + PCI_CAP_LIST_NEXT,
> +                                         caps, n, &ttl);
> +
> +            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
> +                                   pos + PCI_CAP_LIST_ID, 1, NULL);

There's no need to add this handler for the hardware domain, that's
already the default behavior in that case.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 3/8] vpci/header: Emulate extended capability list for host
  2025-04-09  6:45 ` [PATCH v2 3/8] vpci/header: Emulate extended " Jiqian Chen
@ 2025-04-15  9:42   ` Roger Pau Monné
  2025-04-15 10:11     ` Chen, Jiqian
  2025-04-15  9:49   ` Jan Beulich
  1 sibling, 1 reply; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-15  9:42 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Wed, Apr 09, 2025 at 02:45:23PM +0800, Jiqian Chen wrote:
> Add a new function to emulate extended capability list for host,
> and call it in init_header(). So that, it will be easy to hide
> a capability whose initialization fails.
> 
> As for the extended capability list of guest, keep hiding it.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v1->v2 changes:
> new patch
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/header.c | 44 ++++++++++++++++++++++++++++++++-------
>  1 file changed, 36 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 0910eb940e23..6833d456566b 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -815,6 +815,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>      return rc;
>  }
>  
> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
> +{
> +    int rc;
> +    u32 header;

uint32_t would be preferred.

> +    unsigned int pos = 0x100U, ttl = 480;
> +
> +    if ( !is_hardware_domain(pdev->domain) )
> +    {
> +        /* Extended capabilities read as zero, write ignore */
> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +                               pos, 4, (void *)0);
> +        if ( rc )
> +            return rc;

I think you want to unconditionally return here, otherwise you will
most likely add a duplicated handler over "pos" when going inside the
loop below?

Also for domU we don't want to expose any extended capabilities yet.

> +    }
> +
> +    while ( pos && ttl-- )
> +    {
> +        header = pci_conf_read32(pdev->sbdf, pos);
> +
> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,

You don't want to pass NULL here, as that would prevent dom0 from
writing to the register, you instead want to pass vpci_hw_write32 I
think.

> +                               pos, 4, (void *)(uintptr_t)header);
> +        if ( rc )
> +            return rc;
> +
> +        if ( (header == 0) || (header == -1) )
> +            return 0;
> +
> +        pos = PCI_EXT_CAP_NEXT(header);

Don't you need to check that pos >= 0x100?  Possibly done in the while
loop condition: while ( pos >= 0x100 && ... )

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 3/8] vpci/header: Emulate extended capability list for host
  2025-04-09  6:45 ` [PATCH v2 3/8] vpci/header: Emulate extended " Jiqian Chen
  2025-04-15  9:42   ` Roger Pau Monné
@ 2025-04-15  9:49   ` Jan Beulich
  2025-04-15 10:18     ` Chen, Jiqian
  1 sibling, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2025-04-15  9:49 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: Huang Rui, Roger Pau Monné, xen-devel

On 09.04.2025 08:45, Jiqian Chen wrote:
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -815,6 +815,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>      return rc;
>  }
>  
> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
> +{
> +    int rc;
> +    u32 header;
> +    unsigned int pos = 0x100U, ttl = 480;
> +
> +    if ( !is_hardware_domain(pdev->domain) )
> +    {
> +        /* Extended capabilities read as zero, write ignore */
> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +                               pos, 4, (void *)0);
> +        if ( rc )
> +            return rc;
> +    }
> +
> +    while ( pos && ttl-- )
> +    {
> +        header = pci_conf_read32(pdev->sbdf, pos);
> +
> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +                               pos, 4, (void *)(uintptr_t)header);
> +        if ( rc )
> +            return rc;
> +
> +        if ( (header == 0) || (header == -1) )
> +            return 0;

I realize pci_find_next_ext_capability() also has such a check, but even
there it's not really clear to me why compare not only against 0, but also
again -1 (aka ~0).

Jan


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-15  9:25   ` Roger Pau Monné
@ 2025-04-15 10:07     ` Chen, Jiqian
  2025-04-15 10:08       ` Jan Beulich
  2025-04-15 10:50       ` Roger Pau Monné
  0 siblings, 2 replies; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-15 10:07 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/4/15 17:25, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 02:45:22PM +0800, Jiqian Chen wrote:
>> Current logic of init_header() only emulates legacy capability list
>> for guest, expand it to emulate for host too. So that it will be
>> easy to hide a capability whose initialization fails and no need
>> to distinguish host or guest.
> 
> It might be best if the initial code movement of the logic in
> init_header() into it's own separate function was done as a
> non-functional change, and a later patch added support for dom0.
> 
> It's easier to then spot the differences that you are adding to
> support dom0.
Got it, I will re-arrange my patch in next version.

> 
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> ---
>> v1->v2 changes:
>> new patch
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/header.c | 139 ++++++++++++++++++++------------------
>>  1 file changed, 74 insertions(+), 65 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index ef6c965c081c..0910eb940e23 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -745,6 +745,76 @@ static int bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
>>      return !bar->mem ? -ENOMEM : 0;
>>  }
>>  
>> +/* These capabilities can be exposed to the guest, that vPCI can handle. */
>> +static const unsigned int guest_supported_caps[] = {
>> +    PCI_CAP_ID_MSI,
>> +    PCI_CAP_ID_MSIX,
>> +};
> 
> Is there a reason this needs to be defined outside of the function
> scope?  So far it's only used by vpci_init_capability_list().
Because, for dom0 I don't need to pass this array, so I need to set below parameter "caps" to be NULL or guest_supported_caps according to the type of domain.
If inside the function, I can't to do that since "caps" is const, I think.
Do you have any suggestions?

> 
>> +
>> +static int vpci_init_capability_list(struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +    bool mask_cap_list = false;
>> +    bool is_hwdom = is_hardware_domain(pdev->domain);
>> +    const unsigned int *caps = is_hwdom ? NULL : guest_supported_caps;
>> +    const unsigned int n = is_hwdom ? 0 : ARRAY_SIZE(guest_supported_caps);
>> +
>> +    if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
>> +    {
>> +        unsigned int next, ttl = 48;
>> +
>> +        next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
>> +                                     caps, n, &ttl);
>> +
>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>> +                               PCI_CAPABILITY_LIST, 1,
>> +                               (void *)(uintptr_t)next);
>> +        if ( rc )
>> +            return rc;
>> +
>> +        next &= ~3;
>> +
>> +        if ( !next && !is_hwdom )
>> +            /*
>> +             * If we don't have any supported capabilities to expose to the
>> +             * guest, mask the PCI_STATUS_CAP_LIST bit in the status register.
>> +             */
>> +            mask_cap_list = true;
>> +
>> +        while ( next && ttl )
>> +        {
>> +            unsigned int pos = next;
>> +
>> +            next = pci_find_next_cap_ttl(pdev->sbdf, pos + PCI_CAP_LIST_NEXT,
>> +                                         caps, n, &ttl);
>> +
>> +            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
>> +                                   pos + PCI_CAP_LIST_ID, 1, NULL);
> 
> There's no need to add this handler for the hardware domain, that's
> already the default behavior in that case.
But if not, I have no handler to remove from capability list in next patch's hiding function vpci_capability_mask(),
then I can't success to hide it.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-15 10:07     ` Chen, Jiqian
@ 2025-04-15 10:08       ` Jan Beulich
  2025-04-15 10:23         ` Chen, Jiqian
  2025-04-15 10:50       ` Roger Pau Monné
  1 sibling, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2025-04-15 10:08 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Roger Pau Monné

On 15.04.2025 12:07, Chen, Jiqian wrote:
> On 2025/4/15 17:25, Roger Pau Monné wrote:
>> On Wed, Apr 09, 2025 at 02:45:22PM +0800, Jiqian Chen wrote:
>>> --- a/xen/drivers/vpci/header.c
>>> +++ b/xen/drivers/vpci/header.c
>>> @@ -745,6 +745,76 @@ static int bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
>>>      return !bar->mem ? -ENOMEM : 0;
>>>  }
>>>  
>>> +/* These capabilities can be exposed to the guest, that vPCI can handle. */
>>> +static const unsigned int guest_supported_caps[] = {
>>> +    PCI_CAP_ID_MSI,
>>> +    PCI_CAP_ID_MSIX,
>>> +};
>>
>> Is there a reason this needs to be defined outside of the function
>> scope?  So far it's only used by vpci_init_capability_list().
> Because, for dom0 I don't need to pass this array, so I need to set below parameter "caps" to be NULL or guest_supported_caps according to the type of domain.
> If inside the function, I can't to do that since "caps" is const, I think.

Why would the more narrow scope of the array affect how it can be used?

Jan


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-09  6:45 ` [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host Jiqian Chen
  2025-04-10 12:40   ` Jan Beulich
  2025-04-15  9:25   ` Roger Pau Monné
@ 2025-04-15 10:10   ` Roger Pau Monné
  2025-04-16  2:51     ` Chen, Jiqian
  2 siblings, 1 reply; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-15 10:10 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Wed, Apr 09, 2025 at 02:45:22PM +0800, Jiqian Chen wrote:
> Current logic of init_header() only emulates legacy capability list
> for guest, expand it to emulate for host too. So that it will be
> easy to hide a capability whose initialization fails and no need
> to distinguish host or guest.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v1->v2 changes:
> new patch
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/header.c | 139 ++++++++++++++++++++------------------
>  1 file changed, 74 insertions(+), 65 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index ef6c965c081c..0910eb940e23 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -745,6 +745,76 @@ static int bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
>      return !bar->mem ? -ENOMEM : 0;
>  }
>  
> +/* These capabilities can be exposed to the guest, that vPCI can handle. */
> +static const unsigned int guest_supported_caps[] = {
> +    PCI_CAP_ID_MSI,
> +    PCI_CAP_ID_MSIX,
> +};
> +
> +static int vpci_init_capability_list(struct pci_dev *pdev)
> +{
> +    int rc;
> +    bool mask_cap_list = false;
> +    bool is_hwdom = is_hardware_domain(pdev->domain);
> +    const unsigned int *caps = is_hwdom ? NULL : guest_supported_caps;
> +    const unsigned int n = is_hwdom ? 0 : ARRAY_SIZE(guest_supported_caps);
> +
> +    if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
> +    {
> +        unsigned int next, ttl = 48;
> +
> +        next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
> +                                     caps, n, &ttl);
> +
> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +                               PCI_CAPABILITY_LIST, 1,
> +                               (void *)(uintptr_t)next);
> +        if ( rc )
> +            return rc;
> +
> +        next &= ~3;
> +
> +        if ( !next && !is_hwdom )
> +            /*
> +             * If we don't have any supported capabilities to expose to the
> +             * guest, mask the PCI_STATUS_CAP_LIST bit in the status register.
> +             */
> +            mask_cap_list = true;
> +
> +        while ( next && ttl )
> +        {
> +            unsigned int pos = next;
> +
> +            next = pci_find_next_cap_ttl(pdev->sbdf, pos + PCI_CAP_LIST_NEXT,
> +                                         caps, n, &ttl);
> +
> +            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
> +                                   pos + PCI_CAP_LIST_ID, 1, NULL);
> +            if ( rc )
> +                return rc;
> +
> +            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +                                   pos + PCI_CAP_LIST_NEXT, 1,
> +                                   (void *)(uintptr_t)next);
> +            if ( rc )
> +                return rc;
> +
> +            next &= ~3;
> +        }
> +    }
> +
> +    /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
> +    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
> +                                PCI_STATUS, 2, NULL,
> +                                PCI_STATUS_RO_MASK &
> +                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
> +                                PCI_STATUS_RW1C_MASK,
> +                                mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
> +                                PCI_STATUS_RSVDZ_MASK);

One further remark I've forgot to make on the previous reply: I'm
unsure we want to do this for dom0, as we in general allow dom0
unfiltered write access (unless it's for accesses mediated by Xen).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 3/8] vpci/header: Emulate extended capability list for host
  2025-04-15  9:42   ` Roger Pau Monné
@ 2025-04-15 10:11     ` Chen, Jiqian
  0 siblings, 0 replies; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-15 10:11 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/4/15 17:42, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 02:45:23PM +0800, Jiqian Chen wrote:
>> Add a new function to emulate extended capability list for host,
>> and call it in init_header(). So that, it will be easy to hide
>> a capability whose initialization fails.
>>
>> As for the extended capability list of guest, keep hiding it.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> ---
>> v1->v2 changes:
>> new patch
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/header.c | 44 ++++++++++++++++++++++++++++++++-------
>>  1 file changed, 36 insertions(+), 8 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 0910eb940e23..6833d456566b 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -815,6 +815,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>      return rc;
>>  }
>>  
>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +    u32 header;
> 
> uint32_t would be preferred.
> 
>> +    unsigned int pos = 0x100U, ttl = 480;
>> +
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +    {
>> +        /* Extended capabilities read as zero, write ignore */
>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>> +                               pos, 4, (void *)0);
>> +        if ( rc )
>> +            return rc;
> 
> I think you want to unconditionally return here, otherwise you will
> most likely add a duplicated handler over "pos" when going inside the
> loop below?
Oh, it's my bad.
I should return here for any rc.

> 
> Also for domU we don't want to expose any extended capabilities yet.
> 
>> +    }
>> +
>> +    while ( pos && ttl-- )
>> +    {
>> +        header = pci_conf_read32(pdev->sbdf, pos);
>> +
>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> 
> You don't want to pass NULL here, as that would prevent dom0 from
> writing to the register, you instead want to pass vpci_hw_write32 I
> think.
Will change in next version.

> 
>> +                               pos, 4, (void *)(uintptr_t)header);
>> +        if ( rc )
>> +            return rc;
>> +
>> +        if ( (header == 0) || (header == -1) )
>> +            return 0;
>> +
>> +        pos = PCI_EXT_CAP_NEXT(header);
> 
> Don't you need to check that pos >= 0x100?  Possibly done in the while
> loop condition: while ( pos >= 0x100 && ... )
Yes, will change in next version.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 3/8] vpci/header: Emulate extended capability list for host
  2025-04-15  9:49   ` Jan Beulich
@ 2025-04-15 10:18     ` Chen, Jiqian
  2025-04-15 10:20       ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-15 10:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, xen-devel@lists.xenproject.org, Huang, Ray,
	Chen, Jiqian

On 2025/4/15 17:49, Jan Beulich wrote:
> On 09.04.2025 08:45, Jiqian Chen wrote:
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -815,6 +815,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>      return rc;
>>  }
>>  
>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +    u32 header;
>> +    unsigned int pos = 0x100U, ttl = 480;
>> +
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +    {
>> +        /* Extended capabilities read as zero, write ignore */
>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>> +                               pos, 4, (void *)0);
>> +        if ( rc )
>> +            return rc;
>> +    }
>> +
>> +    while ( pos && ttl-- )
>> +    {
>> +        header = pci_conf_read32(pdev->sbdf, pos);
>> +
>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>> +                               pos, 4, (void *)(uintptr_t)header);
>> +        if ( rc )
>> +            return rc;
>> +
>> +        if ( (header == 0) || (header == -1) )
>> +            return 0;
> 
> I realize pci_find_next_ext_capability() also has such a check, but even
> there it's not really clear to me why compare not only against 0, but also
> again -1 (aka ~0).
Thank you for raising this question.
When I coded this part, I also had this confuse since pci_find_next_ext_capability() has this check,
so I chose to keep the same check.
Do you think I need to remove this -1 check?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 3/8] vpci/header: Emulate extended capability list for host
  2025-04-15 10:18     ` Chen, Jiqian
@ 2025-04-15 10:20       ` Jan Beulich
  0 siblings, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2025-04-15 10:20 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Roger Pau Monné, xen-devel@lists.xenproject.org, Huang, Ray

On 15.04.2025 12:18, Chen, Jiqian wrote:
> On 2025/4/15 17:49, Jan Beulich wrote:
>> On 09.04.2025 08:45, Jiqian Chen wrote:
>>> --- a/xen/drivers/vpci/header.c
>>> +++ b/xen/drivers/vpci/header.c
>>> @@ -815,6 +815,39 @@ static int vpci_init_capability_list(struct pci_dev *pdev)
>>>      return rc;
>>>  }
>>>  
>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>>> +{
>>> +    int rc;
>>> +    u32 header;
>>> +    unsigned int pos = 0x100U, ttl = 480;
>>> +
>>> +    if ( !is_hardware_domain(pdev->domain) )
>>> +    {
>>> +        /* Extended capabilities read as zero, write ignore */
>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>> +                               pos, 4, (void *)0);
>>> +        if ( rc )
>>> +            return rc;
>>> +    }
>>> +
>>> +    while ( pos && ttl-- )
>>> +    {
>>> +        header = pci_conf_read32(pdev->sbdf, pos);
>>> +
>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>> +                               pos, 4, (void *)(uintptr_t)header);
>>> +        if ( rc )
>>> +            return rc;
>>> +
>>> +        if ( (header == 0) || (header == -1) )
>>> +            return 0;
>>
>> I realize pci_find_next_ext_capability() also has such a check, but even
>> there it's not really clear to me why compare not only against 0, but also
>> again -1 (aka ~0).
> Thank you for raising this question.
> When I coded this part, I also had this confuse since pci_find_next_ext_capability() has this check,
> so I chose to keep the same check.
> Do you think I need to remove this -1 check?

Unless you can explain why it's needed (perhaps by figuring out why the other
one is there), I'd say - yes.

Jan


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-15 10:08       ` Jan Beulich
@ 2025-04-15 10:23         ` Chen, Jiqian
  0 siblings, 0 replies; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-15 10:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel@lists.xenproject.org, Roger Pau Monné, Huang, Ray,
	Chen, Jiqian

On 2025/4/15 18:08, Jan Beulich wrote:
> On 15.04.2025 12:07, Chen, Jiqian wrote:
>> On 2025/4/15 17:25, Roger Pau Monné wrote:
>>> On Wed, Apr 09, 2025 at 02:45:22PM +0800, Jiqian Chen wrote:
>>>> --- a/xen/drivers/vpci/header.c
>>>> +++ b/xen/drivers/vpci/header.c
>>>> @@ -745,6 +745,76 @@ static int bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
>>>>      return !bar->mem ? -ENOMEM : 0;
>>>>  }
>>>>  
>>>> +/* These capabilities can be exposed to the guest, that vPCI can handle. */
>>>> +static const unsigned int guest_supported_caps[] = {
>>>> +    PCI_CAP_ID_MSI,
>>>> +    PCI_CAP_ID_MSIX,
>>>> +};
>>>
>>> Is there a reason this needs to be defined outside of the function
>>> scope?  So far it's only used by vpci_init_capability_list().
>> Because, for dom0 I don't need to pass this array, so I need to set below parameter "caps" to be NULL or guest_supported_caps according to the type of domain.
>> If inside the function, I can't to do that since "caps" is const, I think.
> 
> Why would the more narrow scope of the array affect how it can be used?
Sorry, I have thought about it again and it can be put it inside the function. I will change in the next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/8] vpci: Hide capability when it fails to initialize
  2025-04-09  6:45 ` [PATCH v2 4/8] vpci: Hide capability when it fails to initialize Jiqian Chen
@ 2025-04-15 10:47   ` Roger Pau Monné
  2025-04-16  6:07     ` Chen, Jiqian
  0 siblings, 1 reply; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-15 10:47 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Huang Rui, Andrew Cooper, Anthony PERARD, Michal Orzel,
	Jan Beulich, Julien Grall, Stefano Stabellini

On Wed, Apr 09, 2025 at 02:45:24PM +0800, Jiqian Chen wrote:
> When vpci fails to initialize a capability of a device, it just
> return error instead of catching and processing exception. That
> makes the entire device unusable.
> 
> So, refactor REGISTER_VPCI_INIT to contain more capability specific
> information, and use new functions to hide capability when
> initialization fails in vpci_assign_device().
> 
> Those new functions remove the failed legacy/extended capability
> from the emulated legacy/extended capability list.

I think this needs to be at least 2 different changes.

First change adds the usage of REGISTER_VPCI_{LEGACY,EXTENDED}_CAP()
helpers, while second change introduces the masking of capabilities on
initialization failure.

Otherwise review is a bit complicated.

> What's more, change the definition of init_header() since it is
> not a capability and it is needed for all devices' PCI config space.
> 
> Note: call vpci_make_msix_hole() in the end of init_msix() since the
> change of sequence of init_header() and init_msix().
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> cc: Andrew Cooper <andrew.cooper3@citrix.com>
> cc: Anthony PERARD <anthony.perard@vates.tech>
> cc: Michal Orzel <michal.orzel@amd.com>
> cc: Jan Beulich <jbeulich@suse.com>
> cc: Julien Grall <julien@xen.org>
> cc: Stefano Stabellini <sstabellini@kernel.org>
> ---
> v1->v2 changes:
> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove failed capability from list.
> * Called vpci_make_msix_hole() in the end of init_msix().
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/header.c  |   3 +-
>  xen/drivers/vpci/msi.c     |   2 +-
>  xen/drivers/vpci/msix.c    |   8 +-
>  xen/drivers/vpci/rebar.c   |   2 +-
>  xen/drivers/vpci/vpci.c    | 175 +++++++++++++++++++++++++++++++------
>  xen/include/xen/pci_regs.h |   1 +
>  xen/include/xen/vpci.h     |  26 ++++--
>  xen/include/xen/xen.lds.h  |   2 +-
>  8 files changed, 179 insertions(+), 40 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 6833d456566b..51a67d76ad8a 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -848,7 +848,7 @@ static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>      return 0;
>  }
>  
> -static int cf_check init_header(struct pci_dev *pdev)
> +int vpci_init_header(struct pci_dev *pdev)
>  {
>      uint16_t cmd;
>      uint64_t addr, size;
> @@ -1044,7 +1044,6 @@ static int cf_check init_header(struct pci_dev *pdev)
>      pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>      return rc;
>  }
> -REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
>  
>  /*
>   * Local variables:
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> index 66e5a8a116be..ca89ae9b9c22 100644
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>  
>      return 0;
>  }
> -REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
> +REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi);
>  
>  void vpci_dump_msi(void)
>  {
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index 6bd8c55bb48e..6537374c79a0 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -751,9 +751,13 @@ static int cf_check init_msix(struct pci_dev *pdev)
>      pdev->vpci->msix = msix;
>      list_add(&msix->next, &d->arch.hvm.msix_tables);
>  
> -    return 0;
> +    spin_lock(&pdev->vpci->lock);
> +    rc = vpci_make_msix_hole(pdev);
> +    spin_unlock(&pdev->vpci->lock);
> +
> +    return rc
>  }
> -REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
> +REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSIX, init_msix);
>  
>  /*
>   * Local variables:
> diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
> index 793937449af7..79858e5dc92f 100644
> --- a/xen/drivers/vpci/rebar.c
> +++ b/xen/drivers/vpci/rebar.c
> @@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
>  
>      return 0;
>  }
> -REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
> +REGISTER_VPCI_EXTEND_CAP(PCI_EXT_CAP_ID_REBAR, init_rebar);
>  
>  /*
>   * Local variables:
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 1e6aa5d799b9..f1f125bfdab1 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -35,9 +35,25 @@ struct vpci_register {
>      uint32_t rsvdz_mask;
>  };
>  
> +static int vpci_register_cmp(const struct vpci_register *r1,
> +                             const struct vpci_register *r2)
> +{
> +    /* Return 0 if registers overlap. */
> +    if ( r1->offset < r2->offset + r2->size &&
> +         r2->offset < r1->offset + r1->size )
> +        return 0;
> +    if ( r1->offset < r2->offset )
> +        return -1;
> +    if ( r1->offset > r2->offset )
> +        return 1;
> +
> +    ASSERT_UNREACHABLE();
> +    return 0;
> +}
> +
>  #ifdef __XEN__
> -extern vpci_register_init_t *const __start_vpci_array[];
> -extern vpci_register_init_t *const __end_vpci_array[];
> +extern vpci_capability_t *const __start_vpci_array[];
> +extern vpci_capability_t *const __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
>  #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> @@ -83,6 +99,133 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>  
> +static void vpci_capability_mask(struct pci_dev *pdev,
> +                                 const unsigned int cap)
> +{
> +    const unsigned int size = 1;
> +    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
> +    const struct vpci_register r = { .offset = offset, .size = size };
> +    struct vpci_register *rm;
> +    struct vpci *vpci = pdev->vpci;
> +
> +    spin_lock(&vpci->lock);
> +    list_for_each_entry ( rm, &vpci->handlers, node )
> +    {
> +        int cmp = vpci_register_cmp(&r, rm);
> +
> +        if ( !cmp && rm->offset == offset && rm->size == size )
> +        {
> +            struct vpci_register *pre = list_entry(rm->node.prev,
> +                                                   struct vpci_register,
> +                                                   node);
> +            struct vpci_register *next = list_entry(rm->node.next,
> +                                                    struct vpci_register,
> +                                                    node);
> +
> +            pre->private = next->private;
> +
> +            /* PCI_CAP_LIST_ID register of current capability */
> +            list_del(&rm->node);
> +            /* PCI_CAP_LIST_NEXT register of current capability */
> +            list_del(&next->node);
> +            spin_unlock(&vpci->lock);

Are you sure this works as intended?  The list is sorted, so if there
further handlers in between the two capabilities, like when handling
MSI capability, the next handler in the list won't point to the next
capability list handler.

> +
> +            xfree(rm);
> +            xfree(next);
> +            return;
> +        }
> +        if ( cmp <= 0 )
> +            break;
> +    }
> +    spin_unlock(&vpci->lock);
> +}
> +
> +static void vpci_ext_capability_mask(struct pci_dev *pdev,
> +                                     const unsigned int cap)
> +{
> +    const unsigned int size = 4;
> +    const unsigned int offset = pci_find_ext_capability(pdev->sbdf, cap);
> +    const struct vpci_register r = { .offset = offset, .size = size };
> +    struct vpci_register *rm;
> +    struct vpci *vpci = pdev->vpci;
> +
> +    spin_lock(&vpci->lock);
> +    list_for_each_entry ( rm, &vpci->handlers, node )
> +    {
> +        int cmp = vpci_register_cmp(&r, rm);
> +
> +        if ( !cmp && rm->offset == offset && rm->size == size )
> +        {
> +            struct vpci_register *pre;
> +            u32 pre_header, header = (u32)(uintptr_t)rm->private;
> +
> +            if ( offset == 0x100U && PCI_EXT_CAP_NEXT(header) == 0 )

It would be safer to check for next < 0x100 rather than explicitly
0.

> +            {
> +                rm->private = (void *)(uintptr_t)0;
> +                spin_unlock(&vpci->lock);
> +                return;
> +            }
> +            else if ( offset == 0x100U )

There's no need for the else branch, as the previous if has a return.

> +            {
> +                pre = rm;
> +                rm = list_entry(rm->node.next, struct vpci_register, node);
> +                pre->private = rm->private;
> +            }
> +            else
> +            {
> +                pre = list_entry(rm->node.prev, struct vpci_register, node);
> +                pre_header = (u32)(uintptr_t)pre->private;
> +                pre->private =
> +                    (void *)(uintptr_t)((pre_header & !PCI_EXT_CAP_NEXT_MASK) |

I think you want ~PCI_EXT_CAP_NEXT_MASK rather than !PCI_EXT_CAP_NEXT_MASK?

> +                                        (header & PCI_EXT_CAP_NEXT_MASK));
> +            }
> +            list_del(&rm->node);
> +            spin_unlock(&vpci->lock);
> +            xfree(rm);
> +            return;

Kind of the same complaint I had on the previous patch, this seems to
assume that capability handlers are always consecutive in the list of
handlers, which I don't think it's the case.

> +        }
> +        if ( cmp <= 0 )
> +            break;
> +    }
> +    spin_unlock(&vpci->lock);
> +}
> +
> +static void vpci_init_capabilities(struct pci_dev *pdev)
> +{
> +    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
> +    {
> +        const vpci_capability_t *capability = __start_vpci_array[i];
> +        const unsigned int cap = capability->id;
> +        const bool is_ext = capability->is_ext;
> +        unsigned int pos;
> +        int rc;
> +
> +        if ( !is_hardware_domain(pdev->domain) && is_ext )
> +            continue;
> +
> +        if ( is_ext )
> +            pos = pci_find_ext_capability(pdev->sbdf, cap);
> +        else
> +            pos = pci_find_cap_offset(pdev->sbdf, cap);
> +
> +        if ( !pos )
> +            continue;
> +
> +        rc = capability->init(pdev);
> +
> +        if ( rc )
> +        {
> +            printk(XENLOG_WARNING "%pd %pp: %s cap %u init fail rc=%d, mask it\n",
> +                   pdev->domain, &pdev->sbdf,
> +                   is_ext ? "extended" : "legacy", cap, rc);
> +            if ( is_ext )
> +                vpci_ext_capability_mask(pdev, cap);
> +            else
> +                vpci_capability_mask(pdev, cap);
> +        }
> +    }
> +}
> +
>  void vpci_deassign_device(struct pci_dev *pdev)
>  {
>      unsigned int i;
> @@ -128,7 +271,6 @@ void vpci_deassign_device(struct pci_dev *pdev)
>  
>  int vpci_assign_device(struct pci_dev *pdev)
>  {
> -    unsigned int i;
>      const unsigned long *ro_map;
>      int rc = 0;
>  
> @@ -159,12 +301,11 @@ int vpci_assign_device(struct pci_dev *pdev)
>          goto out;
>  #endif
>  
> -    for ( i = 0; i < NUM_VPCI_INIT; i++ )
> -    {
> -        rc = __start_vpci_array[i](pdev);
> -        if ( rc )
> -            break;
> -    }
> +    rc = vpci_init_header(pdev);
> +    if ( rc )
> +        goto out;

If you use the out label here you can remove the __maybe_unused
attribute from it.

> +
> +    vpci_init_capabilities(pdev);
>  
>   out: __maybe_unused;
>      if ( rc )
> @@ -174,22 +315,6 @@ int vpci_assign_device(struct pci_dev *pdev)
>  }
>  #endif /* __XEN__ */
>  
> -static int vpci_register_cmp(const struct vpci_register *r1,
> -                             const struct vpci_register *r2)
> -{
> -    /* Return 0 if registers overlap. */
> -    if ( r1->offset < r2->offset + r2->size &&
> -         r2->offset < r1->offset + r1->size )
> -        return 0;
> -    if ( r1->offset < r2->offset )
> -        return -1;
> -    if ( r1->offset > r2->offset )
> -        return 1;
> -
> -    ASSERT_UNREACHABLE();
> -    return 0;
> -}
> -
>  /* Dummy hooks, writes are ignored, reads return 1's */
>  static uint32_t cf_check vpci_ignored_read(
>      const struct pci_dev *pdev, unsigned int reg, void *data)
> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
> index 27b4f44eedf3..5fe6653fded4 100644
> --- a/xen/include/xen/pci_regs.h
> +++ b/xen/include/xen/pci_regs.h
> @@ -449,6 +449,7 @@
>  #define PCI_EXT_CAP_ID(header)		((header) & 0x0000ffff)
>  #define PCI_EXT_CAP_VER(header)		(((header) >> 16) & 0xf)
>  #define PCI_EXT_CAP_NEXT(header)	(((header) >> 20) & 0xffc)
> +#define PCI_EXT_CAP_NEXT_MASK		0xFFC00000U
>  
>  #define PCI_EXT_CAP_ID_ERR	1
>  #define PCI_EXT_CAP_ID_VC	2
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 807401b2eaa2..5016ded64d89 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -13,11 +13,11 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
>  typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
>                            uint32_t val, void *data);
>  
> -typedef int vpci_register_init_t(struct pci_dev *dev);
> -
> -#define VPCI_PRIORITY_HIGH      "1"
> -#define VPCI_PRIORITY_MIDDLE    "5"
> -#define VPCI_PRIORITY_LOW       "9"
> +typedef struct {
> +    unsigned int id;
> +    bool is_ext;
> +    int (*init)(struct pci_dev *pdev);
> +} vpci_capability_t;
>  
>  #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
>  
> @@ -29,9 +29,19 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
>   */
>  #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
>  
> -#define REGISTER_VPCI_INIT(x, p)                \
> -  static vpci_register_init_t *const x##_entry  \
> -               __used_section(".data.vpci." p) = (x)
> +#define REGISTER_VPCI_CAP(cap, x, ext) \
> +  static vpci_capability_t x##_t = { \
> +        .id = (cap), \
> +        .init = (x), \
> +        .is_ext = (ext), \
> +  }; \
> +  static vpci_capability_t *const x##_entry  \
> +               __used_section(".data.vpci.") = &(x##_t)
> +
> +#define REGISTER_VPCI_LEGACY_CAP(cap, x) REGISTER_VPCI_CAP(cap, x, false)
> +#define REGISTER_VPCI_EXTEND_CAP(cap, x) REGISTER_VPCI_CAP(cap, x, true)

Nit: I would use EXTENDED here, there's no need to keep both defines
the same length.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-15 10:07     ` Chen, Jiqian
  2025-04-15 10:08       ` Jan Beulich
@ 2025-04-15 10:50       ` Roger Pau Monné
  1 sibling, 0 replies; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-15 10:50 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: xen-devel@lists.xenproject.org, Huang, Ray

On Tue, Apr 15, 2025 at 10:07:14AM +0000, Chen, Jiqian wrote:
> On 2025/4/15 17:25, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 02:45:22PM +0800, Jiqian Chen wrote:
> >> +static int vpci_init_capability_list(struct pci_dev *pdev)
> >> +{
> >> +    int rc;
> >> +    bool mask_cap_list = false;
> >> +    bool is_hwdom = is_hardware_domain(pdev->domain);
> >> +    const unsigned int *caps = is_hwdom ? NULL : guest_supported_caps;
> >> +    const unsigned int n = is_hwdom ? 0 : ARRAY_SIZE(guest_supported_caps);
> >> +
> >> +    if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
> >> +    {
> >> +        unsigned int next, ttl = 48;
> >> +
> >> +        next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
> >> +                                     caps, n, &ttl);
> >> +
> >> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> >> +                               PCI_CAPABILITY_LIST, 1,
> >> +                               (void *)(uintptr_t)next);
> >> +        if ( rc )
> >> +            return rc;
> >> +
> >> +        next &= ~3;
> >> +
> >> +        if ( !next && !is_hwdom )
> >> +            /*
> >> +             * If we don't have any supported capabilities to expose to the
> >> +             * guest, mask the PCI_STATUS_CAP_LIST bit in the status register.
> >> +             */
> >> +            mask_cap_list = true;
> >> +
> >> +        while ( next && ttl )
> >> +        {
> >> +            unsigned int pos = next;
> >> +
> >> +            next = pci_find_next_cap_ttl(pdev->sbdf, pos + PCI_CAP_LIST_NEXT,
> >> +                                         caps, n, &ttl);
> >> +
> >> +            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
> >> +                                   pos + PCI_CAP_LIST_ID, 1, NULL);
> > 
> > There's no need to add this handler for the hardware domain, that's
> > already the default behavior in that case.
> But if not, I have no handler to remove from capability list in next patch's hiding function vpci_capability_mask(),
> then I can't success to hide it.

Oh, I see, I have further comments on that approach, see the comments
on the followup patches.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/8] vpci: Refactor vpci_remove_register to remove matched registers
  2025-04-09  6:45 ` [PATCH v2 5/8] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
@ 2025-04-15 11:06   ` Roger Pau Monné
  0 siblings, 0 replies; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-15 11:06 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui, Anthony PERARD

On Wed, Apr 09, 2025 at 02:45:25PM +0800, Jiqian Chen wrote:
> vpci_remove_register() only supports removing a register in a time,
> but the follow-on changes need to remove all registers within a
> range. And it is only used for test. So, refactor it to support
> removing all matched registers in a calling time.
> 
> And it is no matter to remove a non exist register, so remove the
> __must_check prefix.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> cc: Anthony PERARD <anthony.perard@vates.tech>
> ---
> v1->v2 changes:
> new patch
> 
> Best regards,
> Jiqian Chen.
> ---
>  tools/tests/vpci/main.c |  4 ++--
>  xen/drivers/vpci/vpci.c | 23 ++++++++++++-----------
>  xen/include/xen/vpci.h  |  4 ++--
>  3 files changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
> index 33223db3eb77..ca72877d60cd 100644
> --- a/tools/tests/vpci/main.c
> +++ b/tools/tests/vpci/main.c
> @@ -132,10 +132,10 @@ static void vpci_write32_mask(const struct pci_dev *pdev, unsigned int reg,
>                                    rsvdz_mask))
>  
>  #define VPCI_REMOVE_REG(off, size)                                          \
> -    assert(!vpci_remove_register(test_pdev.vpci, off, size))
> +    assert(!vpci_remove_registers(test_pdev.vpci, off, size))
>  
>  #define VPCI_REMOVE_INVALID_REG(off, size)                                  \
> -    assert(vpci_remove_register(test_pdev.vpci, off, size))
> +    assert(vpci_remove_registers(test_pdev.vpci, off, size))
>  
>  /* Read a 32b register using all possible sizes. */
>  void multiread4_check(unsigned int reg, uint32_t val)
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index f1f125bfdab1..115d3c5f0c84 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -418,34 +418,35 @@ int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
>      return 0;
>  }
>  
> -int vpci_remove_register(struct vpci *vpci, unsigned int offset,
> -                         unsigned int size)
> +int vpci_remove_registers(struct vpci *vpci, unsigned int offset,
> +                          unsigned int size)
>  {
>      const struct vpci_register r = { .offset = offset, .size = size };
>      struct vpci_register *rm;
> +    int rc = -ENOENT;

Thinking about this, not sure returning ENOENT makes much sense now,
as the (new) purpose of the function is to zap all handlers from a
range, without possibly prior knowledge whether there are any
handlers in the range.

>  
>      spin_lock(&vpci->lock);
>      list_for_each_entry ( rm, &vpci->handlers, node )
>      {
>          int cmp = vpci_register_cmp(&r, rm);
>  
> -        /*
> -         * NB: do not use a switch so that we can use break to
> -         * get out of the list loop earlier if required.
> -         */
> -        if ( !cmp && rm->offset == offset && rm->size == size )
> +        if ( !cmp )
>          {
> +            struct vpci_register *prev =
> +                list_entry(rm->node.prev, struct vpci_register, node);
> +
>              list_del(&rm->node);
> -            spin_unlock(&vpci->lock);
>              xfree(rm);
> -            return 0;
> +            rm = prev;
> +            rc = 0;

I think you want some extra checks here to ensure you are not removing
partially overlapping registers.  IOW, you need to assert the register
in rm is fully contained inside the range to zap.

It might be best to simply not use vpci_register_cmp(), and instead
open code the comparison here to ensure "rm" is fuly contained inside
the [offset, offset + size - 1] range.  You will need to return an
error if there's a partially overlapping register.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 6/8] vpci/rebar: Remove registers when init_rebar() fails
  2025-04-09  6:45 ` [PATCH v2 6/8] vpci/rebar: Remove registers when init_rebar() fails Jiqian Chen
@ 2025-04-15 12:59   ` Roger Pau Monné
  0 siblings, 0 replies; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-15 12:59 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Wed, Apr 09, 2025 at 02:45:26PM +0800, Jiqian Chen wrote:
> When init_rebar() fails, the previous new changes will hide Rebar
> capability, it can't rely on vpci_deassign_device() to remove all
> Rebar related registers anymore, those registers must be removed
> in failure path of init_rebar.
> 
> To do that, call vpci_remove_registers() to remove all possible
> registered registers in the failure patch.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v1->v2 changes:
> * Called vpci_remove_registers() to remove all possible registered registers instead of using a array to record all registered register.
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/drivers/vpci/rebar.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
> index 79858e5dc92f..e030937956e3 100644
> --- a/xen/drivers/vpci/rebar.c
> +++ b/xen/drivers/vpci/rebar.c
> @@ -51,6 +51,7 @@ static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
>  
>  static int cf_check init_rebar(struct pci_dev *pdev)
>  {
> +    int rc = 0;
>      uint32_t ctrl;
>      unsigned int nbars;
>      unsigned int rebar_offset = pci_find_ext_capability(pdev->sbdf,
> @@ -70,7 +71,6 @@ static int cf_check init_rebar(struct pci_dev *pdev)
>      nbars = MASK_EXTR(ctrl, PCI_REBAR_CTRL_NBAR_MASK);
>      for ( unsigned int i = 0; i < nbars; i++ )
>      {
> -        int rc;
>          struct vpci_bar *bar;
>          unsigned int index;
>  
> @@ -80,7 +80,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
>          {
>              printk(XENLOG_ERR "%pd %pp: too big BAR number %u in REBAR_CTRL\n",
>                     pdev->domain, &pdev->sbdf, index);
> -            continue;
> +            goto fail;
>          }
>  
>          bar = &pdev->vpci->header.bars[index];
> @@ -88,7 +88,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
>          {
>              printk(XENLOG_ERR "%pd %pp: BAR%u is not in memory space\n",
>                     pdev->domain, &pdev->sbdf, index);
> -            continue;
> +            goto fail;

I think you need to set rc to an error code for the two chunks above,
otherwise you jump into the fail label with rc == 0 AFAICT.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-04-09  6:45 ` [PATCH v2 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
@ 2025-04-15 13:29   ` Roger Pau Monné
  2025-04-16  6:16     ` Chen, Jiqian
  0 siblings, 1 reply; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-15 13:29 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui

On Wed, Apr 09, 2025 at 02:45:27PM +0800, Jiqian Chen wrote:
> When init_msi() fails, the previous new changes will hide MSI
> capability, it can't rely on vpci_deassign_device() to remove
> all MSI related resources anymore, those resources must be
> cleaned up in failure path of init_msi.
> 
> To do that, add a new function to free MSI resources.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v1->v2 changes:
> * Added a new function fini_msi to free all MSI resources instead of using an array to record registered registers.
> ---
>  xen/drivers/vpci/msi.c | 47 +++++++++++++++++++++++++++++++++---------
>  1 file changed, 37 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> index ca89ae9b9c22..48a466dba0ef 100644
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -193,6 +193,33 @@ static void cf_check mask_write(
>      msi->mask = val;
>  }
>  
> +/* 12 is size of MSI structure for 32-bit Message Address without PVM */
> +#define MSI_STRUCTURE_SIZE_32 12

I'm confused by this.  The minimum size of the capability is 4 bytes
for the capability pointer, plus 4 bytes for the message address,
plus 2 bytes for the message data.  So that's 10 bytes in total.

I think it would be best if you calculate the size based on the
existing msi_ macros.

if ( masking )
    end = msi_pending_bits_reg(msi_pos, address64);
else
    end = msi_mask_bits_reg(msi_pos, address64) - 2;

size = end - msi_control_reg(msi_pos);

> +
> +static void fini_msi(struct pci_dev *pdev)
> +{
> +    unsigned int size = MSI_STRUCTURE_SIZE_32;
> +
> +    if ( !pdev->vpci->msi )
> +        return;
> +
> +    if ( pdev->vpci->msi->address64 )
> +        size += 4;
> +    if ( pdev->vpci->msi->masking )
> +        size += 4;
> +
> +    /*
> +     * Remove all possible registered registers except capability ID
> +     * register and next capability pointer register, which will be
> +     * removed in mask function.
> +     *msi_mask_bits_reg/
> +    vpci_remove_registers(pdev->vpci,
> +                          msi_control_reg(pdev->msi_pos),
> +                          size - PCI_MSI_FLAGS);
> +    xfree(pdev->vpci->msi);
> +    pdev->vpci->msi = NULL;
> +}
> +
>  static int cf_check init_msi(struct pci_dev *pdev)
>  {
>      unsigned int pos = pdev->msi_pos;
> @@ -209,12 +236,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>      ret = vpci_add_register(pdev->vpci, control_read, control_write,
>                              msi_control_reg(pos), 2, pdev->vpci->msi);
>      if ( ret )
> -        /*
> -         * NB: there's no need to free the msi struct or remove the register
> -         * handlers form the config space, the caller will take care of the
> -         * cleanup.
> -         */
> -        return ret;
> +        goto fail;
>  
>      /* Get the maximum number of vectors the device supports. */
>      control = pci_conf_read16(pdev->sbdf, msi_control_reg(pos));
> @@ -237,20 +259,20 @@ static int cf_check init_msi(struct pci_dev *pdev)
>      ret = vpci_add_register(pdev->vpci, address_read, address_write,
>                              msi_lower_address_reg(pos), 4, pdev->vpci->msi);
>      if ( ret )
> -        return ret;
> +        goto fail;
>  
>      ret = vpci_add_register(pdev->vpci, data_read, data_write,
>                              msi_data_reg(pos, pdev->vpci->msi->address64), 2,
>                              pdev->vpci->msi);
>      if ( ret )
> -        return ret;
> +        goto fail;
>  
>      if ( pdev->vpci->msi->address64 )
>      {
>          ret = vpci_add_register(pdev->vpci, address_hi_read, address_hi_write,
>                                  msi_upper_address_reg(pos), 4, pdev->vpci->msi);
>          if ( ret )
> -            return ret;
> +            goto fail;
>      }
>  
>      if ( pdev->vpci->msi->masking )
> @@ -260,7 +282,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>                                                    pdev->vpci->msi->address64),
>                                  4, pdev->vpci->msi);
>          if ( ret )
> -            return ret;
> +            goto fail;
>          /*
>           * FIXME: do not add any handler for the pending bits for the hardware
>           * domain, which means direct access. This will be revisited when
> @@ -269,6 +291,11 @@ static int cf_check init_msi(struct pci_dev *pdev)
>      }
>  
>      return 0;
> +
> + fail:
> +    fini_msi(pdev);
> +
> +    return ret;
>  }
>  REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi);

I wonder if the fini_msi should be referenced in
REGISTER_VPCI_{EXTENDED,LEGACY}_CAP(), so that the caller of
init_msi() can call fini_msi() on failure and thus avoid all those
fail hooks and labels, as the caller can take care of the cleanup.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 8/8] vpci/msix: Add function to clean MSIX resources
  2025-04-09  6:45 ` [PATCH v2 8/8] vpci/msix: Add function to clean MSIX resources Jiqian Chen
@ 2025-04-15 13:45   ` Roger Pau Monné
  2025-04-16  6:20     ` Chen, Jiqian
  2025-04-17  7:34   ` Jan Beulich
  1 sibling, 1 reply; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-15 13:45 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: xen-devel, Huang Rui, Jan Beulich, Andrew Cooper

On Wed, Apr 09, 2025 at 02:45:28PM +0800, Jiqian Chen wrote:
> When init_msix() fails, it needs to clean all MSIX resources.
> So, add a new function fini_msix() to do that.
> 
> And to unregister the mmio handler of vpci_msix_table_ops, add
> a new function.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> cc: Jan Beulich <jbeulich@suse.com>
> cc: Andrew Cooper <andrew.cooper3@citrix.com>
> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
> v1->v2 changes:
> new patch.
> 
> Best regards,
> Jiqian Chen.
> ---
>  xen/arch/x86/hvm/intercept.c      | 44 ++++++++++++++++++++++
>  xen/arch/x86/include/asm/hvm/io.h |  3 ++
>  xen/drivers/vpci/msix.c           | 61 ++++++++++++++++++++++++++++---
>  3 files changed, 103 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
> index da22c386763e..5eacf51d4d2c 100644
> --- a/xen/arch/x86/hvm/intercept.c
> +++ b/xen/arch/x86/hvm/intercept.c
> @@ -276,6 +276,50 @@ void register_mmio_handler(struct domain *d,
>      handler->mmio.ops = ops;
>  }
>  
> +void unregister_mmio_handler(struct domain *d,
> +                             const struct hvm_mmio_ops *ops)
> +{
> +    unsigned int i, count = d->arch.hvm.io_handler_count;
> +
> +    ASSERT(d->arch.hvm.io_handler);
> +
> +    if ( !count )
> +        return;
> +
> +    for ( i = 0; i < count; i++ )
> +        if ( d->arch.hvm.io_handler[i].type == IOREQ_TYPE_COPY &&
> +             d->arch.hvm.io_handler[i].mmio.ops == ops )
> +            break;
> +
> +    if ( i == count )
> +        return;
> +
> +    for ( ; i < count - 1; i++ )
> +    {
> +        struct hvm_io_handler *curr = &d->arch.hvm.io_handler[i];
> +        struct hvm_io_handler *next = &d->arch.hvm.io_handler[i + 1];
> +
> +        curr->type = next->type;
> +        curr->ops = next->ops;
> +        if ( next->type == IOREQ_TYPE_COPY )
> +        {
> +            curr->portio.port = 0;
> +            curr->portio.size = 0;
> +            curr->portio.action = 0;
> +            curr->mmio.ops = next->mmio.ops;
> +        }
> +        else
> +        {
> +            curr->mmio.ops = 0;
> +            curr->portio.port = next->portio.port;
> +            curr->portio.size = next->portio.size;
> +            curr->portio.action = next->portio.action;
> +        }
> +    }

Can't you use memmove() instead of a for loop?

memmove(&d->arch.hvm.io_handler[i], &d->arch.hvm.io_handler[i + 1],
	sizeof(d->arch.hvm.io_handler[0]) * (count - i - 1));

> +
> +    d->arch.hvm.io_handler_count--;
> +}
> +
>  void register_portio_handler(struct domain *d, unsigned int port,
>                               unsigned int size, portio_action_t action)
>  {
> diff --git a/xen/arch/x86/include/asm/hvm/io.h b/xen/arch/x86/include/asm/hvm/io.h
> index 565bad300d20..018d2745fd99 100644
> --- a/xen/arch/x86/include/asm/hvm/io.h
> +++ b/xen/arch/x86/include/asm/hvm/io.h
> @@ -75,6 +75,9 @@ bool hvm_mmio_internal(paddr_t gpa);
>  void register_mmio_handler(struct domain *d,
>                             const struct hvm_mmio_ops *ops);
>  
> +void unregister_mmio_handler(struct domain *d,
> +                             const struct hvm_mmio_ops *ops);
> +
>  void register_portio_handler(
>      struct domain *d, unsigned int port, unsigned int size,
>      portio_action_t action);
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index 6537374c79a0..60654d4f6d0b 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -703,6 +703,54 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
>      return 0;
>  }
>  
> +static void fini_msix(struct pci_dev *pdev)
> +{
> +    struct vpci *vpci = pdev->vpci;
> +
> +    if ( !vpci->msix )
> +        return;
> +
> +    list_del(&vpci->msix->next);
> +    if ( list_empty(&pdev->domain->arch.hvm.msix_tables) )
> +        unregister_mmio_handler(pdev->domain, &vpci_msix_table_ops);

At the point the MMIO handler is added the capability initialization
cannot fail, so arguably if the MSI-X handler is registered there will
always be at least one functional MSI-X capability that requires it.

IOW: you can likely drop the addition of unregister_mmio_handler() and
avoid the removal of the MMIO handler.  Worst case a domain will end
up with a dummy handler that does nothing, but it won't cause
malfunctions.

> +
> +    /* Remove any MSIX regions if present. */
> +    for ( unsigned int i = 0;
> +          vpci->msix && i < ARRAY_SIZE(vpci->msix->tables);
> +          i++ )
> +    {
> +        unsigned long start = PFN_DOWN(vmsix_table_addr(pdev->vpci, i));
> +        unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
> +                                     vmsix_table_size(pdev->vpci, i) - 1);
> +
> +        for ( unsigned int j = 0; j < ARRAY_SIZE(vpci->header.bars); j++ )
> +        {
> +            int rc;
> +            const struct vpci_bar *bar = &vpci->header.bars[j];
> +
> +            if ( rangeset_is_empty(bar->mem) )
> +                continue;
> +
> +            rc = rangeset_remove_range(bar->mem, start, end);
> +            if ( rc )
> +            {
> +                gprintk(XENLOG_WARNING,
> +                       "%pp: failed to remove MSIX table [%lx, %lx]: %d\n",
> +                        &pdev->sbdf, start, end, rc);
> +                return;
> +            }
> +        }
> +    }

There's no need to do any of this rangeset manipulation.  The BAR
rangesets are re-created for any map/unmap request, and hence should
be empty unless there's a concurrent operation going on (which won't
be the case when initializing the capabilities).

> +
> +    for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
> +        if ( vpci->msix->table[i] )
> +            iounmap(vpci->msix->table[i]);

The MSI-X init function never maps tables, so the code here (given
it's current usage) will also never unmap anything.  However I would
also like to use this code for vPCI deassing, at which point the logic
will get used.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-15 10:10   ` Roger Pau Monné
@ 2025-04-16  2:51     ` Chen, Jiqian
  2025-04-16  7:32       ` Roger Pau Monné
  0 siblings, 1 reply; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-16  2:51 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/4/15 18:10, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 02:45:22PM +0800, Jiqian Chen wrote:
>> +        while ( next && ttl )
>> +        {
>> +            unsigned int pos = next;
>> +
>> +            next = pci_find_next_cap_ttl(pdev->sbdf, pos + PCI_CAP_LIST_NEXT,
>> +                                         caps, n, &ttl);
>> +
>> +            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
>> +                                   pos + PCI_CAP_LIST_ID, 1, NULL);
>> +            if ( rc )
>> +                return rc;
>> +
>> +            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>> +                                   pos + PCI_CAP_LIST_NEXT, 1,
>> +                                   (void *)(uintptr_t)next);
>> +            if ( rc )
>> +                return rc;
>> +
>> +            next &= ~3;
>> +        }
>> +    }
>> +
>> +    /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
>> +    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
>> +                                PCI_STATUS, 2, NULL,
>> +                                PCI_STATUS_RO_MASK &
>> +                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
>> +                                PCI_STATUS_RW1C_MASK,
>> +                                mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
>> +                                PCI_STATUS_RSVDZ_MASK);
> 
> One further remark I've forgot to make on the previous reply: I'm
> unsure we want to do this for dom0, as we in general allow dom0
> unfiltered write access (unless it's for accesses mediated by Xen).
This part is the original implementation before my patches.
If you want to restrict this only for domUs, not for dom0.
I will add a new first patch to do that.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/8] vpci: Hide capability when it fails to initialize
  2025-04-15 10:47   ` Roger Pau Monné
@ 2025-04-16  6:07     ` Chen, Jiqian
  0 siblings, 0 replies; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-16  6:07 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Andrew Cooper,
	Anthony PERARD, Orzel, Michal, Jan Beulich, Julien Grall,
	Stefano Stabellini, Chen, Jiqian

On 2025/4/15 18:47, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 02:45:24PM +0800, Jiqian Chen wrote:
>> When vpci fails to initialize a capability of a device, it just
>> return error instead of catching and processing exception. That
>> makes the entire device unusable.
>>
>> So, refactor REGISTER_VPCI_INIT to contain more capability specific
>> information, and use new functions to hide capability when
>> initialization fails in vpci_assign_device().
>>
>> Those new functions remove the failed legacy/extended capability
>> from the emulated legacy/extended capability list.
> 
> I think this needs to be at least 2 different changes.
> 
> First change adds the usage of REGISTER_VPCI_{LEGACY,EXTENDED}_CAP()
> helpers, while second change introduces the masking of capabilities on
> initialization failure.
> 
> Otherwise review is a bit complicated.
> 
>> What's more, change the definition of init_header() since it is
>> not a capability and it is needed for all devices' PCI config space.
>>
>> Note: call vpci_make_msix_hole() in the end of init_msix() since the
>> change of sequence of init_header() and init_msix().
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> cc: Anthony PERARD <anthony.perard@vates.tech>
>> cc: Michal Orzel <michal.orzel@amd.com>
>> cc: Jan Beulich <jbeulich@suse.com>
>> cc: Julien Grall <julien@xen.org>
>> cc: Stefano Stabellini <sstabellini@kernel.org>
>> ---
>> v1->v2 changes:
>> * Removed the "priorities" of initializing capabilities since it isn't used anymore.
>> * Added new function vpci_capability_mask() and vpci_ext_capability_mask() to remove failed capability from list.
>> * Called vpci_make_msix_hole() in the end of init_msix().
>>
>> Best regards,
>> Jiqian Chen.
>> ---
>>  xen/drivers/vpci/header.c  |   3 +-
>>  xen/drivers/vpci/msi.c     |   2 +-
>>  xen/drivers/vpci/msix.c    |   8 +-
>>  xen/drivers/vpci/rebar.c   |   2 +-
>>  xen/drivers/vpci/vpci.c    | 175 +++++++++++++++++++++++++++++++------
>>  xen/include/xen/pci_regs.h |   1 +
>>  xen/include/xen/vpci.h     |  26 ++++--
>>  xen/include/xen/xen.lds.h  |   2 +-
>>  8 files changed, 179 insertions(+), 40 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 6833d456566b..51a67d76ad8a 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -848,7 +848,7 @@ static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>>      return 0;
>>  }
>>  
>> -static int cf_check init_header(struct pci_dev *pdev)
>> +int vpci_init_header(struct pci_dev *pdev)
>>  {
>>      uint16_t cmd;
>>      uint64_t addr, size;
>> @@ -1044,7 +1044,6 @@ static int cf_check init_header(struct pci_dev *pdev)
>>      pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>>      return rc;
>>  }
>> -REGISTER_VPCI_INIT(init_header, VPCI_PRIORITY_MIDDLE);
>>  
>>  /*
>>   * Local variables:
>> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
>> index 66e5a8a116be..ca89ae9b9c22 100644
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -270,7 +270,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>>  
>>      return 0;
>>  }
>> -REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
>> +REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi);
>>  
>>  void vpci_dump_msi(void)
>>  {
>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>> index 6bd8c55bb48e..6537374c79a0 100644
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -751,9 +751,13 @@ static int cf_check init_msix(struct pci_dev *pdev)
>>      pdev->vpci->msix = msix;
>>      list_add(&msix->next, &d->arch.hvm.msix_tables);
>>  
>> -    return 0;
>> +    spin_lock(&pdev->vpci->lock);
>> +    rc = vpci_make_msix_hole(pdev);
>> +    spin_unlock(&pdev->vpci->lock);
>> +
>> +    return rc
>>  }
>> -REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
>> +REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSIX, init_msix);
>>  
>>  /*
>>   * Local variables:
>> diff --git a/xen/drivers/vpci/rebar.c b/xen/drivers/vpci/rebar.c
>> index 793937449af7..79858e5dc92f 100644
>> --- a/xen/drivers/vpci/rebar.c
>> +++ b/xen/drivers/vpci/rebar.c
>> @@ -118,7 +118,7 @@ static int cf_check init_rebar(struct pci_dev *pdev)
>>  
>>      return 0;
>>  }
>> -REGISTER_VPCI_INIT(init_rebar, VPCI_PRIORITY_LOW);
>> +REGISTER_VPCI_EXTEND_CAP(PCI_EXT_CAP_ID_REBAR, init_rebar);
>>  
>>  /*
>>   * Local variables:
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 1e6aa5d799b9..f1f125bfdab1 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -35,9 +35,25 @@ struct vpci_register {
>>      uint32_t rsvdz_mask;
>>  };
>>  
>> +static int vpci_register_cmp(const struct vpci_register *r1,
>> +                             const struct vpci_register *r2)
>> +{
>> +    /* Return 0 if registers overlap. */
>> +    if ( r1->offset < r2->offset + r2->size &&
>> +         r2->offset < r1->offset + r1->size )
>> +        return 0;
>> +    if ( r1->offset < r2->offset )
>> +        return -1;
>> +    if ( r1->offset > r2->offset )
>> +        return 1;
>> +
>> +    ASSERT_UNREACHABLE();
>> +    return 0;
>> +}
>> +
>>  #ifdef __XEN__
>> -extern vpci_register_init_t *const __start_vpci_array[];
>> -extern vpci_register_init_t *const __end_vpci_array[];
>> +extern vpci_capability_t *const __start_vpci_array[];
>> +extern vpci_capability_t *const __end_vpci_array[];
>>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>>  
>>  #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> @@ -83,6 +99,133 @@ static int assign_virtual_sbdf(struct pci_dev *pdev)
>>  
>>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>  
>> +static void vpci_capability_mask(struct pci_dev *pdev,
>> +                                 const unsigned int cap)
>> +{
>> +    const unsigned int size = 1;
>> +    const unsigned int offset = pci_find_cap_offset(pdev->sbdf, cap);
>> +    const struct vpci_register r = { .offset = offset, .size = size };
>> +    struct vpci_register *rm;
>> +    struct vpci *vpci = pdev->vpci;
>> +
>> +    spin_lock(&vpci->lock);
>> +    list_for_each_entry ( rm, &vpci->handlers, node )
>> +    {
>> +        int cmp = vpci_register_cmp(&r, rm);
>> +
>> +        if ( !cmp && rm->offset == offset && rm->size == size )
>> +        {
>> +            struct vpci_register *pre = list_entry(rm->node.prev,
>> +                                                   struct vpci_register,
>> +                                                   node);
>> +            struct vpci_register *next = list_entry(rm->node.next,
>> +                                                    struct vpci_register,
>> +                                                    node);
>> +
>> +            pre->private = next->private;
>> +
>> +            /* PCI_CAP_LIST_ID register of current capability */
>> +            list_del(&rm->node);
>> +            /* PCI_CAP_LIST_NEXT register of current capability */
>> +            list_del(&next->node);
>> +            spin_unlock(&vpci->lock);
> 
> Are you sure this works as intended?  The list is sorted, so if there
> further handlers in between the two capabilities, like when handling
> MSI capability, the next handler in the list won't point to the next
> capability list handler.

Oh, I thought the capability list is also sorted in the hardware.
Since it is not that case. My method would not work as intended.
I will change to get the capability position and the previous capability position from the hardware,
and then according to the two position to get handlers from vpci handlers.

And I will change my patch according to your other comments in this email.
Thank you for detail review.

> 
>> +
>> +            xfree(rm);
>> +            xfree(next);
>> +            return;
>> +        }
>> +        if ( cmp <= 0 )
>> +            break;
>> +    }
>> +    spin_unlock(&vpci->lock);
>> +}
>> +
>> +static void vpci_ext_capability_mask(struct pci_dev *pdev,
>> +                                     const unsigned int cap)
>> +{
>> +    const unsigned int size = 4;
>> +    const unsigned int offset = pci_find_ext_capability(pdev->sbdf, cap);
>> +    const struct vpci_register r = { .offset = offset, .size = size };
>> +    struct vpci_register *rm;
>> +    struct vpci *vpci = pdev->vpci;
>> +
>> +    spin_lock(&vpci->lock);
>> +    list_for_each_entry ( rm, &vpci->handlers, node )
>> +    {
>> +        int cmp = vpci_register_cmp(&r, rm);
>> +
>> +        if ( !cmp && rm->offset == offset && rm->size == size )
>> +        {
>> +            struct vpci_register *pre;
>> +            u32 pre_header, header = (u32)(uintptr_t)rm->private;
>> +
>> +            if ( offset == 0x100U && PCI_EXT_CAP_NEXT(header) == 0 )
> 
> It would be safer to check for next < 0x100 rather than explicitly
> 0.
> 
>> +            {
>> +                rm->private = (void *)(uintptr_t)0;
>> +                spin_unlock(&vpci->lock);
>> +                return;
>> +            }
>> +            else if ( offset == 0x100U )
> 
> There's no need for the else branch, as the previous if has a return.
> 
>> +            {
>> +                pre = rm;
>> +                rm = list_entry(rm->node.next, struct vpci_register, node);
>> +                pre->private = rm->private;
>> +            }
>> +            else
>> +            {
>> +                pre = list_entry(rm->node.prev, struct vpci_register, node);
>> +                pre_header = (u32)(uintptr_t)pre->private;
>> +                pre->private =
>> +                    (void *)(uintptr_t)((pre_header & !PCI_EXT_CAP_NEXT_MASK) |
> 
> I think you want ~PCI_EXT_CAP_NEXT_MASK rather than !PCI_EXT_CAP_NEXT_MASK?
> 
>> +                                        (header & PCI_EXT_CAP_NEXT_MASK));
>> +            }
>> +            list_del(&rm->node);
>> +            spin_unlock(&vpci->lock);
>> +            xfree(rm);
>> +            return;
> 
> Kind of the same complaint I had on the previous patch, this seems to
> assume that capability handlers are always consecutive in the list of
> handlers, which I don't think it's the case.
> 
>> +        }
>> +        if ( cmp <= 0 )
>> +            break;
>> +    }
>> +    spin_unlock(&vpci->lock);
>> +}
>> +
>> +static void vpci_init_capabilities(struct pci_dev *pdev)
>> +{
>> +    for ( unsigned int i = 0; i < NUM_VPCI_INIT; i++ )
>> +    {
>> +        const vpci_capability_t *capability = __start_vpci_array[i];
>> +        const unsigned int cap = capability->id;
>> +        const bool is_ext = capability->is_ext;
>> +        unsigned int pos;
>> +        int rc;
>> +
>> +        if ( !is_hardware_domain(pdev->domain) && is_ext )
>> +            continue;
>> +
>> +        if ( is_ext )
>> +            pos = pci_find_ext_capability(pdev->sbdf, cap);
>> +        else
>> +            pos = pci_find_cap_offset(pdev->sbdf, cap);
>> +
>> +        if ( !pos )
>> +            continue;
>> +
>> +        rc = capability->init(pdev);
>> +
>> +        if ( rc )
>> +        {
>> +            printk(XENLOG_WARNING "%pd %pp: %s cap %u init fail rc=%d, mask it\n",
>> +                   pdev->domain, &pdev->sbdf,
>> +                   is_ext ? "extended" : "legacy", cap, rc);
>> +            if ( is_ext )
>> +                vpci_ext_capability_mask(pdev, cap);
>> +            else
>> +                vpci_capability_mask(pdev, cap);
>> +        }
>> +    }
>> +}
>> +
>>  void vpci_deassign_device(struct pci_dev *pdev)
>>  {
>>      unsigned int i;
>> @@ -128,7 +271,6 @@ void vpci_deassign_device(struct pci_dev *pdev)
>>  
>>  int vpci_assign_device(struct pci_dev *pdev)
>>  {
>> -    unsigned int i;
>>      const unsigned long *ro_map;
>>      int rc = 0;
>>  
>> @@ -159,12 +301,11 @@ int vpci_assign_device(struct pci_dev *pdev)
>>          goto out;
>>  #endif
>>  
>> -    for ( i = 0; i < NUM_VPCI_INIT; i++ )
>> -    {
>> -        rc = __start_vpci_array[i](pdev);
>> -        if ( rc )
>> -            break;
>> -    }
>> +    rc = vpci_init_header(pdev);
>> +    if ( rc )
>> +        goto out;
> 
> If you use the out label here you can remove the __maybe_unused
> attribute from it.
> 
>> +
>> +    vpci_init_capabilities(pdev);
>>  
>>   out: __maybe_unused;
>>      if ( rc )
>> @@ -174,22 +315,6 @@ int vpci_assign_device(struct pci_dev *pdev)
>>  }
>>  #endif /* __XEN__ */
>>  
>> -static int vpci_register_cmp(const struct vpci_register *r1,
>> -                             const struct vpci_register *r2)
>> -{
>> -    /* Return 0 if registers overlap. */
>> -    if ( r1->offset < r2->offset + r2->size &&
>> -         r2->offset < r1->offset + r1->size )
>> -        return 0;
>> -    if ( r1->offset < r2->offset )
>> -        return -1;
>> -    if ( r1->offset > r2->offset )
>> -        return 1;
>> -
>> -    ASSERT_UNREACHABLE();
>> -    return 0;
>> -}
>> -
>>  /* Dummy hooks, writes are ignored, reads return 1's */
>>  static uint32_t cf_check vpci_ignored_read(
>>      const struct pci_dev *pdev, unsigned int reg, void *data)
>> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
>> index 27b4f44eedf3..5fe6653fded4 100644
>> --- a/xen/include/xen/pci_regs.h
>> +++ b/xen/include/xen/pci_regs.h
>> @@ -449,6 +449,7 @@
>>  #define PCI_EXT_CAP_ID(header)		((header) & 0x0000ffff)
>>  #define PCI_EXT_CAP_VER(header)		(((header) >> 16) & 0xf)
>>  #define PCI_EXT_CAP_NEXT(header)	(((header) >> 20) & 0xffc)
>> +#define PCI_EXT_CAP_NEXT_MASK		0xFFC00000U
>>  
>>  #define PCI_EXT_CAP_ID_ERR	1
>>  #define PCI_EXT_CAP_ID_VC	2
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index 807401b2eaa2..5016ded64d89 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -13,11 +13,11 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
>>  typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
>>                            uint32_t val, void *data);
>>  
>> -typedef int vpci_register_init_t(struct pci_dev *dev);
>> -
>> -#define VPCI_PRIORITY_HIGH      "1"
>> -#define VPCI_PRIORITY_MIDDLE    "5"
>> -#define VPCI_PRIORITY_LOW       "9"
>> +typedef struct {
>> +    unsigned int id;
>> +    bool is_ext;
>> +    int (*init)(struct pci_dev *pdev);
>> +} vpci_capability_t;
>>  
>>  #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
>>  
>> @@ -29,9 +29,19 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
>>   */
>>  #define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
>>  
>> -#define REGISTER_VPCI_INIT(x, p)                \
>> -  static vpci_register_init_t *const x##_entry  \
>> -               __used_section(".data.vpci." p) = (x)
>> +#define REGISTER_VPCI_CAP(cap, x, ext) \
>> +  static vpci_capability_t x##_t = { \
>> +        .id = (cap), \
>> +        .init = (x), \
>> +        .is_ext = (ext), \
>> +  }; \
>> +  static vpci_capability_t *const x##_entry  \
>> +               __used_section(".data.vpci.") = &(x##_t)
>> +
>> +#define REGISTER_VPCI_LEGACY_CAP(cap, x) REGISTER_VPCI_CAP(cap, x, false)
>> +#define REGISTER_VPCI_EXTEND_CAP(cap, x) REGISTER_VPCI_CAP(cap, x, true)
> 
> Nit: I would use EXTENDED here, there's no need to keep both defines
> the same length.
> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-04-15 13:29   ` Roger Pau Monné
@ 2025-04-16  6:16     ` Chen, Jiqian
  2025-04-16  7:47       ` Roger Pau Monné
  0 siblings, 1 reply; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-16  6:16 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Chen, Jiqian

On 2025/4/15 21:29, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 02:45:27PM +0800, Jiqian Chen wrote:
>> When init_msi() fails, the previous new changes will hide MSI
>> capability, it can't rely on vpci_deassign_device() to remove
>> all MSI related resources anymore, those resources must be
>> cleaned up in failure path of init_msi.
>>
>> To do that, add a new function to free MSI resources.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> ---
>> v1->v2 changes:
>> * Added a new function fini_msi to free all MSI resources instead of using an array to record registered registers.
>> ---
>>  xen/drivers/vpci/msi.c | 47 +++++++++++++++++++++++++++++++++---------
>>  1 file changed, 37 insertions(+), 10 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
>> index ca89ae9b9c22..48a466dba0ef 100644
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -193,6 +193,33 @@ static void cf_check mask_write(
>>      msi->mask = val;
>>  }
>>  
>> +/* 12 is size of MSI structure for 32-bit Message Address without PVM */
>> +#define MSI_STRUCTURE_SIZE_32 12
> 
> I'm confused by this.  The minimum size of the capability is 4 bytes
> for the capability pointer, plus 4 bytes for the message address,
> plus 2 bytes for the message data.  So that's 10 bytes in total.
The remain 2 bytes is Extended Message Data, PCIe spec has this register's definition even though it is optional.

> 
> I think it would be best if you calculate the size based on the
> existing msi_ macros.
> 
> if ( masking )
>     end = msi_pending_bits_reg(msi_pos, address64);
> else
>     end = msi_mask_bits_reg(msi_pos, address64) - 2;
> 
> size = end - msi_control_reg(msi_pos);
Thanks, I will change to this in next version.

> 
>> +
>> +static void fini_msi(struct pci_dev *pdev)
>> +{
>> +    unsigned int size = MSI_STRUCTURE_SIZE_32;
>> +
>> +    if ( !pdev->vpci->msi )
>> +        return;
>> +
>> +    if ( pdev->vpci->msi->address64 )
>> +        size += 4;
>> +    if ( pdev->vpci->msi->masking )
>> +        size += 4;
>> +
>> +    /*
>> +     * Remove all possible registered registers except capability ID
>> +     * register and next capability pointer register, which will be
>> +     * removed in mask function.
>> +     *msi_mask_bits_reg/
>> +    vpci_remove_registers(pdev->vpci,
>> +                          msi_control_reg(pdev->msi_pos),
>> +                          size - PCI_MSI_FLAGS);
>> +    xfree(pdev->vpci->msi);
>> +    pdev->vpci->msi = NULL;
>> +}
>> +
>>  static int cf_check init_msi(struct pci_dev *pdev)
>>  {
>>      unsigned int pos = pdev->msi_pos;
>> @@ -209,12 +236,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>>      ret = vpci_add_register(pdev->vpci, control_read, control_write,
>>                              msi_control_reg(pos), 2, pdev->vpci->msi);
>>      if ( ret )
>> -        /*
>> -         * NB: there's no need to free the msi struct or remove the register
>> -         * handlers form the config space, the caller will take care of the
>> -         * cleanup.
>> -         */
>> -        return ret;
>> +        goto fail;
>>  
>>      /* Get the maximum number of vectors the device supports. */
>>      control = pci_conf_read16(pdev->sbdf, msi_control_reg(pos));
>> @@ -237,20 +259,20 @@ static int cf_check init_msi(struct pci_dev *pdev)
>>      ret = vpci_add_register(pdev->vpci, address_read, address_write,
>>                              msi_lower_address_reg(pos), 4, pdev->vpci->msi);
>>      if ( ret )
>> -        return ret;
>> +        goto fail;
>>  
>>      ret = vpci_add_register(pdev->vpci, data_read, data_write,
>>                              msi_data_reg(pos, pdev->vpci->msi->address64), 2,
>>                              pdev->vpci->msi);
>>      if ( ret )
>> -        return ret;
>> +        goto fail;
>>  
>>      if ( pdev->vpci->msi->address64 )
>>      {
>>          ret = vpci_add_register(pdev->vpci, address_hi_read, address_hi_write,
>>                                  msi_upper_address_reg(pos), 4, pdev->vpci->msi);
>>          if ( ret )
>> -            return ret;
>> +            goto fail;
>>      }
>>  
>>      if ( pdev->vpci->msi->masking )
>> @@ -260,7 +282,7 @@ static int cf_check init_msi(struct pci_dev *pdev)
>>                                                    pdev->vpci->msi->address64),
>>                                  4, pdev->vpci->msi);
>>          if ( ret )
>> -            return ret;
>> +            goto fail;
>>          /*
>>           * FIXME: do not add any handler for the pending bits for the hardware
>>           * domain, which means direct access. This will be revisited when
>> @@ -269,6 +291,11 @@ static int cf_check init_msi(struct pci_dev *pdev)
>>      }
>>  
>>      return 0;
>> +
>> + fail:
>> +    fini_msi(pdev);
>> +
>> +    return ret;
>>  }
>>  REGISTER_VPCI_LEGACY_CAP(PCI_CAP_ID_MSI, init_msi);
> 
> I wonder if the fini_msi should be referenced in
> REGISTER_VPCI_{EXTENDED,LEGACY}_CAP(), so that the caller of
> init_msi() can call fini_msi() on failure and thus avoid all those
> fail hooks and labels, as the caller can take care of the cleanup.
Got it. I will add a hook for fini_x function.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 8/8] vpci/msix: Add function to clean MSIX resources
  2025-04-15 13:45   ` Roger Pau Monné
@ 2025-04-16  6:20     ` Chen, Jiqian
  2025-04-16  7:54       ` Roger Pau Monné
  0 siblings, 1 reply; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-16  6:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Jan Beulich,
	Andrew Cooper, Chen, Jiqian

On 2025/4/15 21:45, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 02:45:28PM +0800, Jiqian Chen wrote: 
>> +
>> +    for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
>> +        if ( vpci->msix->table[i] )
>> +            iounmap(vpci->msix->table[i]);
> 
> The MSI-X init function never maps tables, so the code here (given
> it's current usage) will also never unmap anything.  However I would
> also like to use this code for vPCI deassing, at which point the logic
> will get used.
So, I still need to keep this part, right?

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host
  2025-04-16  2:51     ` Chen, Jiqian
@ 2025-04-16  7:32       ` Roger Pau Monné
  0 siblings, 0 replies; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-16  7:32 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: xen-devel@lists.xenproject.org, Huang, Ray

On Wed, Apr 16, 2025 at 02:51:18AM +0000, Chen, Jiqian wrote:
> On 2025/4/15 18:10, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 02:45:22PM +0800, Jiqian Chen wrote:
> >> +        while ( next && ttl )
> >> +        {
> >> +            unsigned int pos = next;
> >> +
> >> +            next = pci_find_next_cap_ttl(pdev->sbdf, pos + PCI_CAP_LIST_NEXT,
> >> +                                         caps, n, &ttl);
> >> +
> >> +            rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
> >> +                                   pos + PCI_CAP_LIST_ID, 1, NULL);
> >> +            if ( rc )
> >> +                return rc;
> >> +
> >> +            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> >> +                                   pos + PCI_CAP_LIST_NEXT, 1,
> >> +                                   (void *)(uintptr_t)next);
> >> +            if ( rc )
> >> +                return rc;
> >> +
> >> +            next &= ~3;
> >> +        }
> >> +    }
> >> +
> >> +    /* Utilize rsvdp_mask to hide PCI_STATUS_CAP_LIST from the guest. */
> >> +    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
> >> +                                PCI_STATUS, 2, NULL,
> >> +                                PCI_STATUS_RO_MASK &
> >> +                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
> >> +                                PCI_STATUS_RW1C_MASK,
> >> +                                mask_cap_list ? PCI_STATUS_CAP_LIST : 0,
> >> +                                PCI_STATUS_RSVDZ_MASK);
> > 
> > One further remark I've forgot to make on the previous reply: I'm
> > unsure we want to do this for dom0, as we in general allow dom0
> > unfiltered write access (unless it's for accesses mediated by Xen).
> This part is the original implementation before my patches.
> If you want to restrict this only for domUs, not for dom0.

Oh, indeed.  I was confused thinking it was inside the
!is_hardware_domain() conditional.  Leave it like that, your series is
already long enough.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 7/8] vpci/msi: Free MSI resources when init_msi() fails
  2025-04-16  6:16     ` Chen, Jiqian
@ 2025-04-16  7:47       ` Roger Pau Monné
  0 siblings, 0 replies; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-16  7:47 UTC (permalink / raw)
  To: Chen, Jiqian; +Cc: xen-devel@lists.xenproject.org, Huang, Ray

On Wed, Apr 16, 2025 at 06:16:36AM +0000, Chen, Jiqian wrote:
> On 2025/4/15 21:29, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 02:45:27PM +0800, Jiqian Chen wrote:
> >> When init_msi() fails, the previous new changes will hide MSI
> >> capability, it can't rely on vpci_deassign_device() to remove
> >> all MSI related resources anymore, those resources must be
> >> cleaned up in failure path of init_msi.
> >>
> >> To do that, add a new function to free MSI resources.
> >>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >> ---
> >> cc: "Roger Pau Monné" <roger.pau@citrix.com>
> >> ---
> >> v1->v2 changes:
> >> * Added a new function fini_msi to free all MSI resources instead of using an array to record registered registers.
> >> ---
> >>  xen/drivers/vpci/msi.c | 47 +++++++++++++++++++++++++++++++++---------
> >>  1 file changed, 37 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> >> index ca89ae9b9c22..48a466dba0ef 100644
> >> --- a/xen/drivers/vpci/msi.c
> >> +++ b/xen/drivers/vpci/msi.c
> >> @@ -193,6 +193,33 @@ static void cf_check mask_write(
> >>      msi->mask = val;
> >>  }
> >>  
> >> +/* 12 is size of MSI structure for 32-bit Message Address without PVM */
> >> +#define MSI_STRUCTURE_SIZE_32 12
> > 
> > I'm confused by this.  The minimum size of the capability is 4 bytes
> > for the capability pointer, plus 4 bytes for the message address,
> > plus 2 bytes for the message data.  So that's 10 bytes in total.
> The remain 2 bytes is Extended Message Data, PCIe spec has this register's definition even though it is optional.

I was looking at the PCI Local Bus Specification rev 3.0 (because that
has nicer figures IMO), and such field is not listed there.  It's
listed in the PCI Express 6.0.1 specification.  I have to admit I've
completely forgot about such optional feature.

We never expose this bit to the guest in the control register, and
consequently we never explicitly handle accesses to what would be the
extended message data register, neither allow guests to enable it.

If/when we add support for such field we will likely need to adjust
fini_msi() to also zap it if present.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 8/8] vpci/msix: Add function to clean MSIX resources
  2025-04-16  6:20     ` Chen, Jiqian
@ 2025-04-16  7:54       ` Roger Pau Monné
  0 siblings, 0 replies; 41+ messages in thread
From: Roger Pau Monné @ 2025-04-16  7:54 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Huang, Ray, Jan Beulich,
	Andrew Cooper

On Wed, Apr 16, 2025 at 06:20:50AM +0000, Chen, Jiqian wrote:
> On 2025/4/15 21:45, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 02:45:28PM +0800, Jiqian Chen wrote: 
> >> +
> >> +    for ( unsigned int i = 0; i < ARRAY_SIZE(vpci->msix->table); i++ )
> >> +        if ( vpci->msix->table[i] )
> >> +            iounmap(vpci->msix->table[i]);
> > 
> > The MSI-X init function never maps tables, so the code here (given
> > it's current usage) will also never unmap anything.  However I would
> > also like to use this code for vPCI deassing, at which point the logic
> > will get used.
> So, I still need to keep this part, right?

Yes, I guess leave it for now, we can always rip it later.  I would
like to see how doing the cleanup in the caller in case of failure
looks.

Then we can likely extend such cleanup to be done in the deassign case
also.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 8/8] vpci/msix: Add function to clean MSIX resources
  2025-04-09  6:45 ` [PATCH v2 8/8] vpci/msix: Add function to clean MSIX resources Jiqian Chen
  2025-04-15 13:45   ` Roger Pau Monné
@ 2025-04-17  7:34   ` Jan Beulich
  2025-04-17  7:38     ` Chen, Jiqian
  1 sibling, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2025-04-17  7:34 UTC (permalink / raw)
  To: Jiqian Chen; +Cc: Huang Rui, Andrew Cooper, Roger Pau Monné, xen-devel

On 09.04.2025 08:45, Jiqian Chen wrote:
> --- a/xen/arch/x86/hvm/intercept.c
> +++ b/xen/arch/x86/hvm/intercept.c
> @@ -276,6 +276,50 @@ void register_mmio_handler(struct domain *d,
>      handler->mmio.ops = ops;
>  }
>  
> +void unregister_mmio_handler(struct domain *d,
> +                             const struct hvm_mmio_ops *ops)
> +{
> +    unsigned int i, count = d->arch.hvm.io_handler_count;
> +
> +    ASSERT(d->arch.hvm.io_handler);
> +
> +    if ( !count )
> +        return;
> +
> +    for ( i = 0; i < count; i++ )
> +        if ( d->arch.hvm.io_handler[i].type == IOREQ_TYPE_COPY &&
> +             d->arch.hvm.io_handler[i].mmio.ops == ops )
> +            break;
> +
> +    if ( i == count )
> +        return;
> +
> +    for ( ; i < count - 1; i++ )
> +    {
> +        struct hvm_io_handler *curr = &d->arch.hvm.io_handler[i];
> +        struct hvm_io_handler *next = &d->arch.hvm.io_handler[i + 1];
> +
> +        curr->type = next->type;
> +        curr->ops = next->ops;
> +        if ( next->type == IOREQ_TYPE_COPY )
> +        {
> +            curr->portio.port = 0;
> +            curr->portio.size = 0;
> +            curr->portio.action = 0;
> +            curr->mmio.ops = next->mmio.ops;
> +        }
> +        else
> +        {
> +            curr->mmio.ops = 0;
> +            curr->portio.port = next->portio.port;
> +            curr->portio.size = next->portio.size;
> +            curr->portio.action = next->portio.action;
> +        }
> +    }
> +
> +    d->arch.hvm.io_handler_count--;
> +}

To add on what Roger said: This is inherently non-atomic, so the domain
would need pausing to do such removal in a race-free way. Hence why we
deliberately didn't have such a function so far, aiui. (The removal may
be safe in the specific case you use it helow, but there's nothing here
preventing it from being used elsewhere, without paying attention to
the raciness.)

Jan


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 8/8] vpci/msix: Add function to clean MSIX resources
  2025-04-17  7:34   ` Jan Beulich
@ 2025-04-17  7:38     ` Chen, Jiqian
  0 siblings, 0 replies; 41+ messages in thread
From: Chen, Jiqian @ 2025-04-17  7:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Huang, Ray, Andrew Cooper, Roger Pau Monné,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2025/4/17 15:34, Jan Beulich wrote:
> On 09.04.2025 08:45, Jiqian Chen wrote:
>> --- a/xen/arch/x86/hvm/intercept.c
>> +++ b/xen/arch/x86/hvm/intercept.c
>> @@ -276,6 +276,50 @@ void register_mmio_handler(struct domain *d,
>>      handler->mmio.ops = ops;
>>  }
>>  
>> +void unregister_mmio_handler(struct domain *d,
>> +                             const struct hvm_mmio_ops *ops)
>> +{
>> +    unsigned int i, count = d->arch.hvm.io_handler_count;
>> +
>> +    ASSERT(d->arch.hvm.io_handler);
>> +
>> +    if ( !count )
>> +        return;
>> +
>> +    for ( i = 0; i < count; i++ )
>> +        if ( d->arch.hvm.io_handler[i].type == IOREQ_TYPE_COPY &&
>> +             d->arch.hvm.io_handler[i].mmio.ops == ops )
>> +            break;
>> +
>> +    if ( i == count )
>> +        return;
>> +
>> +    for ( ; i < count - 1; i++ )
>> +    {
>> +        struct hvm_io_handler *curr = &d->arch.hvm.io_handler[i];
>> +        struct hvm_io_handler *next = &d->arch.hvm.io_handler[i + 1];
>> +
>> +        curr->type = next->type;
>> +        curr->ops = next->ops;
>> +        if ( next->type == IOREQ_TYPE_COPY )
>> +        {
>> +            curr->portio.port = 0;
>> +            curr->portio.size = 0;
>> +            curr->portio.action = 0;
>> +            curr->mmio.ops = next->mmio.ops;
>> +        }
>> +        else
>> +        {
>> +            curr->mmio.ops = 0;
>> +            curr->portio.port = next->portio.port;
>> +            curr->portio.size = next->portio.size;
>> +            curr->portio.action = next->portio.action;
>> +        }
>> +    }
>> +
>> +    d->arch.hvm.io_handler_count--;
>> +}
> 
> To add on what Roger said: This is inherently non-atomic, so the domain
> would need pausing to do such removal in a race-free way. Hence why we
> deliberately didn't have such a function so far, aiui. (The removal may
> be safe in the specific case you use it helow, but there's nothing here
> preventing it from being used elsewhere, without paying attention to
> the raciness.)
Make sense, thanks for your detailed inputs.
I will delete this part in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2025-04-17  7:38 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-09  6:45 [PATCH v2 0/8] Support hiding capability when its initialization fails Jiqian Chen
2025-04-09  6:45 ` [PATCH v2 1/8] driver/pci: Get next capability without passing caps Jiqian Chen
2025-04-10 12:34   ` Jan Beulich
2025-04-11  2:51     ` Chen, Jiqian
2025-04-11  8:00       ` Jan Beulich
2025-04-11  8:08         ` Chen, Jiqian
2025-04-09  6:45 ` [PATCH v2 2/8] vpci/header: Emulate legacy capability list for host Jiqian Chen
2025-04-10 12:40   ` Jan Beulich
2025-04-11  2:54     ` Chen, Jiqian
2025-04-11  8:01       ` Jan Beulich
2025-04-15  9:25   ` Roger Pau Monné
2025-04-15 10:07     ` Chen, Jiqian
2025-04-15 10:08       ` Jan Beulich
2025-04-15 10:23         ` Chen, Jiqian
2025-04-15 10:50       ` Roger Pau Monné
2025-04-15 10:10   ` Roger Pau Monné
2025-04-16  2:51     ` Chen, Jiqian
2025-04-16  7:32       ` Roger Pau Monné
2025-04-09  6:45 ` [PATCH v2 3/8] vpci/header: Emulate extended " Jiqian Chen
2025-04-15  9:42   ` Roger Pau Monné
2025-04-15 10:11     ` Chen, Jiqian
2025-04-15  9:49   ` Jan Beulich
2025-04-15 10:18     ` Chen, Jiqian
2025-04-15 10:20       ` Jan Beulich
2025-04-09  6:45 ` [PATCH v2 4/8] vpci: Hide capability when it fails to initialize Jiqian Chen
2025-04-15 10:47   ` Roger Pau Monné
2025-04-16  6:07     ` Chen, Jiqian
2025-04-09  6:45 ` [PATCH v2 5/8] vpci: Refactor vpci_remove_register to remove matched registers Jiqian Chen
2025-04-15 11:06   ` Roger Pau Monné
2025-04-09  6:45 ` [PATCH v2 6/8] vpci/rebar: Remove registers when init_rebar() fails Jiqian Chen
2025-04-15 12:59   ` Roger Pau Monné
2025-04-09  6:45 ` [PATCH v2 7/8] vpci/msi: Free MSI resources when init_msi() fails Jiqian Chen
2025-04-15 13:29   ` Roger Pau Monné
2025-04-16  6:16     ` Chen, Jiqian
2025-04-16  7:47       ` Roger Pau Monné
2025-04-09  6:45 ` [PATCH v2 8/8] vpci/msix: Add function to clean MSIX resources Jiqian Chen
2025-04-15 13:45   ` Roger Pau Monné
2025-04-16  6:20     ` Chen, Jiqian
2025-04-16  7:54       ` Roger Pau Monné
2025-04-17  7:34   ` Jan Beulich
2025-04-17  7:38     ` Chen, Jiqian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.