[XEN PATCH v10 0/5] Support device passthrough when dom0 is PVH on Xen

All of lore.kernel.org
 help / color / mirror / Atom feed

* [XEN PATCH v10 0/5] Support device passthrough when dom0 is PVH on Xen
@ 2024-06-17  9:00 Jiqian Chen
  2024-06-17  9:00 ` [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device Jiqian Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 48+ messages in thread
From: Jiqian Chen @ 2024-06-17  9:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Huang Rui,
	Jiqian Chen, Huang Rui

Hi All,
This is v10 series to support passthrough when dom0 is PVH
v9->v10 changes:
* patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code style.
* patch#3: Modified the description in the commit message, changing "it calls" to "it will need to call",
           indicating that there will be new codes on the kernel side that will call PHYSDEVOP_setup_gsi.
           Also added an explanation of why the interrupt of passthrough device does not work if gsi is not
           registered.
* patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate x86 code in libxl_pci.c.
* patch#5: Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission.
           Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission directly in pci_add_dm_done.
           Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd
           instead of current->domain.
           In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and
           error handling for irq0 was added.
           Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission
           definition.
All patches have modified signatures as follows:
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> means I am the author.
Signed-off-by: Huang Rui <ray.huang@amd.com> means Rui sent them to upstream firstly.
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> means I take continue to upstream.

Best regards,
Jiqian Chen

v8->v9 changes:
* patch#1: Move pcidevs_unlock below write_lock, and remove "ASSERT(pcidevs_locked());"
           from vpci_reset_device_state;
           Add pci_device_state_reset_type to distinguish the reset types.
* patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall.
           Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" to
           "d == current->domian".
* patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke in below.Although their return
           values are different, this difference is acceptable for the sake of code consistency
           if ( !is_hardware_domain(currd) )
		       return -ENOSYS;
           break;
* patch#5: Change the commit message to describe more why we need this new hypercall.
           Add comment above "if ( is_pv_domain(current->domain) || has_pirq(current->domain) )" to explain
           why we need this check.
           Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq.
           Add explicit padding to struct xen_domctl_gsi_permission.

v7->v8 changes:
* patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't use pirq.
           That check was missed in the previous version.
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function
           to get gsi by passing in the sbdf of pci device.
* patch#5: Remove the parameter "is_gsi", when there exist gsi, in pci_add_dm_done use a new function
           pci_device_set_gsi to do map_pirq and grant permission. That gets more intuitive code logic.

v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function
           to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.

v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old function vpci_remove_device,
           vpci_add_handlers to vpci_deassign_device, vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to be understood, which to use
           gsi by default and be compatible with older kernel versions to continue to use irq

v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and
           just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on it. And add the handling of errno
           and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi

v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH is treated as a separate patch
* patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to grant irq permission in
           XEN_DOMCTL_irq_permission.
* patch#5: to be compatible with previous kernel versions, when there is no gsi sysfs, still use irq
v4 link:
https://lore.kernel.org/xen-devel/20240105070920.350113-1-Jiqian.Chen@amd.com/T/#t

v2->v3 changes:
* patch#1: move the content out of pci_reset_device_state and delete pci_reset_device_state; add
           xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; add description for
		   PHYSDEVOP_pci_device_state_reset;
* patch#2: du to changes in the implementation of the second patch on kernel side(that it will do setup_gsi and
           map_pirq when assigning a device to passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need
		   to support self mapping.
* patch#3: du to changes in the implementation of the second patch on kernel side(that adds a new sysfs for gsi
           instead of a new syscall), so read gsi number from the sysfs of gsi.
v3 link:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-Jiqian.Chen@amd.com/T/#t

v2 link:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-Jiqian.Chen@amd.com/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions.
I will introduce all issues that these patches try to fix and the differences between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add <sbdf>\u201d to assign a device, pci_stub will call
pcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword()\u201d, the pci config
write will trigger an io interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when 
bar->Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, so the current cached state in
pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new hypercall to reset the
state stored in vPCI when the state of real device has changed.
Thank Roger for the suggestion of this v2, and it is different from
v1 (https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.huang@amd.com/), v1 simply allow domU to write
pci bar, it does not comply with the design principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using gsi. See
xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call
into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail
at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 (at present dom0 is PVH). The
second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do PHYSDEVOP_map_pirq. This v2
patch is better than v1, v1 simply remove the has_pirq check
(xen https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.huang@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to check if the gsi has corresponding
mappings in dom0. But it didn\u2019t, so failed. See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current"
is PVH dom0 and it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH dom0, because the devices of PVH
are using MSI(-X) interrupts. However, the IO-APIC pin must be configured for it to be able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be done in function
vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask irq for passthrough device in PVH dom0)
call the unmask_irq() when we assign a device to be passthrough. So that passthrough devices can have the mapping of
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the
v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@amd.com/,
kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@amd.com/ and
xen https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.huang@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, which is unnecessary and may cause
multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device\u2019s gsi to pirq in function
xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass irq to it and
treat irq as gsi, it is got from file /sys/bus/pci/devices/xxxx:xx:xx.x/irq in function xen_host_pci_device_get().
But actually the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a gsi in
function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of applying first, distributing
first. And if you debug the kernel codes(see function __irq_alloc_descs), you will find the irq number is allocated
from small to large by order, but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38
get a smaller irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we can do a
translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the relations
in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall for userspace to get the gsi
from irq. The third patch of xen(tools: Add new function to get gsi from irq) add a new function
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 patch is the
same as v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.huang@amd.com/ and
xen https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.huang@amd.com/)

About the v2 patch of qemu, just change an included head file, other are similar to the
v1 ( qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.huang@amd.com/), just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (5):
  xen/vpci: Clear all vpci status of device
  x86/pvh: Allow (un)map_pirq when dom0 is PVH
  x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  tools: Add new function to get gsi from dev
  domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

 tools/include/xen-sys/Linux/privcmd.h |   7 ++
 tools/include/xencall.h               |   2 +
 tools/include/xenctrl.h               |   7 ++
 tools/libs/call/core.c                |   5 ++
 tools/libs/call/libxencall.map        |   2 +
 tools/libs/call/linux.c               |  15 ++++
 tools/libs/call/private.h             |   9 +++
 tools/libs/ctrl/xc_domain.c           |  15 ++++
 tools/libs/ctrl/xc_physdev.c          |   4 +
 tools/libs/light/Makefile             |   2 +-
 tools/libs/light/libxl_pci.c          | 105 ++++++++++++++++++++++++--
 xen/arch/x86/domctl.c                 |  43 +++++++++++
 xen/arch/x86/hvm/hypercall.c          |   8 ++
 xen/arch/x86/include/asm/io_apic.h    |   2 +
 xen/arch/x86/io_apic.c                |  17 +++++
 xen/arch/x86/mpparse.c                |   3 +-
 xen/arch/x86/physdev.c                |  14 ++++
 xen/drivers/pci/physdev.c             |  43 +++++++++++
 xen/drivers/vpci/vpci.c               |   9 +++
 xen/include/public/domctl.h           |   8 ++
 xen/include/public/physdev.h          |   7 ++
 xen/include/xen/pci.h                 |  16 ++++
 xen/include/xen/vpci.h                |   6 ++
 xen/xsm/flask/hooks.c                 |   1 +
 24 files changed, 341 insertions(+), 9 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device
  2024-06-17  9:00 [XEN PATCH v10 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
@ 2024-06-17  9:00 ` Jiqian Chen
  2024-06-17 14:17   ` Jan Beulich
  2024-06-17  9:00 ` [XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 48+ messages in thread
From: Jiqian Chen @ 2024-06-17  9:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Huang Rui,
	Jiqian Chen, Huang Rui, Stewart Hildebrand

When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c    | 43 ++++++++++++++++++++++++++++++++++++
 xen/drivers/vpci/vpci.c      |  9 ++++++++
 xen/include/public/physdev.h |  7 ++++++
 xen/include/xen/pci.h        | 16 ++++++++++++++
 xen/include/xen/vpci.h       |  6 +++++
 6 files changed, 82 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 7fb3136f0c7c..0fab670a4871 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case PHYSDEVOP_pci_mmcfg_reserved:
     case PHYSDEVOP_pci_device_add:
     case PHYSDEVOP_pci_device_remove:
+    case PHYSDEVOP_pci_device_state_reset:
     case PHYSDEVOP_dbgp_op:
         if ( !is_hardware_domain(currd) )
             return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..1cce508a73b1 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,11 +2,17 @@
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
 #include <xen/init.h>
+#include <xen/vpci.h>
 
 #ifndef COMPAT
 typedef long ret_t;
 #endif
 
+static const struct pci_device_state_reset_method
+                    pci_device_state_reset_methods[] = {
+    [ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
+};
+
 ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     ret_t ret;
@@ -67,6 +73,43 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case PHYSDEVOP_pci_device_state_reset: {
+        struct pci_device_state_reset dev_reset;
+        struct physdev_pci_device *dev;
+        struct pci_dev *pdev;
+        pci_sbdf_t sbdf;
+
+        if ( !is_pci_passthrough_enabled() )
+            return -EOPNOTSUPP;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
+            break;
+        dev = &dev_reset.dev;
+        sbdf = PCI_SBDF(dev->seg, dev->bus, dev->devfn);
+
+        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+        if ( ret )
+            break;
+
+        pcidevs_lock();
+        pdev = pci_get_pdev(NULL, sbdf);
+        if ( !pdev )
+        {
+            pcidevs_unlock();
+            ret = -ENODEV;
+            break;
+        }
+
+        write_lock(&pdev->domain->pci_lock);
+        pcidevs_unlock();
+        ret = pci_device_state_reset_methods[dev_reset.reset_type].reset_fn(pdev);
+        write_unlock(&pdev->domain->pci_lock);
+        if ( ret )
+            printk(XENLOG_ERR "%pp: failed to reset vPCI device state\n", &sbdf);
+        break;
+    }
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..ff67c2550ccb 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev)
 
     return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+    vpci_deassign_device(pdev);
+    return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..a71da5892e5f 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix          30
 #define PHYSDEVOP_release_msix          31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
     /* IN */
     uint16_t seg;
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 63e49f0117e9..376981f9da98 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -156,6 +156,22 @@ struct pci_dev {
     struct vpci *vpci;
 };
 
+struct pci_device_state_reset_method {
+    int (*reset_fn)(struct pci_dev *pdev);
+};
+
+enum pci_device_state_reset_type {
+    DEVICE_RESET_FLR,
+    DEVICE_RESET_COLD,
+    DEVICE_RESET_WARM,
+    DEVICE_RESET_HOT,
+};
+
+struct pci_device_state_reset {
+    struct physdev_pci_device dev;
+    enum pci_device_state_reset_type reset_type;
+};
+
 #define for_each_pdev(domain, pdev) \
     list_for_each_entry(pdev, &(domain)->pdev_list, domain_list)
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index da8d0f41e6f4..b230fd374de5 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -38,6 +38,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -282,6 +283,11 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
 
 static inline void vpci_deassign_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+    return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device
  2024-06-17  9:00 ` [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device Jiqian Chen
@ 2024-06-17 14:17   ` Jan Beulich
  2024-06-18  6:25     ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-17 14:17 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel

On 17.06.2024 11:00, Jiqian Chen wrote:
> --- a/xen/drivers/pci/physdev.c
> +++ b/xen/drivers/pci/physdev.c
> @@ -2,11 +2,17 @@
>  #include <xen/guest_access.h>
>  #include <xen/hypercall.h>
>  #include <xen/init.h>
> +#include <xen/vpci.h>
>  
>  #ifndef COMPAT
>  typedef long ret_t;
>  #endif
>  
> +static const struct pci_device_state_reset_method
> +                    pci_device_state_reset_methods[] = {
> +    [ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
> +};

What about the other three DEVICE_RESET_*? In particular ...

> @@ -67,6 +73,43 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case PHYSDEVOP_pci_device_state_reset: {
> +        struct pci_device_state_reset dev_reset;
> +        struct physdev_pci_device *dev;
> +        struct pci_dev *pdev;
> +        pci_sbdf_t sbdf;
> +
> +        if ( !is_pci_passthrough_enabled() )
> +            return -EOPNOTSUPP;
> +
> +        ret = -EFAULT;
> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
> +            break;
> +        dev = &dev_reset.dev;
> +        sbdf = PCI_SBDF(dev->seg, dev->bus, dev->devfn);
> +
> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
> +        if ( ret )
> +            break;
> +
> +        pcidevs_lock();
> +        pdev = pci_get_pdev(NULL, sbdf);
> +        if ( !pdev )
> +        {
> +            pcidevs_unlock();
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        write_lock(&pdev->domain->pci_lock);
> +        pcidevs_unlock();
> +        ret = pci_device_state_reset_methods[dev_reset.reset_type].reset_fn(pdev);

... you're setting this up for calling NULL. In fact there's also no bounds
check for the array index.

Also, nit (further up): Opening figure braces for a new scope go onto their
own line. Then again I notice that apparenly _all_ other instances in this
file are doing it the wrong way, too.

Finally, is the "dev" local variable really needed? It effectively hides that
PCI_SBDF() is invoked on the hypercall arguments.

> +        write_unlock(&pdev->domain->pci_lock);
> +        if ( ret )
> +            printk(XENLOG_ERR "%pp: failed to reset vPCI device state\n", &sbdf);

Maybe downgrade to dprintk()? The caller ought to handle the error anyway.

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev)
>  
>      return rc;
>  }
> +
> +int vpci_reset_device_state(struct pci_dev *pdev)

As a target of an indirect call this needs to be annotated cf_check (both
here and in the declaration, unlike __must_check, which is sufficient to
have on just the declaration).

> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -156,6 +156,22 @@ struct pci_dev {
>      struct vpci *vpci;
>  };
>  
> +struct pci_device_state_reset_method {
> +    int (*reset_fn)(struct pci_dev *pdev);
> +};
> +
> +enum pci_device_state_reset_type {
> +    DEVICE_RESET_FLR,
> +    DEVICE_RESET_COLD,
> +    DEVICE_RESET_WARM,
> +    DEVICE_RESET_HOT,
> +};
> +
> +struct pci_device_state_reset {
> +    struct physdev_pci_device dev;
> +    enum pci_device_state_reset_type reset_type;
> +};

This is the struct to use as hypercall argument. How can it live outside of
any public header? Also, when moving it there, beware that you should not
use enum-s there. Only handles and fixed-width types are permitted.

> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -38,6 +38,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
>  
>  /* Remove all handlers and free vpci related structures. */
>  void vpci_deassign_device(struct pci_dev *pdev);
> +int __must_check vpci_reset_device_state(struct pci_dev *pdev);

What's the purpose of this __must_check, when the sole caller is calling
this through a function pointer, which isn't similarly annotated?

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device
  2024-06-17 14:17   ` Jan Beulich
@ 2024-06-18  6:25     ` Chen, Jiqian
  2024-06-18  8:33       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-18  6:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/17 22:17, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> --- a/xen/drivers/pci/physdev.c
>> +++ b/xen/drivers/pci/physdev.c
>> @@ -2,11 +2,17 @@
>>  #include <xen/guest_access.h>
>>  #include <xen/hypercall.h>
>>  #include <xen/init.h>
>> +#include <xen/vpci.h>
>>  
>>  #ifndef COMPAT
>>  typedef long ret_t;
>>  #endif
>>  
>> +static const struct pci_device_state_reset_method
>> +                    pci_device_state_reset_methods[] = {
>> +    [ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
>> +};
> 
> What about the other three DEVICE_RESET_*? In particular ...
I don't know how to implement the other three types of reset.
This is a design form so that corresponding processing functions can be added later if necessary. Do I need to set them to NULL pointers in this array?
Does this form conform to your previous suggestion of using only one hypercall to handle all types of resets?

> 
>> @@ -67,6 +73,43 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          break;
>>      }
>>  
>> +    case PHYSDEVOP_pci_device_state_reset: {
>> +        struct pci_device_state_reset dev_reset;
>> +        struct physdev_pci_device *dev;
>> +        struct pci_dev *pdev;
>> +        pci_sbdf_t sbdf;
>> +
>> +        if ( !is_pci_passthrough_enabled() )
>> +            return -EOPNOTSUPP;
>> +
>> +        ret = -EFAULT;
>> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
>> +            break;
>> +        dev = &dev_reset.dev;
>> +        sbdf = PCI_SBDF(dev->seg, dev->bus, dev->devfn);
>> +
>> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>> +        if ( ret )
>> +            break;
>> +
>> +        pcidevs_lock();
>> +        pdev = pci_get_pdev(NULL, sbdf);
>> +        if ( !pdev )
>> +        {
>> +            pcidevs_unlock();
>> +            ret = -ENODEV;
>> +            break;
>> +        }
>> +
>> +        write_lock(&pdev->domain->pci_lock);
>> +        pcidevs_unlock();
>> +        ret = pci_device_state_reset_methods[dev_reset.reset_type].reset_fn(pdev);
> 
> ... you're setting this up for calling NULL. In fact there's also no bounds
> check for the array index.
Oh, right. I will add checks next version.

> 
> Also, nit (further up): Opening figure braces for a new scope go onto their
OK, will change in next version.
> own line. Then again I notice that apparenly _all_ other instances in this
> file are doing it the wrong way, too.
Do I need to change them in this patch?
> 
> Finally, is the "dev" local variable really needed? It effectively hides that
> PCI_SBDF() is invoked on the hypercall arguments.
Will remove "dev" in next version.
> 
>> +        write_unlock(&pdev->domain->pci_lock);
>> +        if ( ret )
>> +            printk(XENLOG_ERR "%pp: failed to reset vPCI device state\n", &sbdf);
> 
> Maybe downgrade to dprintk()? The caller ought to handle the error anyway.
Will downgrade in next version.
> 
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev)
>>  
>>      return rc;
>>  }
>> +
>> +int vpci_reset_device_state(struct pci_dev *pdev)
> 
> As a target of an indirect call this needs to be annotated cf_check (both
> here and in the declaration, unlike __must_check, which is sufficient to
> have on just the declaration).
OK, will add cf_check in next version.
> 
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -156,6 +156,22 @@ struct pci_dev {
>>      struct vpci *vpci;
>>  };
>>  
>> +struct pci_device_state_reset_method {
>> +    int (*reset_fn)(struct pci_dev *pdev);
>> +};
>> +
>> +enum pci_device_state_reset_type {
>> +    DEVICE_RESET_FLR,
>> +    DEVICE_RESET_COLD,
>> +    DEVICE_RESET_WARM,
>> +    DEVICE_RESET_HOT,
>> +};
>> +
>> +struct pci_device_state_reset {
>> +    struct physdev_pci_device dev;
>> +    enum pci_device_state_reset_type reset_type;
>> +};
> 
> This is the struct to use as hypercall argument. How can it live outside of
> any public header? Also, when moving it there, beware that you should not
> use enum-s there. Only handles and fixed-width types are permitted.t
Yes, I put them there before, but enum is not permitted.
Then, do you have other suggested type to use to distinguish different types of resets, because enum can't work in the public header?

> 
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -38,6 +38,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
>>  
>>  /* Remove all handlers and free vpci related structures. */
>>  void vpci_deassign_device(struct pci_dev *pdev);
>> +int __must_check vpci_reset_device_state(struct pci_dev *pdev);
> 
> What's the purpose of this __must_check, when the sole caller is calling
> this through a function pointer, which isn't similarly annotated?
This is what I added before introducing function pointers, but after modifying the implementation, it was not taken into account.
I will remove __must_check and change to cf_check, according to your above comment.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device
  2024-06-18  6:25     ` Chen, Jiqian
@ 2024-06-18  8:33       ` Jan Beulich
  2024-06-19  3:39         ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-18  8:33 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 18.06.2024 08:25, Chen, Jiqian wrote:
> On 2024/6/17 22:17, Jan Beulich wrote:
>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>> --- a/xen/drivers/pci/physdev.c
>>> +++ b/xen/drivers/pci/physdev.c
>>> @@ -2,11 +2,17 @@
>>>  #include <xen/guest_access.h>
>>>  #include <xen/hypercall.h>
>>>  #include <xen/init.h>
>>> +#include <xen/vpci.h>
>>>  
>>>  #ifndef COMPAT
>>>  typedef long ret_t;
>>>  #endif
>>>  
>>> +static const struct pci_device_state_reset_method
>>> +                    pci_device_state_reset_methods[] = {
>>> +    [ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
>>> +};
>>
>> What about the other three DEVICE_RESET_*? In particular ...
> I don't know how to implement the other three types of reset.
> This is a design form so that corresponding processing functions can be added later if necessary. Do I need to set them to NULL pointers in this array?

No.

> Does this form conform to your previous suggestion of using only one hypercall to handle all types of resets?

Yes, at least in principle. Question here is: To be on the safe side,
wouldn't we better reset state for all forms of reset, leaving possible
relaxation of that for later? At which point the function wouldn't need
calling indirectly, and instead would be passed the reset type as an
argument.

>> Also, nit (further up): Opening figure braces for a new scope go onto their
> OK, will change in next version.
>> own line. Then again I notice that apparenly _all_ other instances in this
>> file are doing it the wrong way, too.
> Do I need to change them in this patch?

No.

>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>  
>>>      return rc;
>>>  }
>>> +
>>> +int vpci_reset_device_state(struct pci_dev *pdev)
>>
>> As a target of an indirect call this needs to be annotated cf_check (both
>> here and in the declaration, unlike __must_check, which is sufficient to
>> have on just the declaration).
> OK, will add cf_check in next version.

Which may not be necessary if you go the route suggested above.

>>> --- a/xen/include/xen/pci.h
>>> +++ b/xen/include/xen/pci.h
>>> @@ -156,6 +156,22 @@ struct pci_dev {
>>>      struct vpci *vpci;
>>>  };
>>>  
>>> +struct pci_device_state_reset_method {
>>> +    int (*reset_fn)(struct pci_dev *pdev);
>>> +};
>>> +
>>> +enum pci_device_state_reset_type {
>>> +    DEVICE_RESET_FLR,
>>> +    DEVICE_RESET_COLD,
>>> +    DEVICE_RESET_WARM,
>>> +    DEVICE_RESET_HOT,
>>> +};
>>> +
>>> +struct pci_device_state_reset {
>>> +    struct physdev_pci_device dev;
>>> +    enum pci_device_state_reset_type reset_type;
>>> +};
>>
>> This is the struct to use as hypercall argument. How can it live outside of
>> any public header? Also, when moving it there, beware that you should not
>> use enum-s there. Only handles and fixed-width types are permitted.t
> Yes, I put them there before, but enum is not permitted.
> Then, do you have other suggested type to use to distinguish different types of resets, because enum can't work in the public header?

Do like we do everywhere else: Use a set of #define-s.

>>> --- a/xen/include/xen/vpci.h
>>> +++ b/xen/include/xen/vpci.h
>>> @@ -38,6 +38,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
>>>  
>>>  /* Remove all handlers and free vpci related structures. */
>>>  void vpci_deassign_device(struct pci_dev *pdev);
>>> +int __must_check vpci_reset_device_state(struct pci_dev *pdev);
>>
>> What's the purpose of this __must_check, when the sole caller is calling
>> this through a function pointer, which isn't similarly annotated?
> This is what I added before introducing function pointers, but after modifying the implementation, it was not taken into account.
> I will remove __must_check

Why remove? Is it relevant for the return value to be checked? Or if it
isn't, why would there be a return value?

Jan

> and change to cf_check, according to your above comment.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device
  2024-06-18  8:33       ` Jan Beulich
@ 2024-06-19  3:39         ` Chen, Jiqian
  2024-06-19  7:02           ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-19  3:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/18 16:33, Jan Beulich wrote:
> On 18.06.2024 08:25, Chen, Jiqian wrote:
>> On 2024/6/17 22:17, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> --- a/xen/drivers/pci/physdev.c
>>>> +++ b/xen/drivers/pci/physdev.c
>>>> @@ -2,11 +2,17 @@
>>>>  #include <xen/guest_access.h>
>>>>  #include <xen/hypercall.h>
>>>>  #include <xen/init.h>
>>>> +#include <xen/vpci.h>
>>>>  
>>>>  #ifndef COMPAT
>>>>  typedef long ret_t;
>>>>  #endif
>>>>  
>>>> +static const struct pci_device_state_reset_method
>>>> +                    pci_device_state_reset_methods[] = {
>>>> +    [ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
>>>> +};
>>>
>>> What about the other three DEVICE_RESET_*? In particular ...
>> I don't know how to implement the other three types of reset.
>> This is a design form so that corresponding processing functions can be added later if necessary. Do I need to set them to NULL pointers in this array?
> 
> No.
> 
>> Does this form conform to your previous suggestion of using only one hypercall to handle all types of resets?
> 
> Yes, at least in principle. Question here is: To be on the safe side,
> wouldn't we better reset state for all forms of reset, leaving possible
> relaxation of that for later? At which point the function wouldn't need
> calling indirectly, and instead would be passed the reset type as an
> argument.
If I understood correctly, next version should be?
Use macros to represent different reset types.
Add switch cases in PHYSDEVOP_pci_device_state_reset to handle different reset functions.
Add reset_type as a function parameter to vpci_reset_device_state for possible future use.

+    case PHYSDEVOP_pci_device_state_reset:
+    {
+        struct pci_device_state_reset dev_reset;
+        struct pci_dev *pdev;
+        pci_sbdf_t sbdf;
+
+        if ( !is_pci_passthrough_enabled() )
+            return -EOPNOTSUPP;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
+            break;
+
+        sbdf = PCI_SBDF(dev_reset.dev.seg,
+                        dev_reset.dev.bus,
+                        dev_reset.dev.evfn);
+
+        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+        if ( ret )
+            break;
+
+        pcidevs_lock();
+        pdev = pci_get_pdev(NULL, sbdf);
+        if ( !pdev )
+        {
+            pcidevs_unlock();
+            ret = -ENODEV;
+            break;
+        }
+
+        write_lock(&pdev->domain->pci_lock);
+        pcidevs_unlock();
+        /* Implement FLR, other reset types may be implemented in future */
+        switch ( dev_reset.reset_type )
+        {
+        case PCI_DEVICE_STATE_RESET_COLD:
+        case PCI_DEVICE_STATE_RESET_WARM:
+        case PCI_DEVICE_STATE_RESET_HOT:
+        case PCI_DEVICE_STATE_RESET_FLR:
+            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
+            break;
+        }
+        write_unlock(&pdev->domain->pci_lock);
+
+        if ( ret )
+            dprintk(XENLOG_ERR,
+                    "%pp: failed to reset vPCI device state\n", &sbdf);
+        break;
+    }

> 
>>> Also, nit (further up): Opening figure braces for a new scope go onto their
>> OK, will change in next version.
>>> own line. Then again I notice that apparenly _all_ other instances in this
>>> file are doing it the wrong way, too.
>> Do I need to change them in this patch?
> 
> No.
> 
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>>  
>>>>      return rc;
>>>>  }
>>>> +
>>>> +int vpci_reset_device_state(struct pci_dev *pdev)
>>>
>>> As a target of an indirect call this needs to be annotated cf_check (both
>>> here and in the declaration, unlike __must_check, which is sufficient to
>>> have on just the declaration).
>> OK, will add cf_check in next version.
> 
> Which may not be necessary if you go the route suggested above.
> 
>>>> --- a/xen/include/xen/pci.h
>>>> +++ b/xen/include/xen/pci.h
>>>> @@ -156,6 +156,22 @@ struct pci_dev {
>>>>      struct vpci *vpci;
>>>>  };
>>>>  
>>>> +struct pci_device_state_reset_method {
>>>> +    int (*reset_fn)(struct pci_dev *pdev);
>>>> +};
>>>> +
>>>> +enum pci_device_state_reset_type {
>>>> +    DEVICE_RESET_FLR,
>>>> +    DEVICE_RESET_COLD,
>>>> +    DEVICE_RESET_WARM,
>>>> +    DEVICE_RESET_HOT,
>>>> +};
>>>> +
>>>> +struct pci_device_state_reset {
>>>> +    struct physdev_pci_device dev;
>>>> +    enum pci_device_state_reset_type reset_type;
>>>> +};
>>>
>>> This is the struct to use as hypercall argument. How can it live outside of
>>> any public header? Also, when moving it there, beware that you should not
>>> use enum-s there. Only handles and fixed-width types are permitted.t
>> Yes, I put them there before, but enum is not permitted.
>> Then, do you have other suggested type to use to distinguish different types of resets, because enum can't work in the public header?
> 
> Do like we do everywhere else: Use a set of #define-s.
> 
>>>> --- a/xen/include/xen/vpci.h
>>>> +++ b/xen/include/xen/vpci.h
>>>> @@ -38,6 +38,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
>>>>  
>>>>  /* Remove all handlers and free vpci related structures. */
>>>>  void vpci_deassign_device(struct pci_dev *pdev);
>>>> +int __must_check vpci_reset_device_state(struct pci_dev *pdev);
>>>
>>> What's the purpose of this __must_check, when the sole caller is calling
>>> this through a function pointer, which isn't similarly annotated?
>> This is what I added before introducing function pointers, but after modifying the implementation, it was not taken into account.
>> I will remove __must_check
> 
> Why remove? Is it relevant for the return value to be checked? Or if it
> isn't, why would there be a return value?
> 
> Jan
> 
>> and change to cf_check, according to your above comment.
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device
  2024-06-19  3:39         ` Chen, Jiqian
@ 2024-06-19  7:02           ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2024-06-19  7:02 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 19.06.2024 05:39, Chen, Jiqian wrote:
> On 2024/6/18 16:33, Jan Beulich wrote:
>> On 18.06.2024 08:25, Chen, Jiqian wrote:
>>> On 2024/6/17 22:17, Jan Beulich wrote:
>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>> --- a/xen/drivers/pci/physdev.c
>>>>> +++ b/xen/drivers/pci/physdev.c
>>>>> @@ -2,11 +2,17 @@
>>>>>  #include <xen/guest_access.h>
>>>>>  #include <xen/hypercall.h>
>>>>>  #include <xen/init.h>
>>>>> +#include <xen/vpci.h>
>>>>>  
>>>>>  #ifndef COMPAT
>>>>>  typedef long ret_t;
>>>>>  #endif
>>>>>  
>>>>> +static const struct pci_device_state_reset_method
>>>>> +                    pci_device_state_reset_methods[] = {
>>>>> +    [ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
>>>>> +};
>>>>
>>>> What about the other three DEVICE_RESET_*? In particular ...
>>> I don't know how to implement the other three types of reset.
>>> This is a design form so that corresponding processing functions can be added later if necessary. Do I need to set them to NULL pointers in this array?
>>
>> No.
>>
>>> Does this form conform to your previous suggestion of using only one hypercall to handle all types of resets?
>>
>> Yes, at least in principle. Question here is: To be on the safe side,
>> wouldn't we better reset state for all forms of reset, leaving possible
>> relaxation of that for later? At which point the function wouldn't need
>> calling indirectly, and instead would be passed the reset type as an
>> argument.
> If I understood correctly, next version should be?
> Use macros to represent different reset types.
> Add switch cases in PHYSDEVOP_pci_device_state_reset to handle different reset functions.
> Add reset_type as a function parameter to vpci_reset_device_state for possible future use.
> 
> +    case PHYSDEVOP_pci_device_state_reset:
> +    {
> +        struct pci_device_state_reset dev_reset;
> +        struct pci_dev *pdev;
> +        pci_sbdf_t sbdf;
> +
> +        if ( !is_pci_passthrough_enabled() )
> +            return -EOPNOTSUPP;
> +
> +        ret = -EFAULT;
> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
> +            break;
> +
> +        sbdf = PCI_SBDF(dev_reset.dev.seg,
> +                        dev_reset.dev.bus,
> +                        dev_reset.dev.evfn);
> +
> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
> +        if ( ret )
> +            break;
> +
> +        pcidevs_lock();
> +        pdev = pci_get_pdev(NULL, sbdf);
> +        if ( !pdev )
> +        {
> +            pcidevs_unlock();
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        write_lock(&pdev->domain->pci_lock);
> +        pcidevs_unlock();
> +        /* Implement FLR, other reset types may be implemented in future */
> +        switch ( dev_reset.reset_type )
> +        {
> +        case PCI_DEVICE_STATE_RESET_COLD:
> +        case PCI_DEVICE_STATE_RESET_WARM:
> +        case PCI_DEVICE_STATE_RESET_HOT:
> +        case PCI_DEVICE_STATE_RESET_FLR:
> +            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
> +            break;
> +        }

If you use a switch() here, then there wants to be a default case returning
e.g. -EOPNOTSUPP or -EINVAL. Else the switch wants dropping. I'm not sure
which one's better in this specific case, I'm only slightly tending towards
the former.

In any event the comment (if any) wants to reflect what the actual code does.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-06-17  9:00 [XEN PATCH v10 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
  2024-06-17  9:00 ` [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device Jiqian Chen
@ 2024-06-17  9:00 ` Jiqian Chen
  2024-06-17 14:45   ` Jan Beulich
  2024-06-17  9:00 ` [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 48+ messages in thread
From: Jiqian Chen @ 2024-06-17  9:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Huang Rui,
	Jiqian Chen, Huang Rui

If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when subject domain has no
PIRQ flag.

So that domU with PIRQ flag can success to map pirq for
passthrough devices even dom0 has no PIRQ flag.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/arch/x86/hvm/hypercall.c |  6 ++++++
 xen/arch/x86/physdev.c       | 14 ++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0fab670a4871..03ada3c880bd 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd )
     {
+        /*
+        * Only being permitted for management of other domains.
+        * Further restrictions are enforced in do_physdev_op.
+        */
     case PHYSDEVOP_map_pirq:
     case PHYSDEVOP_unmap_pirq:
+        break;
+
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index d6dd622952a9..f38cc22c872e 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -323,6 +323,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( !d )
             break;
 
+        /* Prevent self-map when currd has no X86_EMU_USE_PIRQ flag */
+        if ( is_hvm_domain(d) && !has_pirq(d) && d == currd )
+        {
+            rcu_unlock_domain(d);
+            return -EOPNOTSUPP;
+        }
+
         ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
 
         rcu_unlock_domain(d);
@@ -346,6 +353,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( !d )
             break;
 
+        /* Prevent self-unmap when currd has no X86_EMU_USE_PIRQ flag */
+        if ( is_hvm_domain(d) && !has_pirq(d) && d == currd )
+        {
+            rcu_unlock_domain(d);
+            return -EOPNOTSUPP;
+        }
+
         ret = physdev_unmap_pirq(d, unmap.pirq);
 
         rcu_unlock_domain(d);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-06-17  9:00 ` [XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
@ 2024-06-17 14:45   ` Jan Beulich
  2024-06-18  6:49     ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-17 14:45 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel

On 17.06.2024 11:00, Jiqian Chen wrote:
> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
> a passthrough device by using gsi, see qemu code
> xen_pt_realize->xc_physdev_map_pirq and libxl code
> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
> is not allowed because currd is PVH dom0 and PVH has no
> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
> 
> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq.

Why "failed path"? Isn't unmapping also part of normal device removal
from a guest?

> And
> add a new check to prevent self map when subject domain has no
> PIRQ flag.

You still talk of only self mapping, and the code also still does only
that. As pointed out before: Why would you allow mapping into a PVH
DomU? IOW what purpose do the "d == currd" checks have?

> So that domU with PIRQ flag can success to map pirq for
> passthrough devices even dom0 has no PIRQ flag.

There's still a description problem here. Much like the first sentence,
this last one also says that the guest would itself map the pIRQ. In
which case there would still not be any reason to expose the sub-
functions to Dom0.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-06-17 14:45   ` Jan Beulich
@ 2024-06-18  6:49     ` Chen, Jiqian
  2024-06-18  8:38       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-18  6:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/17 22:45, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see qemu code
>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq.
> 
> Why "failed path"? Isn't unmapping also part of normal device removal
> from a guest?
Yes, both. I will change to also "allow PHYSDEVOP_unmap_pirq for the device removal path to unmap pirq".

> 
>> And
>> add a new check to prevent self map when subject domain has no
>> PIRQ flag.
> 
> You still talk of only self mapping, and the code also still does only
> that. As pointed out before: Why would you allow mapping into a PVH
> DomU? IOW what purpose do the "d == currd" checks have?
The checking I added has two purpose, first is I need to allow this case:
Dom0(without PIRQ) + DomU(with PIRQ), because the original code just do (!has_pirq(currd)) will cause map_pirq fail in this case.
Second I need to disallow self-mapping:
DomU(without PIRQ) do map_pirq, the "d==currd" means the currd is the subject domain itself.

Emmm, I think I know what's your concern.
Do you mean I need to
" Prevent map_pirq when currd has no X86_EMU_USE_PIRQ flag "
instead of
" Prevent self-map when currd has no X86_EMU_USE_PIRQ flag ",
so I need to remove "d==currd", right?

> 
>> So that domU with PIRQ flag can success to map pirq for
>> passthrough devices even dom0 has no PIRQ flag.
> 
> There's still a description problem here. Much like the first sentence,
> this last one also says that the guest would itself map the pIRQ. In
> which case there would still not be any reason to expose the sub-
> functions to Dom0.
If change to " So that the interrupt of a passthrough device can success to be mapped to pirq for domU with PIRQ flag when dom0 is PVH.",
Is it OK?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-06-18  6:49     ` Chen, Jiqian
@ 2024-06-18  8:38       ` Jan Beulich
  2024-06-19  5:35         ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-18  8:38 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 18.06.2024 08:49, Chen, Jiqian wrote:
> On 2024/6/17 22:45, Jan Beulich wrote:
>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>>> a passthrough device by using gsi, see qemu code
>>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>>> is not allowed because currd is PVH dom0 and PVH has no
>>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>>
>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq.
>>
>> Why "failed path"? Isn't unmapping also part of normal device removal
>> from a guest?
> Yes, both. I will change to also "allow PHYSDEVOP_unmap_pirq for the device removal path to unmap pirq".
> 
>>
>>> And
>>> add a new check to prevent self map when subject domain has no
>>> PIRQ flag.
>>
>> You still talk of only self mapping, and the code also still does only
>> that. As pointed out before: Why would you allow mapping into a PVH
>> DomU? IOW what purpose do the "d == currd" checks have?
> The checking I added has two purpose, first is I need to allow this case:
> Dom0(without PIRQ) + DomU(with PIRQ), because the original code just do (!has_pirq(currd)) will cause map_pirq fail in this case.
> Second I need to disallow self-mapping:
> DomU(without PIRQ) do map_pirq, the "d==currd" means the currd is the subject domain itself.
> 
> Emmm, I think I know what's your concern.
> Do you mean I need to
> " Prevent map_pirq when currd has no X86_EMU_USE_PIRQ flag "
> instead of
> " Prevent self-map when currd has no X86_EMU_USE_PIRQ flag ",

No. What I mean is that I continue to fail to see why you mention "currd".
IOW it would be more like "prevent mapping when the subject domain has no
X86_EMU_USE_PIRQ" (which, as a specific sub-case, includes self-mapping
if the caller specifies DOMID_SELF for the subject domain).

> so I need to remove "d==currd", right?

Removing this check is what I'm after, yes. Yet that's not in sync with
either of the two quoted sentences above.

>>> So that domU with PIRQ flag can success to map pirq for
>>> passthrough devices even dom0 has no PIRQ flag.
>>
>> There's still a description problem here. Much like the first sentence,
>> this last one also says that the guest would itself map the pIRQ. In
>> which case there would still not be any reason to expose the sub-
>> functions to Dom0.
> If change to " So that the interrupt of a passthrough device can success to be mapped to pirq for domU with PIRQ flag when dom0 is PVH.",
> Is it OK?

Kind of, yes. "can be successfully mapped" is one of the various possibilities
of making this read a little more smoothly.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-06-18  8:38       ` Jan Beulich
@ 2024-06-19  5:35         ` Chen, Jiqian
  0 siblings, 0 replies; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-19  5:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/18 16:38, Jan Beulich wrote:
> On 18.06.2024 08:49, Chen, Jiqian wrote:
>> On 2024/6/17 22:45, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>>>> a passthrough device by using gsi, see qemu code
>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>>>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>>>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>>>> is not allowed because currd is PVH dom0 and PVH has no
>>>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>>>
>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq.
>>>
>>> Why "failed path"? Isn't unmapping also part of normal device removal
>>> from a guest?
>> Yes, both. I will change to also "allow PHYSDEVOP_unmap_pirq for the device removal path to unmap pirq".
>>
>>>
>>>> And
>>>> add a new check to prevent self map when subject domain has no
>>>> PIRQ flag.
>>>
>>> You still talk of only self mapping, and the code also still does only
>>> that. As pointed out before: Why would you allow mapping into a PVH
>>> DomU? IOW what purpose do the "d == currd" checks have?
>> The checking I added has two purpose, first is I need to allow this case:
>> Dom0(without PIRQ) + DomU(with PIRQ), because the original code just do (!has_pirq(currd)) will cause map_pirq fail in this case.
>> Second I need to disallow self-mapping:
>> DomU(without PIRQ) do map_pirq, the "d==currd" means the currd is the subject domain itself.
>>
>> Emmm, I think I know what's your concern.
>> Do you mean I need to
>> " Prevent map_pirq when currd has no X86_EMU_USE_PIRQ flag "
>> instead of
>> " Prevent self-map when currd has no X86_EMU_USE_PIRQ flag ",
> 
> No. What I mean is that I continue to fail to see why you mention "currd".
> IOW it would be more like "prevent mapping when the subject domain has no
> X86_EMU_USE_PIRQ" (which, as a specific sub-case, includes self-mapping
> if the caller specifies DOMID_SELF for the subject domain).
Oh, I see, not only to prevent self-mapping, but if the subject domain has no PIRQs, we should reject, self-mapping is just the one sub case.

> 
>> so I need to remove "d==currd", right?
> 
> Removing this check is what I'm after, yes. Yet that's not in sync with
> either of the two quoted sentences above.
> 
>>>> So that domU with PIRQ flag can success to map pirq for
>>>> passthrough devices even dom0 has no PIRQ flag.
>>>
>>> There's still a description problem here. Much like the first sentence,
>>> this last one also says that the guest would itself map the pIRQ. In
>>> which case there would still not be any reason to expose the sub-
>>> functions to Dom0.
>> If change to " So that the interrupt of a passthrough device can success to be mapped to pirq for domU with PIRQ flag when dom0 is PVH.",
>> Is it OK?
> 
> Kind of, yes. "can be successfully mapped" is one of the various possibilities
> of making this read a little more smoothly.
OK.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-06-17  9:00 [XEN PATCH v10 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
  2024-06-17  9:00 ` [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device Jiqian Chen
  2024-06-17  9:00 ` [XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
@ 2024-06-17  9:00 ` Jiqian Chen
  2024-06-17 14:52   ` Jan Beulich
  2024-06-17  9:00 ` [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev Jiqian Chen
  2024-06-17  9:00 ` [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi Jiqian Chen
  4 siblings, 1 reply; 48+ messages in thread
From: Jiqian Chen @ 2024-06-17  9:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Huang Rui,
	Jiqian Chen, Huang Rui

The gsi of a passthrough device must be configured for it to be
able to be mapped into a hvm domU.
But When dom0 is PVH, the gsis don't get registered, it causes
the info of apic, pin and irq not be added into irq_2_pin list,
and the handler of irq_desc is not set, then when passthrough a
device, setting ioapic affinity and vector will fail.

To fix above problem, on Linux kernel side, a new code will
need to call PHYSDEVOP_setup_gsi for passthrough devices to
register gsi when dom0 is PVH.

So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
purpose.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
The code link that will call this hypercall on linux kernel side is as follows:
https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 03ada3c880bd..cfe82d0f96ed 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -ENOSYS;
         break;
 
+    case PHYSDEVOP_setup_gsi:
     case PHYSDEVOP_pci_mmcfg_reserved:
     case PHYSDEVOP_pci_device_add:
     case PHYSDEVOP_pci_device_remove:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-06-17  9:00 ` [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
@ 2024-06-17 14:52   ` Jan Beulich
  2024-06-18  6:57     ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-17 14:52 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel

On 17.06.2024 11:00, Jiqian Chen wrote:
> The gsi of a passthrough device must be configured for it to be
> able to be mapped into a hvm domU.
> But When dom0 is PVH, the gsis don't get registered, it causes
> the info of apic, pin and irq not be added into irq_2_pin list,
> and the handler of irq_desc is not set, then when passthrough a
> device, setting ioapic affinity and vector will fail.
> 
> To fix above problem, on Linux kernel side, a new code will
> need to call PHYSDEVOP_setup_gsi for passthrough devices to
> register gsi when dom0 is PVH.
> 
> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
> purpose.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> The code link that will call this hypercall on linux kernel side is as follows:
> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/

One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
one here:
"As to GSIs not being registered: If that's not a problem for Dom0's own
 operation, I think it'll also want/need explaining why what is sufficient for
 Dom0 alone isn't sufficient when pass-through comes into play."

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-06-17 14:52   ` Jan Beulich
@ 2024-06-18  6:57     ` Chen, Jiqian
  2024-06-18  8:55       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-18  6:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/17 22:52, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> The gsi of a passthrough device must be configured for it to be
>> able to be mapped into a hvm domU.
>> But When dom0 is PVH, the gsis don't get registered, it causes
>> the info of apic, pin and irq not be added into irq_2_pin list,
>> and the handler of irq_desc is not set, then when passthrough a
>> device, setting ioapic affinity and vector will fail.
>>
>> To fix above problem, on Linux kernel side, a new code will
>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>> register gsi when dom0 is PVH.
>>
>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>> purpose.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> The code link that will call this hypercall on linux kernel side is as follows:
>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
> 
> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
> one here:
> "As to GSIs not being registered: If that's not a problem for Dom0's own
>  operation, I think it'll also want/need explaining why what is sufficient for
>  Dom0 alone isn't sufficient when pass-through comes into play."
I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
" it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
What description do you want me to add?

> 
> Jan
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-06-18  6:57     ` Chen, Jiqian
@ 2024-06-18  8:55       ` Jan Beulich
  2024-06-19  7:53         ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-18  8:55 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 18.06.2024 08:57, Chen, Jiqian wrote:
> On 2024/6/17 22:52, Jan Beulich wrote:
>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>> The gsi of a passthrough device must be configured for it to be
>>> able to be mapped into a hvm domU.
>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>> and the handler of irq_desc is not set, then when passthrough a
>>> device, setting ioapic affinity and vector will fail.
>>>
>>> To fix above problem, on Linux kernel side, a new code will
>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>> register gsi when dom0 is PVH.
>>>
>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>> purpose.
>>>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> ---
>>> The code link that will call this hypercall on linux kernel side is as follows:
>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>
>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>> one here:
>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>  operation, I think it'll also want/need explaining why what is sufficient for
>>  Dom0 alone isn't sufficient when pass-through comes into play."
> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
> What description do you want me to add?

What I'd first like to have clarification on (i.e. before putting it in
the description one way or another): How come Dom0 alone gets away fine
without making the call, yet for passthrough-to-DomU it's needed? Is it
perhaps that it just so happened that for Dom0 things have been working
on systems where it was tested, but the call should in principle have been
there in this case, too [1]? That (to me at least) would make quite a
difference for both this patch's description and us accepting it.

Jan

[1] Alternative e.g. being that because of other actions PVH Dom0 takes,
like the IO-APIC RTE programming it does for IRQs it wants to use for
itself, the necessary information is already suitably conveyed to Xen in
that case. In such a case imo it's relevant to mention in the description.
Not the least because iirc the pciback driver sets up a fake IRQ handler
in such cases, which ought to lead to similar IO-APIC RTE programming, at
which point the question would again arise why the hypercall needs
exposing.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-06-18  8:55       ` Jan Beulich
@ 2024-06-19  7:53         ` Chen, Jiqian
  2024-06-19  8:06           ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-19  7:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/18 16:55, Jan Beulich wrote:
> On 18.06.2024 08:57, Chen, Jiqian wrote:
>> On 2024/6/17 22:52, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> The gsi of a passthrough device must be configured for it to be
>>>> able to be mapped into a hvm domU.
>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>> and the handler of irq_desc is not set, then when passthrough a
>>>> device, setting ioapic affinity and vector will fail.
>>>>
>>>> To fix above problem, on Linux kernel side, a new code will
>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>> register gsi when dom0 is PVH.
>>>>
>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>> purpose.
>>>>
>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>> ---
>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>
>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>> one here:
>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>> What description do you want me to add?
> 
> What I'd first like to have clarification on (i.e. before putting it in
> the description one way or another): How come Dom0 alone gets away fine
> without making the call, yet for passthrough-to-DomU it's needed? Is it
> perhaps that it just so happened that for Dom0 things have been working
> on systems where it was tested, but the call should in principle have been
> there in this case, too [1]? That (to me at least) would make quite a
> difference for both this patch's description and us accepting it.
Oh, I think I know what's your concern now. Thanks.
First question, why gsi of device can work on PVH dom0:
Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
Second question, why gsi of passthrough can't work on PVH dom0:
Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.

> 
> Jan
> 
> [1] Alternative e.g. being that because of other actions PVH Dom0 takes,
> like the IO-APIC RTE programming it does for IRQs it wants to use for
> itself, the necessary information is already suitably conveyed to Xen in
> that case. In such a case imo it's relevant to mention in the description.
> Not the least because iirc the pciback driver sets up a fake IRQ handler
> in such cases, which ought to lead to similar IO-APIC RTE programming, at
> which point the question would again arise why the hypercall needs
> exposing.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-06-19  7:53         ` Chen, Jiqian
@ 2024-06-19  8:06           ` Jan Beulich
  2024-06-19  8:51             ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-19  8:06 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 19.06.2024 09:53, Chen, Jiqian wrote:
> On 2024/6/18 16:55, Jan Beulich wrote:
>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>> The gsi of a passthrough device must be configured for it to be
>>>>> able to be mapped into a hvm domU.
>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>> device, setting ioapic affinity and vector will fail.
>>>>>
>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>> register gsi when dom0 is PVH.
>>>>>
>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>> purpose.
>>>>>
>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>> ---
>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>
>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>> one here:
>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>> What description do you want me to add?
>>
>> What I'd first like to have clarification on (i.e. before putting it in
>> the description one way or another): How come Dom0 alone gets away fine
>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>> perhaps that it just so happened that for Dom0 things have been working
>> on systems where it was tested, but the call should in principle have been
>> there in this case, too [1]? That (to me at least) would make quite a
>> difference for both this patch's description and us accepting it.
> Oh, I think I know what's your concern now. Thanks.
> First question, why gsi of device can work on PVH dom0:
> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
> Second question, why gsi of passthrough can't work on PVH dom0:
> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.

And why exactly would the fake IRQ handler not be set up by pciback? Its
setting up ought to lead to those same IO-APIC RTE writes that Xen
intercepts.

In any event, imo a summary of the above wants to be part of the patch
description.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-06-19  8:06           ` Jan Beulich
@ 2024-06-19  8:51             ` Chen, Jiqian
  2024-06-19  9:49               ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-19  8:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/19 16:06, Jan Beulich wrote:
> On 19.06.2024 09:53, Chen, Jiqian wrote:
>> On 2024/6/18 16:55, Jan Beulich wrote:
>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>> able to be mapped into a hvm domU.
>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>
>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>> register gsi when dom0 is PVH.
>>>>>>
>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>> purpose.
>>>>>>
>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>> ---
>>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>>
>>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>>> one here:
>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>>> What description do you want me to add?
>>>
>>> What I'd first like to have clarification on (i.e. before putting it in
>>> the description one way or another): How come Dom0 alone gets away fine
>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>> perhaps that it just so happened that for Dom0 things have been working
>>> on systems where it was tested, but the call should in principle have been
>>> there in this case, too [1]? That (to me at least) would make quite a
>>> difference for both this patch's description and us accepting it.
>> Oh, I think I know what's your concern now. Thanks.
>> First question, why gsi of device can work on PVH dom0:
>> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>> Second question, why gsi of passthrough can't work on PVH dom0:
>> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.
> 
> And why exactly would the fake IRQ handler not be set up by pciback? Its
> setting up ought to lead to those same IO-APIC RTE writes that Xen
> intercepts.
Because isr_on is not set, when xen_pcibk_control_isr is called, it will return due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.
And it seems isr_on is set through driver sysfs " irq_handler_state" for a level device that is to be shared with guest and the IRQ is shared with the initial domain.

> 
> In any event, imo a summary of the above wants to be part of the patch
> description.
OK, will add into the commit message in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-06-19  8:51             ` Chen, Jiqian
@ 2024-06-19  9:49               ` Jan Beulich
  2024-06-19 10:10                 ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-19  9:49 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 19.06.2024 10:51, Chen, Jiqian wrote:
> On 2024/6/19 16:06, Jan Beulich wrote:
>> On 19.06.2024 09:53, Chen, Jiqian wrote:
>>> On 2024/6/18 16:55, Jan Beulich wrote:
>>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>>> able to be mapped into a hvm domU.
>>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>>
>>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>>> register gsi when dom0 is PVH.
>>>>>>>
>>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>>> purpose.
>>>>>>>
>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>> ---
>>>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>>>
>>>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>>>> one here:
>>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>>>> What description do you want me to add?
>>>>
>>>> What I'd first like to have clarification on (i.e. before putting it in
>>>> the description one way or another): How come Dom0 alone gets away fine
>>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>>> perhaps that it just so happened that for Dom0 things have been working
>>>> on systems where it was tested, but the call should in principle have been
>>>> there in this case, too [1]? That (to me at least) would make quite a
>>>> difference for both this patch's description and us accepting it.
>>> Oh, I think I know what's your concern now. Thanks.
>>> First question, why gsi of device can work on PVH dom0:
>>> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>>> Second question, why gsi of passthrough can't work on PVH dom0:
>>> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.
>>
>> And why exactly would the fake IRQ handler not be set up by pciback? Its
>> setting up ought to lead to those same IO-APIC RTE writes that Xen
>> intercepts.
> Because isr_on is not set, when xen_pcibk_control_isr is called, it will return due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.

I'm afraid I don't follow you here. Quoting from the function:

	enable =  dev_data->enable_intx;

	/* Asked to disable, but ISR isn't runnig */
	if (!enable && !dev_data->isr_on)
		return;

I.e. we bail if the request was to _disable_ and there is no ISR.

I can't exclude though that command_write()'s logic to set ->enable_intx
is insufficient. But in the common case one would surely expect at least
one of PCI_COMMAND_MEMORY and PCI_COMMAND_IO to be set first by a guest.
IOW at some point I'd expect xen_pcibk_control_isr() to be called with
the second argument 0 and with ->enable_intx set.

> And it seems isr_on is set through driver sysfs " irq_handler_state" for a level device that is to be shared with guest and the IRQ is shared with the initial domain.

The sysfs interface is, according to my reading of the description
of the commit introducing it, merely for debugging / recovery purposes.
(It also looks to me as if this was partly broken: If one would use it,
thus clearing ->isr_on, a subsequent disable request would take exactly
that early bailing path quoted above, with nothing removing the IRQ
handler.)

That description also talks about both an edge vs level distinction in
behavior and one for shared vs non-shared, but neither in that commit
nor in present code I can spot any respective checks. Otherwise I could
understand that there are cases where the necessary information isn't
propagated to Xen.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-06-19  9:49               ` Jan Beulich
@ 2024-06-19 10:10                 ` Chen, Jiqian
  0 siblings, 0 replies; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-19 10:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/19 17:49, Jan Beulich wrote:
> On 19.06.2024 10:51, Chen, Jiqian wrote:
>> On 2024/6/19 16:06, Jan Beulich wrote:
>>> On 19.06.2024 09:53, Chen, Jiqian wrote:
>>>> On 2024/6/18 16:55, Jan Beulich wrote:
>>>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>>>> able to be mapped into a hvm domU.
>>>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>>>
>>>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>>>> register gsi when dom0 is PVH.
>>>>>>>>
>>>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>>>> purpose.
>>>>>>>>
>>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>>> ---
>>>>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>>>>
>>>>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>>>>> one here:
>>>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>>>>> What description do you want me to add?
>>>>>
>>>>> What I'd first like to have clarification on (i.e. before putting it in
>>>>> the description one way or another): How come Dom0 alone gets away fine
>>>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>>>> perhaps that it just so happened that for Dom0 things have been working
>>>>> on systems where it was tested, but the call should in principle have been
>>>>> there in this case, too [1]? That (to me at least) would make quite a
>>>>> difference for both this patch's description and us accepting it.
>>>> Oh, I think I know what's your concern now. Thanks.
>>>> First question, why gsi of device can work on PVH dom0:
>>>> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>>>> Second question, why gsi of passthrough can't work on PVH dom0:
>>>> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.
>>>
>>> And why exactly would the fake IRQ handler not be set up by pciback? Its
>>> setting up ought to lead to those same IO-APIC RTE writes that Xen
>>> intercepts.
>> Because isr_on is not set, when xen_pcibk_control_isr is called, it will return due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.
> 
> I'm afraid I don't follow you here. Quoting from the function:
> 
> 	enable =  dev_data->enable_intx;
> 
> 	/* Asked to disable, but ISR isn't runnig */
> 	if (!enable && !dev_data->isr_on)
> 		return;
> 
> I.e. we bail if the request was to _disable_ and there is no ISR.
I mean, after debugging the pcistub_probe callstack:
pcistub_seize-> pcistub_init_device-> xen_pcibk_reset_device-> xen_pcibk_control_isr(dev, 1 /*reset device*/)
and in xen_pcibk_control_isr code:
	if (reset) {
		dev_data->enable_intx = 0;
		dev_data->ack_intr = 0;
	}
	enable =  dev_data->enable_intx;

	/* Asked to disable, but ISR isn't runnig */
	if (!enable && !dev_data->isr_on)
		return;
"reset" is 1, so "dev_date->enable_intx" is set 0, then "enable" is 0, and then arrive the second "if" check, "enable" is 0 and "dev_date->isr_on" is also 0, so it return here.
It can't reach following code to install irq handle.

> 
> I can't exclude though that command_write()'s logic to set ->enable_intx
> is insufficient. But in the common case one would surely expect at least
> one of PCI_COMMAND_MEMORY and PCI_COMMAND_IO to be set first by a guest.
> IOW at some point I'd expect xen_pcibk_control_isr() to be called with
> the second argument 0 and with ->enable_intx set.
I only see xen_pcibk_control_isr(dev, 0) is called in xen_pcibk_do_one_op, but xen_pcibk_do_one_op is not called during assigning a device to be passthrough.

> 
>> And it seems isr_on is set through driver sysfs " irq_handler_state" for a level device that is to be shared with guest and the IRQ is shared with the initial domain.
> 
> The sysfs interface is, according to my reading of the description
> of the commit introducing it, merely for debugging / recovery purposes.
> (It also looks to me as if this was partly broken: If one would use it,
> thus clearing ->isr_on, a subsequent disable request would take exactly
> that early bailing path quoted above, with nothing removing the IRQ
> handler.)
> 
> That description also talks about both an edge vs level distinction in
> behavior and one for shared vs non-shared, but neither in that commit
> nor in present code I can spot any respective checks. Otherwise I could
> understand that there are cases where the necessary information isn't
> propagated to Xen.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-17  9:00 [XEN PATCH v10 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
                   ` (2 preceding siblings ...)
  2024-06-17  9:00 ` [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
@ 2024-06-17  9:00 ` Jiqian Chen
  2024-06-17 15:10   ` Jan Beulich
  2024-06-20 14:38   ` Anthony PERARD
  2024-06-17  9:00 ` [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi Jiqian Chen
  4 siblings, 2 replies; 48+ messages in thread
From: Jiqian Chen @ 2024-06-17  9:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Huang Rui,
	Jiqian Chen, Huang Rui

In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
irq number is alloced from small to large, but the applying
gsi number is not, may gsi 38 comes before gsi 28, that
causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use its gsi number
to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices/<sbdf>/irq, so it will fail when mapping.
And in current codes, there is no method to get gsi for
userspace.

For above purpose, add new function to get gsi. And call this
function before xc_physdev_(un)map_pirq

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
---
RFC: it needs review and needs to wait for the corresponding third patch on linux kernel side to be merged.
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +++++
 tools/include/xencall.h               |  2 ++
 tools/include/xenctrl.h               |  2 ++
 tools/libs/call/core.c                |  5 ++++
 tools/libs/call/libxencall.map        |  2 ++
 tools/libs/call/linux.c               | 15 +++++++++++
 tools/libs/call/private.h             |  9 +++++++
 tools/libs/ctrl/xc_physdev.c          |  4 +++
 tools/libs/light/Makefile             |  2 +-
 tools/libs/light/libxl_pci.c          | 38 +++++++++++++++++++++++++++
 10 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..977f1a058797 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
 	__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_dev {
+	__u32 sbdf;
+	int gsi;
+} privcmd_gsi_from_dev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
 	_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE				\
 	_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_DEV				\
+	_IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_dev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED				\
 	_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xencall.h b/tools/include/xencall.h
index fc95ed0fe58e..750aab070323 100644
--- a/tools/include/xencall.h
+++ b/tools/include/xencall.h
@@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
              uint64_t arg1, uint64_t arg2, uint64_t arg3,
              uint64_t arg4, uint64_t arg5);
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
+
 /* Variant(s) of the above, as needed, returning "long" instead of "int". */
 long xencall2L(xencall_handle *xcall, unsigned int op,
                uint64_t arg1, uint64_t arg2);
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9ceca0cffc2f..a0381f74d24b 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
                           uint32_t domid,
                           int pirq);
 
+int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
index 02c4f8e1aefa..6dae50c9a6ba 100644
--- a/tools/libs/call/core.c
+++ b/tools/libs/call/core.c
@@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
     return osdep_hypercall(xcall, &call);
 }
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf)
+{
+    return osdep_oscall(xcall, sbdf);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
index d18a3174e9dc..b92a0b5dc12c 100644
--- a/tools/libs/call/libxencall.map
+++ b/tools/libs/call/libxencall.map
@@ -10,6 +10,8 @@ VERS_1.0 {
 		xencall4;
 		xencall5;
 
+		xen_oscall_gsi_from_dev;
+
 		xencall_alloc_buffer;
 		xencall_free_buffer;
 		xencall_alloc_buffer_pages;
diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
index 6d588e6bea8f..92c740e176f2 100644
--- a/tools/libs/call/linux.c
+++ b/tools/libs/call/linux.c
@@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall)
     return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
 }
 
+int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
+{
+    privcmd_gsi_from_dev_t dev_gsi = {
+        .sbdf = sbdf,
+        .gsi = -1,
+    };
+
+    if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, &dev_gsi)) {
+        PERROR("failed to get gsi from dev");
+        return -1;
+    }
+
+    return dev_gsi.gsi;
+}
+
 static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages)
 {
     void *p;
diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h
index 9c3aa432efe2..cd6eb5a3e66f 100644
--- a/tools/libs/call/private.h
+++ b/tools/libs/call/private.h
@@ -57,6 +57,15 @@ int osdep_xencall_close(xencall_handle *xcall);
 
 long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall);
 
+#if defined(__linux__)
+int osdep_oscall(xencall_handle *xcall, unsigned int sbdf);
+#else
+static inline int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
+{
+    return -1;
+}
+#endif
+
 void *osdep_alloc_pages(xencall_handle *xcall, size_t nr_pages);
 void osdep_free_pages(xencall_handle *xcall, void *p, size_t nr_pages);
 
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779ce8..c1458f3a38b5 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -111,3 +111,7 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
     return rc;
 }
 
+int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf)
+{
+    return xen_oscall_gsi_from_dev(xch->xcall, sbdf);
+}
diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile
index 37e4d1670986..6b616d5ee9b6 100644
--- a/tools/libs/light/Makefile
+++ b/tools/libs/light/Makefile
@@ -40,7 +40,7 @@ OBJS-$(CONFIG_X86) += $(ACPI_OBJS)
 
 CFLAGS += -Wno-format-zero-length -Wmissing-declarations -Wformat-nonliteral
 
-CFLAGS-$(CONFIG_X86) += -DCONFIG_PCI_SUPP_LEGACY_IRQ
+CFLAGS-$(CONFIG_X86) += -DCONFIG_PCI_SUPP_LEGACY_IRQ -DCONFIG_X86
 
 OBJS-$(CONFIG_X86) += libxl_cpuid.o
 OBJS-$(CONFIG_X86) += libxl_x86.o
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..376f91759ac6 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
 #endif
 }
 
+#define PCI_DEVID(bus, devfn)\
+            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
+
+#define PCI_SBDF(seg, bus, devfn) \
+            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
+
 static void pci_add_dm_done(libxl__egc *egc,
                             pci_add_state *pas,
                             int rc)
@@ -1418,6 +1424,10 @@ static void pci_add_dm_done(libxl__egc *egc,
     unsigned long long start, end, flags, size;
     int irq, i;
     int r;
+#ifdef CONFIG_X86
+    int gsi;
+    uint32_t sbdf;
+#endif
     uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
     uint32_t domainid = domid;
     bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
@@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
         goto out_no_irq;
     }
     if ((fscanf(f, "%u", &irq) == 1) && irq) {
+#ifdef CONFIG_X86
+        sbdf = PCI_SBDF(pci->domain, pci->bus,
+                        (PCI_DEVFN(pci->dev, pci->func)));
+        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
+        /*
+         * Old kernel version may not support this function,
+         * so if fail, keep using irq; if success, use gsi
+         */
+        if (gsi > 0) {
+            irq = gsi;
+        }
+#endif
         r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
         if (r < 0) {
             LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
@@ -2172,6 +2194,10 @@ static void pci_remove_detached(libxl__egc *egc,
     int  irq = 0, i, stubdomid = 0;
     const char *sysfs_path;
     FILE *f;
+#ifdef CONFIG_X86
+    int gsi;
+    uint32_t sbdf;
+#endif
     uint32_t domainid = prs->domid;
     bool isstubdom;
 
@@ -2239,6 +2265,18 @@ skip_bar:
     }
 
     if ((fscanf(f, "%u", &irq) == 1) && irq) {
+#ifdef CONFIG_X86
+        sbdf = PCI_SBDF(pci->domain, pci->bus,
+                        (PCI_DEVFN(pci->dev, pci->func)));
+        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
+        /*
+         * Old kernel version may not support this function,
+         * so if fail, keep using irq; if success, use gsi
+         */
+        if (gsi > 0) {
+            irq = gsi;
+        }
+#endif
         rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
         if (rc < 0) {
             /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-17  9:00 ` [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev Jiqian Chen
@ 2024-06-17 15:10   ` Jan Beulich
  2024-06-18  8:10     ` Chen, Jiqian
  2024-06-20 14:38   ` Anthony PERARD
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-17 15:10 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel

On 17.06.2024 11:00, Jiqian Chen wrote:
> In PVH dom0, it uses the linux local interrupt mechanism,
> when it allocs irq for a gsi, it is dynamic, and follow
> the principle of applying first, distributing first. And
> irq number is alloced from small to large, but the applying
> gsi number is not, may gsi 38 comes before gsi 28, that
> causes the irq number is not equal with the gsi number.

Hmm, see my earlier explanations on patch 5: GSI and IRQ generally aren't
the same anyway. Therefore this part of the description, while not wrong,
is at least at risk of misleading people.

> --- a/tools/include/xen-sys/Linux/privcmd.h
> +++ b/tools/include/xen-sys/Linux/privcmd.h
> @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
>  	__u64 addr;
>  } privcmd_mmap_resource_t;
>  
> +typedef struct privcmd_gsi_from_dev {
> +	__u32 sbdf;

That's PCI-centric, without struct and IOCTL names reflecting this fact.

> +	int gsi;

Is "int" legitimate to use here? Doesn't this want to similarly be __u32?

> --- a/tools/include/xencall.h
> +++ b/tools/include/xencall.h
> @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
>               uint64_t arg1, uint64_t arg2, uint64_t arg3,
>               uint64_t arg4, uint64_t arg5);
>  
> +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);

Hmm, something (by name at least) OS-specific being in the public header
and ...

> --- a/tools/libs/call/libxencall.map
> +++ b/tools/libs/call/libxencall.map
> @@ -10,6 +10,8 @@ VERS_1.0 {
>  		xencall4;
>  		xencall5;
>  
> +		xen_oscall_gsi_from_dev;

... map file. I'm not sure things are intended to be this way.

> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>  #endif
>  }
>  
> +#define PCI_DEVID(bus, devfn)\
> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
> +
> +#define PCI_SBDF(seg, bus, devfn) \
> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))

I'm not a maintainer of this file; if I were, I'd ask that for readability's
sake all excess parentheses be dropped from these.

> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>          goto out_no_irq;
>      }
>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
> +#ifdef CONFIG_X86
> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
> +                        (PCI_DEVFN(pci->dev, pci->func)));
> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
> +        /*
> +         * Old kernel version may not support this function,

Just kernel?

> +         * so if fail, keep using irq; if success, use gsi
> +         */
> +        if (gsi > 0) {
> +            irq = gsi;

I'm still puzzled by this, when by now I think we've sufficiently clarified
that IRQs and GSIs use two distinct numbering spaces.

Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
the assumption that it'll fail. What if we decide to make the functionality
available there, too (if only for informational purposes, or for
consistency)? Suddenly you're fallback logic wouldn't work anymore, and
you'd call ...

> +        }
> +#endif
>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);

... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
you strictly want to avoid the call on PV Dom0.

Also for PVH Dom0: I don't think I've seen changes to the hypercall
handling, yet. How can that be when GSI and IRQ aren't the same, and hence
incoming GSI would need translating to IRQ somewhere? I can once again only
assume all your testing was done with IRQs whose numbers happened to match
their GSI numbers. (The difference, imo, would also need calling out in the
public header, where the respective interface struct(s) is/are defined.)

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-17 15:10   ` Jan Beulich
@ 2024-06-18  8:10     ` Chen, Jiqian
  2024-06-18  9:13       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-18  8:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/17 23:10, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> In PVH dom0, it uses the linux local interrupt mechanism,
>> when it allocs irq for a gsi, it is dynamic, and follow
>> the principle of applying first, distributing first. And
>> irq number is alloced from small to large, but the applying
>> gsi number is not, may gsi 38 comes before gsi 28, that
>> causes the irq number is not equal with the gsi number.
> 
> Hmm, see my earlier explanations on patch 5: GSI and IRQ generally aren't
> the same anyway. Therefore this part of the description, while not wrong,
> is at least at risk of misleading people.
OK, I wll change to say "irq is not the same as gsi".
> 
>> --- a/tools/include/xen-sys/Linux/privcmd.h
>> +++ b/tools/include/xen-sys/Linux/privcmd.h
>> @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
>>  	__u64 addr;
>>  } privcmd_mmap_resource_t;
>>  
>> +typedef struct privcmd_gsi_from_dev {
>> +	__u32 sbdf;
> 
> That's PCI-centric, without struct and IOCTL names reflecting this fact.
So, change to privcmd_gsi_from_pcidev ?

> 
>> +	int gsi;
> 
> Is "int" legitimate to use here? Doesn't this want to similarly be __u32?
I want to set gsi to negative if there is no record of this translation.

> 
>> --- a/tools/include/xencall.h
>> +++ b/tools/include/xencall.h
>> @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
>>               uint64_t arg1, uint64_t arg2, uint64_t arg3,
>>               uint64_t arg4, uint64_t arg5);
>>  
>> +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
> 
> Hmm, something (by name at least) OS-specific being in the public header
> and ...
> 
>> --- a/tools/libs/call/libxencall.map
>> +++ b/tools/libs/call/libxencall.map
>> @@ -10,6 +10,8 @@ VERS_1.0 {
>>  		xencall4;
>>  		xencall5;
>>  
>> +		xen_oscall_gsi_from_dev;
> 
> ... map file. I'm not sure things are intended to be this way.
Let's see other maintainer's opinion.

> 
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>  #endif
>>  }
>>  
>> +#define PCI_DEVID(bus, devfn)\
>> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>> +
>> +#define PCI_SBDF(seg, bus, devfn) \
>> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
> 
> I'm not a maintainer of this file; if I were, I'd ask that for readability's
> sake all excess parentheses be dropped from these.
Isn't it a coding requirement to enclose each element in parentheses in the macro definition?
It seems other files also do this. See tools/libs/light/libxl_internal.h

> 
>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>          goto out_no_irq;
>>      }
>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>> +#ifdef CONFIG_X86
>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>> +        /*
>> +         * Old kernel version may not support this function,
> 
> Just kernel?
Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux kernel side.
> 
>> +         * so if fail, keep using irq; if success, use gsi
>> +         */
>> +        if (gsi > 0) {
>> +            irq = gsi;
> 
> I'm still puzzled by this, when by now I think we've sufficiently clarified
> that IRQs and GSIs use two distinct numbering spaces.
> 
> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
> the assumption that it'll fail. What if we decide to make the functionality
> available there, too (if only for informational purposes, or for
> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
> you'd call ...
> 
>> +        }
>> +#endif
>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
> 
> ... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
> you strictly want to avoid the call on PV Dom0.
> 
> Also for PVH Dom0: I don't think I've seen changes to the hypercall
> handling, yet. How can that be when GSI and IRQ aren't the same, and hence
> incoming GSI would need translating to IRQ somewhere? I can once again only
> assume all your testing was done with IRQs whose numbers happened to match
> their GSI numbers. (The difference, imo, would also need calling out in the
> public header, where the respective interface struct(s) is/are defined.)
I feel like you missed out on many of the previous discussions.
Without my changes, the original codes use irq (read from file /sys/bus/pci/devices/<sbdf>/irq) to do xc_physdev_map_pirq,
but xc_physdev_map_pirq require passing into gsi instead of irq, so we need to use gsi whether dom0 is PV or PVH, so for the original codes, they are wrong.
Just because by chance, the irq value in the Linux kernel of pv dom0 is equal to the gsi value, so there was no problem with the original pv passthrough.
But not when using PVH, so passthrough failed.
With my changes, I got gsi through function xc_physdev_gsi_from_dev firstly, and to be compatible with old kernel versions(if the ioctl
IOCTL_PRIVCMD_GSI_FROM_DEV is not implemented), I still need to use the irq number, so I need to check the result
of gsi, if gsi > 0 means IOCTL_PRIVCMD_GSI_FROM_DEV is implemented I can use the right one gsi, otherwise keep using wrong one irq. 

And regarding to the implementation of ioctl IOCTL_PRIVCMD_GSI_FROM_DEV, it doesn't have any xen heypercall handling changes, all of its processing logic is on the kernel side.
I know, so you might want to say, "Then you shouldn't put this in xen's code." But this concern was discussed in previous versions, and since the pci maintainer disallowed to add
gsi sysfs on linux kernel side, I had to do so.
Roger, Stefano and Juergen may know more about this part.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-18  8:10     ` Chen, Jiqian
@ 2024-06-18  9:13       ` Jan Beulich
  2024-06-20  7:03         ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-18  9:13 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 18.06.2024 10:10, Chen, Jiqian wrote:
> On 2024/6/17 23:10, Jan Beulich wrote:
>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>> --- a/tools/include/xen-sys/Linux/privcmd.h
>>> +++ b/tools/include/xen-sys/Linux/privcmd.h
>>> @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
>>>  	__u64 addr;
>>>  } privcmd_mmap_resource_t;
>>>  
>>> +typedef struct privcmd_gsi_from_dev {
>>> +	__u32 sbdf;
>>
>> That's PCI-centric, without struct and IOCTL names reflecting this fact.
> So, change to privcmd_gsi_from_pcidev ?

That's what I'd suggest, yes. But remember that it's the kernel maintainers
who have the ultimate say here, as here you're only making a copy of what
the canonical header (in the kernel tree) is going to have.

>>> +	int gsi;
>>
>> Is "int" legitimate to use here? Doesn't this want to similarly be __u32?
> I want to set gsi to negative if there is no record of this translation.

There are surely more explicit ways to signal that case?

>>> --- a/tools/libs/light/libxl_pci.c
>>> +++ b/tools/libs/light/libxl_pci.c
>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>  #endif
>>>  }
>>>  
>>> +#define PCI_DEVID(bus, devfn)\
>>> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>> +
>>> +#define PCI_SBDF(seg, bus, devfn) \
>>> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>
>> I'm not a maintainer of this file; if I were, I'd ask that for readability's
>> sake all excess parentheses be dropped from these.
> Isn't it a coding requirement to enclose each element in parentheses in the macro definition?
> It seems other files also do this. See tools/libs/light/libxl_internal.h

As said, I'm not a maintainer of this code. Yet while I'm aware that libxl
has its own CODING_STYLE, I can't spot anything towards excessive use of
parentheses there.

>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>          goto out_no_irq;
>>>      }
>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>>> +#ifdef CONFIG_X86
>>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>> +        /*
>>> +         * Old kernel version may not support this function,
>>
>> Just kernel?
> Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux kernel side.

Okay, and when the kernel supports it but the underlying hypervisor doesn't
support what the kernel wants to use in order to fulfill the request, all
is fine? (See also below for what may be needed in the hypervisor, even if
this IOCTL would be satisfied by the kernel without needing to interact with
the hypervisor.)

>>> +         * so if fail, keep using irq; if success, use gsi
>>> +         */
>>> +        if (gsi > 0) {
>>> +            irq = gsi;
>>
>> I'm still puzzled by this, when by now I think we've sufficiently clarified
>> that IRQs and GSIs use two distinct numbering spaces.
>>
>> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
>> the assumption that it'll fail. What if we decide to make the functionality
>> available there, too (if only for informational purposes, or for
>> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
>> you'd call ...
>>
>>> +        }
>>> +#endif
>>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>>
>> ... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
>> you strictly want to avoid the call on PV Dom0.
>>
>> Also for PVH Dom0: I don't think I've seen changes to the hypercall
>> handling, yet. How can that be when GSI and IRQ aren't the same, and hence
>> incoming GSI would need translating to IRQ somewhere? I can once again only
>> assume all your testing was done with IRQs whose numbers happened to match
>> their GSI numbers. (The difference, imo, would also need calling out in the
>> public header, where the respective interface struct(s) is/are defined.)
> I feel like you missed out on many of the previous discussions.
> Without my changes, the original codes use irq (read from file /sys/bus/pci/devices/<sbdf>/irq) to do xc_physdev_map_pirq,
> but xc_physdev_map_pirq require passing into gsi instead of irq, so we need to use gsi whether dom0 is PV or PVH, so for the original codes, they are wrong.
> Just because by chance, the irq value in the Linux kernel of pv dom0 is equal to the gsi value, so there was no problem with the original pv passthrough.
> But not when using PVH, so passthrough failed.
> With my changes, I got gsi through function xc_physdev_gsi_from_dev firstly, and to be compatible with old kernel versions(if the ioctl
> IOCTL_PRIVCMD_GSI_FROM_DEV is not implemented), I still need to use the irq number, so I need to check the result
> of gsi, if gsi > 0 means IOCTL_PRIVCMD_GSI_FROM_DEV is implemented I can use the right one gsi, otherwise keep using wrong one irq. 

I understand all of this, to a (I think) sufficient degree at least. Yet what
you're effectively proposing (without explicitly saying so) is that e.g.
struct physdev_map_pirq's pirq field suddenly may no longer hold a pIRQ
number, but (when the caller is PVH) a GSI. This may be a necessary adjustment
to make (simply because the caller may have no way to express things in pIRQ
terms), but then suitable adjustments to the handling of PHYSDEVOP_map_pirq
would be necessary. In fact that field is presently marked as "IN or OUT";
when re-purposed to take a GSI on input, it may end up being necessary to pass
back the pIRQ (in the subject domain's numbering space). Or alternatively it
may be necessary to add yet another sub-function so the GSI can be translated
to the corresponding pIRQ for the domain that's going to use the IRQ, for that
then to be passed into PHYSDEVOP_map_pirq.

> And regarding to the implementation of ioctl IOCTL_PRIVCMD_GSI_FROM_DEV, it doesn't have any xen heypercall handling changes, all of its processing logic is on the kernel side.
> I know, so you might want to say, "Then you shouldn't put this in xen's code." But this concern was discussed in previous versions, and since the pci maintainer disallowed to add
> gsi sysfs on linux kernel side, I had to do so.

Right, but this is a separate aspect (which we simply need to live with on
the Xen side).

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-18  9:13       ` Jan Beulich
@ 2024-06-20  7:03         ` Chen, Jiqian
  2024-06-20  7:43           ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-20  7:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/18 17:13, Jan Beulich wrote:
> On 18.06.2024 10:10, Chen, Jiqian wrote:
>> On 2024/6/17 23:10, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> --- a/tools/include/xen-sys/Linux/privcmd.h
>>>> +++ b/tools/include/xen-sys/Linux/privcmd.h
>>>> @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
>>>>  	__u64 addr;
>>>>  } privcmd_mmap_resource_t;
>>>>  
>>>> +typedef struct privcmd_gsi_from_dev {
>>>> +	__u32 sbdf;
>>>
>>> That's PCI-centric, without struct and IOCTL names reflecting this fact.
>> So, change to privcmd_gsi_from_pcidev ?
> 
> That's what I'd suggest, yes. But remember that it's the kernel maintainers
> who have the ultimate say here, as here you're only making a copy of what
> the canonical header (in the kernel tree) is going to have.
OK, then let's wait for the corresponding patch on kernel side to be accepted first.
> 
>>>> +	int gsi;
>>>
>>> Is "int" legitimate to use here? Doesn't this want to similarly be __u32?
>> I want to set gsi to negative if there is no record of this translation.
> 
> There are surely more explicit ways to signal that case?
Maybe, I will think about the implementation on kernel side again.
> 
>>>> --- a/tools/libs/light/libxl_pci.c
>>>> +++ b/tools/libs/light/libxl_pci.c
>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>  #endif
>>>>  }
>>>>  
>>>> +#define PCI_DEVID(bus, devfn)\
>>>> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>> +
>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>
>>> I'm not a maintainer of this file; if I were, I'd ask that for readability's
>>> sake all excess parentheses be dropped from these.
>> Isn't it a coding requirement to enclose each element in parentheses in the macro definition?
>> It seems other files also do this. See tools/libs/light/libxl_internal.h
> 
> As said, I'm not a maintainer of this code. Yet while I'm aware that libxl
> has its own CODING_STYLE, I can't spot anything towards excessive use of
> parentheses there.
So, which parentheses do you think are excessive use?
> 
>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>          goto out_no_irq;
>>>>      }
>>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>>>> +#ifdef CONFIG_X86
>>>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>>>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>> +        /*
>>>> +         * Old kernel version may not support this function,
>>>
>>> Just kernel?
>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux kernel side.
> 
> Okay, and when the kernel supports it but the underlying hypervisor doesn't
> support what the kernel wants to use in order to fulfill the request, all
I don't know what things you mentioned hypervisor doesn't support are,
because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf information,
that relationship can be got only in dom0 instead of Xen hypervisor.

> is fine? (See also below for what may be needed in the hypervisor, even if
You mean xc_physdev_map_pirq needs gsi?

> this IOCTL would be satisfied by the kernel without needing to interact with
> the hypervisor.)
> 
>>>> +         * so if fail, keep using irq; if success, use gsi
>>>> +         */
>>>> +        if (gsi > 0) {
>>>> +            irq = gsi;
>>>
>>> I'm still puzzled by this, when by now I think we've sufficiently clarified
>>> that IRQs and GSIs use two distinct numbering spaces.
>>>
>>> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
>>> the assumption that it'll fail. What if we decide to make the functionality
>>> available there, too (if only for informational purposes, or for
>>> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
>>> you'd call ...
>>>
>>>> +        }
>>>> +#endif
>>>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>>>
>>> ... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
>>> you strictly want to avoid the call on PV Dom0.
>>>
>>> Also for PVH Dom0: I don't think I've seen changes to the hypercall
>>> handling, yet. How can that be when GSI and IRQ aren't the same, and hence
>>> incoming GSI would need translating to IRQ somewhere? I can once again only
>>> assume all your testing was done with IRQs whose numbers happened to match
>>> their GSI numbers. (The difference, imo, would also need calling out in the
>>> public header, where the respective interface struct(s) is/are defined.)
>> I feel like you missed out on many of the previous discussions.
>> Without my changes, the original codes use irq (read from file /sys/bus/pci/devices/<sbdf>/irq) to do xc_physdev_map_pirq,
>> but xc_physdev_map_pirq require passing into gsi instead of irq, so we need to use gsi whether dom0 is PV or PVH, so for the original codes, they are wrong.
>> Just because by chance, the irq value in the Linux kernel of pv dom0 is equal to the gsi value, so there was no problem with the original pv passthrough.
>> But not when using PVH, so passthrough failed.
>> With my changes, I got gsi through function xc_physdev_gsi_from_dev firstly, and to be compatible with old kernel versions(if the ioctl
>> IOCTL_PRIVCMD_GSI_FROM_DEV is not implemented), I still need to use the irq number, so I need to check the result
>> of gsi, if gsi > 0 means IOCTL_PRIVCMD_GSI_FROM_DEV is implemented I can use the right one gsi, otherwise keep using wrong one irq. 
> 
> I understand all of this, to a (I think) sufficient degree at least. Yet what
> you're effectively proposing (without explicitly saying so) is that e.g.
> struct physdev_map_pirq's pirq field suddenly may no longer hold a pIRQ
> number, but (when the caller is PVH) a GSI. This may be a necessary adjustment
> to make (simply because the caller may have no way to express things in pIRQ
> terms), but then suitable adjustments to the handling of PHYSDEVOP_map_pirq
> would be necessary. In fact that field is presently marked as "IN or OUT";
> when re-purposed to take a GSI on input, it may end up being necessary to pass
> back the pIRQ (in the subject domain's numbering space). Or alternatively it
> may be necessary to add yet another sub-function so the GSI can be translated
> to the corresponding pIRQ for the domain that's going to use the IRQ, for that
> then to be passed into PHYSDEVOP_map_pirq.
If I understood correctly, your concerns about this patch are two:
First, when dom0 is PV, I should not use xc_physdev_gsi_from_dev to get gsi to do xc_physdev_map_pirq, I should keep the original code that use irq.
Second, when dom0 is PVH, I get the gsi, but I should not pass gsi as the fourth parameter of xc_physdev_map_pirq, I should create a new local parameter pirq=-1, and pass it in.

> 
>> And regarding to the implementation of ioctl IOCTL_PRIVCMD_GSI_FROM_DEV, it doesn't have any xen heypercall handling changes, all of its processing logic is on the kernel side.
>> I know, so you might want to say, "Then you shouldn't put this in xen's code." But this concern was discussed in previous versions, and since the pci maintainer disallowed to add
>> gsi sysfs on linux kernel side, I had to do so.
> 
> Right, but this is a separate aspect (which we simply need to live with on
> the Xen side).
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-20  7:03         ` Chen, Jiqian
@ 2024-06-20  7:43           ` Jan Beulich
  2024-06-20 10:23             ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-20  7:43 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 20.06.2024 09:03, Chen, Jiqian wrote:
> On 2024/6/18 17:13, Jan Beulich wrote:
>> On 18.06.2024 10:10, Chen, Jiqian wrote:
>>> On 2024/6/17 23:10, Jan Beulich wrote:
>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>> --- a/tools/libs/light/libxl_pci.c
>>>>> +++ b/tools/libs/light/libxl_pci.c
>>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>>  #endif
>>>>>  }
>>>>>  
>>>>> +#define PCI_DEVID(bus, devfn)\
>>>>> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>>> +
>>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>>> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>>
>>>> I'm not a maintainer of this file; if I were, I'd ask that for readability's
>>>> sake all excess parentheses be dropped from these.
>>> Isn't it a coding requirement to enclose each element in parentheses in the macro definition?
>>> It seems other files also do this. See tools/libs/light/libxl_internal.h
>>
>> As said, I'm not a maintainer of this code. Yet while I'm aware that libxl
>> has its own CODING_STYLE, I can't spot anything towards excessive use of
>> parentheses there.
> So, which parentheses do you think are excessive use?

#define PCI_DEVID(bus, devfn)\
            (((uint16_t)(bus) << 8) | ((devfn) & 0xff))

#define PCI_SBDF(seg, bus, devfn) \
            (((uint32_t)(seg) << 16) | PCI_DEVID(bus, devfn))

>>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>          goto out_no_irq;
>>>>>      }
>>>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>>>>> +#ifdef CONFIG_X86
>>>>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>>>>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>>> +        /*
>>>>> +         * Old kernel version may not support this function,
>>>>
>>>> Just kernel?
>>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux kernel side.
>>
>> Okay, and when the kernel supports it but the underlying hypervisor doesn't
>> support what the kernel wants to use in order to fulfill the request, all
> I don't know what things you mentioned hypervisor doesn't support are,
> because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf information,
> that relationship can be got only in dom0 instead of Xen hypervisor.
> 
>> is fine? (See also below for what may be needed in the hypervisor, even if
> You mean xc_physdev_map_pirq needs gsi?

I'd put it slightly differently: You arrange for that function to now take a
GSI when the caller is PVH. But yes, the function, when used with
MAP_PIRQ_TYPE_GSI, clearly expects a GSI as input (see also below).

>> this IOCTL would be satisfied by the kernel without needing to interact with
>> the hypervisor.)
>>
>>>>> +         * so if fail, keep using irq; if success, use gsi
>>>>> +         */
>>>>> +        if (gsi > 0) {
>>>>> +            irq = gsi;
>>>>
>>>> I'm still puzzled by this, when by now I think we've sufficiently clarified
>>>> that IRQs and GSIs use two distinct numbering spaces.
>>>>
>>>> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
>>>> the assumption that it'll fail. What if we decide to make the functionality
>>>> available there, too (if only for informational purposes, or for
>>>> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
>>>> you'd call ...
>>>>
>>>>> +        }
>>>>> +#endif
>>>>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>>>>
>>>> ... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
>>>> you strictly want to avoid the call on PV Dom0.
>>>>
>>>> Also for PVH Dom0: I don't think I've seen changes to the hypercall
>>>> handling, yet. How can that be when GSI and IRQ aren't the same, and hence
>>>> incoming GSI would need translating to IRQ somewhere? I can once again only
>>>> assume all your testing was done with IRQs whose numbers happened to match
>>>> their GSI numbers. (The difference, imo, would also need calling out in the
>>>> public header, where the respective interface struct(s) is/are defined.)
>>> I feel like you missed out on many of the previous discussions.
>>> Without my changes, the original codes use irq (read from file /sys/bus/pci/devices/<sbdf>/irq) to do xc_physdev_map_pirq,
>>> but xc_physdev_map_pirq require passing into gsi instead of irq, so we need to use gsi whether dom0 is PV or PVH, so for the original codes, they are wrong.
>>> Just because by chance, the irq value in the Linux kernel of pv dom0 is equal to the gsi value, so there was no problem with the original pv passthrough.
>>> But not when using PVH, so passthrough failed.
>>> With my changes, I got gsi through function xc_physdev_gsi_from_dev firstly, and to be compatible with old kernel versions(if the ioctl
>>> IOCTL_PRIVCMD_GSI_FROM_DEV is not implemented), I still need to use the irq number, so I need to check the result
>>> of gsi, if gsi > 0 means IOCTL_PRIVCMD_GSI_FROM_DEV is implemented I can use the right one gsi, otherwise keep using wrong one irq. 
>>
>> I understand all of this, to a (I think) sufficient degree at least. Yet what
>> you're effectively proposing (without explicitly saying so) is that e.g.
>> struct physdev_map_pirq's pirq field suddenly may no longer hold a pIRQ
>> number, but (when the caller is PVH) a GSI. This may be a necessary adjustment
>> to make (simply because the caller may have no way to express things in pIRQ
>> terms), but then suitable adjustments to the handling of PHYSDEVOP_map_pirq
>> would be necessary. In fact that field is presently marked as "IN or OUT";
>> when re-purposed to take a GSI on input, it may end up being necessary to pass
>> back the pIRQ (in the subject domain's numbering space). Or alternatively it
>> may be necessary to add yet another sub-function so the GSI can be translated
>> to the corresponding pIRQ for the domain that's going to use the IRQ, for that
>> then to be passed into PHYSDEVOP_map_pirq.
> If I understood correctly, your concerns about this patch are two:
> First, when dom0 is PV, I should not use xc_physdev_gsi_from_dev to get gsi to do xc_physdev_map_pirq, I should keep the original code that use irq.

Yes.

> Second, when dom0 is PVH, I get the gsi, but I should not pass gsi as the fourth parameter of xc_physdev_map_pirq, I should create a new local parameter pirq=-1, and pass it in.

I think so, yes. You also may need to record the output value, so you can later
use it for unmap. xc_physdev_map_pirq() may also need adjusting, as right now
it wouldn't put a negative incoming *pirq into the .pirq field. I actually
wonder if that's even correct right now, i.e. independent of your change.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-20  7:43           ` Jan Beulich
@ 2024-06-20 10:23             ` Chen, Jiqian
  2024-06-20 10:37               ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-20 10:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/20 15:43, Jan Beulich wrote:
> On 20.06.2024 09:03, Chen, Jiqian wrote:
>> On 2024/6/18 17:13, Jan Beulich wrote:
>>> On 18.06.2024 10:10, Chen, Jiqian wrote:
>>>> On 2024/6/17 23:10, Jan Beulich wrote:
>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>> --- a/tools/libs/light/libxl_pci.c
>>>>>> +++ b/tools/libs/light/libxl_pci.c
>>>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>>>  #endif
>>>>>>  }
>>>>>>  
>>>>>> +#define PCI_DEVID(bus, devfn)\
>>>>>> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>>>> +
>>>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>>>> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>>>
>>>>> I'm not a maintainer of this file; if I were, I'd ask that for readability's
>>>>> sake all excess parentheses be dropped from these.
>>>> Isn't it a coding requirement to enclose each element in parentheses in the macro definition?
>>>> It seems other files also do this. See tools/libs/light/libxl_internal.h
>>>
>>> As said, I'm not a maintainer of this code. Yet while I'm aware that libxl
>>> has its own CODING_STYLE, I can't spot anything towards excessive use of
>>> parentheses there.
>> So, which parentheses do you think are excessive use?
> 
> #define PCI_DEVID(bus, devfn)\
>             (((uint16_t)(bus) << 8) | ((devfn) & 0xff))
> 
> #define PCI_SBDF(seg, bus, devfn) \
>             (((uint32_t)(seg) << 16) | PCI_DEVID(bus, devfn))
Thanks, will change in next version.

> 
>>>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>          goto out_no_irq;
>>>>>>      }
>>>>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>>>>>> +#ifdef CONFIG_X86
>>>>>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>>>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>>>>>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>>>> +        /*
>>>>>> +         * Old kernel version may not support this function,
>>>>>
>>>>> Just kernel?
>>>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux kernel side.
>>>
>>> Okay, and when the kernel supports it but the underlying hypervisor doesn't
>>> support what the kernel wants to use in order to fulfill the request, all
>> I don't know what things you mentioned hypervisor doesn't support are,
>> because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf information,
>> that relationship can be got only in dom0 instead of Xen hypervisor.
>>
>>> is fine? (See also below for what may be needed in the hypervisor, even if
>> You mean xc_physdev_map_pirq needs gsi?
> 
> I'd put it slightly differently: You arrange for that function to now take a
> GSI when the caller is PVH. But yes, the function, when used with
> MAP_PIRQ_TYPE_GSI, clearly expects a GSI as input (see also below).
> 
>>> this IOCTL would be satisfied by the kernel without needing to interact with
>>> the hypervisor.)
>>>
>>>>>> +         * so if fail, keep using irq; if success, use gsi
>>>>>> +         */
>>>>>> +        if (gsi > 0) {
>>>>>> +            irq = gsi;
>>>>>
>>>>> I'm still puzzled by this, when by now I think we've sufficiently clarified
>>>>> that IRQs and GSIs use two distinct numbering spaces.
>>>>>
>>>>> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
>>>>> the assumption that it'll fail. What if we decide to make the functionality
>>>>> available there, too (if only for informational purposes, or for
>>>>> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
>>>>> you'd call ...
>>>>>
>>>>>> +        }
>>>>>> +#endif
>>>>>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>>>>>
>>>>> ... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
>>>>> you strictly want to avoid the call on PV Dom0.
>>>>>
>>>>> Also for PVH Dom0: I don't think I've seen changes to the hypercall
>>>>> handling, yet. How can that be when GSI and IRQ aren't the same, and hence
>>>>> incoming GSI would need translating to IRQ somewhere? I can once again only
>>>>> assume all your testing was done with IRQs whose numbers happened to match
>>>>> their GSI numbers. (The difference, imo, would also need calling out in the
>>>>> public header, where the respective interface struct(s) is/are defined.)
>>>> I feel like you missed out on many of the previous discussions.
>>>> Without my changes, the original codes use irq (read from file /sys/bus/pci/devices/<sbdf>/irq) to do xc_physdev_map_pirq,
>>>> but xc_physdev_map_pirq require passing into gsi instead of irq, so we need to use gsi whether dom0 is PV or PVH, so for the original codes, they are wrong.
>>>> Just because by chance, the irq value in the Linux kernel of pv dom0 is equal to the gsi value, so there was no problem with the original pv passthrough.
>>>> But not when using PVH, so passthrough failed.
>>>> With my changes, I got gsi through function xc_physdev_gsi_from_dev firstly, and to be compatible with old kernel versions(if the ioctl
>>>> IOCTL_PRIVCMD_GSI_FROM_DEV is not implemented), I still need to use the irq number, so I need to check the result
>>>> of gsi, if gsi > 0 means IOCTL_PRIVCMD_GSI_FROM_DEV is implemented I can use the right one gsi, otherwise keep using wrong one irq. 
>>>
>>> I understand all of this, to a (I think) sufficient degree at least. Yet what
>>> you're effectively proposing (without explicitly saying so) is that e.g.
>>> struct physdev_map_pirq's pirq field suddenly may no longer hold a pIRQ
>>> number, but (when the caller is PVH) a GSI. This may be a necessary adjustment
>>> to make (simply because the caller may have no way to express things in pIRQ
>>> terms), but then suitable adjustments to the handling of PHYSDEVOP_map_pirq
>>> would be necessary. In fact that field is presently marked as "IN or OUT";
>>> when re-purposed to take a GSI on input, it may end up being necessary to pass
>>> back the pIRQ (in the subject domain's numbering space). Or alternatively it
>>> may be necessary to add yet another sub-function so the GSI can be translated
>>> to the corresponding pIRQ for the domain that's going to use the IRQ, for that
>>> then to be passed into PHYSDEVOP_map_pirq.
>> If I understood correctly, your concerns about this patch are two:
>> First, when dom0 is PV, I should not use xc_physdev_gsi_from_dev to get gsi to do xc_physdev_map_pirq, I should keep the original code that use irq.
> 
> Yes.
OK, I can change to do this.
But I still have a concern:
Although without my changes, passthrough can work now when dom0 is PV.
And you also agree that: for xc_physdev_map_pirq, when use with MAP_PIRQ_TYPE_GSI, it expects a GSI as input.
Isn't it a wrong for PV dom0 to pass irq in? Why don't we use gsi as it should be used, since we have a function to get gsi now?

> 
>> Second, when dom0 is PVH, I get the gsi, but I should not pass gsi as the fourth parameter of xc_physdev_map_pirq, I should create a new local parameter pirq=-1, and pass it in.
> 
> I think so, yes. You also may need to record the output value, so you can later
> use it for unmap. xc_physdev_map_pirq() may also need adjusting, as right now
> it wouldn't put a negative incoming *pirq into the .pirq field. 
xc_physdev_map_pirq's logic is if we pass a negative in, it sets *pirq to index(gsi).
Is its logic right? If not how do we change it?

> I actually wonder if that's even correct right now, i.e. independent of your change.
Even without my changes, passthrough can work for PV dom0, not for PVH dom0.
According to the logic of hypercall PHYSDEVOP_map_pirq,
if pirq is -1, it calls physdev_map_pirq-> allocate_and_map_gsi_pirq-> allocate_pirq -> get_free_pirq to get pirq.
If pirq is set to positive before calling hypercall, it set pirq to its own value in allocate_pirq.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-20 10:23             ` Chen, Jiqian
@ 2024-06-20 10:37               ` Jan Beulich
  2024-06-21  8:15                 ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-20 10:37 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 20.06.2024 12:23, Chen, Jiqian wrote:
> On 2024/6/20 15:43, Jan Beulich wrote:
>> On 20.06.2024 09:03, Chen, Jiqian wrote:
>>> On 2024/6/18 17:13, Jan Beulich wrote:
>>>> On 18.06.2024 10:10, Chen, Jiqian wrote:
>>>>> On 2024/6/17 23:10, Jan Beulich wrote:
>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>> --- a/tools/libs/light/libxl_pci.c
>>>>>>> +++ b/tools/libs/light/libxl_pci.c
>>>>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>>>>  #endif
>>>>>>>  }
>>>>>>>  
>>>>>>> +#define PCI_DEVID(bus, devfn)\
>>>>>>> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>>>>> +
>>>>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>>>>> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>>>>
>>>>>> I'm not a maintainer of this file; if I were, I'd ask that for readability's
>>>>>> sake all excess parentheses be dropped from these.
>>>>> Isn't it a coding requirement to enclose each element in parentheses in the macro definition?
>>>>> It seems other files also do this. See tools/libs/light/libxl_internal.h
>>>>
>>>> As said, I'm not a maintainer of this code. Yet while I'm aware that libxl
>>>> has its own CODING_STYLE, I can't spot anything towards excessive use of
>>>> parentheses there.
>>> So, which parentheses do you think are excessive use?
>>
>> #define PCI_DEVID(bus, devfn)\
>>             (((uint16_t)(bus) << 8) | ((devfn) & 0xff))
>>
>> #define PCI_SBDF(seg, bus, devfn) \
>>             (((uint32_t)(seg) << 16) | PCI_DEVID(bus, devfn))
> Thanks, will change in next version.
> 
>>
>>>>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>          goto out_no_irq;
>>>>>>>      }
>>>>>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>>>>>>> +#ifdef CONFIG_X86
>>>>>>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>>>>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>>>>>>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>>>>> +        /*
>>>>>>> +         * Old kernel version may not support this function,
>>>>>>
>>>>>> Just kernel?
>>>>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux kernel side.
>>>>
>>>> Okay, and when the kernel supports it but the underlying hypervisor doesn't
>>>> support what the kernel wants to use in order to fulfill the request, all
>>> I don't know what things you mentioned hypervisor doesn't support are,
>>> because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf information,
>>> that relationship can be got only in dom0 instead of Xen hypervisor.
>>>
>>>> is fine? (See also below for what may be needed in the hypervisor, even if
>>> You mean xc_physdev_map_pirq needs gsi?
>>
>> I'd put it slightly differently: You arrange for that function to now take a
>> GSI when the caller is PVH. But yes, the function, when used with
>> MAP_PIRQ_TYPE_GSI, clearly expects a GSI as input (see also below).
>>
>>>> this IOCTL would be satisfied by the kernel without needing to interact with
>>>> the hypervisor.)
>>>>
>>>>>>> +         * so if fail, keep using irq; if success, use gsi
>>>>>>> +         */
>>>>>>> +        if (gsi > 0) {
>>>>>>> +            irq = gsi;
>>>>>>
>>>>>> I'm still puzzled by this, when by now I think we've sufficiently clarified
>>>>>> that IRQs and GSIs use two distinct numbering spaces.
>>>>>>
>>>>>> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
>>>>>> the assumption that it'll fail. What if we decide to make the functionality
>>>>>> available there, too (if only for informational purposes, or for
>>>>>> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
>>>>>> you'd call ...
>>>>>>
>>>>>>> +        }
>>>>>>> +#endif
>>>>>>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>>>>>>
>>>>>> ... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
>>>>>> you strictly want to avoid the call on PV Dom0.
>>>>>>
>>>>>> Also for PVH Dom0: I don't think I've seen changes to the hypercall
>>>>>> handling, yet. How can that be when GSI and IRQ aren't the same, and hence
>>>>>> incoming GSI would need translating to IRQ somewhere? I can once again only
>>>>>> assume all your testing was done with IRQs whose numbers happened to match
>>>>>> their GSI numbers. (The difference, imo, would also need calling out in the
>>>>>> public header, where the respective interface struct(s) is/are defined.)
>>>>> I feel like you missed out on many of the previous discussions.
>>>>> Without my changes, the original codes use irq (read from file /sys/bus/pci/devices/<sbdf>/irq) to do xc_physdev_map_pirq,
>>>>> but xc_physdev_map_pirq require passing into gsi instead of irq, so we need to use gsi whether dom0 is PV or PVH, so for the original codes, they are wrong.
>>>>> Just because by chance, the irq value in the Linux kernel of pv dom0 is equal to the gsi value, so there was no problem with the original pv passthrough.
>>>>> But not when using PVH, so passthrough failed.
>>>>> With my changes, I got gsi through function xc_physdev_gsi_from_dev firstly, and to be compatible with old kernel versions(if the ioctl
>>>>> IOCTL_PRIVCMD_GSI_FROM_DEV is not implemented), I still need to use the irq number, so I need to check the result
>>>>> of gsi, if gsi > 0 means IOCTL_PRIVCMD_GSI_FROM_DEV is implemented I can use the right one gsi, otherwise keep using wrong one irq. 
>>>>
>>>> I understand all of this, to a (I think) sufficient degree at least. Yet what
>>>> you're effectively proposing (without explicitly saying so) is that e.g.
>>>> struct physdev_map_pirq's pirq field suddenly may no longer hold a pIRQ
>>>> number, but (when the caller is PVH) a GSI. This may be a necessary adjustment
>>>> to make (simply because the caller may have no way to express things in pIRQ
>>>> terms), but then suitable adjustments to the handling of PHYSDEVOP_map_pirq
>>>> would be necessary. In fact that field is presently marked as "IN or OUT";
>>>> when re-purposed to take a GSI on input, it may end up being necessary to pass
>>>> back the pIRQ (in the subject domain's numbering space). Or alternatively it
>>>> may be necessary to add yet another sub-function so the GSI can be translated
>>>> to the corresponding pIRQ for the domain that's going to use the IRQ, for that
>>>> then to be passed into PHYSDEVOP_map_pirq.
>>> If I understood correctly, your concerns about this patch are two:
>>> First, when dom0 is PV, I should not use xc_physdev_gsi_from_dev to get gsi to do xc_physdev_map_pirq, I should keep the original code that use irq.
>>
>> Yes.
> OK, I can change to do this.
> But I still have a concern:
> Although without my changes, passthrough can work now when dom0 is PV.
> And you also agree that: for xc_physdev_map_pirq, when use with MAP_PIRQ_TYPE_GSI, it expects a GSI as input.
> Isn't it a wrong for PV dom0 to pass irq in? Why don't we use gsi as it should be used, since we have a function to get gsi now?

Indeed this and ...

>>> Second, when dom0 is PVH, I get the gsi, but I should not pass gsi as the fourth parameter of xc_physdev_map_pirq, I should create a new local parameter pirq=-1, and pass it in.
>>
>> I think so, yes. You also may need to record the output value, so you can later
>> use it for unmap. xc_physdev_map_pirq() may also need adjusting, as right now
>> it wouldn't put a negative incoming *pirq into the .pirq field. 
> xc_physdev_map_pirq's logic is if we pass a negative in, it sets *pirq to index(gsi).
> Is its logic right? If not how do we change it?

... this matches ...

>> I actually wonder if that's even correct right now, i.e. independent of your change.

... the remark here.

> Even without my changes, passthrough can work for PV dom0, not for PVH dom0.

In the common case. I fear no-one ever tried for a device with an IRQ that
has a source override specified in ACPI.

> According to the logic of hypercall PHYSDEVOP_map_pirq,
> if pirq is -1, it calls physdev_map_pirq-> allocate_and_map_gsi_pirq-> allocate_pirq -> get_free_pirq to get pirq.
> If pirq is set to positive before calling hypercall, it set pirq to its own value in allocate_pirq.

Which is what looks wrong to me. Question is what it was done this way in the
first place.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-20 10:37               ` Jan Beulich
@ 2024-06-21  8:15                 ` Chen, Jiqian
  2024-06-24  8:13                   ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-21  8:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/20 18:37, Jan Beulich wrote:
> On 20.06.2024 12:23, Chen, Jiqian wrote:
>> On 2024/6/20 15:43, Jan Beulich wrote:
>>> On 20.06.2024 09:03, Chen, Jiqian wrote:
>>>> On 2024/6/18 17:13, Jan Beulich wrote:
>>>>> On 18.06.2024 10:10, Chen, Jiqian wrote:
>>>>>> On 2024/6/17 23:10, Jan Beulich wrote:
>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>> --- a/tools/libs/light/libxl_pci.c
>>>>>>>> +++ b/tools/libs/light/libxl_pci.c
>>>>>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>>>>>  #endif
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +#define PCI_DEVID(bus, devfn)\
>>>>>>>> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>>>>>> +
>>>>>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>>>>>> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>>>>>
>>>>>>> I'm not a maintainer of this file; if I were, I'd ask that for readability's
>>>>>>> sake all excess parentheses be dropped from these.
>>>>>> Isn't it a coding requirement to enclose each element in parentheses in the macro definition?
>>>>>> It seems other files also do this. See tools/libs/light/libxl_internal.h
>>>>>
>>>>> As said, I'm not a maintainer of this code. Yet while I'm aware that libxl
>>>>> has its own CODING_STYLE, I can't spot anything towards excessive use of
>>>>> parentheses there.
>>>> So, which parentheses do you think are excessive use?
>>>
>>> #define PCI_DEVID(bus, devfn)\
>>>             (((uint16_t)(bus) << 8) | ((devfn) & 0xff))
>>>
>>> #define PCI_SBDF(seg, bus, devfn) \
>>>             (((uint32_t)(seg) << 16) | PCI_DEVID(bus, devfn))
>> Thanks, will change in next version.
>>
>>>
>>>>>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>>          goto out_no_irq;
>>>>>>>>      }
>>>>>>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>>>>>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>>>>>>>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>>>>>> +        /*
>>>>>>>> +         * Old kernel version may not support this function,
>>>>>>>
>>>>>>> Just kernel?
>>>>>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux kernel side.
>>>>>
>>>>> Okay, and when the kernel supports it but the underlying hypervisor doesn't
>>>>> support what the kernel wants to use in order to fulfill the request, all
>>>> I don't know what things you mentioned hypervisor doesn't support are,
>>>> because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf information,
>>>> that relationship can be got only in dom0 instead of Xen hypervisor.
>>>>
>>>>> is fine? (See also below for what may be needed in the hypervisor, even if
>>>> You mean xc_physdev_map_pirq needs gsi?
>>>
>>> I'd put it slightly differently: You arrange for that function to now take a
>>> GSI when the caller is PVH. But yes, the function, when used with
>>> MAP_PIRQ_TYPE_GSI, clearly expects a GSI as input (see also below).
>>>
>>>>> this IOCTL would be satisfied by the kernel without needing to interact with
>>>>> the hypervisor.)
>>>>>
>>>>>>>> +         * so if fail, keep using irq; if success, use gsi
>>>>>>>> +         */
>>>>>>>> +        if (gsi > 0) {
>>>>>>>> +            irq = gsi;
>>>>>>>
>>>>>>> I'm still puzzled by this, when by now I think we've sufficiently clarified
>>>>>>> that IRQs and GSIs use two distinct numbering spaces.
>>>>>>>
>>>>>>> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
>>>>>>> the assumption that it'll fail. What if we decide to make the functionality
>>>>>>> available there, too (if only for informational purposes, or for
>>>>>>> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
>>>>>>> you'd call ...
>>>>>>>
>>>>>>>> +        }
>>>>>>>> +#endif
>>>>>>>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>>>>>>>
>>>>>>> ... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
>>>>>>> you strictly want to avoid the call on PV Dom0.
>>>>>>>
>>>>>>> Also for PVH Dom0: I don't think I've seen changes to the hypercall
>>>>>>> handling, yet. How can that be when GSI and IRQ aren't the same, and hence
>>>>>>> incoming GSI would need translating to IRQ somewhere? I can once again only
>>>>>>> assume all your testing was done with IRQs whose numbers happened to match
>>>>>>> their GSI numbers. (The difference, imo, would also need calling out in the
>>>>>>> public header, where the respective interface struct(s) is/are defined.)
>>>>>> I feel like you missed out on many of the previous discussions.
>>>>>> Without my changes, the original codes use irq (read from file /sys/bus/pci/devices/<sbdf>/irq) to do xc_physdev_map_pirq,
>>>>>> but xc_physdev_map_pirq require passing into gsi instead of irq, so we need to use gsi whether dom0 is PV or PVH, so for the original codes, they are wrong.
>>>>>> Just because by chance, the irq value in the Linux kernel of pv dom0 is equal to the gsi value, so there was no problem with the original pv passthrough.
>>>>>> But not when using PVH, so passthrough failed.
>>>>>> With my changes, I got gsi through function xc_physdev_gsi_from_dev firstly, and to be compatible with old kernel versions(if the ioctl
>>>>>> IOCTL_PRIVCMD_GSI_FROM_DEV is not implemented), I still need to use the irq number, so I need to check the result
>>>>>> of gsi, if gsi > 0 means IOCTL_PRIVCMD_GSI_FROM_DEV is implemented I can use the right one gsi, otherwise keep using wrong one irq. 
>>>>>
>>>>> I understand all of this, to a (I think) sufficient degree at least. Yet what
>>>>> you're effectively proposing (without explicitly saying so) is that e.g.
>>>>> struct physdev_map_pirq's pirq field suddenly may no longer hold a pIRQ
>>>>> number, but (when the caller is PVH) a GSI. This may be a necessary adjustment
>>>>> to make (simply because the caller may have no way to express things in pIRQ
>>>>> terms), but then suitable adjustments to the handling of PHYSDEVOP_map_pirq
>>>>> would be necessary. In fact that field is presently marked as "IN or OUT";
>>>>> when re-purposed to take a GSI on input, it may end up being necessary to pass
>>>>> back the pIRQ (in the subject domain's numbering space). Or alternatively it
>>>>> may be necessary to add yet another sub-function so the GSI can be translated
>>>>> to the corresponding pIRQ for the domain that's going to use the IRQ, for that
>>>>> then to be passed into PHYSDEVOP_map_pirq.
>>>> If I understood correctly, your concerns about this patch are two:
>>>> First, when dom0 is PV, I should not use xc_physdev_gsi_from_dev to get gsi to do xc_physdev_map_pirq, I should keep the original code that use irq.
>>>
>>> Yes.
>> OK, I can change to do this.
>> But I still have a concern:
>> Although without my changes, passthrough can work now when dom0 is PV.
>> And you also agree that: for xc_physdev_map_pirq, when use with MAP_PIRQ_TYPE_GSI, it expects a GSI as input.
>> Isn't it a wrong for PV dom0 to pass irq in? Why don't we use gsi as it should be used, since we have a function to get gsi now?
> 
> Indeed this and ...
> 
>>>> Second, when dom0 is PVH, I get the gsi, but I should not pass gsi as the fourth parameter of xc_physdev_map_pirq, I should create a new local parameter pirq=-1, and pass it in.
>>>
>>> I think so, yes. You also may need to record the output value, so you can later
>>> use it for unmap. xc_physdev_map_pirq() may also need adjusting, as right now
>>> it wouldn't put a negative incoming *pirq into the .pirq field. 
>> xc_physdev_map_pirq's logic is if we pass a negative in, it sets *pirq to index(gsi).
>> Is its logic right? If not how do we change it?
> 
> ... this matches ...
> 
>>> I actually wonder if that's even correct right now, i.e. independent of your change.
> 
> ... the remark here.
So, what should I do next step?
If assume the logic of function xc_physdev_map_pirq and PHYSDEVOP_map_pirq is right,
I think what I did now is right, both PV and PVH dom0 should pass gsi into xc_physdev_map_pirq.
By the way, I found xc_physdev_map_pirq didn't support negative pirq is since your commit 934a5253d932b6f67fe40fc48975a2b0117e4cce, do you remember why?

> 
>> Even without my changes, passthrough can work for PV dom0, not for PVH dom0.
> 
> In the common case. I fear no-one ever tried for a device with an IRQ that
> has a source override specified in ACPI.
> 
>> According to the logic of hypercall PHYSDEVOP_map_pirq,
>> if pirq is -1, it calls physdev_map_pirq-> allocate_and_map_gsi_pirq-> allocate_pirq -> get_free_pirq to get pirq.
>> If pirq is set to positive before calling hypercall, it set pirq to its own value in allocate_pirq.
> 
> Which is what looks wrong to me. Question is what it was done this way in the
> first place.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-21  8:15                 ` Chen, Jiqian
@ 2024-06-24  8:13                   ` Jan Beulich
  2024-06-25  7:38                     ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-24  8:13 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 21.06.2024 10:15, Chen, Jiqian wrote:
> On 2024/6/20 18:37, Jan Beulich wrote:
>> On 20.06.2024 12:23, Chen, Jiqian wrote:
>>> On 2024/6/20 15:43, Jan Beulich wrote:
>>>> On 20.06.2024 09:03, Chen, Jiqian wrote:
>>>>> On 2024/6/18 17:13, Jan Beulich wrote:
>>>>>> On 18.06.2024 10:10, Chen, Jiqian wrote:
>>>>>>> On 2024/6/17 23:10, Jan Beulich wrote:
>>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>>> --- a/tools/libs/light/libxl_pci.c
>>>>>>>>> +++ b/tools/libs/light/libxl_pci.c
>>>>>>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>>>>>>  #endif
>>>>>>>>>  }
>>>>>>>>>  
>>>>>>>>> +#define PCI_DEVID(bus, devfn)\
>>>>>>>>> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>>>>>>> +
>>>>>>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>>>>>>> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>>>>>>
>>>>>>>> I'm not a maintainer of this file; if I were, I'd ask that for readability's
>>>>>>>> sake all excess parentheses be dropped from these.
>>>>>>> Isn't it a coding requirement to enclose each element in parentheses in the macro definition?
>>>>>>> It seems other files also do this. See tools/libs/light/libxl_internal.h
>>>>>>
>>>>>> As said, I'm not a maintainer of this code. Yet while I'm aware that libxl
>>>>>> has its own CODING_STYLE, I can't spot anything towards excessive use of
>>>>>> parentheses there.
>>>>> So, which parentheses do you think are excessive use?
>>>>
>>>> #define PCI_DEVID(bus, devfn)\
>>>>             (((uint16_t)(bus) << 8) | ((devfn) & 0xff))
>>>>
>>>> #define PCI_SBDF(seg, bus, devfn) \
>>>>             (((uint32_t)(seg) << 16) | PCI_DEVID(bus, devfn))
>>> Thanks, will change in next version.
>>>
>>>>
>>>>>>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>>>          goto out_no_irq;
>>>>>>>>>      }
>>>>>>>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>>>>>>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>>>>>>>>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>>>>>>> +        /*
>>>>>>>>> +         * Old kernel version may not support this function,
>>>>>>>>
>>>>>>>> Just kernel?
>>>>>>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux kernel side.
>>>>>>
>>>>>> Okay, and when the kernel supports it but the underlying hypervisor doesn't
>>>>>> support what the kernel wants to use in order to fulfill the request, all
>>>>> I don't know what things you mentioned hypervisor doesn't support are,
>>>>> because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf information,
>>>>> that relationship can be got only in dom0 instead of Xen hypervisor.
>>>>>
>>>>>> is fine? (See also below for what may be needed in the hypervisor, even if
>>>>> You mean xc_physdev_map_pirq needs gsi?
>>>>
>>>> I'd put it slightly differently: You arrange for that function to now take a
>>>> GSI when the caller is PVH. But yes, the function, when used with
>>>> MAP_PIRQ_TYPE_GSI, clearly expects a GSI as input (see also below).
>>>>
>>>>>> this IOCTL would be satisfied by the kernel without needing to interact with
>>>>>> the hypervisor.)
>>>>>>
>>>>>>>>> +         * so if fail, keep using irq; if success, use gsi
>>>>>>>>> +         */
>>>>>>>>> +        if (gsi > 0) {
>>>>>>>>> +            irq = gsi;
>>>>>>>>
>>>>>>>> I'm still puzzled by this, when by now I think we've sufficiently clarified
>>>>>>>> that IRQs and GSIs use two distinct numbering spaces.
>>>>>>>>
>>>>>>>> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
>>>>>>>> the assumption that it'll fail. What if we decide to make the functionality
>>>>>>>> available there, too (if only for informational purposes, or for
>>>>>>>> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
>>>>>>>> you'd call ...
>>>>>>>>
>>>>>>>>> +        }
>>>>>>>>> +#endif
>>>>>>>>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>>>>>>>>
>>>>>>>> ... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
>>>>>>>> you strictly want to avoid the call on PV Dom0.
>>>>>>>>
>>>>>>>> Also for PVH Dom0: I don't think I've seen changes to the hypercall
>>>>>>>> handling, yet. How can that be when GSI and IRQ aren't the same, and hence
>>>>>>>> incoming GSI would need translating to IRQ somewhere? I can once again only
>>>>>>>> assume all your testing was done with IRQs whose numbers happened to match
>>>>>>>> their GSI numbers. (The difference, imo, would also need calling out in the
>>>>>>>> public header, where the respective interface struct(s) is/are defined.)
>>>>>>> I feel like you missed out on many of the previous discussions.
>>>>>>> Without my changes, the original codes use irq (read from file /sys/bus/pci/devices/<sbdf>/irq) to do xc_physdev_map_pirq,
>>>>>>> but xc_physdev_map_pirq require passing into gsi instead of irq, so we need to use gsi whether dom0 is PV or PVH, so for the original codes, they are wrong.
>>>>>>> Just because by chance, the irq value in the Linux kernel of pv dom0 is equal to the gsi value, so there was no problem with the original pv passthrough.
>>>>>>> But not when using PVH, so passthrough failed.
>>>>>>> With my changes, I got gsi through function xc_physdev_gsi_from_dev firstly, and to be compatible with old kernel versions(if the ioctl
>>>>>>> IOCTL_PRIVCMD_GSI_FROM_DEV is not implemented), I still need to use the irq number, so I need to check the result
>>>>>>> of gsi, if gsi > 0 means IOCTL_PRIVCMD_GSI_FROM_DEV is implemented I can use the right one gsi, otherwise keep using wrong one irq. 
>>>>>>
>>>>>> I understand all of this, to a (I think) sufficient degree at least. Yet what
>>>>>> you're effectively proposing (without explicitly saying so) is that e.g.
>>>>>> struct physdev_map_pirq's pirq field suddenly may no longer hold a pIRQ
>>>>>> number, but (when the caller is PVH) a GSI. This may be a necessary adjustment
>>>>>> to make (simply because the caller may have no way to express things in pIRQ
>>>>>> terms), but then suitable adjustments to the handling of PHYSDEVOP_map_pirq
>>>>>> would be necessary. In fact that field is presently marked as "IN or OUT";
>>>>>> when re-purposed to take a GSI on input, it may end up being necessary to pass
>>>>>> back the pIRQ (in the subject domain's numbering space). Or alternatively it
>>>>>> may be necessary to add yet another sub-function so the GSI can be translated
>>>>>> to the corresponding pIRQ for the domain that's going to use the IRQ, for that
>>>>>> then to be passed into PHYSDEVOP_map_pirq.
>>>>> If I understood correctly, your concerns about this patch are two:
>>>>> First, when dom0 is PV, I should not use xc_physdev_gsi_from_dev to get gsi to do xc_physdev_map_pirq, I should keep the original code that use irq.
>>>>
>>>> Yes.
>>> OK, I can change to do this.
>>> But I still have a concern:
>>> Although without my changes, passthrough can work now when dom0 is PV.
>>> And you also agree that: for xc_physdev_map_pirq, when use with MAP_PIRQ_TYPE_GSI, it expects a GSI as input.
>>> Isn't it a wrong for PV dom0 to pass irq in? Why don't we use gsi as it should be used, since we have a function to get gsi now?
>>
>> Indeed this and ...
>>
>>>>> Second, when dom0 is PVH, I get the gsi, but I should not pass gsi as the fourth parameter of xc_physdev_map_pirq, I should create a new local parameter pirq=-1, and pass it in.
>>>>
>>>> I think so, yes. You also may need to record the output value, so you can later
>>>> use it for unmap. xc_physdev_map_pirq() may also need adjusting, as right now
>>>> it wouldn't put a negative incoming *pirq into the .pirq field. 
>>> xc_physdev_map_pirq's logic is if we pass a negative in, it sets *pirq to index(gsi).
>>> Is its logic right? If not how do we change it?
>>
>> ... this matches ...
>>
>>>> I actually wonder if that's even correct right now, i.e. independent of your change.
>>
>> ... the remark here.
> So, what should I do next step?
> If assume the logic of function xc_physdev_map_pirq and PHYSDEVOP_map_pirq is right,
> I think what I did now is right, both PV and PVH dom0 should pass gsi into xc_physdev_map_pirq.

It may sound unfriendly, and I'm willing to accept other maintainers
disagreeing with me, but I think before we think of any extensions of
what we have here, we want to address any issues with what we have.
Since you're working in the area, and since getting the additions right
requires properly understanding how things work (and where things may
not work correctly right now), I view you as being in the best position
to see about doing that (imo) prereq step.

> By the way, I found xc_physdev_map_pirq didn't support negative pirq is since your commit 934a5253d932b6f67fe40fc48975a2b0117e4cce, do you remember why?

Counter question: What is a negative pIRQ?

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-24  8:13                   ` Jan Beulich
@ 2024-06-25  7:38                     ` Chen, Jiqian
  0 siblings, 0 replies; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-25  7:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/24 16:13, Jan Beulich wrote:
> On 21.06.2024 10:15, Chen, Jiqian wrote:
>> On 2024/6/20 18:37, Jan Beulich wrote:
>>> On 20.06.2024 12:23, Chen, Jiqian wrote:
>>>> On 2024/6/20 15:43, Jan Beulich wrote:
>>>>> On 20.06.2024 09:03, Chen, Jiqian wrote:
>>>>>> On 2024/6/18 17:13, Jan Beulich wrote:
>>>>>>> On 18.06.2024 10:10, Chen, Jiqian wrote:
>>>>>>>> On 2024/6/17 23:10, Jan Beulich wrote:
>>>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>>>> --- a/tools/libs/light/libxl_pci.c
>>>>>>>>>> +++ b/tools/libs/light/libxl_pci.c
>>>>>>>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>>>>>>>  #endif
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> +#define PCI_DEVID(bus, devfn)\
>>>>>>>>>> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>>>>>>>> +
>>>>>>>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>>>>>>>> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>>>>>>>
>>>>>>>>> I'm not a maintainer of this file; if I were, I'd ask that for readability's
>>>>>>>>> sake all excess parentheses be dropped from these.
>>>>>>>> Isn't it a coding requirement to enclose each element in parentheses in the macro definition?
>>>>>>>> It seems other files also do this. See tools/libs/light/libxl_internal.h
>>>>>>>
>>>>>>> As said, I'm not a maintainer of this code. Yet while I'm aware that libxl
>>>>>>> has its own CODING_STYLE, I can't spot anything towards excessive use of
>>>>>>> parentheses there.
>>>>>> So, which parentheses do you think are excessive use?
>>>>>
>>>>> #define PCI_DEVID(bus, devfn)\
>>>>>             (((uint16_t)(bus) << 8) | ((devfn) & 0xff))
>>>>>
>>>>> #define PCI_SBDF(seg, bus, devfn) \
>>>>>             (((uint32_t)(seg) << 16) | PCI_DEVID(bus, devfn))
>>>> Thanks, will change in next version.
>>>>
>>>>>
>>>>>>>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>>>>          goto out_no_irq;
>>>>>>>>>>      }
>>>>>>>>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>>>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>>>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>>>>>>>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>>>>>>>>>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>>>>>>>> +        /*
>>>>>>>>>> +         * Old kernel version may not support this function,
>>>>>>>>>
>>>>>>>>> Just kernel?
>>>>>>>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux kernel side.
>>>>>>>
>>>>>>> Okay, and when the kernel supports it but the underlying hypervisor doesn't
>>>>>>> support what the kernel wants to use in order to fulfill the request, all
>>>>>> I don't know what things you mentioned hypervisor doesn't support are,
>>>>>> because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf information,
>>>>>> that relationship can be got only in dom0 instead of Xen hypervisor.
>>>>>>
>>>>>>> is fine? (See also below for what may be needed in the hypervisor, even if
>>>>>> You mean xc_physdev_map_pirq needs gsi?
>>>>>
>>>>> I'd put it slightly differently: You arrange for that function to now take a
>>>>> GSI when the caller is PVH. But yes, the function, when used with
>>>>> MAP_PIRQ_TYPE_GSI, clearly expects a GSI as input (see also below).
>>>>>
>>>>>>> this IOCTL would be satisfied by the kernel without needing to interact with
>>>>>>> the hypervisor.)
>>>>>>>
>>>>>>>>>> +         * so if fail, keep using irq; if success, use gsi
>>>>>>>>>> +         */
>>>>>>>>>> +        if (gsi > 0) {
>>>>>>>>>> +            irq = gsi;
>>>>>>>>>
>>>>>>>>> I'm still puzzled by this, when by now I think we've sufficiently clarified
>>>>>>>>> that IRQs and GSIs use two distinct numbering spaces.
>>>>>>>>>
>>>>>>>>> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
>>>>>>>>> the assumption that it'll fail. What if we decide to make the functionality
>>>>>>>>> available there, too (if only for informational purposes, or for
>>>>>>>>> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
>>>>>>>>> you'd call ...
>>>>>>>>>
>>>>>>>>>> +        }
>>>>>>>>>> +#endif
>>>>>>>>>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>>>>>>>>>
>>>>>>>>> ... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
>>>>>>>>> you strictly want to avoid the call on PV Dom0.
>>>>>>>>>
>>>>>>>>> Also for PVH Dom0: I don't think I've seen changes to the hypercall
>>>>>>>>> handling, yet. How can that be when GSI and IRQ aren't the same, and hence
>>>>>>>>> incoming GSI would need translating to IRQ somewhere? I can once again only
>>>>>>>>> assume all your testing was done with IRQs whose numbers happened to match
>>>>>>>>> their GSI numbers. (The difference, imo, would also need calling out in the
>>>>>>>>> public header, where the respective interface struct(s) is/are defined.)
>>>>>>>> I feel like you missed out on many of the previous discussions.
>>>>>>>> Without my changes, the original codes use irq (read from file /sys/bus/pci/devices/<sbdf>/irq) to do xc_physdev_map_pirq,
>>>>>>>> but xc_physdev_map_pirq require passing into gsi instead of irq, so we need to use gsi whether dom0 is PV or PVH, so for the original codes, they are wrong.
>>>>>>>> Just because by chance, the irq value in the Linux kernel of pv dom0 is equal to the gsi value, so there was no problem with the original pv passthrough.
>>>>>>>> But not when using PVH, so passthrough failed.
>>>>>>>> With my changes, I got gsi through function xc_physdev_gsi_from_dev firstly, and to be compatible with old kernel versions(if the ioctl
>>>>>>>> IOCTL_PRIVCMD_GSI_FROM_DEV is not implemented), I still need to use the irq number, so I need to check the result
>>>>>>>> of gsi, if gsi > 0 means IOCTL_PRIVCMD_GSI_FROM_DEV is implemented I can use the right one gsi, otherwise keep using wrong one irq. 
>>>>>>>
>>>>>>> I understand all of this, to a (I think) sufficient degree at least. Yet what
>>>>>>> you're effectively proposing (without explicitly saying so) is that e.g.
>>>>>>> struct physdev_map_pirq's pirq field suddenly may no longer hold a pIRQ
>>>>>>> number, but (when the caller is PVH) a GSI. This may be a necessary adjustment
>>>>>>> to make (simply because the caller may have no way to express things in pIRQ
>>>>>>> terms), but then suitable adjustments to the handling of PHYSDEVOP_map_pirq
>>>>>>> would be necessary. In fact that field is presently marked as "IN or OUT";
>>>>>>> when re-purposed to take a GSI on input, it may end up being necessary to pass
>>>>>>> back the pIRQ (in the subject domain's numbering space). Or alternatively it
>>>>>>> may be necessary to add yet another sub-function so the GSI can be translated
>>>>>>> to the corresponding pIRQ for the domain that's going to use the IRQ, for that
>>>>>>> then to be passed into PHYSDEVOP_map_pirq.
>>>>>> If I understood correctly, your concerns about this patch are two:
>>>>>> First, when dom0 is PV, I should not use xc_physdev_gsi_from_dev to get gsi to do xc_physdev_map_pirq, I should keep the original code that use irq.
>>>>>
>>>>> Yes.
>>>> OK, I can change to do this.
>>>> But I still have a concern:
>>>> Although without my changes, passthrough can work now when dom0 is PV.
>>>> And you also agree that: for xc_physdev_map_pirq, when use with MAP_PIRQ_TYPE_GSI, it expects a GSI as input.
>>>> Isn't it a wrong for PV dom0 to pass irq in? Why don't we use gsi as it should be used, since we have a function to get gsi now?
>>>
>>> Indeed this and ...
>>>
>>>>>> Second, when dom0 is PVH, I get the gsi, but I should not pass gsi as the fourth parameter of xc_physdev_map_pirq, I should create a new local parameter pirq=-1, and pass it in.
>>>>>
>>>>> I think so, yes. You also may need to record the output value, so you can later
>>>>> use it for unmap. xc_physdev_map_pirq() may also need adjusting, as right now
>>>>> it wouldn't put a negative incoming *pirq into the .pirq field. 
>>>> xc_physdev_map_pirq's logic is if we pass a negative in, it sets *pirq to index(gsi).
>>>> Is its logic right? If not how do we change it?
>>>
>>> ... this matches ...
>>>
>>>>> I actually wonder if that's even correct right now, i.e. independent of your change.
>>>
>>> ... the remark here.
>> So, what should I do next step?
>> If assume the logic of function xc_physdev_map_pirq and PHYSDEVOP_map_pirq is right,
>> I think what I did now is right, both PV and PVH dom0 should pass gsi into xc_physdev_map_pirq.
> 
> It may sound unfriendly, and I'm willing to accept other maintainers
> disagreeing with me, but I think before we think of any extensions of
> what we have here, we want to address any issues with what we have.
> Since you're working in the area, and since getting the additions right
> requires properly understanding how things work (and where things may
> not work correctly right now), I view you as being in the best position
> to see about doing that (imo) prereq step.
Make sense.
OK, I think there may be two issues in the callstack of function xc_physdev_map_pirq
(xc_physdev_map_pirq-> physdev_map_pirq-> allocate_and_map_gsi_pirq-> allocate_pirq).
For function xc_physdev_map_pirq, I think it should have two use cases:
First, when caller pass a pirq that ">=0", they want to map gsi to a specific pirq. In this case, when it reaches allocate_pirq,
if the gsi already has a mapped pirq(current_pirq), and current_pirq is not equal with specific pirq, it fails due to EEXIST, if current_pirq
is equal with specific pirq or the gsi doesn't have a current_pirq, allocate_pirq return the specific pirq. It successes.
Second, when caller pass a pirq that "<0", they want to get a free pirq for gsi. In this case, when it reaches allocate_pirq,
if the gsi already has a mapped pirq(current_pirq), we should return current_pirq, otherwise, it allocate a free pirq through get_free_pirqs
and then return the free pirq, it successes.
Roadmap is below:

													caller pass gsi and pirq into xc_physdev_map_pirq
															(xc_physdev_map_pirq)
															/		\
														*pirq>=0		*pirq<0 [issue 1]
													(map gsi to a specific pirq)	(map gsi to a free pirq)
				 _									/						\
				|							gsi already has a mapped pirq				gsi already has a mapped pirq
				|								(current_pirq)						(current_pirq)
				|							/			\					/		\
				|							yes			no					yes		no
allocate_pirq-------------------------|						/					\			/				\
				|					current_pirq == pirq			return specific pirq	return current_pirq [issue2]		return free pirq
				|				/				\
				|				yes				no
				|			/						\
				|_		return specific pirq					fail -EEXIST

But for current code,
[issue 1]: when *pirq<0, in xc_physdev_map_pirq, it re-sets *pirq to index by default. " map.pirq = *pirq < 0 ? index : *pirq; ", so that it can't allocate a free pirq for gsi(above second case)
[issue 2]: when *pirq<0 and gsi already has mapped pirq and has no need to allocate a new free pirq, in allocate_pirq, it returns *pirq(<0) directly, it means allocate_pirq fail. Here should return the already mapped pirq for that gsi and mean successful.

> 
>> By the way, I found xc_physdev_map_pirq didn't support negative pirq is since your commit 934a5253d932b6f67fe40fc48975a2b0117e4cce, do you remember why?
> 
> Counter question: What is a negative pIRQ?
I mean when caller pass a pirq that "<0" in.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-17  9:00 ` [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev Jiqian Chen
  2024-06-17 15:10   ` Jan Beulich
@ 2024-06-20 14:38   ` Anthony PERARD
  2024-06-21  8:34     ` Chen, Jiqian
  1 sibling, 1 reply; 48+ messages in thread
From: Anthony PERARD @ 2024-06-20 14:38 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Stewart Hildebrand, Huang Rui

On Mon, Jun 17, 2024 at 05:00:34PM +0800, Jiqian Chen wrote:
> diff --git a/tools/include/xencall.h b/tools/include/xencall.h
> index fc95ed0fe58e..750aab070323 100644
> --- a/tools/include/xencall.h
> +++ b/tools/include/xencall.h
> @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
>               uint64_t arg1, uint64_t arg2, uint64_t arg3,
>               uint64_t arg4, uint64_t arg5);
>  
> +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);

I don't think that an appropriate library for this new feature.
libxencall is a generic lib to make hypercall.

>  /* Variant(s) of the above, as needed, returning "long" instead of "int". */
>  long xencall2L(xencall_handle *xcall, unsigned int op,
>                 uint64_t arg1, uint64_t arg2);
> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
> index 9ceca0cffc2f..a0381f74d24b 100644
> --- a/tools/include/xenctrl.h
> +++ b/tools/include/xenctrl.h
> @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>                            uint32_t domid,
>                            int pirq);
>  
> +int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf);
> +
>  /*
>   *  LOGGING AND ERROR REPORTING
>   */
> diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
> index 02c4f8e1aefa..6dae50c9a6ba 100644
> --- a/tools/libs/call/core.c
> +++ b/tools/libs/call/core.c
> @@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
>      return osdep_hypercall(xcall, &call);
>  }
>  
> +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf)
> +{
> +    return osdep_oscall(xcall, sbdf);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
> index d18a3174e9dc..b92a0b5dc12c 100644
> --- a/tools/libs/call/libxencall.map
> +++ b/tools/libs/call/libxencall.map
> @@ -10,6 +10,8 @@ VERS_1.0 {
>  		xencall4;
>  		xencall5;
>  
> +		xen_oscall_gsi_from_dev;

FYI, never change already released version of a library, this would add
a new function to libxencall.1.0. Instead, when adding a new function
to a library that is supposed to be stable (they have a *.map file in
xen case), add it to a new section, that woud be VERS_1.4 in this case.
But libxencall isn't approriate for this new function, so just for
future reference.

>  		xencall_alloc_buffer;
>  		xencall_free_buffer;
>  		xencall_alloc_buffer_pages;
> diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
> index 6d588e6bea8f..92c740e176f2 100644
> --- a/tools/libs/call/linux.c
> +++ b/tools/libs/call/linux.c
> @@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall)
>      return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
>  }
>  
> +int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
> +{
> +    privcmd_gsi_from_dev_t dev_gsi = {
> +        .sbdf = sbdf,
> +        .gsi = -1,
> +    };
> +
> +    if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, &dev_gsi)) {

Looks like libxencall is only for hypercall, and so I don't think
it's the right place to introducing another ioctl() call.

> +        PERROR("failed to get gsi from dev");
> +        return -1;
> +    }
> +
> +    return dev_gsi.gsi;
> +}
> +
>  static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages)
>  {
>      void *p;
> diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h
> index 9c3aa432efe2..cd6eb5a3e66f 100644
> --- a/tools/libs/call/private.h
> +++ b/tools/libs/call/private.h
> @@ -57,6 +57,15 @@ int osdep_xencall_close(xencall_handle *xcall);
>  
>  long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall);
>  
> +#if defined(__linux__)
> +int osdep_oscall(xencall_handle *xcall, unsigned int sbdf);
> +#else
> +static inline int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
> +{
> +    return -1;
> +}
> +#endif
> +
>  void *osdep_alloc_pages(xencall_handle *xcall, size_t nr_pages);
>  void osdep_free_pages(xencall_handle *xcall, void *p, size_t nr_pages);
>  
> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
> index 460a8e779ce8..c1458f3a38b5 100644
> --- a/tools/libs/ctrl/xc_physdev.c
> +++ b/tools/libs/ctrl/xc_physdev.c
> @@ -111,3 +111,7 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>      return rc;
>  }
>  
> +int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf)
> +{

I'm not sure if this is the best place for this new function, but I
can't find another one, so that will do.

> +    return xen_oscall_gsi_from_dev(xch->xcall, sbdf);
> +}
> diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile
> index 37e4d1670986..6b616d5ee9b6 100644
> --- a/tools/libs/light/Makefile
> +++ b/tools/libs/light/Makefile
> @@ -40,7 +40,7 @@ OBJS-$(CONFIG_X86) += $(ACPI_OBJS)
>  
>  CFLAGS += -Wno-format-zero-length -Wmissing-declarations -Wformat-nonliteral
>  
> -CFLAGS-$(CONFIG_X86) += -DCONFIG_PCI_SUPP_LEGACY_IRQ
> +CFLAGS-$(CONFIG_X86) += -DCONFIG_PCI_SUPP_LEGACY_IRQ -DCONFIG_X86
>  
>  OBJS-$(CONFIG_X86) += libxl_cpuid.o
>  OBJS-$(CONFIG_X86) += libxl_x86.o
> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> index 96cb4da0794e..376f91759ac6 100644
> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>  #endif
>  }
>  
> +#define PCI_DEVID(bus, devfn)\
> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
> +
> +#define PCI_SBDF(seg, bus, devfn) \
> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
> +
>  static void pci_add_dm_done(libxl__egc *egc,
>                              pci_add_state *pas,
>                              int rc)
> @@ -1418,6 +1424,10 @@ static void pci_add_dm_done(libxl__egc *egc,
>      unsigned long long start, end, flags, size;
>      int irq, i;
>      int r;
> +#ifdef CONFIG_X86
> +    int gsi;
> +    uint32_t sbdf;
> +#endif
>      uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>      uint32_t domainid = domid;
>      bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>          goto out_no_irq;
>      }
>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
> +#ifdef CONFIG_X86

Could you avoid these #ifdef, and move the new arch specific code (and
maybe existing code) into libxl_x86.c ? There's already examples of arch
specific code.

> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
> +                        (PCI_DEVFN(pci->dev, pci->func)));
> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
> +        /*
> +         * Old kernel version may not support this function,
> +         * so if fail, keep using irq; if success, use gsi
> +         */
> +        if (gsi > 0) {
> +            irq = gsi;
> +        }
> +#endif
>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>          if (r < 0) {
>              LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
> @@ -2172,6 +2194,10 @@ static void pci_remove_detached(libxl__egc *egc,
>      int  irq = 0, i, stubdomid = 0;
>      const char *sysfs_path;
>      FILE *f;
> +#ifdef CONFIG_X86
> +    int gsi;
> +    uint32_t sbdf;
> +#endif
>      uint32_t domainid = prs->domid;
>      bool isstubdom;
>  
> @@ -2239,6 +2265,18 @@ skip_bar:
>      }
>  
>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
> +#ifdef CONFIG_X86
> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
> +                        (PCI_DEVFN(pci->dev, pci->func)));
> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
> +        /*
> +         * Old kernel version may not support this function,
> +         * so if fail, keep using irq; if success, use gsi
> +         */
> +        if (gsi > 0) {
> +            irq = gsi;
> +        }
> +#endif
>          rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
>          if (rc < 0) {
>              /*

Thanks,

-- 


Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-20 14:38   ` Anthony PERARD
@ 2024-06-21  8:34     ` Chen, Jiqian
  2024-06-24 12:08       ` Anthony PERARD
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-21  8:34 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Roger Pau Monné, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

On 2024/6/20 22:38, Anthony PERARD wrote:
> On Mon, Jun 17, 2024 at 05:00:34PM +0800, Jiqian Chen wrote:
>> diff --git a/tools/include/xencall.h b/tools/include/xencall.h
>> index fc95ed0fe58e..750aab070323 100644
>> --- a/tools/include/xencall.h
>> +++ b/tools/include/xencall.h
>> @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
>>               uint64_t arg1, uint64_t arg2, uint64_t arg3,
>>               uint64_t arg4, uint64_t arg5);
>>  
>> +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
> 
> I don't think that an appropriate library for this new feature.
> libxencall is a generic lib to make hypercall.
Do you have a suggested place to put this new function?
This new function is to get gsi of a pci device, and only depend on the dom0 kernel, doesn't need to interact with hypervisor.

> 
>>  /* Variant(s) of the above, as needed, returning "long" instead of "int". */
>>  long xencall2L(xencall_handle *xcall, unsigned int op,
>>                 uint64_t arg1, uint64_t arg2);
>> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
>> index 9ceca0cffc2f..a0381f74d24b 100644
>> --- a/tools/include/xenctrl.h
>> +++ b/tools/include/xenctrl.h
>> @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>                            uint32_t domid,
>>                            int pirq);
>>  
>> +int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf);
>> +
>>  /*
>>   *  LOGGING AND ERROR REPORTING
>>   */
>> diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
>> index 02c4f8e1aefa..6dae50c9a6ba 100644
>> --- a/tools/libs/call/core.c
>> +++ b/tools/libs/call/core.c
>> @@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
>>      return osdep_hypercall(xcall, &call);
>>  }
>>  
>> +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf)
>> +{
>> +    return osdep_oscall(xcall, sbdf);
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
>> index d18a3174e9dc..b92a0b5dc12c 100644
>> --- a/tools/libs/call/libxencall.map
>> +++ b/tools/libs/call/libxencall.map
>> @@ -10,6 +10,8 @@ VERS_1.0 {
>>  		xencall4;
>>  		xencall5;
>>  
>> +		xen_oscall_gsi_from_dev;
> 
> FYI, never change already released version of a library, this would add
> a new function to libxencall.1.0. Instead, when adding a new function
> to a library that is supposed to be stable (they have a *.map file in
> xen case), add it to a new section, that woud be VERS_1.4 in this case.
> But libxencall isn't approriate for this new function, so just for
> future reference.
> 
>>  		xencall_alloc_buffer;
>>  		xencall_free_buffer;
>>  		xencall_alloc_buffer_pages;
>> diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
>> index 6d588e6bea8f..92c740e176f2 100644
>> --- a/tools/libs/call/linux.c
>> +++ b/tools/libs/call/linux.c
>> @@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall)
>>      return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
>>  }
>>  
>> +int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
>> +{
>> +    privcmd_gsi_from_dev_t dev_gsi = {
>> +        .sbdf = sbdf,
>> +        .gsi = -1,
>> +    };
>> +
>> +    if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, &dev_gsi)) {
> 
> Looks like libxencall is only for hypercall, and so I don't think
> it's the right place to introducing another ioctl() call.
It seems IOCTL_PRIVCMD_HYPERCALL is for hypercall.
What I do here is to introduce new call into privcmd fd.
Maybe I can open "/dev/xen/privcmd" directly, so that I don't have to add the *_oscal function.

> 
>> +        PERROR("failed to get gsi from dev");
>> +        return -1;
>> +    }
>> +
>> +    return dev_gsi.gsi;
>> +}
>> +
>>  static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages)
>>  {
>>      void *p;
>> diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h
>> index 9c3aa432efe2..cd6eb5a3e66f 100644
>> --- a/tools/libs/call/private.h
>> +++ b/tools/libs/call/private.h
>> @@ -57,6 +57,15 @@ int osdep_xencall_close(xencall_handle *xcall);
>>  
>>  long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall);
>>  
>> +#if defined(__linux__)
>> +int osdep_oscall(xencall_handle *xcall, unsigned int sbdf);
>> +#else
>> +static inline int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
>> +{
>> +    return -1;
>> +}
>> +#endif
>> +
>>  void *osdep_alloc_pages(xencall_handle *xcall, size_t nr_pages);
>>  void osdep_free_pages(xencall_handle *xcall, void *p, size_t nr_pages);
>>  
>> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
>> index 460a8e779ce8..c1458f3a38b5 100644
>> --- a/tools/libs/ctrl/xc_physdev.c
>> +++ b/tools/libs/ctrl/xc_physdev.c
>> @@ -111,3 +111,7 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>      return rc;
>>  }
>>  
>> +int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf)
>> +{
> 
> I'm not sure if this is the best place for this new function, but I
> can't find another one, so that will do.
Thanks.

> 
>> +    return xen_oscall_gsi_from_dev(xch->xcall, sbdf);
>> +}
>> diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile
>> index 37e4d1670986..6b616d5ee9b6 100644
>> --- a/tools/libs/light/Makefile
>> +++ b/tools/libs/light/Makefile
>> @@ -40,7 +40,7 @@ OBJS-$(CONFIG_X86) += $(ACPI_OBJS)
>>  
>>  CFLAGS += -Wno-format-zero-length -Wmissing-declarations -Wformat-nonliteral
>>  
>> -CFLAGS-$(CONFIG_X86) += -DCONFIG_PCI_SUPP_LEGACY_IRQ
>> +CFLAGS-$(CONFIG_X86) += -DCONFIG_PCI_SUPP_LEGACY_IRQ -DCONFIG_X86
>>  
>>  OBJS-$(CONFIG_X86) += libxl_cpuid.o
>>  OBJS-$(CONFIG_X86) += libxl_x86.o
>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>> index 96cb4da0794e..376f91759ac6 100644
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>  #endif
>>  }
>>  
>> +#define PCI_DEVID(bus, devfn)\
>> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>> +
>> +#define PCI_SBDF(seg, bus, devfn) \
>> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>> +
>>  static void pci_add_dm_done(libxl__egc *egc,
>>                              pci_add_state *pas,
>>                              int rc)
>> @@ -1418,6 +1424,10 @@ static void pci_add_dm_done(libxl__egc *egc,
>>      unsigned long long start, end, flags, size;
>>      int irq, i;
>>      int r;
>> +#ifdef CONFIG_X86
>> +    int gsi;
>> +    uint32_t sbdf;
>> +#endif
>>      uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>>      uint32_t domainid = domid;
>>      bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>          goto out_no_irq;
>>      }
>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>> +#ifdef CONFIG_X86
> 
> Could you avoid these #ifdef, and move the new arch specific code (and
> maybe existing code) into libxl_x86.c ? There's already examples of arch
> specific code.
OK, will do in next version.

> 
>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>> +        /*
>> +         * Old kernel version may not support this function,
>> +         * so if fail, keep using irq; if success, use gsi
>> +         */
>> +        if (gsi > 0) {
>> +            irq = gsi;
>> +        }
>> +#endif
>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>>          if (r < 0) {
>>              LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
>> @@ -2172,6 +2194,10 @@ static void pci_remove_detached(libxl__egc *egc,
>>      int  irq = 0, i, stubdomid = 0;
>>      const char *sysfs_path;
>>      FILE *f;
>> +#ifdef CONFIG_X86
>> +    int gsi;
>> +    uint32_t sbdf;
>> +#endif
>>      uint32_t domainid = prs->domid;
>>      bool isstubdom;
>>  
>> @@ -2239,6 +2265,18 @@ skip_bar:
>>      }
>>  
>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>> +#ifdef CONFIG_X86
>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>> +        /*
>> +         * Old kernel version may not support this function,
>> +         * so if fail, keep using irq; if success, use gsi
>> +         */
>> +        if (gsi > 0) {
>> +            irq = gsi;
>> +        }
>> +#endif
>>          rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
>>          if (rc < 0) {
>>              /*
> 
> Thanks,
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
  2024-06-21  8:34     ` Chen, Jiqian
@ 2024-06-24 12:08       ` Anthony PERARD
  0 siblings, 0 replies; 48+ messages in thread
From: Anthony PERARD @ 2024-06-24 12:08 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray

On Fri, Jun 21, 2024 at 08:34:11AM +0000, Chen, Jiqian wrote:
> On 2024/6/20 22:38, Anthony PERARD wrote:
> > On Mon, Jun 17, 2024 at 05:00:34PM +0800, Jiqian Chen wrote:
> >> diff --git a/tools/include/xencall.h b/tools/include/xencall.h
> >> index fc95ed0fe58e..750aab070323 100644
> >> --- a/tools/include/xencall.h
> >> +++ b/tools/include/xencall.h
> >> @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
> >>               uint64_t arg1, uint64_t arg2, uint64_t arg3,
> >>               uint64_t arg4, uint64_t arg5);
> >>  
> >> +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
> > 
> > I don't think that an appropriate library for this new feature.
> > libxencall is a generic lib to make hypercall.
> Do you have a suggested place to put this new function?

This is an internal function, which doesn't need to be exposed in a
public interface, but the implementation is used by another function. So
that can be moved to libxenctrl.


-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-17  9:00 [XEN PATCH v10 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
                   ` (3 preceding siblings ...)
  2024-06-17  9:00 ` [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev Jiqian Chen
@ 2024-06-17  9:00 ` Jiqian Chen
  2024-06-17  9:15   ` Chen, Jiqian
  2024-06-17 15:32   ` Jan Beulich
  4 siblings, 2 replies; 48+ messages in thread
From: Jiqian Chen @ 2024-06-17  9:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Huang Rui,
	Jiqian Chen, Huang Rui

Some type of domain don't have PIRQs, like PVH, it doesn't do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest base on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
irq on Xen side.
What's more, current hypercall XEN_DOMCTL_irq_permission requires
passing in pirq, it is not suitable for dom0 that doesn't have
PIRQs.

So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the
permission of irq(translate from gsi) to dumU when dom0 has no
PIRQs.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
RFC: it needs review and needs to wait for the corresponding third patch on linux kernel side to be merged.
---
 tools/include/xenctrl.h            |  5 +++
 tools/libs/ctrl/xc_domain.c        | 15 +++++++
 tools/libs/light/libxl_pci.c       | 67 +++++++++++++++++++++++++++---
 xen/arch/x86/domctl.c              | 43 +++++++++++++++++++
 xen/arch/x86/include/asm/io_apic.h |  2 +
 xen/arch/x86/io_apic.c             | 17 ++++++++
 xen/arch/x86/mpparse.c             |  3 +-
 xen/include/public/domctl.h        |  8 ++++
 xen/xsm/flask/hooks.c              |  1 +
 9 files changed, 153 insertions(+), 8 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index a0381f74d24b..f3feb6848e25 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
                              uint32_t pirq,
                              bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+                             uint32_t domid,
+                             uint32_t gsi,
+                             bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
                                uint32_t domid,
                                unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
     return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+                             uint32_t domid,
+                             uint32_t gsi,
+                             bool allow_access)
+{
+    struct xen_domctl domctl = {
+        .cmd = XEN_DOMCTL_gsi_permission,
+        .domain = domid,
+        .u.gsi_permission.gsi = gsi,
+        .u.gsi_permission.allow_access = allow_access,
+    };
+
+    return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
                                uint32_t domid,
                                unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 376f91759ac6..f027f22c0028 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1431,6 +1431,9 @@ static void pci_add_dm_done(libxl__egc *egc,
     uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
     uint32_t domainid = domid;
     bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
+#ifdef CONFIG_X86
+    xc_domaininfo_t info;
+#endif
 
     /* Convenience aliases */
     bool starting = pas->starting;
@@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
             rc = ERROR_FAIL;
             goto out;
         }
-        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+#ifdef CONFIG_X86
+        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
+        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
         if (r < 0) {
-            LOGED(ERROR, domainid,
-                  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
+            LOGED(ERROR, domainid, "getdomaininfo failed (error=%d)", errno);
             fclose(f);
             rc = ERROR_FAIL;
             goto out;
         }
+        if (info.flags & XEN_DOMINF_hvm_guest &&
+            !(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ) &&
+            gsi > 0) {
+            r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
+            if (r < 0) {
+                LOGED(ERROR, domainid,
+                    "xc_domain_gsi_permission gsi=%d (error=%d)", gsi, errno);
+                fclose(f);
+                rc = ERROR_FAIL;
+                goto out;
+            }
+        }
+        else
+#endif
+        {
+            r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+            if (r < 0) {
+                LOGED(ERROR, domainid,
+                    "xc_domain_irq_permission irq=%d (error=%d)", irq, errno);
+                fclose(f);
+                rc = ERROR_FAIL;
+                goto out;
+            }
+        }
     }
     fclose(f);
 
@@ -2200,6 +2228,10 @@ static void pci_remove_detached(libxl__egc *egc,
 #endif
     uint32_t domainid = prs->domid;
     bool isstubdom;
+#ifdef CONFIG_X86
+    int r;
+    xc_domaininfo_t info;
+#endif
 
     /* Convenience aliases */
     libxl_device_pci *const pci = &prs->pci;
@@ -2287,9 +2319,32 @@ skip_bar:
              */
             LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
         }
-        rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
-        if (rc < 0) {
-            LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
+#ifdef CONFIG_X86
+        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
+        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
+        if (r < 0) {
+            LOGED(ERROR, domid, "getdomaininfo failed (error=%d)", errno);
+            fclose(f);
+            rc = ERROR_FAIL;
+            goto skip_legacy_irq;
+        }
+        if (info.flags & XEN_DOMINF_hvm_guest &&
+            !(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ) &&
+            gsi > 0) {
+            r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 0);
+            if (r < 0) {
+                LOGED(ERROR, domid,
+                    "xc_domain_gsi_permission gsi=%d (error=%d)", gsi, errno);
+                rc = ERROR_FAIL;
+            }
+        }
+        else
+#endif
+        {
+            rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
+            if (rc < 0) {
+                LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
+            }
         }
     }
 
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 335aedf46d03..6b465bbc6ec0 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include <asm/xstate.h>
 #include <asm/psr.h>
 #include <asm/cpu-policy.h>
+#include <asm/io_apic.h>
 
 static int update_domain_cpu_policy(struct domain *d,
                                     xen_domctl_cpu_policy_t *xdpc)
@@ -237,6 +238,48 @@ long arch_do_domctl(
         break;
     }
 
+    case XEN_DOMCTL_gsi_permission:
+    {
+        unsigned int gsi = domctl->u.gsi_permission.gsi;
+        int irq;
+        bool allow = domctl->u.gsi_permission.allow_access;
+
+        /* Check all pads are zero */
+        ret = -EINVAL;
+        for ( i = 0;
+              i < sizeof(domctl->u.gsi_permission.pad) /
+                  sizeof(domctl->u.gsi_permission.pad[0]);
+              ++i )
+            if ( domctl->u.gsi_permission.pad[i] )
+                goto out;
+
+        /*
+         * If current domain is PV or it has PIRQ flag, it has a mapping
+         * of gsi, pirq and irq, so it should use XEN_DOMCTL_irq_permission
+         * to grant irq permission.
+         */
+        ret = -EOPNOTSUPP;
+        if ( is_pv_domain(currd) || has_pirq(currd) )
+            goto out;
+
+        ret = -EINVAL;
+        if ( gsi >= nr_irqs_gsi || (irq = gsi_2_irq(gsi)) < 0 )
+            goto out;
+
+        ret = -EPERM;
+        if ( !irq_access_permitted(currd, irq) ||
+             xsm_irq_permission(XSM_HOOK, d, irq, allow) )
+            goto out;
+
+        if ( allow )
+            ret = irq_permit_access(d, irq);
+        else
+            ret = irq_deny_access(d, irq);
+
+    out:
+        break;
+    }
+
     case XEN_DOMCTL_getpageframeinfo3:
     {
         unsigned int num = domctl->u.getpageframeinfo3.num;
diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
index 78268ea8f666..7e86d8337758 100644
--- a/xen/arch/x86/include/asm/io_apic.h
+++ b/xen/arch/x86/include/asm/io_apic.h
@@ -213,5 +213,7 @@ unsigned highest_gsi(void);
 
 int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
 int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
+int mp_find_ioapic(int gsi);
+int gsi_2_irq(int gsi);
 
 #endif
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index b48a64246548..23845c8cb11f 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
     return irq;
 }
 
+int gsi_2_irq(int gsi)
+{
+    int ioapic, pin, irq;
+
+    ioapic = mp_find_ioapic(gsi);
+    if ( ioapic < 0 )
+        return -EINVAL;
+
+    pin = gsi - io_apic_gsi_base(ioapic);
+
+    irq = apic_pin_2_gsi_irq(ioapic, pin);
+    if ( irq <= 0 )
+        return -EINVAL;
+
+    return irq;
+}
+
 static inline int IO_APIC_irq_trigger(int irq)
 {
     int apic, idx, pin;
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index d8ccab2449c6..c95da0de5770 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -841,8 +841,7 @@ static struct mp_ioapic_routing {
 } mp_ioapic_routing[MAX_IO_APICS];
 
 
-static int mp_find_ioapic (
-	int			gsi)
+int mp_find_ioapic(int gsi)
 {
 	unsigned int		i;
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 2a49fe46ce25..f7ae8b19d27d 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -464,6 +464,12 @@ struct xen_domctl_irq_permission {
     uint8_t pad[3];
 };
 
+/* XEN_DOMCTL_gsi_permission */
+struct xen_domctl_gsi_permission {
+    uint32_t gsi;
+    uint8_t allow_access;    /* flag to specify enable/disable of x86 gsi access */
+    uint8_t pad[3];
+};
 
 /* XEN_DOMCTL_iomem_permission */
 struct xen_domctl_iomem_permission {
@@ -1306,6 +1312,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_get_paging_mempool_size       85
 #define XEN_DOMCTL_set_paging_mempool_size       86
 #define XEN_DOMCTL_dt_overlay                    87
+#define XEN_DOMCTL_gsi_permission                88
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1328,6 +1335,7 @@ struct xen_domctl {
         struct xen_domctl_setdomainhandle   setdomainhandle;
         struct xen_domctl_setdebugging      setdebugging;
         struct xen_domctl_irq_permission    irq_permission;
+        struct xen_domctl_gsi_permission    gsi_permission;
         struct xen_domctl_iomem_permission  iomem_permission;
         struct xen_domctl_ioport_permission ioport_permission;
         struct xen_domctl_hypercall_init    hypercall_init;
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 5e88c71b8e22..a5b134c91101 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -685,6 +685,7 @@ static int cf_check flask_domctl(struct domain *d, int cmd)
     case XEN_DOMCTL_shadow_op:
     case XEN_DOMCTL_ioport_permission:
     case XEN_DOMCTL_ioport_mapping:
+    case XEN_DOMCTL_gsi_permission:
 #endif
 #ifdef CONFIG_HAS_PASSTHROUGH
     /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-17  9:00 ` [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi Jiqian Chen
@ 2024-06-17  9:15   ` Chen, Jiqian
  2024-06-17 15:32   ` Jan Beulich
  1 sibling, 0 replies; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-17  9:15 UTC (permalink / raw)
  To: Daniel P . Smith
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

Hi Daniel,

On 2024/6/17 17:00, Jiqian Chen wrote:
> Some type of domain don't have PIRQs, like PVH, it doesn't do
> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
> to guest base on PVH dom0, callstack
> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
> irq on Xen side.
> What's more, current hypercall XEN_DOMCTL_irq_permission requires
> passing in pirq, it is not suitable for dom0 that doesn't have
> PIRQs.
> 
> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the
> permission of irq(translate from gsi) to dumU when dom0 has no
> PIRQs.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> RFC: it needs review and needs to wait for the corresponding third patch on linux kernel side to be merged.
> ---
>  tools/include/xenctrl.h            |  5 +++
>  tools/libs/ctrl/xc_domain.c        | 15 +++++++
>  tools/libs/light/libxl_pci.c       | 67 +++++++++++++++++++++++++++---
>  xen/arch/x86/domctl.c              | 43 +++++++++++++++++++
>  xen/arch/x86/include/asm/io_apic.h |  2 +
>  xen/arch/x86/io_apic.c             | 17 ++++++++
>  xen/arch/x86/mpparse.c             |  3 +-
>  xen/include/public/domctl.h        |  8 ++++
>  xen/xsm/flask/hooks.c              |  1 +
>  9 files changed, 153 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
> index a0381f74d24b..f3feb6848e25 100644
> --- a/tools/include/xenctrl.h
> +++ b/tools/include/xenctrl.h
> @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
>                               uint32_t pirq,
>                               bool allow_access);
>  
> +int xc_domain_gsi_permission(xc_interface *xch,
> +                             uint32_t domid,
> +                             uint32_t gsi,
> +                             bool allow_access);
> +
>  int xc_domain_iomem_permission(xc_interface *xch,
>                                 uint32_t domid,
>                                 unsigned long first_mfn,
> diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
> index f2d9d14b4d9f..8540e84fda93 100644
> --- a/tools/libs/ctrl/xc_domain.c
> +++ b/tools/libs/ctrl/xc_domain.c
> @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
>      return do_domctl(xch, &domctl);
>  }
>  
> +int xc_domain_gsi_permission(xc_interface *xch,
> +                             uint32_t domid,
> +                             uint32_t gsi,
> +                             bool allow_access)
> +{
> +    struct xen_domctl domctl = {
> +        .cmd = XEN_DOMCTL_gsi_permission,
> +        .domain = domid,
> +        .u.gsi_permission.gsi = gsi,
> +        .u.gsi_permission.allow_access = allow_access,
> +    };
> +
> +    return do_domctl(xch, &domctl);
> +}
> +
>  int xc_domain_iomem_permission(xc_interface *xch,
>                                 uint32_t domid,
>                                 unsigned long first_mfn,
> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> index 376f91759ac6..f027f22c0028 100644
> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -1431,6 +1431,9 @@ static void pci_add_dm_done(libxl__egc *egc,
>      uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>      uint32_t domainid = domid;
>      bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
> +#ifdef CONFIG_X86
> +    xc_domaininfo_t info;
> +#endif
>  
>      /* Convenience aliases */
>      bool starting = pas->starting;
> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>              rc = ERROR_FAIL;
>              goto out;
>          }
> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
> +#ifdef CONFIG_X86
> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
>          if (r < 0) {
> -            LOGED(ERROR, domainid,
> -                  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
> +            LOGED(ERROR, domainid, "getdomaininfo failed (error=%d)", errno);
>              fclose(f);
>              rc = ERROR_FAIL;
>              goto out;
>          }
> +        if (info.flags & XEN_DOMINF_hvm_guest &&
> +            !(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ) &&
> +            gsi > 0) {
> +            r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
> +            if (r < 0) {
> +                LOGED(ERROR, domainid,
> +                    "xc_domain_gsi_permission gsi=%d (error=%d)", gsi, errno);
> +                fclose(f);
> +                rc = ERROR_FAIL;
> +                goto out;
> +            }
> +        }
> +        else
> +#endif
> +        {
> +            r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
> +            if (r < 0) {
> +                LOGED(ERROR, domainid,
> +                    "xc_domain_irq_permission irq=%d (error=%d)", irq, errno);
> +                fclose(f);
> +                rc = ERROR_FAIL;
> +                goto out;
> +            }
> +        }
>      }
>      fclose(f);
>  
> @@ -2200,6 +2228,10 @@ static void pci_remove_detached(libxl__egc *egc,
>  #endif
>      uint32_t domainid = prs->domid;
>      bool isstubdom;
> +#ifdef CONFIG_X86
> +    int r;
> +    xc_domaininfo_t info;
> +#endif
>  
>      /* Convenience aliases */
>      libxl_device_pci *const pci = &prs->pci;
> @@ -2287,9 +2319,32 @@ skip_bar:
>               */
>              LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
>          }
> -        rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
> -        if (rc < 0) {
> -            LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
> +#ifdef CONFIG_X86
> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
> +        if (r < 0) {
> +            LOGED(ERROR, domid, "getdomaininfo failed (error=%d)", errno);
> +            fclose(f);
> +            rc = ERROR_FAIL;
> +            goto skip_legacy_irq;
> +        }
> +        if (info.flags & XEN_DOMINF_hvm_guest &&
> +            !(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ) &&
> +            gsi > 0) {
> +            r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 0);
> +            if (r < 0) {
> +                LOGED(ERROR, domid,
> +                    "xc_domain_gsi_permission gsi=%d (error=%d)", gsi, errno);
> +                rc = ERROR_FAIL;
> +            }
> +        }
> +        else
> +#endif
> +        {
> +            rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
> +            if (rc < 0) {
> +                LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
> +            }
>          }
>      }
>  
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 335aedf46d03..6b465bbc6ec0 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -36,6 +36,7 @@
>  #include <asm/xstate.h>
>  #include <asm/psr.h>
>  #include <asm/cpu-policy.h>
> +#include <asm/io_apic.h>
>  
>  static int update_domain_cpu_policy(struct domain *d,
>                                      xen_domctl_cpu_policy_t *xdpc)
> @@ -237,6 +238,48 @@ long arch_do_domctl(
>          break;
>      }
>  
> +    case XEN_DOMCTL_gsi_permission:
> +    {
> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
> +        int irq;
> +        bool allow = domctl->u.gsi_permission.allow_access;
> +
> +        /* Check all pads are zero */
> +        ret = -EINVAL;
> +        for ( i = 0;
> +              i < sizeof(domctl->u.gsi_permission.pad) /
> +                  sizeof(domctl->u.gsi_permission.pad[0]);
> +              ++i )
> +            if ( domctl->u.gsi_permission.pad[i] )
> +                goto out;
> +
> +        /*
> +         * If current domain is PV or it has PIRQ flag, it has a mapping
> +         * of gsi, pirq and irq, so it should use XEN_DOMCTL_irq_permission
> +         * to grant irq permission.
> +         */
> +        ret = -EOPNOTSUPP;
> +        if ( is_pv_domain(currd) || has_pirq(currd) )
> +            goto out;
> +
> +        ret = -EINVAL;
> +        if ( gsi >= nr_irqs_gsi || (irq = gsi_2_irq(gsi)) < 0 )
> +            goto out;
> +
> +        ret = -EPERM;
> +        if ( !irq_access_permitted(currd, irq) ||
> +             xsm_irq_permission(XSM_HOOK, d, irq, allow) )
Copy the question of Jan from the previous version to here:
Is it okay to issue the XSM check using the translated value, not the one that was originally passed into the hypercall?

> +            goto out;
> +
> +        if ( allow )
> +            ret = irq_permit_access(d, irq);
> +        else
> +            ret = irq_deny_access(d, irq);
> +
> +    out:
> +        break;
> +    }
> +
>      case XEN_DOMCTL_getpageframeinfo3:
>      {
>          unsigned int num = domctl->u.getpageframeinfo3.num;
> diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
> index 78268ea8f666..7e86d8337758 100644
> --- a/xen/arch/x86/include/asm/io_apic.h
> +++ b/xen/arch/x86/include/asm/io_apic.h
> @@ -213,5 +213,7 @@ unsigned highest_gsi(void);
>  
>  int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
>  int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
> +int mp_find_ioapic(int gsi);
> +int gsi_2_irq(int gsi);
>  
>  #endif
> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
> index b48a64246548..23845c8cb11f 100644
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
>      return irq;
>  }
>  
> +int gsi_2_irq(int gsi)
> +{
> +    int ioapic, pin, irq;
> +
> +    ioapic = mp_find_ioapic(gsi);
> +    if ( ioapic < 0 )
> +        return -EINVAL;
> +
> +    pin = gsi - io_apic_gsi_base(ioapic);
> +
> +    irq = apic_pin_2_gsi_irq(ioapic, pin);
> +    if ( irq <= 0 )
> +        return -EINVAL;
> +
> +    return irq;
> +}
> +
>  static inline int IO_APIC_irq_trigger(int irq)
>  {
>      int apic, idx, pin;
> diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
> index d8ccab2449c6..c95da0de5770 100644
> --- a/xen/arch/x86/mpparse.c
> +++ b/xen/arch/x86/mpparse.c
> @@ -841,8 +841,7 @@ static struct mp_ioapic_routing {
>  } mp_ioapic_routing[MAX_IO_APICS];
>  
>  
> -static int mp_find_ioapic (
> -	int			gsi)
> +int mp_find_ioapic(int gsi)
>  {
>  	unsigned int		i;
>  
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 2a49fe46ce25..f7ae8b19d27d 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -464,6 +464,12 @@ struct xen_domctl_irq_permission {
>      uint8_t pad[3];
>  };
>  
> +/* XEN_DOMCTL_gsi_permission */
> +struct xen_domctl_gsi_permission {
> +    uint32_t gsi;
> +    uint8_t allow_access;    /* flag to specify enable/disable of x86 gsi access */
> +    uint8_t pad[3];
> +};
>  
>  /* XEN_DOMCTL_iomem_permission */
>  struct xen_domctl_iomem_permission {
> @@ -1306,6 +1312,7 @@ struct xen_domctl {
>  #define XEN_DOMCTL_get_paging_mempool_size       85
>  #define XEN_DOMCTL_set_paging_mempool_size       86
>  #define XEN_DOMCTL_dt_overlay                    87
> +#define XEN_DOMCTL_gsi_permission                88
>  #define XEN_DOMCTL_gdbsx_guestmemio            1000
>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> @@ -1328,6 +1335,7 @@ struct xen_domctl {
>          struct xen_domctl_setdomainhandle   setdomainhandle;
>          struct xen_domctl_setdebugging      setdebugging;
>          struct xen_domctl_irq_permission    irq_permission;
> +        struct xen_domctl_gsi_permission    gsi_permission;
>          struct xen_domctl_iomem_permission  iomem_permission;
>          struct xen_domctl_ioport_permission ioport_permission;
>          struct xen_domctl_hypercall_init    hypercall_init;
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 5e88c71b8e22..a5b134c91101 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -685,6 +685,7 @@ static int cf_check flask_domctl(struct domain *d, int cmd)
>      case XEN_DOMCTL_shadow_op:
>      case XEN_DOMCTL_ioport_permission:
>      case XEN_DOMCTL_ioport_mapping:
> +    case XEN_DOMCTL_gsi_permission:
>  #endif
>  #ifdef CONFIG_HAS_PASSTHROUGH
>      /*

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-17  9:00 ` [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi Jiqian Chen
  2024-06-17  9:15   ` Chen, Jiqian
@ 2024-06-17 15:32   ` Jan Beulich
  2024-06-18  8:23     ` Chen, Jiqian
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-17 15:32 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel

On 17.06.2024 11:00, Jiqian Chen wrote:
> Some type of domain don't have PIRQs, like PVH, it doesn't do
> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
> to guest base on PVH dom0, callstack
> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
> irq on Xen side.
> What's more, current hypercall XEN_DOMCTL_irq_permission requires
> passing in pirq, it is not suitable for dom0 that doesn't have
> PIRQs.
> 
> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the
> permission of irq(translate from gsi) to dumU when dom0 has no
> PIRQs.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> RFC: it needs review and needs to wait for the corresponding third patch on linux kernel side to be merged.
> ---
>  tools/include/xenctrl.h            |  5 +++
>  tools/libs/ctrl/xc_domain.c        | 15 +++++++
>  tools/libs/light/libxl_pci.c       | 67 +++++++++++++++++++++++++++---
>  xen/arch/x86/domctl.c              | 43 +++++++++++++++++++
>  xen/arch/x86/include/asm/io_apic.h |  2 +
>  xen/arch/x86/io_apic.c             | 17 ++++++++
>  xen/arch/x86/mpparse.c             |  3 +-
>  xen/include/public/domctl.h        |  8 ++++
>  xen/xsm/flask/hooks.c              |  1 +
>  9 files changed, 153 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
> index a0381f74d24b..f3feb6848e25 100644
> --- a/tools/include/xenctrl.h
> +++ b/tools/include/xenctrl.h
> @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
>                               uint32_t pirq,
>                               bool allow_access);
>  
> +int xc_domain_gsi_permission(xc_interface *xch,
> +                             uint32_t domid,
> +                             uint32_t gsi,
> +                             bool allow_access);
> +
>  int xc_domain_iomem_permission(xc_interface *xch,
>                                 uint32_t domid,
>                                 unsigned long first_mfn,
> diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
> index f2d9d14b4d9f..8540e84fda93 100644
> --- a/tools/libs/ctrl/xc_domain.c
> +++ b/tools/libs/ctrl/xc_domain.c
> @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
>      return do_domctl(xch, &domctl);
>  }
>  
> +int xc_domain_gsi_permission(xc_interface *xch,
> +                             uint32_t domid,
> +                             uint32_t gsi,
> +                             bool allow_access)
> +{
> +    struct xen_domctl domctl = {
> +        .cmd = XEN_DOMCTL_gsi_permission,
> +        .domain = domid,
> +        .u.gsi_permission.gsi = gsi,
> +        .u.gsi_permission.allow_access = allow_access,
> +    };
> +
> +    return do_domctl(xch, &domctl);
> +}
> +
>  int xc_domain_iomem_permission(xc_interface *xch,
>                                 uint32_t domid,
>                                 unsigned long first_mfn,
> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> index 376f91759ac6..f027f22c0028 100644
> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -1431,6 +1431,9 @@ static void pci_add_dm_done(libxl__egc *egc,
>      uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>      uint32_t domainid = domid;
>      bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
> +#ifdef CONFIG_X86
> +    xc_domaininfo_t info;
> +#endif
>  
>      /* Convenience aliases */
>      bool starting = pas->starting;
> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>              rc = ERROR_FAIL;
>              goto out;
>          }
> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
> +#ifdef CONFIG_X86
> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);

Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
I didn't check if that can be used with the underlying hypercall(s). Otherwise
you want to pass the actual domid of the local domain here.

>          if (r < 0) {
> -            LOGED(ERROR, domainid,
> -                  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
> +            LOGED(ERROR, domainid, "getdomaininfo failed (error=%d)", errno);
>              fclose(f);
>              rc = ERROR_FAIL;
>              goto out;
>          }
> +        if (info.flags & XEN_DOMINF_hvm_guest &&

You want to parenthesize the & here.

> +            !(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ) &&
> +            gsi > 0) {

So if gsi < 0 failure of xc_domain_getinfo_single() would needlessly result
in failure of this function?

> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -36,6 +36,7 @@
>  #include <asm/xstate.h>
>  #include <asm/psr.h>
>  #include <asm/cpu-policy.h>
> +#include <asm/io_apic.h>
>  
>  static int update_domain_cpu_policy(struct domain *d,
>                                      xen_domctl_cpu_policy_t *xdpc)
> @@ -237,6 +238,48 @@ long arch_do_domctl(
>          break;
>      }
>  
> +    case XEN_DOMCTL_gsi_permission:
> +    {
> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
> +        int irq;
> +        bool allow = domctl->u.gsi_permission.allow_access;

See my earlier comments on this conversion of 8 bits into just one.

> +        /* Check all pads are zero */
> +        ret = -EINVAL;
> +        for ( i = 0;
> +              i < sizeof(domctl->u.gsi_permission.pad) /
> +                  sizeof(domctl->u.gsi_permission.pad[0]);

Please don't open-code ARRAY_SIZE().

> +              ++i )
> +            if ( domctl->u.gsi_permission.pad[i] )
> +                goto out;
> +
> +        /*
> +         * If current domain is PV or it has PIRQ flag, it has a mapping
> +         * of gsi, pirq and irq, so it should use XEN_DOMCTL_irq_permission
> +         * to grant irq permission.
> +         */
> +        ret = -EOPNOTSUPP;
> +        if ( is_pv_domain(currd) || has_pirq(currd) )
> +            goto out;

I'm curious what other x86 maintainers think: I for one would not impose such
an artificial restriction.

> +        ret = -EINVAL;
> +        if ( gsi >= nr_irqs_gsi || (irq = gsi_2_irq(gsi)) < 0 )
> +            goto out;
> +
> +        ret = -EPERM;
> +        if ( !irq_access_permitted(currd, irq) ||
> +             xsm_irq_permission(XSM_HOOK, d, irq, allow) )
> +            goto out;
> +
> +        if ( allow )
> +            ret = irq_permit_access(d, irq);
> +        else
> +            ret = irq_deny_access(d, irq);
> +
> +    out:

Please use a less generic name for such a label local to just one case
block. However, with ..

> +        break;

.. this being all that's done here: Why have a label in the first place?

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-17 15:32   ` Jan Beulich
@ 2024-06-18  8:23     ` Chen, Jiqian
  2024-06-18  9:23       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-18  8:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/17 23:32, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> Some type of domain don't have PIRQs, like PVH, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest base on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
>> irq on Xen side.
>> What's more, current hypercall XEN_DOMCTL_irq_permission requires
>> passing in pirq, it is not suitable for dom0 that doesn't have
>> PIRQs.
>>
>> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the
>> permission of irq(translate from gsi) to dumU when dom0 has no
>> PIRQs.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> RFC: it needs review and needs to wait for the corresponding third patch on linux kernel side to be merged.
>> ---
>>  tools/include/xenctrl.h            |  5 +++
>>  tools/libs/ctrl/xc_domain.c        | 15 +++++++
>>  tools/libs/light/libxl_pci.c       | 67 +++++++++++++++++++++++++++---
>>  xen/arch/x86/domctl.c              | 43 +++++++++++++++++++
>>  xen/arch/x86/include/asm/io_apic.h |  2 +
>>  xen/arch/x86/io_apic.c             | 17 ++++++++
>>  xen/arch/x86/mpparse.c             |  3 +-
>>  xen/include/public/domctl.h        |  8 ++++
>>  xen/xsm/flask/hooks.c              |  1 +
>>  9 files changed, 153 insertions(+), 8 deletions(-)
>>
>> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
>> index a0381f74d24b..f3feb6848e25 100644
>> --- a/tools/include/xenctrl.h
>> +++ b/tools/include/xenctrl.h
>> @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
>>                               uint32_t pirq,
>>                               bool allow_access);
>>  
>> +int xc_domain_gsi_permission(xc_interface *xch,
>> +                             uint32_t domid,
>> +                             uint32_t gsi,
>> +                             bool allow_access);
>> +
>>  int xc_domain_iomem_permission(xc_interface *xch,
>>                                 uint32_t domid,
>>                                 unsigned long first_mfn,
>> diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
>> index f2d9d14b4d9f..8540e84fda93 100644
>> --- a/tools/libs/ctrl/xc_domain.c
>> +++ b/tools/libs/ctrl/xc_domain.c
>> @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
>>      return do_domctl(xch, &domctl);
>>  }
>>  
>> +int xc_domain_gsi_permission(xc_interface *xch,
>> +                             uint32_t domid,
>> +                             uint32_t gsi,
>> +                             bool allow_access)
>> +{
>> +    struct xen_domctl domctl = {
>> +        .cmd = XEN_DOMCTL_gsi_permission,
>> +        .domain = domid,
>> +        .u.gsi_permission.gsi = gsi,
>> +        .u.gsi_permission.allow_access = allow_access,
>> +    };
>> +
>> +    return do_domctl(xch, &domctl);
>> +}
>> +
>>  int xc_domain_iomem_permission(xc_interface *xch,
>>                                 uint32_t domid,
>>                                 unsigned long first_mfn,
>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>> index 376f91759ac6..f027f22c0028 100644
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -1431,6 +1431,9 @@ static void pci_add_dm_done(libxl__egc *egc,
>>      uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>>      uint32_t domainid = domid;
>>      bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
>> +#ifdef CONFIG_X86
>> +    xc_domaininfo_t info;
>> +#endif
>>  
>>      /* Convenience aliases */
>>      bool starting = pas->starting;
>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>              rc = ERROR_FAIL;
>>              goto out;
>>          }
>> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>> +#ifdef CONFIG_X86
>> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
>> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
> 
> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
> I didn't check if that can be used with the underlying hypercall(s). Otherwise
> you want to pass the actual domid of the local domain here.
But the action of granting permission is from dom0 to domU, what I need to get is the infomation of dom0,
The actual domid here is domU's id I think, it is not useful.

> 
>>          if (r < 0) {
>> -            LOGED(ERROR, domainid,
>> -                  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
>> +            LOGED(ERROR, domainid, "getdomaininfo failed (error=%d)", errno);
>>              fclose(f);
>>              rc = ERROR_FAIL;
>>              goto out;
>>          }
>> +        if (info.flags & XEN_DOMINF_hvm_guest &&
> 
> You want to parenthesize the & here.
Will change in next version.

> 
>> +            !(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ) &&
>> +            gsi > 0) {
> 
> So if gsi < 0 failure of xc_domain_getinfo_single() would needlessly result
> in failure of this function?
In next version, the judgment of gsi>0 will be placed in the next layer of if, and then the error will be handled.

> 
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -36,6 +36,7 @@
>>  #include <asm/xstate.h>
>>  #include <asm/psr.h>
>>  #include <asm/cpu-policy.h>
>> +#include <asm/io_apic.h>
>>  
>>  static int update_domain_cpu_policy(struct domain *d,
>>                                      xen_domctl_cpu_policy_t *xdpc)
>> @@ -237,6 +238,48 @@ long arch_do_domctl(
>>          break;
>>      }
>>  
>> +    case XEN_DOMCTL_gsi_permission:
>> +    {
>> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
>> +        int irq;
>> +        bool allow = domctl->u.gsi_permission.allow_access;
> 
> See my earlier comments on this conversion of 8 bits into just one.
Do you mean that I need to check allow_access is >= 0?
But allow_access is u8, it can't be negative.

> 
>> +        /* Check all pads are zero */
>> +        ret = -EINVAL;
>> +        for ( i = 0;
>> +              i < sizeof(domctl->u.gsi_permission.pad) /
>> +                  sizeof(domctl->u.gsi_permission.pad[0]);
> 
> Please don't open-code ARRAY_SIZE().
Will change in next version.

> 
>> +              ++i )
>> +            if ( domctl->u.gsi_permission.pad[i] )
>> +                goto out;
>> +
>> +        /*
>> +         * If current domain is PV or it has PIRQ flag, it has a mapping
>> +         * of gsi, pirq and irq, so it should use XEN_DOMCTL_irq_permission
>> +         * to grant irq permission.
>> +         */
>> +        ret = -EOPNOTSUPP;
>> +        if ( is_pv_domain(currd) || has_pirq(currd) )
>> +            goto out;
> 
> I'm curious what other x86 maintainers think: I for one would not impose such
> an artificial restriction.
Emmm, so I need to remove this check.

> 
>> +        ret = -EINVAL;
>> +        if ( gsi >= nr_irqs_gsi || (irq = gsi_2_irq(gsi)) < 0 )
>> +            goto out;
>> +
>> +        ret = -EPERM;
>> +        if ( !irq_access_permitted(currd, irq) ||
>> +             xsm_irq_permission(XSM_HOOK, d, irq, allow) )
>> +            goto out;
>> +
>> +        if ( allow )
>> +            ret = irq_permit_access(d, irq);
>> +        else
>> +            ret = irq_deny_access(d, irq);
>> +
>> +    out:
> 
> Please use a less generic name for such a label local to just one case
> block. However, with ..
Ok, will change in next version.

> 
>> +        break;
> 
> .. this being all that's done here: Why have a label in the first place?
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-18  8:23     ` Chen, Jiqian
@ 2024-06-18  9:23       ` Jan Beulich
  2024-06-20  9:40         ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-18  9:23 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 18.06.2024 10:23, Chen, Jiqian wrote:
> On 2024/6/17 23:32, Jan Beulich wrote:
>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>              rc = ERROR_FAIL;
>>>              goto out;
>>>          }
>>> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>> +#ifdef CONFIG_X86
>>> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
>>> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
>>
>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
>> I didn't check if that can be used with the underlying hypercall(s). Otherwise
>> you want to pass the actual domid of the local domain here.
> But the action of granting permission is from dom0 to domU, what I need to get is the infomation of dom0,
> The actual domid here is domU's id I think, it is not useful.

Note how I said DOMID_SELF and "local domain". There's no talk of using the
DomU's domid. But what you apparently neglect is the fact that the hardware
domain isn't necessarily Dom0 (see CONFIG_LATE_HWDOM in the hypervisor).
While benign in most cases, this is relevant when it comes to referencing
the hardware domain by domid. And it is the hardware domain which is going
to drive the device re-assignment, as that domain is who's in possession of
all the devices not yet assigned to any DomU.

>>> @@ -237,6 +238,48 @@ long arch_do_domctl(
>>>          break;
>>>      }
>>>  
>>> +    case XEN_DOMCTL_gsi_permission:
>>> +    {
>>> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
>>> +        int irq;
>>> +        bool allow = domctl->u.gsi_permission.allow_access;
>>
>> See my earlier comments on this conversion of 8 bits into just one.
> Do you mean that I need to check allow_access is >= 0?
> But allow_access is u8, it can't be negative.

Right. What I can only re-iterate from earlier commenting is that you
want to check for 0 or 1 (can be viewed as looking at just the low bit),
rejecting everything else. It is only this way that down the road we
could assign meaning to the other bits, without risking to break existing
callers. That's the same as the requirement to check padding fields to be
zero.

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-18  9:23       ` Jan Beulich
@ 2024-06-20  9:40         ` Chen, Jiqian
  2024-06-20 10:42           ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-20  9:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/18 17:23, Jan Beulich wrote:
> On 18.06.2024 10:23, Chen, Jiqian wrote:
>> On 2024/6/17 23:32, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>              rc = ERROR_FAIL;
>>>>              goto out;
>>>>          }
>>>> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>>> +#ifdef CONFIG_X86
>>>> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
>>>> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
>>>
>>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
>>> I didn't check if that can be used with the underlying hypercall(s). Otherwise
From the commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf, DOMID_SELF is not allowed for XEN_DOMCTL_getdomaininfo.
And now XEN_DOMCTL_getdomaininfo gets domain through rcu_lock_domain_by_id.

>>> you want to pass the actual domid of the local domain here.
What is the local domain here?
What is method for me to get its domid?

>> But the action of granting permission is from dom0 to domU, what I need to get is the infomation of dom0,
>> The actual domid here is domU's id I think, it is not useful.
> 
> Note how I said DOMID_SELF and "local domain". There's no talk of using the
> DomU's domid. But what you apparently neglect is the fact that the hardware
> domain isn't necessarily Dom0 (see CONFIG_LATE_HWDOM in the hypervisor).
> While benign in most cases, this is relevant when it comes to referencing
> the hardware domain by domid. And it is the hardware domain which is going
> to drive the device re-assignment, as that domain is who's in possession of
> all the devices not yet assigned to any DomU.
OK, I need to get the information of hardware domain here?

> 
>>>> @@ -237,6 +238,48 @@ long arch_do_domctl(
>>>>          break;
>>>>      }
>>>>  
>>>> +    case XEN_DOMCTL_gsi_permission:
>>>> +    {
>>>> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
>>>> +        int irq;
>>>> +        bool allow = domctl->u.gsi_permission.allow_access;
>>>
>>> See my earlier comments on this conversion of 8 bits into just one.
>> Do you mean that I need to check allow_access is >= 0?
>> But allow_access is u8, it can't be negative.
> 
> Right. What I can only re-iterate from earlier commenting is that you
> want to check for 0 or 1 (can be viewed as looking at just the low bit),
> rejecting everything else. It is only this way that down the road we
> could assign meaning to the other bits, without risking to break existing
> callers. That's the same as the requirement to check padding fields to be
> zero.
OK, I will add check the other bit is zero except the lowest one bit.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-20  9:40         ` Chen, Jiqian
@ 2024-06-20 10:42           ` Jan Beulich
  2024-06-21  8:20             ` Chen, Jiqian
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-20 10:42 UTC (permalink / raw)
  To: Chen, Jiqian, Anthony PERARD
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org

On 20.06.2024 11:40, Chen, Jiqian wrote:
> On 2024/6/18 17:23, Jan Beulich wrote:
>> On 18.06.2024 10:23, Chen, Jiqian wrote:
>>> On 2024/6/17 23:32, Jan Beulich wrote:
>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>              rc = ERROR_FAIL;
>>>>>              goto out;
>>>>>          }
>>>>> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>>>> +#ifdef CONFIG_X86
>>>>> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
>>>>> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
>>>>
>>>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
>>>> I didn't check if that can be used with the underlying hypercall(s). Otherwise
> From the commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf, DOMID_SELF is not allowed for XEN_DOMCTL_getdomaininfo.
> And now XEN_DOMCTL_getdomaininfo gets domain through rcu_lock_domain_by_id.
> 
>>>> you want to pass the actual domid of the local domain here.
> What is the local domain here?

The domain your code is running in.

> What is method for me to get its domid?

I hope there's an available function in one of the libraries to do that.
But I wouldn't even know what to look for; that's a question to (primarily)
Anthony then, who sadly continues to be our only tool stack maintainer.

Alternatively we could maybe enable XEN_DOMCTL_getdomaininfo to permit
DOMID_SELF.

>>> But the action of granting permission is from dom0 to domU, what I need to get is the infomation of dom0,
>>> The actual domid here is domU's id I think, it is not useful.
>>
>> Note how I said DOMID_SELF and "local domain". There's no talk of using the
>> DomU's domid. But what you apparently neglect is the fact that the hardware
>> domain isn't necessarily Dom0 (see CONFIG_LATE_HWDOM in the hypervisor).
>> While benign in most cases, this is relevant when it comes to referencing
>> the hardware domain by domid. And it is the hardware domain which is going
>> to drive the device re-assignment, as that domain is who's in possession of
>> all the devices not yet assigned to any DomU.
> OK, I need to get the information of hardware domain here?

Right, with (for this purpose) "hardware domain" == "local domain".

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-20 10:42           ` Jan Beulich
@ 2024-06-21  8:20             ` Chen, Jiqian
  2024-06-24  8:17               ` Jan Beulich
  2024-06-24 12:33               ` Anthony PERARD
  0 siblings, 2 replies; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-21  8:20 UTC (permalink / raw)
  To: Anthony PERARD, Jan Beulich, Keir Fraser
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, Chen, Jiqian,
	xen-devel@lists.xenproject.org

On 2024/6/20 18:42, Jan Beulich wrote:
> On 20.06.2024 11:40, Chen, Jiqian wrote:
>> On 2024/6/18 17:23, Jan Beulich wrote:
>>> On 18.06.2024 10:23, Chen, Jiqian wrote:
>>>> On 2024/6/17 23:32, Jan Beulich wrote:
>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>              rc = ERROR_FAIL;
>>>>>>              goto out;
>>>>>>          }
>>>>>> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>>>>> +#ifdef CONFIG_X86
>>>>>> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
>>>>>> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
>>>>>
>>>>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
>>>>> I didn't check if that can be used with the underlying hypercall(s). Otherwise
>> From the commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf, DOMID_SELF is not allowed for XEN_DOMCTL_getdomaininfo.
>> And now XEN_DOMCTL_getdomaininfo gets domain through rcu_lock_domain_by_id.
>>
>>>>> you want to pass the actual domid of the local domain here.
>> What is the local domain here?
> 
> The domain your code is running in.
> 
>> What is method for me to get its domid?
> 
> I hope there's an available function in one of the libraries to do that.
I didn't find relate function.
Hi Anthony, do you know?

> But I wouldn't even know what to look for; that's a question to (primarily)
> Anthony then, who sadly continues to be our only tool stack maintainer.
> 
> Alternatively we could maybe enable XEN_DOMCTL_getdomaininfo to permit
> DOMID_SELF.
It didn't permit DOMID_SELF since below commit. Does it still have the same problem if permit DOMID_SELF?

commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf
Author: kfraser@localhost.localdomain <kfraser@localhost.localdomain>
Date:   Tue Aug 14 09:56:46 2007 +0100

    xen: Do not accept DOMID_SELF as input to DOMCTL_getdomaininfo.
    This was screwing up callers that loop on getdomaininfo(), if there
    was a domain with domid DOMID_FIRST_RESERVED-1 (== DOMID_SELF-1).
    They would see DOMID_SELF-1, then look up DOMID_SELF, which has domid
    0 of course, and then start their domain-finding loop all over again!
    Found by Kouya Shimura <kouya@jp.fujitsu.com>. Thanks!
    Signed-off-by: Keir Fraser <keir@xensource.com>

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 09a1e84d98e0..5d29667b7c3d 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -463,19 +463,13 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domctl_t) u_domctl)
     case XEN_DOMCTL_getdomaininfo:
     {
         struct domain *d;
-        domid_t dom;
-
-        dom = op->domain;
-        if ( dom == DOMID_SELF )
-            dom = current->domain->domain_id;
+        domid_t dom = op->domain;

         rcu_read_lock(&domlist_read_lock);

         for_each_domain ( d )
-        {
             if ( d->domain_id >= dom )
                 break;
-        }

         if ( d == NULL )
         {

> 
>>>> But the action of granting permission is from dom0 to domU, what I need to get is the infomation of dom0,
>>>> The actual domid here is domU's id I think, it is not useful.
>>>
>>> Note how I said DOMID_SELF and "local domain". There's no talk of using the
>>> DomU's domid. But what you apparently neglect is the fact that the hardware
>>> domain isn't necessarily Dom0 (see CONFIG_LATE_HWDOM in the hypervisor).
>>> While benign in most cases, this is relevant when it comes to referencing
>>> the hardware domain by domid. And it is the hardware domain which is going
>>> to drive the device re-assignment, as that domain is who's in possession of
>>> all the devices not yet assigned to any DomU.
>> OK, I need to get the information of hardware domain here?
> 
> Right, with (for this purpose) "hardware domain" == "local domain".
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-21  8:20             ` Chen, Jiqian
@ 2024-06-24  8:17               ` Jan Beulich
  2024-06-25  7:44                 ` Chen, Jiqian
  2024-06-24 12:33               ` Anthony PERARD
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2024-06-24  8:17 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Anthony PERARD

On 21.06.2024 10:20, Chen, Jiqian wrote:
> On 2024/6/20 18:42, Jan Beulich wrote:
>> On 20.06.2024 11:40, Chen, Jiqian wrote:
>>> On 2024/6/18 17:23, Jan Beulich wrote:
>>>> On 18.06.2024 10:23, Chen, Jiqian wrote:
>>>>> On 2024/6/17 23:32, Jan Beulich wrote:
>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>              rc = ERROR_FAIL;
>>>>>>>              goto out;
>>>>>>>          }
>>>>>>> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>>>>>> +#ifdef CONFIG_X86
>>>>>>> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
>>>>>>> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
>>>>>>
>>>>>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
>>>>>> I didn't check if that can be used with the underlying hypercall(s). Otherwise
>>> From the commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf, DOMID_SELF is not allowed for XEN_DOMCTL_getdomaininfo.
>>> And now XEN_DOMCTL_getdomaininfo gets domain through rcu_lock_domain_by_id.
>>>
>>>>>> you want to pass the actual domid of the local domain here.
>>> What is the local domain here?
>>
>> The domain your code is running in.
>>
>>> What is method for me to get its domid?
>>
>> I hope there's an available function in one of the libraries to do that.
> I didn't find relate function.
> Hi Anthony, do you know?
> 
>> But I wouldn't even know what to look for; that's a question to (primarily)
>> Anthony then, who sadly continues to be our only tool stack maintainer.
>>
>> Alternatively we could maybe enable XEN_DOMCTL_getdomaininfo to permit
>> DOMID_SELF.
> It didn't permit DOMID_SELF since below commit. Does it still have the same problem if permit DOMID_SELF?

To answer this, all respective callers would need auditing. However, ...

> commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf
> Author: kfraser@localhost.localdomain <kfraser@localhost.localdomain>
> Date:   Tue Aug 14 09:56:46 2007 +0100
> 
>     xen: Do not accept DOMID_SELF as input to DOMCTL_getdomaininfo.
>     This was screwing up callers that loop on getdomaininfo(), if there
>     was a domain with domid DOMID_FIRST_RESERVED-1 (== DOMID_SELF-1).
>     They would see DOMID_SELF-1, then look up DOMID_SELF, which has domid
>     0 of course, and then start their domain-finding loop all over again!
>     Found by Kouya Shimura <kouya@jp.fujitsu.com>. Thanks!
>     Signed-off-by: Keir Fraser <keir@xensource.com>

... I view this as a pretty odd justification for the change, when imo the
bogus loops should instead have been adjusted.

Jan

> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 09a1e84d98e0..5d29667b7c3d 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -463,19 +463,13 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domctl_t) u_domctl)
>      case XEN_DOMCTL_getdomaininfo:
>      {
>          struct domain *d;
> -        domid_t dom;
> -
> -        dom = op->domain;
> -        if ( dom == DOMID_SELF )
> -            dom = current->domain->domain_id;
> +        domid_t dom = op->domain;
> 
>          rcu_read_lock(&domlist_read_lock);
> 
>          for_each_domain ( d )
> -        {
>              if ( d->domain_id >= dom )
>                  break;
> -        }
> 
>          if ( d == NULL )
>          {



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-24  8:17               ` Jan Beulich
@ 2024-06-25  7:44                 ` Chen, Jiqian
  2024-06-25  7:48                   ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-25  7:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Anthony PERARD, Chen, Jiqian

On 2024/6/24 16:17, Jan Beulich wrote:
> On 21.06.2024 10:20, Chen, Jiqian wrote:
>> On 2024/6/20 18:42, Jan Beulich wrote:
>>> On 20.06.2024 11:40, Chen, Jiqian wrote:
>>>> On 2024/6/18 17:23, Jan Beulich wrote:
>>>>> On 18.06.2024 10:23, Chen, Jiqian wrote:
>>>>>> On 2024/6/17 23:32, Jan Beulich wrote:
>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>>              rc = ERROR_FAIL;
>>>>>>>>              goto out;
>>>>>>>>          }
>>>>>>>> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
>>>>>>>> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
>>>>>>>
>>>>>>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
>>>>>>> I didn't check if that can be used with the underlying hypercall(s). Otherwise
>>>> From the commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf, DOMID_SELF is not allowed for XEN_DOMCTL_getdomaininfo.
>>>> And now XEN_DOMCTL_getdomaininfo gets domain through rcu_lock_domain_by_id.
>>>>
>>>>>>> you want to pass the actual domid of the local domain here.
>>>> What is the local domain here?
>>>
>>> The domain your code is running in.
>>>
>>>> What is method for me to get its domid?
>>>
>>> I hope there's an available function in one of the libraries to do that.
>> I didn't find relate function.
>> Hi Anthony, do you know?
>>
>>> But I wouldn't even know what to look for; that's a question to (primarily)
>>> Anthony then, who sadly continues to be our only tool stack maintainer.
>>>
>>> Alternatively we could maybe enable XEN_DOMCTL_getdomaininfo to permit
>>> DOMID_SELF.
>> It didn't permit DOMID_SELF since below commit. Does it still have the same problem if permit DOMID_SELF?
> 
> To answer this, all respective callers would need auditing. However, ...
> 
>> commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf
>> Author: kfraser@localhost.localdomain <kfraser@localhost.localdomain>
>> Date:   Tue Aug 14 09:56:46 2007 +0100
>>
>>     xen: Do not accept DOMID_SELF as input to DOMCTL_getdomaininfo.
>>     This was screwing up callers that loop on getdomaininfo(), if there
>>     was a domain with domid DOMID_FIRST_RESERVED-1 (== DOMID_SELF-1).
>>     They would see DOMID_SELF-1, then look up DOMID_SELF, which has domid
>>     0 of course, and then start their domain-finding loop all over again!
>>     Found by Kouya Shimura <kouya@jp.fujitsu.com>. Thanks!
>>     Signed-off-by: Keir Fraser <keir@xensource.com>
> 
> ... I view this as a pretty odd justification for the change, when imo the
> bogus loops should instead have been adjusted.
Yes, you are right.
And Anthony suggested to use LIBXL_TOOLSTACK_DOMID to replace 0 domid.
It seems there is no need to change hypercall DOMCTL_getdomaininfo for now?

> 
> Jan
> 
>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>> index 09a1e84d98e0..5d29667b7c3d 100644
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -463,19 +463,13 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domctl_t) u_domctl)
>>      case XEN_DOMCTL_getdomaininfo:
>>      {
>>          struct domain *d;
>> -        domid_t dom;
>> -
>> -        dom = op->domain;
>> -        if ( dom == DOMID_SELF )
>> -            dom = current->domain->domain_id;
>> +        domid_t dom = op->domain;
>>
>>          rcu_read_lock(&domlist_read_lock);
>>
>>          for_each_domain ( d )
>> -        {
>>              if ( d->domain_id >= dom )
>>                  break;
>> -        }
>>
>>          if ( d == NULL )
>>          {
> 

-- 
Best regards,
Jiqian Chen.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-25  7:44                 ` Chen, Jiqian
@ 2024-06-25  7:48                   ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2024-06-25  7:48 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Anthony PERARD

On 25.06.2024 09:44, Chen, Jiqian wrote:
> On 2024/6/24 16:17, Jan Beulich wrote:
>> On 21.06.2024 10:20, Chen, Jiqian wrote:
>>> On 2024/6/20 18:42, Jan Beulich wrote:
>>>> Alternatively we could maybe enable XEN_DOMCTL_getdomaininfo to permit
>>>> DOMID_SELF.
>>> It didn't permit DOMID_SELF since below commit. Does it still have the same problem if permit DOMID_SELF?
>>
>> To answer this, all respective callers would need auditing. However, ...
>>
>>> commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf
>>> Author: kfraser@localhost.localdomain <kfraser@localhost.localdomain>
>>> Date:   Tue Aug 14 09:56:46 2007 +0100
>>>
>>>     xen: Do not accept DOMID_SELF as input to DOMCTL_getdomaininfo.
>>>     This was screwing up callers that loop on getdomaininfo(), if there
>>>     was a domain with domid DOMID_FIRST_RESERVED-1 (== DOMID_SELF-1).
>>>     They would see DOMID_SELF-1, then look up DOMID_SELF, which has domid
>>>     0 of course, and then start their domain-finding loop all over again!
>>>     Found by Kouya Shimura <kouya@jp.fujitsu.com>. Thanks!
>>>     Signed-off-by: Keir Fraser <keir@xensource.com>
>>
>> ... I view this as a pretty odd justification for the change, when imo the
>> bogus loops should instead have been adjusted.
> Yes, you are right.
> And Anthony suggested to use LIBXL_TOOLSTACK_DOMID to replace 0 domid.
> It seems there is no need to change hypercall DOMCTL_getdomaininfo for now?

With Anthony's suggestion - indeed.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-21  8:20             ` Chen, Jiqian
  2024-06-24  8:17               ` Jan Beulich
@ 2024-06-24 12:33               ` Anthony PERARD
  2024-06-25  7:46                 ` Chen, Jiqian
  1 sibling, 1 reply; 48+ messages in thread
From: Anthony PERARD @ 2024-06-24 12:33 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Jan Beulich, Keir Fraser, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel

On Fri, Jun 21, 2024 at 08:20:55AM +0000, Chen, Jiqian wrote:
> On 2024/6/20 18:42, Jan Beulich wrote:
> > On 20.06.2024 11:40, Chen, Jiqian wrote:
> >> On 2024/6/18 17:23, Jan Beulich wrote:
> >>> On 18.06.2024 10:23, Chen, Jiqian wrote:
> >>>> On 2024/6/17 23:32, Jan Beulich wrote:
> >>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
> >>>>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
> >>>>>>              rc = ERROR_FAIL;
> >>>>>>              goto out;
> >>>>>>          }
> >>>>>> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
> >>>>>> +#ifdef CONFIG_X86
> >>>>>> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
> >>>>>> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
> >>>>>
> >>>>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
> >>>>> I didn't check if that can be used with the underlying hypercall(s). Otherwise
> >> From the commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf, DOMID_SELF is not allowed for XEN_DOMCTL_getdomaininfo.
> >> And now XEN_DOMCTL_getdomaininfo gets domain through rcu_lock_domain_by_id.
> >>
> >>>>> you want to pass the actual domid of the local domain here.
> >> What is the local domain here?
> > 
> > The domain your code is running in.
> > 
> >> What is method for me to get its domid?
> > 
> > I hope there's an available function in one of the libraries to do that.
> I didn't find relate function.
> Hi Anthony, do you know?

Yes, I managed to find:
LIBXL_TOOLSTACK_DOMID
That's the value you can use instead of "0" do designate dom0.
(That was harder than necessary to find.)

Cheers,

-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-24 12:33               ` Anthony PERARD
@ 2024-06-25  7:46                 ` Chen, Jiqian
  0 siblings, 0 replies; 48+ messages in thread
From: Chen, Jiqian @ 2024-06-25  7:46 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Jan Beulich, Keir Fraser, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/6/24 20:33, Anthony PERARD wrote:
> On Fri, Jun 21, 2024 at 08:20:55AM +0000, Chen, Jiqian wrote:
>> On 2024/6/20 18:42, Jan Beulich wrote:
>>> On 20.06.2024 11:40, Chen, Jiqian wrote:
>>>> On 2024/6/18 17:23, Jan Beulich wrote:
>>>>> On 18.06.2024 10:23, Chen, Jiqian wrote:
>>>>>> On 2024/6/17 23:32, Jan Beulich wrote:
>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>>              rc = ERROR_FAIL;
>>>>>>>>              goto out;
>>>>>>>>          }
>>>>>>>> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>> +        /* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
>>>>>>>> +        r = xc_domain_getinfo_single(ctx->xch, 0, &info);
>>>>>>>
>>>>>>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
>>>>>>> I didn't check if that can be used with the underlying hypercall(s). Otherwise
>>>> From the commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf, DOMID_SELF is not allowed for XEN_DOMCTL_getdomaininfo.
>>>> And now XEN_DOMCTL_getdomaininfo gets domain through rcu_lock_domain_by_id.
>>>>
>>>>>>> you want to pass the actual domid of the local domain here.
>>>> What is the local domain here?
>>>
>>> The domain your code is running in.
>>>
>>>> What is method for me to get its domid?
>>>
>>> I hope there's an available function in one of the libraries to do that.
>> I didn't find relate function.
>> Hi Anthony, do you know?
> 
> Yes, I managed to find:
> LIBXL_TOOLSTACK_DOMID
> That's the value you can use instead of "0" do designate dom0.
> (That was harder than necessary to find.)
Thank you very much! I will use LIBXL_TOOLSTACK_DOMID in next version.

> 
> Cheers,
> 

-- 
Best regards,
Jiqian Chen.


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2024-06-25  7:48 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17  9:00 [XEN PATCH v10 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
2024-06-17  9:00 ` [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device Jiqian Chen
2024-06-17 14:17   ` Jan Beulich
2024-06-18  6:25     ` Chen, Jiqian
2024-06-18  8:33       ` Jan Beulich
2024-06-19  3:39         ` Chen, Jiqian
2024-06-19  7:02           ` Jan Beulich
2024-06-17  9:00 ` [XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
2024-06-17 14:45   ` Jan Beulich
2024-06-18  6:49     ` Chen, Jiqian
2024-06-18  8:38       ` Jan Beulich
2024-06-19  5:35         ` Chen, Jiqian
2024-06-17  9:00 ` [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
2024-06-17 14:52   ` Jan Beulich
2024-06-18  6:57     ` Chen, Jiqian
2024-06-18  8:55       ` Jan Beulich
2024-06-19  7:53         ` Chen, Jiqian
2024-06-19  8:06           ` Jan Beulich
2024-06-19  8:51             ` Chen, Jiqian
2024-06-19  9:49               ` Jan Beulich
2024-06-19 10:10                 ` Chen, Jiqian
2024-06-17  9:00 ` [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev Jiqian Chen
2024-06-17 15:10   ` Jan Beulich
2024-06-18  8:10     ` Chen, Jiqian
2024-06-18  9:13       ` Jan Beulich
2024-06-20  7:03         ` Chen, Jiqian
2024-06-20  7:43           ` Jan Beulich
2024-06-20 10:23             ` Chen, Jiqian
2024-06-20 10:37               ` Jan Beulich
2024-06-21  8:15                 ` Chen, Jiqian
2024-06-24  8:13                   ` Jan Beulich
2024-06-25  7:38                     ` Chen, Jiqian
2024-06-20 14:38   ` Anthony PERARD
2024-06-21  8:34     ` Chen, Jiqian
2024-06-24 12:08       ` Anthony PERARD
2024-06-17  9:00 ` [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi Jiqian Chen
2024-06-17  9:15   ` Chen, Jiqian
2024-06-17 15:32   ` Jan Beulich
2024-06-18  8:23     ` Chen, Jiqian
2024-06-18  9:23       ` Jan Beulich
2024-06-20  9:40         ` Chen, Jiqian
2024-06-20 10:42           ` Jan Beulich
2024-06-21  8:20             ` Chen, Jiqian
2024-06-24  8:17               ` Jan Beulich
2024-06-25  7:44                 ` Chen, Jiqian
2024-06-25  7:48                   ` Jan Beulich
2024-06-24 12:33               ` Anthony PERARD
2024-06-25  7:46                 ` Chen, Jiqian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.